cwltool


Namecwltool JSON
Version 1.0.20170828135420 PyPI version JSON
download
home_pagehttps://github.com/common-workflow-language/cwltool
SummaryCommon workflow language reference implementation
upload_time2017-08-28 13:55:47
maintainer
docs_urlNone
authorCommon workflow language working group
requires_python
license
keywords
VCS
bugtrack_url
requirements requests ruamel.yaml rdflib rdflib-jsonld shellescape schema-salad typing
Travis-CI
coveralls test coverage
            ==================================================================
Common Workflow Language tool description reference implementation
==================================================================

CWL conformance tests: |Build Status| Travis CI: |Unix Build Status|

.. |Unix Build Status| image:: https://img.shields.io/travis/common-workflow-language/cwltool/master.svg?label=unix%20build
   :target: https://travis-ci.org/common-workflow-language/cwltool

This is the reference implementation of the Common Workflow Language.  It is
intended to feature complete and provide comprehensive validation of CWL
files as well as provide other tools related to working with CWL.

This is written and tested for Python ``2.7 and 3.x {x = 3, 4, 5, 6}``

The reference implementation consists of two packages.  The ``cwltool`` package
is the primary Python module containing the reference implementation in the
``cwltool`` module and console executable by the same name.

The ``cwlref-runner`` package is optional and provides an additional entry point
under the alias ``cwl-runner``, which is the implementation-agnostic name for the
default CWL interpreter installed on a host.

Install
-------

It is highly recommended to setup virtual environment before installing `cwltool`:

.. code:: bash

  virtualenv -p python2 venv   # Create a virtual environment, can use `python3` as well
  source venv/bin/activate     # Activate environment before installing `cwltool`

1. Installing the official package from PyPi (will install "cwltool" package as
well)

.. code:: bash

  pip install cwlref-runner

If installing alongside another CWL implementation then

.. code:: bash

  pip install cwltool

2. To install from source

.. code:: bash

  git clone https://github.com/common-workflow-language/cwltool.git # clone cwltool repo
  cd cwltool         # Switch to source directory
  pip install .      # Install `cwltool` from source
  cwltool --version  # Check if the installation works correctly

Remember, if co-installing multiple CWL implementations then you need to
maintain which implementation ``cwl-runner`` points to via a symbolic file
system link or `another facility <https://wiki.debian.org/DebianAlternatives>`_.

Running tests locally
---------------------

-  Running basic tests ``(/tests)``:

We use `tox <https://github.com/common-workflow-language/cwltool/tree/master/tox.ini>`_
to run various tests in all supported Python environments.
You can run the test suite by simply running the following in the terminal:
``pip install tox; tox``

List of all environment can be seen using:
``tox --listenvs``
and running a specfic test env using:
``tox -e <env name>``

-  Running the entire suite of CWL conformance tests:

The GitHub repository for the CWL specifications contains a script that tests a CWL
implementation against a wide array of valid CWL files using the `cwltest <https://github.com/common-workflow-language/cwltest>`_
program

Instructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/master/CONFORMANCE_TESTS.md

Run on the command line
-----------------------

Simple command::

  cwl-runner [tool-or-workflow-description] [input-job-settings]

Or if you have multiple CWL implementations installed and you want to override
the default cwl-runner use::

  cwltool [tool-or-workflow-description] [input-job-settings]

Use with boot2docker
--------------------
boot2docker is running docker inside a virtual machine and it only mounts ``Users``
on it. The default behavior of CWL is to create temporary directories under e.g.
``/Var`` which is not accessible to Docker containers.

To run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix``
and ``--tmp-outdir-prefix`` to somewhere under ``/Users``::

    $ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json

.. |Build Status| image:: https://ci.commonwl.org/buildStatus/icon?job=cwltool-conformance
   :target: https://ci.commonwl.org/job/cwltool-conformance/

Tool or workflow loading from remote or local locations
-------------------------------------------------------

``cwltool`` can run tool and workflow descriptions on both local and remote
systems via its support for HTTP[S] URLs.

Input job files and Workflow steps (via the `run` directive) can reference CWL
documents using absolute or relative local filesytem paths. If a relative path
is referenced and that document isn't found in the current directory then the
following locations will be searched:
http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem


Use with GA4GH Tool Registry API
--------------------------------

Cwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints.

By default, cwltool searches https://dockstore.org/ .  Use --add-tool-registry to add other registries to the search path.

For example ::

  cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats:master test.json

and (defaults to latest when a version is not specified) ::

  cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats test.json

For this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats

.. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas

Import as a module
------------------

Add

.. code:: python

  import cwltool

to your script.

The easiest way to use cwltool to run a tool or workflow from Python is to use a Factory

.. code:: python

  import cwltool.factory
  fac = cwltool.factory.Factory()

  echo = f.make("echo.cwl")
  result = echo(inp="foo")

  # result["out"] == "foo"

Leveraging SoftwareRequirements (Beta)
--------------------------------------

CWL tools may be decoarated with ``SoftwareRequirement`` hints that cwltool
may in turn use to resolve to packages in various package managers or
dependency management systems such as `Environment Modules
<http://modules.sourceforge.net/>`__.

Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional
dependency, for this reason be sure to use specify the ``deps`` modifier when
installing cwltool. For instance::

  $ pip install 'cwltool[deps]'

Installing cwltool in this fashion enables several new command line options.
The most general of these options is ``--beta-dependency-resolvers-configuration``.
This option allows one to specify a dependency resolvers configuration file.
This file may be specified as either XML or YAML and very simply describes various
plugins to enable to "resolve" ``SoftwareRequirement`` dependencies.

To discuss some of these plugins and how to configure them, first consider the
following ``hint`` definition for an example CWL tool.

.. code:: yaml

  SoftwareRequirement:
    packages:
    - package: seqtk
      version:
      - r93

Now imagine deploying cwltool on a cluster with Software Modules installed
and that a ``seqtk`` module is avaialble at version ``r93``. This means cluster
users likely won't have the ``seqtk`` the binary on their ``PATH`` by default but after
sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is
available on the ``PATH``. A simple dependency resolvers configuration file, called
``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source
the correct module environment before executing the above tool would simply be:

.. code:: yaml

  - type: module

The outer list indicates that one plugin is being enabled, the plugin parameters are
defined as a dictionary for this one list item. There is only one required parameter
for the plugin above, this is ``type`` and defines the plugin type. This parameter
is required for all plugins. The available plugins and the parameters
available for each are documented (incompletely) `here
<https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.
Unfortunately, this documentation is in the context of Galaxy tool
``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.

cwltool is distributed with an example of such seqtk tool and sample corresponding
job. It could executed from the cwltool root using a dependency resolvers
configuration file such as the above one using the command::

  cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
      tests/seqtk_seq.cwl \
      tests/seqtk_seq_job.json

This example demonstrates both that cwltool can leverage
existing software installations and also handle workflows with dependencies
on different versions of the same software and libraries. However the above
example does require an existing module setup so it is impossible to test this example
"out of the box" with cwltool. For a more isolated test that demonstrates all
the same concepts - the resolver plugin type ``galaxy_packages`` can be used.

"Galaxy packages" are a lighter weight alternative to Environment Modules that are
really just defined by a way to lay out directories into packages and versions
to find little scripts that are sourced to modify the environment. They have
been used for years in Galaxy community to adapt Galaxy tools to cluster
environments but require neither knowledge of Galaxy nor any special tools to
setup. These should work just fine for CWL tools.

The cwltool source code repository's test directory is setup with a very simple
directory that defines a set of "Galaxy  packages" (but really just defines one
package named ``random-lines``). The directory layout is simply::

  tests/test_deps_env/
    random-lines/
      1.0/
        env.sh

If the ``galaxy_packages`` plugin is enabled and pointed at the
``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``
such as the following is encountered.

.. code:: yaml

  hints:
    SoftwareRequirement:
      packages:
      - package: 'random-lines'
        version:
        - '1.0'

Then cwltool will simply find that ``env.sh`` file and source it before executing
the corresponding tool. That ``env.sh`` script is only responsible for modifying
the job's ``PATH`` to add the required binaries.

This is a full example that works since resolving "Galaxy packages" has no
external requirements. Try it out by executing the following command from cwltool's
root directory::

  cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
      tests/random_lines.cwl \
      tests/random_lines_job.json

The resolvers configuration file in the above example was simply:

.. code:: yaml

  - type: galaxy_packages
    base_path: ./tests/test_deps_env

It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not
match the module names for a given cluster. Such requirements can be re-mapped
to specific deployed packages and/or versions using another file specified using
the resolver plugin parameter `mapping_files`. We will
demonstrate this using `galaxy_packages` but the concepts apply equally well
to Environment Modules or Conda packages (described below) for instance.

So consider the resolvers configuration file
(`tests/test_deps_env_resolvers_conf_rewrite.yml`):

.. code:: yaml

  - type: galaxy_packages
    base_path: ./tests/test_deps_env
    mapping_files: ./tests/test_deps_mapping.yml

And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`):

.. code:: yaml

  - from:
      name: randomLines
      version: 1.0.0-rc1
    to:
      name: random-lines
      version: '1.0'

This is saying if cwltool encounters a requirement of ``randomLines`` at version
``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at
version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``
that contains such a source ``SoftwareRequirement``. To try out this example with
mapping, execute the following command from the cwltool root directory::

  cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
      tests/random_lines_mapping.cwl \
      tests/random_lines_job.json

The previous examples demonstrated leveraging existing infrastructure to
provide requirements for CWL tools. If instead a real package manager is used
cwltool has the oppertunity to install requirements as needed. While initial
support for Homebrew/Linuxbrew plugins is available, the most developed such
plugin is for the `Conda <https://conda.io/docs/#>`__ package manager. Conda has the nice properties
of allowing multiple versions of a package to be installed simultaneously,
not requiring evalated permissions to install Conda itself or packages using
Conda, and being cross platform. For these reasons, cwltool may run as a normal
user, install its own Conda environment and manage multiple versions of Conda packages
on both Linux and Mac OS X.

The Conda plugin can be endlessly configured, but a sensible set of defaults
that has proven a powerful stack for dependency management within the Galaxy tool
development ecosystem can be enabled by simply passing cwltool the
``--beta-conda-dependencies`` flag.

With this we can use the seqtk example above without Docker and without
any externally managed services - cwltool should install everything it needs
and create an environment for the tool. Try it out with the follwing command::

  cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json

The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s
that allow disambiguation of package names. If the mapping files described above
allow deployers to adapt tools to their infrastructure, this mechanism allows
tools to adapt their requirements to multiple package managers. To demonstrate
this within the context of the seqtk, we can simply break the package name we
use and then specify a specific Conda package as follows:

.. code:: yaml

  hints:
    SoftwareRequirement:
      packages:
      - package: seqtk_seq
        version:
        - '1.2'
        specs:
        - https://anaconda.org/bioconda/seqtk
        - https://packages.debian.org/sid/seqtk

The example can be executed using the command::

  cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json

The plugin framework for managing resolution of these software requirements
as maintained as part of `galaxy-lib <https://github.com/galaxyproject/galaxy-lib>`__ - a small, portable subset of the Galaxy
project. More information on configuration and implementation can be found
at the following links:

- `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__
- `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__
- `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb>`__
- `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc>`__
- `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214>`__

CWL Tool Control Flow
---------------------

Technical outline of how cwltool works internally, for maintainers.

#. Use CWL ``load_tool()`` to load document.

   #. Fetches the document from file or URL
   #. Applies preprocessing (syntax/identifier expansion and normalization)
   #. Validates the document based on cwlVersion
   #. If necessary, updates the document to latest spec
   #. Constructs a Process object using ``make_tool()``` callback.  This yields a
      CommandLineTool, Workflow, or ExpressionTool.  For workflows, this
      recursively constructs each workflow step.
   #. To construct custom types for CommandLineTool, Workflow, or
      ExpressionTool, provide a custom ``make_tool()``

#. Iterate on the ``job()`` method of the Process object to get back runnable jobs.

   #. ``job()`` is a generator method (uses the Python iterator protocol)
   #. Each time the ``job()`` method is invoked in an iteration, it returns one
      of: a runnable item (an object with a ``run()`` method), ``None`` (indicating
      there is currently no work ready to run) or end of iteration (indicating
      the process is complete.)
   #. Invoke the runnable item by calling ``run()``.  This runs the tool and gets output.
   #. Output of a process is reported by an output callback.
   #. ``job()`` may be iterated over multiple times.  It will yield all the work
      that is currently ready to run and then yield None.

#. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation.

   #. The WorkflowJob iterates over each WorkflowJobStep and determines if the
      inputs the step are ready.
   #. When a step is ready, it constructs an input object for that step and
      iterates on the ``job()`` method of the workflow job step.
   #. Each runnable item is yielded back up to top level run loop
   #. When a step job completes and receives an output callback, the
      job outputs are assigned to the output of the workflow step.
   #. When all steps are complete, the intermediate files are moved to a final
      workflow output, intermediate directories are deleted, and the output
      callback for the workflow is called.

#. ``CommandLineTool`` job() objects yield a single runnable object.

   #. The CommandLineTool ``job()`` method calls ``makeJobRunner()`` to create a
      ``CommandLineJob`` object
   #. The job method configures the CommandLineJob object by setting public
      attributes
   #. The job method iterates over file and directories inputs to the
      CommandLineTool and creates a "path map".
   #. Files are mapped from their "resolved" location to a "target" path where
      they will appear at tool invocation (for example, a location inside a
      Docker container.)  The target paths are used on the command line.
   #. Files are staged to targets paths using either Docker volume binds (when
      using containers) or symlinks (if not).  This staging step enables files
      to be logically rearranged or renamed independent of their source layout.
   #. The ``run()`` method of CommandLineJob executes the command line tool or
      Docker container, waits for it to complete, collects output, and makes
      the output callback.


Extension points
----------------

The following functions can be provided to main(), to load_tool(), or to the
executor to override or augment the listed behaviors.

executor
  ::

    executor(tool, job_order_object, **kwargs)
      (Process, Dict[Text, Any], **Any) -> Tuple[Dict[Text, Any], Text]

  A toplevel workflow execution loop, should synchronously execute a process
  object and return an output object.

makeTool
  ::

    makeTool(toolpath_object, **kwargs)
      (Dict[Text, Any], **Any) -> Process

  Construct a Process object from a document.

selectResources
  ::

    selectResources(request)
      (Dict[Text, int]) -> Dict[Text, int]

  Take a resource request and turn it into a concrete resource assignment.

versionfunc
  ::

    ()
      () -> Text

  Return version string.

make_fs_access
  ::

    make_fs_access(basedir)
      (Text) -> StdFsAccess

  Return a file system access object.

fetcher_constructor
  ::

    fetcher_constructor(cache, session)
      (Dict[unicode, unicode], requests.sessions.Session) -> Fetcher

  Construct a Fetcher object with the supplied cache and HTTP session.

resolver
  ::

    resolver(document_loader, document)
      (Loader, Union[Text, dict[Text, Any]]) -> Text

  Resolve a relative document identifier to an absolute one which can be fetched.

logger_handler
  ::

    logger_handler
      logging.Handler

  Handler object for logging.
            

Raw data

            {
    "maintainer": "", 
    "docs_url": null, 
    "requires_python": "", 
    "maintainer_email": "", 
    "cheesecake_code_kwalitee_id": null, 
    "keywords": "", 
    "upload_time": "2017-08-28 13:55:47", 
    "requirements": [
        {
            "name": "requests", 
            "specs": [
                [
                    ">=", 
                    "2.4.3"
                ]
            ]
        }, 
        {
            "name": "ruamel.yaml", 
            "specs": [
                [
                    ">=", 
                    "0.12.4"
                ], 
                [
                    "<", 
                    "0.15"
                ]
            ]
        }, 
        {
            "name": "rdflib", 
            "specs": [
                [
                    "==", 
                    "4.2.2"
                ]
            ]
        }, 
        {
            "name": "rdflib-jsonld", 
            "specs": [
                [
                    "==", 
                    "0.4.0"
                ]
            ]
        }, 
        {
            "name": "shellescape", 
            "specs": [
                [
                    "==", 
                    "3.4.1"
                ]
            ]
        }, 
        {
            "name": "schema-salad", 
            "specs": [
                [
                    ">=", 
                    "2.6"
                ], 
                [
                    "<", 
                    "3"
                ]
            ]
        }, 
        {
            "name": "typing", 
            "specs": [
                [
                    "==", 
                    "3.5.3"
                ]
            ]
        }
    ], 
    "author": "Common workflow language working group", 
    "home_page": "https://github.com/common-workflow-language/cwltool", 
    "github_user": "common-workflow-language", 
    "appveyor": true, 
    "download_url": "https://pypi.python.org/packages/3f/a4/d76db1a5acf961f0a52f7c05fdeefac3146a52ac6235f7f9e9095e1ea08c/cwltool-1.0.20170828135420.tar.gz", 
    "platform": "", 
    "version": "1.0.20170828135420", 
    "cheesecake_documentation_id": null, 
    "description": "==================================================================\nCommon Workflow Language tool description reference implementation\n==================================================================\n\nCWL conformance tests: |Build Status| Travis CI: |Unix Build Status|\n\n.. |Unix Build Status| image:: https://img.shields.io/travis/common-workflow-language/cwltool/master.svg?label=unix%20build\n   :target: https://travis-ci.org/common-workflow-language/cwltool\n\nThis is the reference implementation of the Common Workflow Language.  It is\nintended to feature complete and provide comprehensive validation of CWL\nfiles as well as provide other tools related to working with CWL.\n\nThis is written and tested for Python ``2.7 and 3.x {x = 3, 4, 5, 6}``\n\nThe reference implementation consists of two packages.  The ``cwltool`` package\nis the primary Python module containing the reference implementation in the\n``cwltool`` module and console executable by the same name.\n\nThe ``cwlref-runner`` package is optional and provides an additional entry point\nunder the alias ``cwl-runner``, which is the implementation-agnostic name for the\ndefault CWL interpreter installed on a host.\n\nInstall\n-------\n\nIt is highly recommended to setup virtual environment before installing `cwltool`:\n\n.. code:: bash\n\n  virtualenv -p python2 venv   # Create a virtual environment, can use `python3` as well\n  source venv/bin/activate     # Activate environment before installing `cwltool`\n\n1. Installing the official package from PyPi (will install \"cwltool\" package as\nwell)\n\n.. code:: bash\n\n  pip install cwlref-runner\n\nIf installing alongside another CWL implementation then\n\n.. code:: bash\n\n  pip install cwltool\n\n2. To install from source\n\n.. code:: bash\n\n  git clone https://github.com/common-workflow-language/cwltool.git # clone cwltool repo\n  cd cwltool         # Switch to source directory\n  pip install .      # Install `cwltool` from source\n  cwltool --version  # Check if the installation works correctly\n\nRemember, if co-installing multiple CWL implementations then you need to\nmaintain which implementation ``cwl-runner`` points to via a symbolic file\nsystem link or `another facility <https://wiki.debian.org/DebianAlternatives>`_.\n\nRunning tests locally\n---------------------\n\n-  Running basic tests ``(/tests)``:\n\nWe use `tox <https://github.com/common-workflow-language/cwltool/tree/master/tox.ini>`_\nto run various tests in all supported Python environments.\nYou can run the test suite by simply running the following in the terminal:\n``pip install tox; tox``\n\nList of all environment can be seen using:\n``tox --listenvs``\nand running a specfic test env using:\n``tox -e <env name>``\n\n-  Running the entire suite of CWL conformance tests:\n\nThe GitHub repository for the CWL specifications contains a script that tests a CWL\nimplementation against a wide array of valid CWL files using the `cwltest <https://github.com/common-workflow-language/cwltest>`_\nprogram\n\nInstructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/master/CONFORMANCE_TESTS.md\n\nRun on the command line\n-----------------------\n\nSimple command::\n\n  cwl-runner [tool-or-workflow-description] [input-job-settings]\n\nOr if you have multiple CWL implementations installed and you want to override\nthe default cwl-runner use::\n\n  cwltool [tool-or-workflow-description] [input-job-settings]\n\nUse with boot2docker\n--------------------\nboot2docker is running docker inside a virtual machine and it only mounts ``Users``\non it. The default behavior of CWL is to create temporary directories under e.g.\n``/Var`` which is not accessible to Docker containers.\n\nTo run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix``\nand ``--tmp-outdir-prefix`` to somewhere under ``/Users``::\n\n    $ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json\n\n.. |Build Status| image:: https://ci.commonwl.org/buildStatus/icon?job=cwltool-conformance\n   :target: https://ci.commonwl.org/job/cwltool-conformance/\n\nTool or workflow loading from remote or local locations\n-------------------------------------------------------\n\n``cwltool`` can run tool and workflow descriptions on both local and remote\nsystems via its support for HTTP[S] URLs.\n\nInput job files and Workflow steps (via the `run` directive) can reference CWL\ndocuments using absolute or relative local filesytem paths. If a relative path\nis referenced and that document isn't found in the current directory then the\nfollowing locations will be searched:\nhttp://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem\n\n\nUse with GA4GH Tool Registry API\n--------------------------------\n\nCwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints.\n\nBy default, cwltool searches https://dockstore.org/ .  Use --add-tool-registry to add other registries to the search path.\n\nFor example ::\n\n  cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats:master test.json\n\nand (defaults to latest when a version is not specified) ::\n\n  cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats test.json\n\nFor this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats\n\n.. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas\n\nImport as a module\n------------------\n\nAdd\n\n.. code:: python\n\n  import cwltool\n\nto your script.\n\nThe easiest way to use cwltool to run a tool or workflow from Python is to use a Factory\n\n.. code:: python\n\n  import cwltool.factory\n  fac = cwltool.factory.Factory()\n\n  echo = f.make(\"echo.cwl\")\n  result = echo(inp=\"foo\")\n\n  # result[\"out\"] == \"foo\"\n\nLeveraging SoftwareRequirements (Beta)\n--------------------------------------\n\nCWL tools may be decoarated with ``SoftwareRequirement`` hints that cwltool\nmay in turn use to resolve to packages in various package managers or\ndependency management systems such as `Environment Modules\n<http://modules.sourceforge.net/>`__.\n\nUtilizing ``SoftwareRequirement`` hints using cwltool requires an optional\ndependency, for this reason be sure to use specify the ``deps`` modifier when\ninstalling cwltool. For instance::\n\n  $ pip install 'cwltool[deps]'\n\nInstalling cwltool in this fashion enables several new command line options.\nThe most general of these options is ``--beta-dependency-resolvers-configuration``.\nThis option allows one to specify a dependency resolvers configuration file.\nThis file may be specified as either XML or YAML and very simply describes various\nplugins to enable to \"resolve\" ``SoftwareRequirement`` dependencies.\n\nTo discuss some of these plugins and how to configure them, first consider the\nfollowing ``hint`` definition for an example CWL tool.\n\n.. code:: yaml\n\n  SoftwareRequirement:\n    packages:\n    - package: seqtk\n      version:\n      - r93\n\nNow imagine deploying cwltool on a cluster with Software Modules installed\nand that a ``seqtk`` module is avaialble at version ``r93``. This means cluster\nusers likely won't have the ``seqtk`` the binary on their ``PATH`` by default but after\nsourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is\navailable on the ``PATH``. A simple dependency resolvers configuration file, called\n``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source\nthe correct module environment before executing the above tool would simply be:\n\n.. code:: yaml\n\n  - type: module\n\nThe outer list indicates that one plugin is being enabled, the plugin parameters are\ndefined as a dictionary for this one list item. There is only one required parameter\nfor the plugin above, this is ``type`` and defines the plugin type. This parameter\nis required for all plugins. The available plugins and the parameters\navailable for each are documented (incompletely) `here\n<https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.\nUnfortunately, this documentation is in the context of Galaxy tool\n``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.\n\ncwltool is distributed with an example of such seqtk tool and sample corresponding\njob. It could executed from the cwltool root using a dependency resolvers\nconfiguration file such as the above one using the command::\n\n  cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \\\n      tests/seqtk_seq.cwl \\\n      tests/seqtk_seq_job.json\n\nThis example demonstrates both that cwltool can leverage\nexisting software installations and also handle workflows with dependencies\non different versions of the same software and libraries. However the above\nexample does require an existing module setup so it is impossible to test this example\n\"out of the box\" with cwltool. For a more isolated test that demonstrates all\nthe same concepts - the resolver plugin type ``galaxy_packages`` can be used.\n\n\"Galaxy packages\" are a lighter weight alternative to Environment Modules that are\nreally just defined by a way to lay out directories into packages and versions\nto find little scripts that are sourced to modify the environment. They have\nbeen used for years in Galaxy community to adapt Galaxy tools to cluster\nenvironments but require neither knowledge of Galaxy nor any special tools to\nsetup. These should work just fine for CWL tools.\n\nThe cwltool source code repository's test directory is setup with a very simple\ndirectory that defines a set of \"Galaxy  packages\" (but really just defines one\npackage named ``random-lines``). The directory layout is simply::\n\n  tests/test_deps_env/\n    random-lines/\n      1.0/\n        env.sh\n\nIf the ``galaxy_packages`` plugin is enabled and pointed at the\n``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``\nsuch as the following is encountered.\n\n.. code:: yaml\n\n  hints:\n    SoftwareRequirement:\n      packages:\n      - package: 'random-lines'\n        version:\n        - '1.0'\n\nThen cwltool will simply find that ``env.sh`` file and source it before executing\nthe corresponding tool. That ``env.sh`` script is only responsible for modifying\nthe job's ``PATH`` to add the required binaries.\n\nThis is a full example that works since resolving \"Galaxy packages\" has no\nexternal requirements. Try it out by executing the following command from cwltool's\nroot directory::\n\n  cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \\\n      tests/random_lines.cwl \\\n      tests/random_lines_job.json\n\nThe resolvers configuration file in the above example was simply:\n\n.. code:: yaml\n\n  - type: galaxy_packages\n    base_path: ./tests/test_deps_env\n\nIt is possible that the ``SoftwareRequirement`` s in a given CWL tool will not\nmatch the module names for a given cluster. Such requirements can be re-mapped\nto specific deployed packages and/or versions using another file specified using\nthe resolver plugin parameter `mapping_files`. We will\ndemonstrate this using `galaxy_packages` but the concepts apply equally well\nto Environment Modules or Conda packages (described below) for instance.\n\nSo consider the resolvers configuration file\n(`tests/test_deps_env_resolvers_conf_rewrite.yml`):\n\n.. code:: yaml\n\n  - type: galaxy_packages\n    base_path: ./tests/test_deps_env\n    mapping_files: ./tests/test_deps_mapping.yml\n\nAnd the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`):\n\n.. code:: yaml\n\n  - from:\n      name: randomLines\n      version: 1.0.0-rc1\n    to:\n      name: random-lines\n      version: '1.0'\n\nThis is saying if cwltool encounters a requirement of ``randomLines`` at version\n``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at\nversion ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``\nthat contains such a source ``SoftwareRequirement``. To try out this example with\nmapping, execute the following command from the cwltool root directory::\n\n  cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \\\n      tests/random_lines_mapping.cwl \\\n      tests/random_lines_job.json\n\nThe previous examples demonstrated leveraging existing infrastructure to\nprovide requirements for CWL tools. If instead a real package manager is used\ncwltool has the oppertunity to install requirements as needed. While initial\nsupport for Homebrew/Linuxbrew plugins is available, the most developed such\nplugin is for the `Conda <https://conda.io/docs/#>`__ package manager. Conda has the nice properties\nof allowing multiple versions of a package to be installed simultaneously,\nnot requiring evalated permissions to install Conda itself or packages using\nConda, and being cross platform. For these reasons, cwltool may run as a normal\nuser, install its own Conda environment and manage multiple versions of Conda packages\non both Linux and Mac OS X.\n\nThe Conda plugin can be endlessly configured, but a sensible set of defaults\nthat has proven a powerful stack for dependency management within the Galaxy tool\ndevelopment ecosystem can be enabled by simply passing cwltool the\n``--beta-conda-dependencies`` flag.\n\nWith this we can use the seqtk example above without Docker and without\nany externally managed services - cwltool should install everything it needs\nand create an environment for the tool. Try it out with the follwing command::\n\n  cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json\n\nThe CWL specification allows URIs to be attached to ``SoftwareRequirement`` s\nthat allow disambiguation of package names. If the mapping files described above\nallow deployers to adapt tools to their infrastructure, this mechanism allows\ntools to adapt their requirements to multiple package managers. To demonstrate\nthis within the context of the seqtk, we can simply break the package name we\nuse and then specify a specific Conda package as follows:\n\n.. code:: yaml\n\n  hints:\n    SoftwareRequirement:\n      packages:\n      - package: seqtk_seq\n        version:\n        - '1.2'\n        specs:\n        - https://anaconda.org/bioconda/seqtk\n        - https://packages.debian.org/sid/seqtk\n\nThe example can be executed using the command::\n\n  cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json\n\nThe plugin framework for managing resolution of these software requirements\nas maintained as part of `galaxy-lib <https://github.com/galaxyproject/galaxy-lib>`__ - a small, portable subset of the Galaxy\nproject. More information on configuration and implementation can be found\nat the following links:\n\n- `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__\n- `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__\n- `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb>`__\n- `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc>`__\n- `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214>`__\n\nCWL Tool Control Flow\n---------------------\n\nTechnical outline of how cwltool works internally, for maintainers.\n\n#. Use CWL ``load_tool()`` to load document.\n\n   #. Fetches the document from file or URL\n   #. Applies preprocessing (syntax/identifier expansion and normalization)\n   #. Validates the document based on cwlVersion\n   #. If necessary, updates the document to latest spec\n   #. Constructs a Process object using ``make_tool()``` callback.  This yields a\n      CommandLineTool, Workflow, or ExpressionTool.  For workflows, this\n      recursively constructs each workflow step.\n   #. To construct custom types for CommandLineTool, Workflow, or\n      ExpressionTool, provide a custom ``make_tool()``\n\n#. Iterate on the ``job()`` method of the Process object to get back runnable jobs.\n\n   #. ``job()`` is a generator method (uses the Python iterator protocol)\n   #. Each time the ``job()`` method is invoked in an iteration, it returns one\n      of: a runnable item (an object with a ``run()`` method), ``None`` (indicating\n      there is currently no work ready to run) or end of iteration (indicating\n      the process is complete.)\n   #. Invoke the runnable item by calling ``run()``.  This runs the tool and gets output.\n   #. Output of a process is reported by an output callback.\n   #. ``job()`` may be iterated over multiple times.  It will yield all the work\n      that is currently ready to run and then yield None.\n\n#. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation.\n\n   #. The WorkflowJob iterates over each WorkflowJobStep and determines if the\n      inputs the step are ready.\n   #. When a step is ready, it constructs an input object for that step and\n      iterates on the ``job()`` method of the workflow job step.\n   #. Each runnable item is yielded back up to top level run loop\n   #. When a step job completes and receives an output callback, the\n      job outputs are assigned to the output of the workflow step.\n   #. When all steps are complete, the intermediate files are moved to a final\n      workflow output, intermediate directories are deleted, and the output\n      callback for the workflow is called.\n\n#. ``CommandLineTool`` job() objects yield a single runnable object.\n\n   #. The CommandLineTool ``job()`` method calls ``makeJobRunner()`` to create a\n      ``CommandLineJob`` object\n   #. The job method configures the CommandLineJob object by setting public\n      attributes\n   #. The job method iterates over file and directories inputs to the\n      CommandLineTool and creates a \"path map\".\n   #. Files are mapped from their \"resolved\" location to a \"target\" path where\n      they will appear at tool invocation (for example, a location inside a\n      Docker container.)  The target paths are used on the command line.\n   #. Files are staged to targets paths using either Docker volume binds (when\n      using containers) or symlinks (if not).  This staging step enables files\n      to be logically rearranged or renamed independent of their source layout.\n   #. The ``run()`` method of CommandLineJob executes the command line tool or\n      Docker container, waits for it to complete, collects output, and makes\n      the output callback.\n\n\nExtension points\n----------------\n\nThe following functions can be provided to main(), to load_tool(), or to the\nexecutor to override or augment the listed behaviors.\n\nexecutor\n  ::\n\n    executor(tool, job_order_object, **kwargs)\n      (Process, Dict[Text, Any], **Any) -> Tuple[Dict[Text, Any], Text]\n\n  A toplevel workflow execution loop, should synchronously execute a process\n  object and return an output object.\n\nmakeTool\n  ::\n\n    makeTool(toolpath_object, **kwargs)\n      (Dict[Text, Any], **Any) -> Process\n\n  Construct a Process object from a document.\n\nselectResources\n  ::\n\n    selectResources(request)\n      (Dict[Text, int]) -> Dict[Text, int]\n\n  Take a resource request and turn it into a concrete resource assignment.\n\nversionfunc\n  ::\n\n    ()\n      () -> Text\n\n  Return version string.\n\nmake_fs_access\n  ::\n\n    make_fs_access(basedir)\n      (Text) -> StdFsAccess\n\n  Return a file system access object.\n\nfetcher_constructor\n  ::\n\n    fetcher_constructor(cache, session)\n      (Dict[unicode, unicode], requests.sessions.Session) -> Fetcher\n\n  Construct a Fetcher object with the supplied cache and HTTP session.\n\nresolver\n  ::\n\n    resolver(document_loader, document)\n      (Loader, Union[Text, dict[Text, Any]]) -> Text\n\n  Resolve a relative document identifier to an absolute one which can be fetched.\n\nlogger_handler\n  ::\n\n    logger_handler\n      logging.Handler\n\n  Handler object for logging.", 
    "tox": true, 
    "lcname": "cwltool", 
    "bugtrack_url": "", 
    "github": true, 
    "coveralls": true, 
    "name": "cwltool", 
    "license": "", 
    "travis_ci": true, 
    "github_project": "cwltool", 
    "summary": "Common workflow language reference implementation", 
    "split_keywords": [], 
    "author_email": "common-workflow-language@googlegroups.com", 
    "urls": [
        {
            "has_sig": false, 
            "upload_time": "2017-08-28T13:55:47", 
            "comment_text": "", 
            "python_version": "source", 
            "url": "https://pypi.python.org/packages/3f/a4/d76db1a5acf961f0a52f7c05fdeefac3146a52ac6235f7f9e9095e1ea08c/cwltool-1.0.20170828135420.tar.gz", 
            "md5_digest": "c6baabec59b98fa00671ee100ce1e592", 
            "downloads": 0, 
            "filename": "cwltool-1.0.20170828135420.tar.gz", 
            "packagetype": "sdist", 
            "path": "3f/a4/d76db1a5acf961f0a52f7c05fdeefac3146a52ac6235f7f9e9095e1ea08c/cwltool-1.0.20170828135420.tar.gz", 
            "size": 277556
        }
    ], 
    "_id": null, 
    "cheesecake_installability_id": null
}