#############################################################################################
``cwltool``: The reference implementation of the Common Workflow Language standards
#############################################################################################
|Linux Status| |Coverage Status| |Docs Status|
PyPI: |PyPI Version| |PyPI Downloads Month| |Total PyPI Downloads|
Conda: |Conda Version| |Conda Installs|
Debian: |Debian Testing package| |Debian Stable package|
Quay.io (Docker): |Quay.io Container|
.. |Linux Status| image:: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml/badge.svg?branch=main
:target: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml
.. |Debian Stable package| image:: https://badges.debian.net/badges/debian/stable/cwltool/version.svg
:target: https://packages.debian.org/stable/cwltool
.. |Debian Testing package| image:: https://badges.debian.net/badges/debian/testing/cwltool/version.svg
:target: https://packages.debian.org/testing/cwltool
.. |Coverage Status| image:: https://img.shields.io/codecov/c/github/common-workflow-language/cwltool.svg
:target: https://codecov.io/gh/common-workflow-language/cwltool
.. |PyPI Version| image:: https://badge.fury.io/py/cwltool.svg
:target: https://badge.fury.io/py/cwltool
.. |PyPI Downloads Month| image:: https://pepy.tech/badge/cwltool/month
:target: https://pepy.tech/project/cwltool
.. |Total PyPI Downloads| image:: https://static.pepy.tech/personalized-badge/cwltool?period=total&units=international_system&left_color=black&right_color=orange&left_text=Total%20PyPI%20Downloads
:target: https://pepy.tech/project/cwltool
.. |Conda Version| image:: https://anaconda.org/conda-forge/cwltool/badges/version.svg
:target: https://anaconda.org/conda-forge/cwltool
.. |Conda Installs| image:: https://anaconda.org/conda-forge/cwltool/badges/downloads.svg
:target: https://anaconda.org/conda-forge/cwltool
.. |Quay.io Container| image:: https://quay.io/repository/commonwl/cwltool/status
:target: https://quay.io/repository/commonwl/cwltool
.. |Docs Status| image:: https://readthedocs.org/projects/cwltool/badge/?version=latest
:target: https://cwltool.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
This is the reference implementation of the `Common Workflow Language open
standards <https://www.commonwl.org/>`_. It is intended to be feature complete
and provide comprehensive validation of CWL
files as well as provide other tools related to working with CWL.
``cwltool`` is written and tested for
`Python <https://www.python.org/>`_ ``3.x {x = 6, 8, 9, 10, 11}``
The reference implementation consists of two packages. The ``cwltool`` package
is the primary Python module containing the reference implementation in the
``cwltool`` module and console executable by the same name.
The ``cwlref-runner`` package is optional and provides an additional entry point
under the alias ``cwl-runner``, which is the implementation-agnostic name for the
default CWL interpreter installed on a host.
``cwltool`` is provided by the CWL project, `a member project of Software Freedom Conservancy <https://sfconservancy.org/news/2018/apr/11/cwl-new-member-project/>`_
and our `many contributors <https://github.com/common-workflow-language/cwltool/graphs/contributors>`_.
.. contents:: Table of Contents
*******
Install
*******
``cwltool`` packages
====================
Your operating system may offer cwltool directly. For `Debian <https://tracker.debian.org/pkg/cwltool>`_, `Ubuntu <https://launchpad.net/ubuntu/+source/cwltool>`_,
and similar Linux distribution try
.. code:: bash
sudo apt-get install cwltool
If you encounter an error, first try to update package information by using
.. code:: bash
sudo apt-get update
If you are running macOS X or other UNIXes and you want to use packages prepared by the conda-forge project, then
please follow the install instructions for `conda-forge <https://conda-forge.org/#about>`_ (if you haven't already) and then
.. code:: bash
conda install -c conda-forge cwltool
All of the above methods of installing ``cwltool`` use packages that might contain bugs already fixed in newer versions or be missing desired features.
If the packaged version of ``cwltool`` available to you is too old, then we recommend installing using ``pip`` and ``venv``
.. code:: bash
python3 -m venv env # Create a virtual environment named 'env' in the current directory
source env/bin/activate # Activate environment before installing `cwltool`
Then install the latest ``cwlref-runner`` package from PyPi (which will install the latest ``cwltool`` package as
well)
.. code:: bash
pip install cwlref-runner
If installing alongside another CWL implementation (like ``toil-cwl-runner`` or ``arvados-cwl-runner``) then instead run
.. code:: bash
pip install cwltool
MS Windows users
================
1. `Install Windows Subsystem for Linux 2 and Docker Desktop <https://docs.docker.com/docker-for-windows/wsl/#prerequisites>`_.
2. `Install Debian from the Microsoft Store <https://www.microsoft.com/en-us/p/debian/9msvkqc78pk6>`_.
3. Set Debian as your default WSL 2 distro: ``wsl --set-default debian``.
4. Return to the Docker Desktop, choose ``Settings`` → ``Resources`` → ``WSL Integration`` and under "Enable integration with additional distros" select "Debian",
5. Reboot if you have not yet already.
6. Launch Debian and follow the Linux instructions above (``apt-get install cwltool`` or use the ``venv`` method)
Network problems from within WSL2? Try `these instructions <https://github.com/microsoft/WSL/issues/4731#issuecomment-702176954>`_ followed by ``wsl --shutdown``.
``cwltool`` development version
===============================
Or you can skip the direct ``pip`` commands above and install the latest development version of ``cwltool``:
.. code:: bash
git clone https://github.com/common-workflow-language/cwltool.git # clone (copy) the cwltool git repository
cd cwltool # Change to source directory that git clone just downloaded
pip install .[deps] # Installs ``cwltool`` from source
cwltool --version # Check if the installation works correctly
Remember, if co-installing multiple CWL implementations, then you need to
maintain which implementation ``cwl-runner`` points to via a symbolic file
system link or `another facility <https://wiki.debian.org/DebianAlternatives>`_.
Recommended Software
====================
We strongly suggested to have the following installed:
* One of the following software container engines
* `Podman <https://podman.io/getting-started/installation>`_
* `Docker <https://docs.docker.com/engine/install/>`_
* Singularity/Apptainer: See `Using Singularity`_
* udocker: See `Using uDocker`_
* `node.js <https://nodejs.org/en/download/>`_ for evaluating CWL Expressions quickly
(required for `udocker` users, optional but recommended for the other container engines).
Without these, some examples in the CWL tutorials at http://www.commonwl.org/user_guide/ may not work.
***********************
Run on the command line
***********************
Simple command::
cwl-runner my_workflow.cwl my_inputs.yaml
Or if you have multiple CWL implementations installed and you want to override
the default cwl-runner then use::
cwltool my_workflow.cwl my_inputs.yml
You can set cwltool options in the environment with ``CWLTOOL_OPTIONS``,
these will be inserted at the beginning of the command line::
export CWLTOOL_OPTIONS="--debug"
Use with boot2docker on macOS
=============================
boot2docker runs Docker inside a virtual machine, and it only mounts ``Users``
on it. The default behavior of CWL is to create temporary directories under e.g.
``/Var`` which is not accessible to Docker containers.
To run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix``
and ``--tmp-outdir-prefix`` to somewhere under ``/Users``::
$ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json
Using uDocker
=============
Some shared computing environments don't support Docker software containers for technical or policy reasons.
As a workaround, the CWL reference runner supports using the `udocker <https://github.com/indigo-dc/udocker>`_
program on Linux using ``--udocker``.
udocker installation: https://indigo-dc.github.io/udocker/installation_manual.html
Run `cwltool` just as you usually would, but with ``--udocker`` prior to the workflow path:
.. code:: bash
cwltool --udocker https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/empty.json
As was mentioned in the `Recommended Software`_ section,
Using Singularity
=================
``cwltool`` can also use `Singularity <https://github.com/hpcng/singularity/releases/>`_ version 2.6.1
or later as a Docker container runtime.
``cwltool`` with Singularity will run software containers specified in
``DockerRequirement`` and therefore works with Docker images only, native
Singularity images are not supported. To use Singularity as the Docker container
runtime, provide ``--singularity`` command line option to ``cwltool``.
With Singularity, ``cwltool`` can pass all CWL v1.0 conformance tests, except
those involving Docker container ENTRYPOINTs.
Example
.. code:: bash
cwltool --singularity https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/cat3-tool-mediumcut.cwl https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/cat-job.json
Running a tool or workflow from remote or local locations
=========================================================
``cwltool`` can run tool and workflow descriptions on both local and remote
systems via its support for HTTP[S] URLs.
Input job files and Workflow steps (via the `run` directive) can reference CWL
documents using absolute or relative local filesystem paths. If a relative path
is referenced and that document isn't found in the current directory, then the
following locations will be searched:
http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem
You can also use `cwldep <https://github.com/common-workflow-language/cwldep>`_
to manage dependencies on external tools and workflows.
Overriding workflow requirements at load time
=============================================
Sometimes a workflow needs additional requirements to run in a particular
environment or with a particular dataset. To avoid the need to modify the
underlying workflow, cwltool supports requirement "overrides".
The format of the "overrides" object is a mapping of item identifier (workflow,
workflow step, or command line tool) to the process requirements that should be applied.
.. code:: yaml
cwltool:overrides:
echo.cwl:
requirements:
EnvVarRequirement:
envDef:
MESSAGE: override_value
Overrides can be specified either on the command line, or as part of the job
input document. Workflow steps are identified using the name of the workflow
file followed by the step name as a document fragment identifier "#id".
Override identifiers are relative to the top-level workflow document.
.. code:: bash
cwltool --overrides overrides.yml my-tool.cwl my-job.yml
.. code:: yaml
input_parameter1: value1
input_parameter2: value2
cwltool:overrides:
workflow.cwl#step1:
requirements:
EnvVarRequirement:
envDef:
MESSAGE: override_value
.. code:: bash
cwltool my-tool.cwl my-job-with-overrides.yml
Combining parts of a workflow into a single document
====================================================
Use ``--pack`` to combine a workflow made up of multiple files into a
single compound document. This operation takes all the CWL files
referenced by a workflow and builds a new CWL document with all
Process objects (CommandLineTool and Workflow) in a list in the
``$graph`` field. Cross references (such as ``run:`` and ``source:``
fields) are updated to internal references within the new packed
document. The top-level workflow is named ``#main``.
.. code:: bash
cwltool --pack my-wf.cwl > my-packed-wf.cwl
Running only part of a workflow
===============================
You can run a partial workflow with the ``--target`` (``-t``) option. This
takes the name of an output parameter, workflow step, or input
parameter in the top-level workflow. You may provide multiple
targets.
.. code:: bash
cwltool --target step3 my-wf.cwl
If a target is an output parameter, it will only run only the steps
that contribute to that output. If a target is a workflow step, it
will run the workflow starting from that step. If a target is an
input parameter, it will only run the steps connected to
that input.
Use ``--print-targets`` to get a listing of the targets of a workflow.
To see which steps will run, use ``--print-subgraph`` with
``--target`` to get a printout of the workflow subgraph for the
selected targets.
.. code:: bash
cwltool --print-targets my-wf.cwl
cwltool --target step3 --print-subgraph my-wf.cwl > my-wf-starting-from-step3.cwl
Visualizing a CWL document
==========================
The ``--print-dot`` option will print a file suitable for Graphviz ``dot`` program. Here is a bash onliner to generate a Scalable Vector Graphic (SVG) file:
.. code:: bash
cwltool --print-dot my-wf.cwl | dot -Tsvg > my-wf.svg
Modeling a CWL document as RDF
==============================
CWL documents can be expressed as RDF triple graphs.
.. code:: bash
cwltool --print-rdf --rdf-serializer=turtle mywf.cwl
Environment Variables in cwltool
================================
This reference implementation supports several ways of setting
environment variables for tools, in addition to the standard
``EnvVarRequirement``. The sequence of steps applied to create the
environment is:
0. If the ``--preserve-entire-environment`` flag is present, then begin with the current
environment, else begin with an empty environment.
1. Add any variables specified by ``--preserve-environment`` option(s).
2. Set ``TMPDIR`` and ``HOME`` per `the CWL v1.0+ CommandLineTool specification <https://www.commonwl.org/v1.0/CommandLineTool.html#Runtime_environment>`_.
3. Apply any ``EnvVarRequirement`` from the ``CommandLineTool`` description.
4. Apply any manipulations required by any ``cwltool:MPIRequirement`` extensions.
5. Substitute any secrets required by ``Secrets`` extension.
6. Modify the environment in response to ``SoftwareRequirement`` (see below).
Leveraging SoftwareRequirements (Beta)
--------------------------------------
CWL tools may be decorated with ``SoftwareRequirement`` hints that cwltool
may in turn use to resolve to packages in various package managers or
dependency management systems such as `Environment Modules
<http://modules.sourceforge.net/>`__.
Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional
dependency, for this reason be sure to use specify the ``deps`` modifier when
installing cwltool. For instance::
$ pip install 'cwltool[deps]'
Installing cwltool in this fashion enables several new command line options.
The most general of these options is ``--beta-dependency-resolvers-configuration``.
This option allows one to specify a dependency resolver's configuration file.
This file may be specified as either XML or YAML and very simply describes various
plugins to enable to "resolve" ``SoftwareRequirement`` dependencies.
Using these hints will allow cwltool to modify the environment in
which your tool runs, for example by loading one or more Environment
Modules. The environment is constructed as above, then the environment
may modified by the selected tool resolver. This currently means that
you cannot override any environment variables set by the selected tool
resolver. Note that the environment given to the configured dependency
resolver has the variable `_CWLTOOL` set to `1` to allow introspection.
To discuss some of these plugins and how to configure them, first consider the
following ``hint`` definition for an example CWL tool.
.. code:: yaml
SoftwareRequirement:
packages:
- package: seqtk
version:
- r93
Now imagine deploying cwltool on a cluster with Software Modules installed
and that a ``seqtk`` module is available at version ``r93``. This means cluster
users likely won't have the binary ``seqtk`` on their ``PATH`` by default, but after
sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is
available on the ``PATH``. A simple dependency resolvers configuration file, called
``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source
the correct module environment before executing the above tool would simply be:
.. code:: yaml
- type: modules
The outer list indicates that one plugin is being enabled, the plugin parameters are
defined as a dictionary for this one list item. There is only one required parameter
for the plugin above, this is ``type`` and defines the plugin type. This parameter
is required for all plugins. The available plugins and the parameters
available for each are documented (incompletely) `here
<https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.
Unfortunately, this documentation is in the context of Galaxy tool
``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.
cwltool is distributed with an example of such seqtk tool and sample corresponding
job. It could executed from the cwltool root using a dependency resolvers
configuration file such as the above one using the command::
cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
tests/seqtk_seq.cwl \
tests/seqtk_seq_job.json
This example demonstrates both that cwltool can leverage
existing software installations and also handle workflows with dependencies
on different versions of the same software and libraries. However the above
example does require an existing module setup so it is impossible to test this example
"out of the box" with cwltool. For a more isolated test that demonstrates all
the same concepts - the resolver plugin type ``galaxy_packages`` can be used.
"Galaxy packages" are a lighter-weight alternative to Environment Modules that are
really just defined by a way to lay out directories into packages and versions
to find little scripts that are sourced to modify the environment. They have
been used for years in Galaxy community to adapt Galaxy tools to cluster
environments but require neither knowledge of Galaxy nor any special tools to
setup. These should work just fine for CWL tools.
The cwltool source code repository's test directory is setup with a very simple
directory that defines a set of "Galaxy packages" (but really just defines one
package named ``random-lines``). The directory layout is simply::
tests/test_deps_env/
random-lines/
1.0/
env.sh
If the ``galaxy_packages`` plugin is enabled and pointed at the
``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``
such as the following is encountered.
.. code:: yaml
hints:
SoftwareRequirement:
packages:
- package: 'random-lines'
version:
- '1.0'
Then cwltool will simply find that ``env.sh`` file and source it before executing
the corresponding tool. That ``env.sh`` script is only responsible for modifying
the job's ``PATH`` to add the required binaries.
This is a full example that works since resolving "Galaxy packages" has no
external requirements. Try it out by executing the following command from cwltool's
root directory::
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
tests/random_lines.cwl \
tests/random_lines_job.json
The resolvers configuration file in the above example was simply:
.. code:: yaml
- type: galaxy_packages
base_path: ./tests/test_deps_env
It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not
match the module names for a given cluster. Such requirements can be re-mapped
to specific deployed packages or versions using another file specified using
the resolver plugin parameter `mapping_files`. We will
demonstrate this using `galaxy_packages,` but the concepts apply equally well
to Environment Modules or Conda packages (described below), for instance.
So consider the resolvers configuration file.
(`tests/test_deps_env_resolvers_conf_rewrite.yml`):
.. code:: yaml
- type: galaxy_packages
base_path: ./tests/test_deps_env
mapping_files: ./tests/test_deps_mapping.yml
And the corresponding mapping configuration file (`tests/test_deps_mapping.yml`):
.. code:: yaml
- from:
name: randomLines
version: 1.0.0-rc1
to:
name: random-lines
version: '1.0'
This is saying if cwltool encounters a requirement of ``randomLines`` at version
``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at
version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``
that contains such a source ``SoftwareRequirement``. To try out this example with
mapping, execute the following command from the cwltool root directory::
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
tests/random_lines_mapping.cwl \
tests/random_lines_job.json
The previous examples demonstrated leveraging existing infrastructure to
provide requirements for CWL tools. If instead a real package manager is used
cwltool has the opportunity to install requirements as needed. While initial
support for Homebrew/Linuxbrew plugins is available, the most developed such
plugin is for the `Conda <https://conda.io/docs/#>`__ package manager. Conda has the nice properties
of allowing multiple versions of a package to be installed simultaneously,
not requiring evaluated permissions to install Conda itself or packages using
Conda, and being cross-platform. For these reasons, cwltool may run as a normal
user, install its own Conda environment and manage multiple versions of Conda packages
on Linux and Mac OS X.
The Conda plugin can be endlessly configured, but a sensible set of defaults
that has proven a powerful stack for dependency management within the Galaxy tool
development ecosystem can be enabled by simply passing cwltool the
``--beta-conda-dependencies`` flag.
With this, we can use the seqtk example above without Docker or any externally managed services - cwltool should install everything it needs
and create an environment for the tool. Try it out with the following command::
cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s
that allow disambiguation of package names. If the mapping files described above
allow deployers to adapt tools to their infrastructure, this mechanism allows
tools to adapt their requirements to multiple package managers. To demonstrate
this within the context of the seqtk, we can simply break the package name we
use and then specify a specific Conda package as follows:
.. code:: yaml
hints:
SoftwareRequirement:
packages:
- package: seqtk_seq
version:
- '1.2'
specs:
- https://anaconda.org/bioconda/seqtk
- https://packages.debian.org/sid/seqtk
The example can be executed using the command::
cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
The plugin framework for managing the resolution of these software requirements
as maintained as part of `galaxy-tool-util <https://github.com/galaxyproject/galaxy/tree/dev/packages/tool_util>`__ - a small,
portable subset of the Galaxy project. More information on configuration and implementation can be found
at the following links:
- `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__
- `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__
- `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb>`__
- `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc>`__
- `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214>`__
Use with GA4GH Tool Registry API
================================
Cwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints.
By default, cwltool searches https://dockstore.org/ . Use ``--add-tool-registry`` to add other registries to the search path.
For example ::
cwltool quay.io/collaboratory/dockstore-tool-bamstats:develop test.json
and (defaults to latest when a version is not specified) ::
cwltool quay.io/collaboratory/dockstore-tool-bamstats test.json
For this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats ::
wget https://dockstore.org/api/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-bamstats/versions/develop/PLAIN-CWL/descriptor/test.json
wget https://github.com/CancerCollaboratory/dockstore-tool-bamstats/raw/develop/rna.SRR948778.bam
.. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas
Running MPI-based tools that need to be launched
================================================
Cwltool supports an extension to the CWL spec
``http://commonwl.org/cwltool#MPIRequirement``. When the tool
definition has this in its ``requirements``/``hints`` section, and
cwltool has been run with ``--enable-ext``, then the tool's command
line will be extended with the commands needed to launch it with
``mpirun`` or similar. You can specify the number of processes to
start as either a literal integer or an expression (that will result
in an integer). For example::
#!/usr/bin/env cwl-runner
cwlVersion: v1.1
class: CommandLineTool
$namespaces:
cwltool: "http://commonwl.org/cwltool#"
requirements:
cwltool:MPIRequirement:
processes: $(inputs.nproc)
inputs:
nproc:
type: int
Interaction with containers: the MPIRequirement currently prepends its
commands to the front of the command line that is constructed. If you
wish to run a containerized application in parallel, for simple use
cases, this does work with Singularity, depending upon the platform
setup. However, this combination should be considered "alpha" -- please
do report any issues you have! This does not work with Docker at the
moment. (More precisely, you get `n` copies of the same single process
image run at the same time that cannot communicate with each other.)
The host-specific parameters are configured in a simple YAML file
(specified with the ``--mpi-config-file`` flag). The allowed keys are
given in the following table; all are optional.
+----------------+------------------+----------+------------------------------+
| Key | Type | Default | Description |
+================+==================+==========+==============================+
| runner | str | "mpirun" | The primary command to use. |
+----------------+------------------+----------+------------------------------+
| nproc_flag | str | "-n" | Flag to set number of |
| | | | processes to start. |
+----------------+------------------+----------+------------------------------+
| default_nproc | int | 1 | Default number of processes. |
+----------------+------------------+----------+------------------------------+
| extra_flags | List[str] | [] | A list of any other flags to |
| | | | be added to the runner's |
| | | | command line before |
| | | | the ``baseCommand``. |
+----------------+------------------+----------+------------------------------+
| env_pass | List[str] | [] | A list of environment |
| | | | variables that should be |
| | | | passed from the host |
| | | | environment through to the |
| | | | tool (e.g., giving the |
| | | | node list as set by your |
| | | | scheduler). |
+----------------+------------------+----------+------------------------------+
| env_pass_regex | List[str] | [] | A list of python regular |
| | | | expressions that will be |
| | | | matched against the host's |
| | | | environment. Those that match|
| | | | will be passed through. |
+----------------+------------------+----------+------------------------------+
| env_set | Mapping[str,str] | {} | A dictionary whose keys are |
| | | | the environment variables set|
| | | | and the values being the |
| | | | values. |
+----------------+------------------+----------+------------------------------+
Enabling Fast Parser (experimental)
===================================
For very large workflows, `cwltool` can spend a lot of time in
initialization, before the first step runs. There is an experimental
flag ``--fast-parser`` which can dramatically reduce the
initialization overhead, however as of this writing it has several limitations:
- Error reporting in general is worse than the standard parser, you will want to use it with workflows that you know are already correct.
- It does not check for dangling links (these will become runtime errors instead of loading errors)
- Several other cases fail, as documented in https://github.com/common-workflow-language/cwltool/pull/1720
***********
Development
***********
Running tests locally
=====================
- Running basic tests ``(/tests)``:
To run the basic tests after installing `cwltool` execute the following:
.. code:: bash
pip install -rtest-requirements.txt
pytest ## N.B. This requires node.js or docker to be available
To run various tests in all supported Python environments, we use `tox <https://github.com/common-workflow-language/cwltool/tree/main/tox.ini>`_. To run the test suite in all supported Python environments
first clone the complete code repository (see the ``git clone`` instructions above) and then run
the following in the terminal:
``pip install "tox<4"; tox -p``
List of all environment can be seen using:
``tox --listenvs``
and running a specific test env using:
``tox -e <env name>``
and additionally run a specific test using this format:
``tox -e py310-unit -- -v tests/test_examples.py::test_scandeps``
- Running the entire suite of CWL conformance tests:
The GitHub repository for the CWL specifications contains a script that tests a CWL
implementation against a wide array of valid CWL files using the `cwltest <https://github.com/common-workflow-language/cwltest>`_
program
Instructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/main/CONFORMANCE_TESTS.md .
Import as a module
==================
Add
.. code:: python
import cwltool
to your script.
The easiest way to use cwltool to run a tool or workflow from Python is to use a Factory
.. code:: python
import cwltool.factory
fac = cwltool.factory.Factory()
echo = fac.make("echo.cwl")
result = echo(inp="foo")
# result["out"] == "foo"
CWL Tool Control Flow
=====================
Technical outline of how cwltool works internally, for maintainers.
#. Use CWL ``load_tool()`` to load document.
#. Fetches the document from file or URL
#. Applies preprocessing (syntax/identifier expansion and normalization)
#. Validates the document based on cwlVersion
#. If necessary, updates the document to the latest spec
#. Constructs a Process object using ``make_tool()``` callback. This yields a
CommandLineTool, Workflow, or ExpressionTool. For workflows, this
recursively constructs each workflow step.
#. To construct custom types for CommandLineTool, Workflow, or
ExpressionTool, provide a custom ``make_tool()``
#. Iterate on the ``job()`` method of the Process object to get back runnable jobs.
#. ``job()`` is a generator method (uses the Python iterator protocol)
#. Each time the ``job()`` method is invoked in an iteration, it returns one
of: a runnable item (an object with a ``run()`` method), ``None`` (indicating
there is currently no work ready to run) or end of iteration (indicating
the process is complete.)
#. Invoke the runnable item by calling ``run()``. This runs the tool and gets output.
#. An output callback reports the output of a process.
#. ``job()`` may be iterated over multiple times. It will yield all the work
that is currently ready to run and then yield None.
#. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation.
#. The WorkflowJob iterates over each WorkflowJobStep and determines if the
inputs the step are ready.
#. When a step is ready, it constructs an input object for that step and
iterates on the ``job()`` method of the workflow job step.
#. Each runnable item is yielded back up to top-level run loop
#. When a step job completes and receives an output callback, the
job outputs are assigned to the output of the workflow step.
#. When all steps are complete, the intermediate files are moved to a final
workflow output, intermediate directories are deleted, and the workflow's output callback is called.
#. ``CommandLineTool`` job() objects yield a single runnable object.
#. The CommandLineTool ``job()`` method calls ``make_job_runner()`` to create a
``CommandLineJob`` object
#. The job method configures the CommandLineJob object by setting public
attributes
#. The job method iterates over file and directories inputs to the
CommandLineTool and creates a "path map".
#. Files are mapped from their "resolved" location to a "target" path where
they will appear at tool invocation (for example, a location inside a
Docker container.) The target paths are used on the command line.
#. Files are staged to targets paths using either Docker volume binds (when
using containers) or symlinks (if not). This staging step enables files
to be logically rearranged or renamed independent of their source layout.
#. The ``run()`` method of CommandLineJob executes the command line tool or
Docker container, waits for it to complete, collects output, and makes
the output callback.
Extension points
================
The following functions can be passed to main() to override or augment
the listed behaviors.
executor
::
executor(tool, job_order_object, runtimeContext, logger)
(Process, Dict[Text, Any], RuntimeContext) -> Tuple[Dict[Text, Any], Text]
An implementation of the top-level workflow execution loop should
synchronously run a process object to completion and return the
output object.
versionfunc
::
()
() -> Text
Return version string.
logger_handler
::
logger_handler
logging.Handler
Handler object for logging.
The following functions can be set in LoadingContext to override or
augment the listed behaviors.
fetcher_constructor
::
fetcher_constructor(cache, session)
(Dict[unicode, unicode], requests.sessions.Session) -> Fetcher
Construct a Fetcher object with the supplied cache and HTTP session.
resolver
::
resolver(document_loader, document)
(Loader, Union[Text, dict[Text, Any]]) -> Text
Resolve a relative document identifier to an absolute one that can be fetched.
The following functions can be set in RuntimeContext to override or
augment the listed behaviors.
construct_tool_object
::
construct_tool_object(toolpath_object, loadingContext)
(MutableMapping[Text, Any], LoadingContext) -> Process
Hook to construct a Process object (eg CommandLineTool) object from a document.
select_resources
::
selectResources(request)
(Dict[str, int], RuntimeContext) -> Dict[Text, int]
Take a resource request and turn it into a concrete resource assignment.
make_fs_access
::
make_fs_access(basedir)
(Text) -> StdFsAccess
Return a file system access object.
In addition, when providing custom subclasses of Process objects, you can override the following methods:
CommandLineTool.make_job_runner
::
make_job_runner(RuntimeContext)
(RuntimeContext) -> Type[JobBase]
Create and return a job runner object (this implements concrete execution of a command line tool).
Workflow.make_workflow_step
::
make_workflow_step(toolpath_object, pos, loadingContext, parentworkflowProv)
(Dict[Text, Any], int, LoadingContext, Optional[ProvenanceProfile]) -> WorkflowStep
Create and return a workflow step object.
Raw data
{
"_id": null,
"home_page": "https://github.com/common-workflow-language/cwltool",
"name": "cwltool",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Common workflow language working group",
"author_email": "common-workflow-language@googlegroups.com",
"download_url": "https://files.pythonhosted.org/packages/f1/4a/6c0d44ed5a0785544ecd7817a031d1a5f98e2e40f6daf68ddb34155db42a/cwltool-3.1.20241007082533.tar.gz",
"platform": null,
"description": "#############################################################################################\n``cwltool``: The reference implementation of the Common Workflow Language standards\n#############################################################################################\n\n|Linux Status| |Coverage Status| |Docs Status|\n\nPyPI: |PyPI Version| |PyPI Downloads Month| |Total PyPI Downloads|\n\nConda: |Conda Version| |Conda Installs|\n\nDebian: |Debian Testing package| |Debian Stable package|\n\nQuay.io (Docker): |Quay.io Container|\n\n.. |Linux Status| image:: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml/badge.svg?branch=main\n :target: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml\n\n.. |Debian Stable package| image:: https://badges.debian.net/badges/debian/stable/cwltool/version.svg\n :target: https://packages.debian.org/stable/cwltool\n\n.. |Debian Testing package| image:: https://badges.debian.net/badges/debian/testing/cwltool/version.svg\n :target: https://packages.debian.org/testing/cwltool\n\n.. |Coverage Status| image:: https://img.shields.io/codecov/c/github/common-workflow-language/cwltool.svg\n :target: https://codecov.io/gh/common-workflow-language/cwltool\n\n.. |PyPI Version| image:: https://badge.fury.io/py/cwltool.svg\n :target: https://badge.fury.io/py/cwltool\n\n.. |PyPI Downloads Month| image:: https://pepy.tech/badge/cwltool/month\n :target: https://pepy.tech/project/cwltool\n\n.. |Total PyPI Downloads| image:: https://static.pepy.tech/personalized-badge/cwltool?period=total&units=international_system&left_color=black&right_color=orange&left_text=Total%20PyPI%20Downloads\n :target: https://pepy.tech/project/cwltool\n\n.. |Conda Version| image:: https://anaconda.org/conda-forge/cwltool/badges/version.svg\n :target: https://anaconda.org/conda-forge/cwltool\n\n.. |Conda Installs| image:: https://anaconda.org/conda-forge/cwltool/badges/downloads.svg\n :target: https://anaconda.org/conda-forge/cwltool\n\n.. |Quay.io Container| image:: https://quay.io/repository/commonwl/cwltool/status\n :target: https://quay.io/repository/commonwl/cwltool\n\n.. |Docs Status| image:: https://readthedocs.org/projects/cwltool/badge/?version=latest\n :target: https://cwltool.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\nThis is the reference implementation of the `Common Workflow Language open\nstandards <https://www.commonwl.org/>`_. It is intended to be feature complete\nand provide comprehensive validation of CWL\nfiles as well as provide other tools related to working with CWL.\n\n``cwltool`` is written and tested for\n`Python <https://www.python.org/>`_ ``3.x {x = 6, 8, 9, 10, 11}``\n\nThe reference implementation consists of two packages. The ``cwltool`` package\nis the primary Python module containing the reference implementation in the\n``cwltool`` module and console executable by the same name.\n\nThe ``cwlref-runner`` package is optional and provides an additional entry point\nunder the alias ``cwl-runner``, which is the implementation-agnostic name for the\ndefault CWL interpreter installed on a host.\n\n``cwltool`` is provided by the CWL project, `a member project of Software Freedom Conservancy <https://sfconservancy.org/news/2018/apr/11/cwl-new-member-project/>`_\nand our `many contributors <https://github.com/common-workflow-language/cwltool/graphs/contributors>`_.\n\n.. contents:: Table of Contents\n\n*******\nInstall\n*******\n\n``cwltool`` packages\n====================\n\nYour operating system may offer cwltool directly. For `Debian <https://tracker.debian.org/pkg/cwltool>`_, `Ubuntu <https://launchpad.net/ubuntu/+source/cwltool>`_,\nand similar Linux distribution try\n\n.. code:: bash\n\n sudo apt-get install cwltool\n\nIf you encounter an error, first try to update package information by using\n\n.. code:: bash\n\n sudo apt-get update\n\nIf you are running macOS X or other UNIXes and you want to use packages prepared by the conda-forge project, then\nplease follow the install instructions for `conda-forge <https://conda-forge.org/#about>`_ (if you haven't already) and then\n\n.. code:: bash\n\n conda install -c conda-forge cwltool\n\nAll of the above methods of installing ``cwltool`` use packages that might contain bugs already fixed in newer versions or be missing desired features.\nIf the packaged version of ``cwltool`` available to you is too old, then we recommend installing using ``pip`` and ``venv``\n\n.. code:: bash\n\n python3 -m venv env # Create a virtual environment named 'env' in the current directory\n source env/bin/activate # Activate environment before installing `cwltool`\n\nThen install the latest ``cwlref-runner`` package from PyPi (which will install the latest ``cwltool`` package as\nwell)\n\n.. code:: bash\n\n pip install cwlref-runner\n\nIf installing alongside another CWL implementation (like ``toil-cwl-runner`` or ``arvados-cwl-runner``) then instead run\n\n.. code:: bash\n\n pip install cwltool\n\nMS Windows users\n================\n\n1. `Install Windows Subsystem for Linux 2 and Docker Desktop <https://docs.docker.com/docker-for-windows/wsl/#prerequisites>`_. \n2. `Install Debian from the Microsoft Store <https://www.microsoft.com/en-us/p/debian/9msvkqc78pk6>`_.\n3. Set Debian as your default WSL 2 distro: ``wsl --set-default debian``.\n4. Return to the Docker Desktop, choose ``Settings`` \u2192 ``Resources`` \u2192 ``WSL Integration`` and under \"Enable integration with additional distros\" select \"Debian\",\n5. Reboot if you have not yet already.\n6. Launch Debian and follow the Linux instructions above (``apt-get install cwltool`` or use the ``venv`` method)\n\nNetwork problems from within WSL2? Try `these instructions <https://github.com/microsoft/WSL/issues/4731#issuecomment-702176954>`_ followed by ``wsl --shutdown``.\n\n``cwltool`` development version\n===============================\n\nOr you can skip the direct ``pip`` commands above and install the latest development version of ``cwltool``:\n\n.. code:: bash\n\n git clone https://github.com/common-workflow-language/cwltool.git # clone (copy) the cwltool git repository\n cd cwltool # Change to source directory that git clone just downloaded\n pip install .[deps] # Installs ``cwltool`` from source\n cwltool --version # Check if the installation works correctly\n\nRemember, if co-installing multiple CWL implementations, then you need to\nmaintain which implementation ``cwl-runner`` points to via a symbolic file\nsystem link or `another facility <https://wiki.debian.org/DebianAlternatives>`_.\n\nRecommended Software\n====================\n\nWe strongly suggested to have the following installed:\n\n* One of the following software container engines\n\n * `Podman <https://podman.io/getting-started/installation>`_\n * `Docker <https://docs.docker.com/engine/install/>`_\n * Singularity/Apptainer: See `Using Singularity`_\n * udocker: See `Using uDocker`_\n\n* `node.js <https://nodejs.org/en/download/>`_ for evaluating CWL Expressions quickly\n (required for `udocker` users, optional but recommended for the other container engines).\n\nWithout these, some examples in the CWL tutorials at http://www.commonwl.org/user_guide/ may not work.\n\n***********************\nRun on the command line\n***********************\n\nSimple command::\n\n cwl-runner my_workflow.cwl my_inputs.yaml\n\nOr if you have multiple CWL implementations installed and you want to override\nthe default cwl-runner then use::\n\n cwltool my_workflow.cwl my_inputs.yml\n\nYou can set cwltool options in the environment with ``CWLTOOL_OPTIONS``,\nthese will be inserted at the beginning of the command line::\n\n export CWLTOOL_OPTIONS=\"--debug\"\n\nUse with boot2docker on macOS\n=============================\nboot2docker runs Docker inside a virtual machine, and it only mounts ``Users``\non it. The default behavior of CWL is to create temporary directories under e.g.\n``/Var`` which is not accessible to Docker containers.\n\nTo run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix``\nand ``--tmp-outdir-prefix`` to somewhere under ``/Users``::\n\n $ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json\n\nUsing uDocker\n=============\n\nSome shared computing environments don't support Docker software containers for technical or policy reasons.\nAs a workaround, the CWL reference runner supports using the `udocker <https://github.com/indigo-dc/udocker>`_\nprogram on Linux using ``--udocker``.\n\nudocker installation: https://indigo-dc.github.io/udocker/installation_manual.html\n\nRun `cwltool` just as you usually would, but with ``--udocker`` prior to the workflow path:\n\n.. code:: bash\n\n cwltool --udocker https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/empty.json\n\nAs was mentioned in the `Recommended Software`_ section,\n\nUsing Singularity\n=================\n\n``cwltool`` can also use `Singularity <https://github.com/hpcng/singularity/releases/>`_ version 2.6.1\nor later as a Docker container runtime.\n``cwltool`` with Singularity will run software containers specified in\n``DockerRequirement`` and therefore works with Docker images only, native\nSingularity images are not supported. To use Singularity as the Docker container\nruntime, provide ``--singularity`` command line option to ``cwltool``.\nWith Singularity, ``cwltool`` can pass all CWL v1.0 conformance tests, except\nthose involving Docker container ENTRYPOINTs.\n\nExample\n\n.. code:: bash\n\n cwltool --singularity https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/cat3-tool-mediumcut.cwl https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/cat-job.json\n\nRunning a tool or workflow from remote or local locations\n=========================================================\n\n``cwltool`` can run tool and workflow descriptions on both local and remote\nsystems via its support for HTTP[S] URLs.\n\nInput job files and Workflow steps (via the `run` directive) can reference CWL\ndocuments using absolute or relative local filesystem paths. If a relative path\nis referenced and that document isn't found in the current directory, then the\nfollowing locations will be searched:\nhttp://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem\n\nYou can also use `cwldep <https://github.com/common-workflow-language/cwldep>`_\nto manage dependencies on external tools and workflows.\n\nOverriding workflow requirements at load time\n=============================================\n\nSometimes a workflow needs additional requirements to run in a particular\nenvironment or with a particular dataset. To avoid the need to modify the\nunderlying workflow, cwltool supports requirement \"overrides\".\n\nThe format of the \"overrides\" object is a mapping of item identifier (workflow,\nworkflow step, or command line tool) to the process requirements that should be applied.\n\n.. code:: yaml\n\n cwltool:overrides:\n echo.cwl:\n requirements:\n EnvVarRequirement:\n envDef:\n MESSAGE: override_value\n\nOverrides can be specified either on the command line, or as part of the job\ninput document. Workflow steps are identified using the name of the workflow\nfile followed by the step name as a document fragment identifier \"#id\".\nOverride identifiers are relative to the top-level workflow document.\n\n.. code:: bash\n\n cwltool --overrides overrides.yml my-tool.cwl my-job.yml\n\n.. code:: yaml\n\n input_parameter1: value1\n input_parameter2: value2\n cwltool:overrides:\n workflow.cwl#step1:\n requirements:\n EnvVarRequirement:\n envDef:\n MESSAGE: override_value\n\n.. code:: bash\n\n cwltool my-tool.cwl my-job-with-overrides.yml\n\n\nCombining parts of a workflow into a single document\n====================================================\n\nUse ``--pack`` to combine a workflow made up of multiple files into a\nsingle compound document. This operation takes all the CWL files\nreferenced by a workflow and builds a new CWL document with all\nProcess objects (CommandLineTool and Workflow) in a list in the\n``$graph`` field. Cross references (such as ``run:`` and ``source:``\nfields) are updated to internal references within the new packed\ndocument. The top-level workflow is named ``#main``.\n\n.. code:: bash\n\n cwltool --pack my-wf.cwl > my-packed-wf.cwl\n\n\nRunning only part of a workflow\n===============================\n\nYou can run a partial workflow with the ``--target`` (``-t``) option. This\ntakes the name of an output parameter, workflow step, or input\nparameter in the top-level workflow. You may provide multiple\ntargets.\n\n.. code:: bash\n\n cwltool --target step3 my-wf.cwl\n\nIf a target is an output parameter, it will only run only the steps\nthat contribute to that output. If a target is a workflow step, it\nwill run the workflow starting from that step. If a target is an\ninput parameter, it will only run the steps connected to\nthat input.\n\nUse ``--print-targets`` to get a listing of the targets of a workflow.\nTo see which steps will run, use ``--print-subgraph`` with\n``--target`` to get a printout of the workflow subgraph for the\nselected targets.\n\n.. code:: bash\n\n cwltool --print-targets my-wf.cwl\n\n cwltool --target step3 --print-subgraph my-wf.cwl > my-wf-starting-from-step3.cwl\n\n\nVisualizing a CWL document\n==========================\n\nThe ``--print-dot`` option will print a file suitable for Graphviz ``dot`` program. Here is a bash onliner to generate a Scalable Vector Graphic (SVG) file:\n\n.. code:: bash\n\n cwltool --print-dot my-wf.cwl | dot -Tsvg > my-wf.svg\n\nModeling a CWL document as RDF\n==============================\n\nCWL documents can be expressed as RDF triple graphs.\n\n.. code:: bash\n\n cwltool --print-rdf --rdf-serializer=turtle mywf.cwl\n\n\nEnvironment Variables in cwltool\n================================\n\nThis reference implementation supports several ways of setting\nenvironment variables for tools, in addition to the standard\n``EnvVarRequirement``. The sequence of steps applied to create the\nenvironment is:\n\n0. If the ``--preserve-entire-environment`` flag is present, then begin with the current\n environment, else begin with an empty environment.\n\n1. Add any variables specified by ``--preserve-environment`` option(s).\n\n2. Set ``TMPDIR`` and ``HOME`` per `the CWL v1.0+ CommandLineTool specification <https://www.commonwl.org/v1.0/CommandLineTool.html#Runtime_environment>`_.\n\n3. Apply any ``EnvVarRequirement`` from the ``CommandLineTool`` description.\n\n4. Apply any manipulations required by any ``cwltool:MPIRequirement`` extensions.\n\n5. Substitute any secrets required by ``Secrets`` extension.\n\n6. Modify the environment in response to ``SoftwareRequirement`` (see below).\n\n\nLeveraging SoftwareRequirements (Beta)\n--------------------------------------\n\nCWL tools may be decorated with ``SoftwareRequirement`` hints that cwltool\nmay in turn use to resolve to packages in various package managers or\ndependency management systems such as `Environment Modules\n<http://modules.sourceforge.net/>`__.\n\nUtilizing ``SoftwareRequirement`` hints using cwltool requires an optional\ndependency, for this reason be sure to use specify the ``deps`` modifier when\ninstalling cwltool. For instance::\n\n $ pip install 'cwltool[deps]'\n\nInstalling cwltool in this fashion enables several new command line options.\nThe most general of these options is ``--beta-dependency-resolvers-configuration``.\nThis option allows one to specify a dependency resolver's configuration file.\nThis file may be specified as either XML or YAML and very simply describes various\nplugins to enable to \"resolve\" ``SoftwareRequirement`` dependencies.\n\nUsing these hints will allow cwltool to modify the environment in\nwhich your tool runs, for example by loading one or more Environment\nModules. The environment is constructed as above, then the environment\nmay modified by the selected tool resolver. This currently means that\nyou cannot override any environment variables set by the selected tool\nresolver. Note that the environment given to the configured dependency\nresolver has the variable `_CWLTOOL` set to `1` to allow introspection.\n\nTo discuss some of these plugins and how to configure them, first consider the\nfollowing ``hint`` definition for an example CWL tool.\n\n.. code:: yaml\n\n SoftwareRequirement:\n packages:\n - package: seqtk\n version:\n - r93\n\nNow imagine deploying cwltool on a cluster with Software Modules installed\nand that a ``seqtk`` module is available at version ``r93``. This means cluster\nusers likely won't have the binary ``seqtk`` on their ``PATH`` by default, but after\nsourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is\navailable on the ``PATH``. A simple dependency resolvers configuration file, called\n``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source\nthe correct module environment before executing the above tool would simply be:\n\n.. code:: yaml\n\n - type: modules\n\nThe outer list indicates that one plugin is being enabled, the plugin parameters are\ndefined as a dictionary for this one list item. There is only one required parameter\nfor the plugin above, this is ``type`` and defines the plugin type. This parameter\nis required for all plugins. The available plugins and the parameters\navailable for each are documented (incompletely) `here\n<https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.\nUnfortunately, this documentation is in the context of Galaxy tool\n``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.\n\ncwltool is distributed with an example of such seqtk tool and sample corresponding\njob. It could executed from the cwltool root using a dependency resolvers\nconfiguration file such as the above one using the command::\n\n cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \\\n tests/seqtk_seq.cwl \\\n tests/seqtk_seq_job.json\n\nThis example demonstrates both that cwltool can leverage\nexisting software installations and also handle workflows with dependencies\non different versions of the same software and libraries. However the above\nexample does require an existing module setup so it is impossible to test this example\n\"out of the box\" with cwltool. For a more isolated test that demonstrates all\nthe same concepts - the resolver plugin type ``galaxy_packages`` can be used.\n\n\"Galaxy packages\" are a lighter-weight alternative to Environment Modules that are\nreally just defined by a way to lay out directories into packages and versions\nto find little scripts that are sourced to modify the environment. They have\nbeen used for years in Galaxy community to adapt Galaxy tools to cluster\nenvironments but require neither knowledge of Galaxy nor any special tools to\nsetup. These should work just fine for CWL tools.\n\nThe cwltool source code repository's test directory is setup with a very simple\ndirectory that defines a set of \"Galaxy packages\" (but really just defines one\npackage named ``random-lines``). The directory layout is simply::\n\n tests/test_deps_env/\n random-lines/\n 1.0/\n env.sh\n\nIf the ``galaxy_packages`` plugin is enabled and pointed at the\n``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``\nsuch as the following is encountered.\n\n.. code:: yaml\n\n hints:\n SoftwareRequirement:\n packages:\n - package: 'random-lines'\n version:\n - '1.0'\n\nThen cwltool will simply find that ``env.sh`` file and source it before executing\nthe corresponding tool. That ``env.sh`` script is only responsible for modifying\nthe job's ``PATH`` to add the required binaries.\n\nThis is a full example that works since resolving \"Galaxy packages\" has no\nexternal requirements. Try it out by executing the following command from cwltool's\nroot directory::\n\n cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \\\n tests/random_lines.cwl \\\n tests/random_lines_job.json\n\nThe resolvers configuration file in the above example was simply:\n\n.. code:: yaml\n\n - type: galaxy_packages\n base_path: ./tests/test_deps_env\n\nIt is possible that the ``SoftwareRequirement`` s in a given CWL tool will not\nmatch the module names for a given cluster. Such requirements can be re-mapped\nto specific deployed packages or versions using another file specified using\nthe resolver plugin parameter `mapping_files`. We will\ndemonstrate this using `galaxy_packages,` but the concepts apply equally well\nto Environment Modules or Conda packages (described below), for instance.\n\nSo consider the resolvers configuration file.\n(`tests/test_deps_env_resolvers_conf_rewrite.yml`):\n\n.. code:: yaml\n\n - type: galaxy_packages\n base_path: ./tests/test_deps_env\n mapping_files: ./tests/test_deps_mapping.yml\n\nAnd the corresponding mapping configuration file (`tests/test_deps_mapping.yml`):\n\n.. code:: yaml\n\n - from:\n name: randomLines\n version: 1.0.0-rc1\n to:\n name: random-lines\n version: '1.0'\n\nThis is saying if cwltool encounters a requirement of ``randomLines`` at version\n``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at\nversion ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``\nthat contains such a source ``SoftwareRequirement``. To try out this example with\nmapping, execute the following command from the cwltool root directory::\n\n cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \\\n tests/random_lines_mapping.cwl \\\n tests/random_lines_job.json\n\nThe previous examples demonstrated leveraging existing infrastructure to\nprovide requirements for CWL tools. If instead a real package manager is used\ncwltool has the opportunity to install requirements as needed. While initial\nsupport for Homebrew/Linuxbrew plugins is available, the most developed such\nplugin is for the `Conda <https://conda.io/docs/#>`__ package manager. Conda has the nice properties\nof allowing multiple versions of a package to be installed simultaneously,\nnot requiring evaluated permissions to install Conda itself or packages using\nConda, and being cross-platform. For these reasons, cwltool may run as a normal\nuser, install its own Conda environment and manage multiple versions of Conda packages\non Linux and Mac OS X.\n\nThe Conda plugin can be endlessly configured, but a sensible set of defaults\nthat has proven a powerful stack for dependency management within the Galaxy tool\ndevelopment ecosystem can be enabled by simply passing cwltool the\n``--beta-conda-dependencies`` flag.\n\nWith this, we can use the seqtk example above without Docker or any externally managed services - cwltool should install everything it needs\nand create an environment for the tool. Try it out with the following command::\n\n cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json\n\nThe CWL specification allows URIs to be attached to ``SoftwareRequirement`` s\nthat allow disambiguation of package names. If the mapping files described above\nallow deployers to adapt tools to their infrastructure, this mechanism allows\ntools to adapt their requirements to multiple package managers. To demonstrate\nthis within the context of the seqtk, we can simply break the package name we\nuse and then specify a specific Conda package as follows:\n\n.. code:: yaml\n\n hints:\n SoftwareRequirement:\n packages:\n - package: seqtk_seq\n version:\n - '1.2'\n specs:\n - https://anaconda.org/bioconda/seqtk\n - https://packages.debian.org/sid/seqtk\n\nThe example can be executed using the command::\n\n cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json\n\nThe plugin framework for managing the resolution of these software requirements\nas maintained as part of `galaxy-tool-util <https://github.com/galaxyproject/galaxy/tree/dev/packages/tool_util>`__ - a small,\nportable subset of the Galaxy project. More information on configuration and implementation can be found\nat the following links:\n\n- `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__\n- `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__\n- `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb>`__\n- `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc>`__\n- `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214>`__\n\nUse with GA4GH Tool Registry API\n================================\n\nCwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints.\n\nBy default, cwltool searches https://dockstore.org/ . Use ``--add-tool-registry`` to add other registries to the search path.\n\nFor example ::\n\n cwltool quay.io/collaboratory/dockstore-tool-bamstats:develop test.json\n\nand (defaults to latest when a version is not specified) ::\n\n cwltool quay.io/collaboratory/dockstore-tool-bamstats test.json\n\nFor this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats ::\n\n wget https://dockstore.org/api/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-bamstats/versions/develop/PLAIN-CWL/descriptor/test.json\n wget https://github.com/CancerCollaboratory/dockstore-tool-bamstats/raw/develop/rna.SRR948778.bam\n\n\n.. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas\n\nRunning MPI-based tools that need to be launched\n================================================\n\nCwltool supports an extension to the CWL spec\n``http://commonwl.org/cwltool#MPIRequirement``. When the tool\ndefinition has this in its ``requirements``/``hints`` section, and\ncwltool has been run with ``--enable-ext``, then the tool's command\nline will be extended with the commands needed to launch it with\n``mpirun`` or similar. You can specify the number of processes to\nstart as either a literal integer or an expression (that will result\nin an integer). For example::\n\n #!/usr/bin/env cwl-runner\n cwlVersion: v1.1\n class: CommandLineTool\n $namespaces:\n cwltool: \"http://commonwl.org/cwltool#\"\n requirements:\n cwltool:MPIRequirement:\n processes: $(inputs.nproc)\n inputs:\n nproc:\n type: int\n\nInteraction with containers: the MPIRequirement currently prepends its\ncommands to the front of the command line that is constructed. If you\nwish to run a containerized application in parallel, for simple use\ncases, this does work with Singularity, depending upon the platform\nsetup. However, this combination should be considered \"alpha\" -- please\ndo report any issues you have! This does not work with Docker at the\nmoment. (More precisely, you get `n` copies of the same single process\nimage run at the same time that cannot communicate with each other.)\n\nThe host-specific parameters are configured in a simple YAML file\n(specified with the ``--mpi-config-file`` flag). The allowed keys are\ngiven in the following table; all are optional.\n\n+----------------+------------------+----------+------------------------------+\n| Key | Type | Default | Description |\n+================+==================+==========+==============================+\n| runner | str | \"mpirun\" | The primary command to use. |\n+----------------+------------------+----------+------------------------------+\n| nproc_flag | str | \"-n\" | Flag to set number of |\n| | | | processes to start. |\n+----------------+------------------+----------+------------------------------+\n| default_nproc | int | 1 | Default number of processes. |\n+----------------+------------------+----------+------------------------------+\n| extra_flags | List[str] | [] | A list of any other flags to |\n| | | | be added to the runner's |\n| | | | command line before |\n| | | | the ``baseCommand``. |\n+----------------+------------------+----------+------------------------------+\n| env_pass | List[str] | [] | A list of environment |\n| | | | variables that should be |\n| | | | passed from the host |\n| | | | environment through to the |\n| | | | tool (e.g., giving the |\n| | | | node list as set by your |\n| | | | scheduler). |\n+----------------+------------------+----------+------------------------------+\n| env_pass_regex | List[str] | [] | A list of python regular |\n| | | | expressions that will be |\n| | | | matched against the host's |\n| | | | environment. Those that match|\n| | | | will be passed through. |\n+----------------+------------------+----------+------------------------------+\n| env_set | Mapping[str,str] | {} | A dictionary whose keys are |\n| | | | the environment variables set|\n| | | | and the values being the |\n| | | | values. |\n+----------------+------------------+----------+------------------------------+\n\n\nEnabling Fast Parser (experimental)\n===================================\n\nFor very large workflows, `cwltool` can spend a lot of time in\ninitialization, before the first step runs. There is an experimental\nflag ``--fast-parser`` which can dramatically reduce the\ninitialization overhead, however as of this writing it has several limitations:\n\n- Error reporting in general is worse than the standard parser, you will want to use it with workflows that you know are already correct.\n\n- It does not check for dangling links (these will become runtime errors instead of loading errors)\n\n- Several other cases fail, as documented in https://github.com/common-workflow-language/cwltool/pull/1720\n\n***********\nDevelopment\n***********\n\nRunning tests locally\n=====================\n\n- Running basic tests ``(/tests)``:\n\nTo run the basic tests after installing `cwltool` execute the following:\n\n.. code:: bash\n\n pip install -rtest-requirements.txt\n pytest ## N.B. This requires node.js or docker to be available\n\nTo run various tests in all supported Python environments, we use `tox <https://github.com/common-workflow-language/cwltool/tree/main/tox.ini>`_. To run the test suite in all supported Python environments\nfirst clone the complete code repository (see the ``git clone`` instructions above) and then run\nthe following in the terminal:\n``pip install \"tox<4\"; tox -p``\n\nList of all environment can be seen using:\n``tox --listenvs``\nand running a specific test env using:\n``tox -e <env name>``\nand additionally run a specific test using this format:\n``tox -e py310-unit -- -v tests/test_examples.py::test_scandeps``\n\n- Running the entire suite of CWL conformance tests:\n\nThe GitHub repository for the CWL specifications contains a script that tests a CWL\nimplementation against a wide array of valid CWL files using the `cwltest <https://github.com/common-workflow-language/cwltest>`_\nprogram\n\nInstructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/main/CONFORMANCE_TESTS.md .\n\nImport as a module\n==================\n\nAdd\n\n.. code:: python\n\n import cwltool\n\nto your script.\n\nThe easiest way to use cwltool to run a tool or workflow from Python is to use a Factory\n\n.. code:: python\n\n import cwltool.factory\n fac = cwltool.factory.Factory()\n\n echo = fac.make(\"echo.cwl\")\n result = echo(inp=\"foo\")\n\n # result[\"out\"] == \"foo\"\n\n\nCWL Tool Control Flow\n=====================\n\nTechnical outline of how cwltool works internally, for maintainers.\n\n#. Use CWL ``load_tool()`` to load document.\n\n #. Fetches the document from file or URL\n #. Applies preprocessing (syntax/identifier expansion and normalization)\n #. Validates the document based on cwlVersion\n #. If necessary, updates the document to the latest spec\n #. Constructs a Process object using ``make_tool()``` callback. This yields a\n CommandLineTool, Workflow, or ExpressionTool. For workflows, this\n recursively constructs each workflow step.\n #. To construct custom types for CommandLineTool, Workflow, or\n ExpressionTool, provide a custom ``make_tool()``\n\n#. Iterate on the ``job()`` method of the Process object to get back runnable jobs.\n\n #. ``job()`` is a generator method (uses the Python iterator protocol)\n #. Each time the ``job()`` method is invoked in an iteration, it returns one\n of: a runnable item (an object with a ``run()`` method), ``None`` (indicating\n there is currently no work ready to run) or end of iteration (indicating\n the process is complete.)\n #. Invoke the runnable item by calling ``run()``. This runs the tool and gets output.\n #. An output callback reports the output of a process.\n #. ``job()`` may be iterated over multiple times. It will yield all the work\n that is currently ready to run and then yield None.\n\n#. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation.\n\n #. The WorkflowJob iterates over each WorkflowJobStep and determines if the\n inputs the step are ready.\n #. When a step is ready, it constructs an input object for that step and\n iterates on the ``job()`` method of the workflow job step.\n #. Each runnable item is yielded back up to top-level run loop\n #. When a step job completes and receives an output callback, the\n job outputs are assigned to the output of the workflow step.\n #. When all steps are complete, the intermediate files are moved to a final\n workflow output, intermediate directories are deleted, and the workflow's output callback is called.\n\n#. ``CommandLineTool`` job() objects yield a single runnable object.\n\n #. The CommandLineTool ``job()`` method calls ``make_job_runner()`` to create a\n ``CommandLineJob`` object\n #. The job method configures the CommandLineJob object by setting public\n attributes\n #. The job method iterates over file and directories inputs to the\n CommandLineTool and creates a \"path map\".\n #. Files are mapped from their \"resolved\" location to a \"target\" path where\n they will appear at tool invocation (for example, a location inside a\n Docker container.) The target paths are used on the command line.\n #. Files are staged to targets paths using either Docker volume binds (when\n using containers) or symlinks (if not). This staging step enables files\n to be logically rearranged or renamed independent of their source layout.\n #. The ``run()`` method of CommandLineJob executes the command line tool or\n Docker container, waits for it to complete, collects output, and makes\n the output callback.\n\nExtension points\n================\n\nThe following functions can be passed to main() to override or augment\nthe listed behaviors.\n\nexecutor\n ::\n\n executor(tool, job_order_object, runtimeContext, logger)\n (Process, Dict[Text, Any], RuntimeContext) -> Tuple[Dict[Text, Any], Text]\n\n An implementation of the top-level workflow execution loop should\n synchronously run a process object to completion and return the\n output object.\n\nversionfunc\n ::\n\n ()\n () -> Text\n\n Return version string.\n\nlogger_handler\n ::\n\n logger_handler\n logging.Handler\n\n Handler object for logging.\n\nThe following functions can be set in LoadingContext to override or\naugment the listed behaviors.\n\nfetcher_constructor\n ::\n\n fetcher_constructor(cache, session)\n (Dict[unicode, unicode], requests.sessions.Session) -> Fetcher\n\n Construct a Fetcher object with the supplied cache and HTTP session.\n\nresolver\n ::\n\n resolver(document_loader, document)\n (Loader, Union[Text, dict[Text, Any]]) -> Text\n\n Resolve a relative document identifier to an absolute one that can be fetched.\n\nThe following functions can be set in RuntimeContext to override or\naugment the listed behaviors.\n\nconstruct_tool_object\n ::\n\n construct_tool_object(toolpath_object, loadingContext)\n (MutableMapping[Text, Any], LoadingContext) -> Process\n\n Hook to construct a Process object (eg CommandLineTool) object from a document.\n\nselect_resources\n ::\n\n selectResources(request)\n (Dict[str, int], RuntimeContext) -> Dict[Text, int]\n\n Take a resource request and turn it into a concrete resource assignment.\n\nmake_fs_access\n ::\n\n make_fs_access(basedir)\n (Text) -> StdFsAccess\n\n Return a file system access object.\n\nIn addition, when providing custom subclasses of Process objects, you can override the following methods:\n\nCommandLineTool.make_job_runner\n ::\n\n make_job_runner(RuntimeContext)\n (RuntimeContext) -> Type[JobBase]\n\n Create and return a job runner object (this implements concrete execution of a command line tool).\n\nWorkflow.make_workflow_step\n ::\n\n make_workflow_step(toolpath_object, pos, loadingContext, parentworkflowProv)\n (Dict[Text, Any], int, LoadingContext, Optional[ProvenanceProfile]) -> WorkflowStep\n\n Create and return a workflow step object.\n",
"bugtrack_url": null,
"license": null,
"summary": "Common workflow language reference implementation",
"version": "3.1.20241007082533",
"project_urls": {
"Download": "https://github.com/common-workflow-language/cwltool",
"Homepage": "https://github.com/common-workflow-language/cwltool"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bb783112dc8b199dd0983f676df68372bb9af8fefb5999bb8f83ca3076f98bb8",
"md5": "fb4255f758ed2d8afa438831dcbd8f65",
"sha256": "406153e8831b94af0b6d316391dad0810ba53ac3580c55584ef5f328a0471a1b"
},
"downloads": -1,
"filename": "cwltool-3.1.20241007082533-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fb4255f758ed2d8afa438831dcbd8f65",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.8",
"size": 1643662,
"upload_time": "2024-10-07T15:38:43",
"upload_time_iso_8601": "2024-10-07T15:38:43.854328Z",
"url": "https://files.pythonhosted.org/packages/bb/78/3112dc8b199dd0983f676df68372bb9af8fefb5999bb8f83ca3076f98bb8/cwltool-3.1.20241007082533-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f14a6c0d44ed5a0785544ecd7817a031d1a5f98e2e40f6daf68ddb34155db42a",
"md5": "ed4b3e4747c3389e2f5aee5bab861884",
"sha256": "f7e65f276ebf40caaafef24992a6e3911ef5b260c6d1e884a2ab3068757f6936"
},
"downloads": -1,
"filename": "cwltool-3.1.20241007082533.tar.gz",
"has_sig": false,
"md5_digest": "ed4b3e4747c3389e2f5aee5bab861884",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.8",
"size": 1417866,
"upload_time": "2024-10-07T15:38:47",
"upload_time_iso_8601": "2024-10-07T15:38:47.520736Z",
"url": "https://files.pythonhosted.org/packages/f1/4a/6c0d44ed5a0785544ecd7817a031d1a5f98e2e40f6daf68ddb34155db42a/cwltool-3.1.20241007082533.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-07 15:38:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "common-workflow-language",
"github_project": "cwltool",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "requests",
"specs": [
[
">=",
"2.6.1"
]
]
},
{
"name": "ruamel.yaml",
"specs": [
[
">=",
"0.16.0"
],
[
"<",
"0.19"
]
]
},
{
"name": "rdflib",
"specs": [
[
"<",
"7.1"
],
[
">=",
"4.2.2"
]
]
},
{
"name": "schema-salad",
"specs": [
[
">=",
"8.7"
],
[
"<",
"9"
]
]
},
{
"name": "prov",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "mypy-extensions",
"specs": []
},
{
"name": "psutil",
"specs": [
[
">=",
"5.6.6"
]
]
},
{
"name": "importlib_resources",
"specs": [
[
">=",
"1.4"
]
]
},
{
"name": "coloredlogs",
"specs": []
},
{
"name": "pydot",
"specs": [
[
">=",
"1.4.1"
],
[
"<",
"3"
]
]
},
{
"name": "argcomplete",
"specs": [
[
">=",
"1.12.0"
]
]
},
{
"name": "pyparsing",
"specs": [
[
"!=",
"3.0.2"
]
]
},
{
"name": "cwl-utils",
"specs": [
[
">=",
"0.32"
]
]
},
{
"name": "spython",
"specs": [
[
">=",
"0.3.0"
]
]
}
],
"tox": true,
"lcname": "cwltool"
}