About

Using CWL to support EHR-based phenotyping from Martin Chapman

Phenoflow

Phenoflow is the name for a conceptual model and a microservice architecture, which includes this fork of CWL Viewer, that aim to enhance the reproducibility and portability of computable phenotypes.

Cite as: Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions (2021). Martin Chapman, Luke V Rasmussen, Jennifer A Pacheco, Vasa Curcin

CWL Viewer

CWL Viewer is a richly featured web visualisation suite for workflows written in the Common Workflow Language with an aim of facilitating sharing, understanding and discovery as well as encouraging best practices when writing workflows and their tooling.

Cite as: https://doi.org/10.7490/f1000research.1114375.1

Technical Report: https://doi.org/10.5281/zenodo.823295

CWL Viewer also won the F1000Research Best Poster Award at ISMB/ECCB 2017 for its poster submission.

This project was developed at the eScience Lab at The University of Manchester, with work supported by Bioexcel, funded by the European Union Horizon 2020 program under grant agreement 675728.

Contributions are welcome in the form of issues and pull requests to the Github repository.

Privacy policy

CWL Viewer publishes visualizations of workflows from publicly available git repositories hosted by third-parties like github.com or gitlab.com. Anyone can submit a workflow, which will be added to our public listing.

Tracking usage

We do not track individual users of CWL Viewer, but we do record general usage (e.g. web server access log) for operational purposes and to prevent abuse. We may use HTTP session cookies in order to assist workflow submission, but do not use cookies to identify users.

What information is held?

We hold information about public open source workflows in order to visualize them graphically and textually, as well as making their declared metadata accessible to the public in different formats such as linked data. This information may be held until explicitly requested for removal, however we reserve the right to remove any workflow from listing without prior notice.

Metadata shown from the public workflows may include personal data, including authorship or as part of workflow descriptions. We retrieve this information from the submitted git repository. Downloading a workflow or its metadata may include information from the git repository not otherwise shown in the CWL Viewer interface, e.g. authors from git commit history.

For performance reasons the CWL Viewer may keep a copy of the checked out git repository and the derived metadata. We may at a later date retrieve published changes from the original repository to update the information held.

Where is information exposed?

Workflows and their metadata can be accessed in CWL Viewer through the public listing by browsers, programmaticaly through the API, and can be downloaded in multiple formats like ZIP, SVG or RDF.

CWL Viewer generates and exposes permalinks which reference the git commit and the workflow path within the git repository, but not the git repository location or username. These permalinks are only resolvable with the public https://view.commonwl.org/ if it has previously visualized a corresponding public git repository.

Metadata from public workflows may be published to the OpenAIRE registry, including author names and workflow title.

Best Practices

In order to ensure that your workflow is well presented in CWL Viewer, we recommend the following of CWL Best Practices. Those which are specifically relevant to the viewer are detailed below, but it is suggested that you try to meet as many as possible to include the general quality and reproducibility of your workflows.

Some limitations of the CWL Viewer which you may need to be aware of are also described here.

Label Strings

Include a top level short label summarising each tool and workflow

Labels give the user an easy human-readable version of the name for the tool or workflow

For workflows this will be displayed at the top of the page as the title and for tools it will be displayed in the table and as the name of the step in the visualisation. If a label is given at the step level, it will take priority over the top level tool label. You can use this to provide a more descriptive label of the tool's application in the particular step if preferred.

Doc Strings

If useful, include a top level doc string providing a longer, more detailed description than was provided in the label (see above)

Docs give the user a detailed description of the role a tool or workflow performs

For workflows this will be displayed at the top of the page under the title and for tools it will be displayed in the table. If a doc string is given at the step level, it will take priority over the top level tool doc. You can use this to provide a more descriptive label of the tool's application in the particular step if preferred

Conceptual Identifiers

All input and output identifiers should reflect their conceptual identity. Generic and uninformative names such as result or input/output should be avoided

Helpful identifiers allow for the links between steps in the CWL file to be easily distinguished

Identifiers are displayed in the tables and are unique to the step. The label is also used as a replacement for the identifier in the visualisation if provided.

Format Specification

The format field should be specified for all input and output Files

Tools should use format identifiers from a relevant ontology such as the EDAM Ontology in the case of Bioinformatics tools. For plain types use the IANA media type list with $namespaces: { iana: "https://www.iana.org/assignments/media-types/" }, for example iana:text/plain, iana:text/tab-separated-values

The use of formal standards for format fields enables implementations to provide checks for compatibility in formatting of files

Ontologies will be parsed and the name of and link to the format displayed in the table on workflow pages. Plain formats will have the iana.org link given but will not display the name of the format.

Separation of Concerns

Each CommandLineTool description should focus on a single operation only, even if the (sub)command is capable of more.

This allows for easier reuse of the tool in other workflows and understanding as to it's purpose

In CWL Viewer this ensures that steps are clear in purpose within the workflow and generated visualisation

JavaScript Elimination

Evaluate all use of JavaScript for possible elimination or replacement. For instance, for the manipulation of File names and paths, often one of the built in File properties such as basename, nameroot, nameext etc could be used instead

Tool runners can implement more efficient implementations of built in functionality, which makes JavaScript expressions a last resort

CWL viewer does not take into account JavaScript expressions when extracting information about your workflows

Use of Subworkflows

CWL implementations which also implement SubworkflowFeatureRequirement can support nesting workflows as a step within others. Complex workflows with individual components which can be abstracted should utilise this to make their workflow modular and allow sections of them to be easily reused

Extracting subworkflows enables them to be run, developed on and tested individually. It also makes them able to be understood more easily

Subworkflows are simplified in the visualisations and are linked as a different workflow in the Step tables on each workflow page

Attribution

Include attribution information in your workflow and tool descriptions

For example, to attribute a person as the author of a workflow or tool with name, email and ORCID information, include the following statements at the top level:

$namespaces: { s: "http://schema.org/" }
s:author:
- class: s:Person
  s:name: Mark Robinson
  s:email: mailto:mark@example.com
  s:id: http://orcid.org/0000-0002-8184-7507

For attributing organisations, see this workflow as an example

Attribution information allows your workflows and tooling to be used by others while recognising your contributions. The inclusion of an ORCID allows you to be uniquely identified from other researchers

CWLViewer parses attribution information for inclusion in the Research Object Manifest from both the Git commit logs and from the CWL descriptions themselves when expressed in the http://schema.org/author format as above

Licensing

Include a OSI approved open source license in your workflow and tool descriptions

For example, the following two statements at the top level of a workflow or tool description licenses it under the Apache V2.0 License:

$namespaces: { s: "http://schema.org/" }
s:license: "https://www.apache.org/licenses/LICENSE-2.0"

A permissive open source license allows others to remix and use your tooling and workflows to prevent the community from repeating development effort, allowing everyone to benefit

CWL Viewer is designed to allow people to locate and make use of the workflows developed by others as well as to share and demonstrate work, and open source licenses promote this goal

Limitations

Research Objects

Research Objects are constructed from the containing directory of the workflow file. This means tooling external to the directory but used by the workflow will not be included (see Github issue)

We recommend that you keep all files in the containing folder for current use of CWL Viewer

SSH Cloning

SSH URLs are not able to be cloned or used as submodules due to the need for SSH keys to be set up.

We do not plan SSH support due to the impact on reproducibility from this being made a required step to download the workflow.

Others

Other limitations or unimplemented features can be viewed on the Github issues page