About
Phenoflow
Phenoflow is the name for a conceptual model and a microservice architecture, which includes this fork of CWL Viewer, that aim to enhance the reproducibility and portability of computable phenotypes.
CWL Viewer
CWL Viewer is a richly featured web visualisation suite for workflows written in the Common Workflow Language with an aim of facilitating sharing, understanding and discovery as well as encouraging best practices when writing workflows and their tooling.
Cite as: https://doi.org/10.7490/f1000research.1114375.1
Technical Report: https://doi.org/10.5281/zenodo.823295
CWL Viewer also won the F1000Research Best Poster Award at ISMB/ECCB 2017 for its poster submission.
This project was developed at the eScience Lab at The University of Manchester, with work supported by Bioexcel, funded by the European Union Horizon 2020 program under grant agreement 675728.
Contributions are welcome in the form of issues and pull requests to the Github repository.
Privacy policy
CWL Viewer publishes visualizations of workflows from publicly available git repositories hosted by third-parties like github.com or gitlab.com. Anyone can submit a workflow, which will be added to our public listing.
Tracking usage
We do not track individual users of CWL Viewer, but we do record general usage (e.g. web server access log) for operational purposes and to prevent abuse. We may use HTTP session cookies in order to assist workflow submission, but do not use cookies to identify users.
What information is held?
We hold information about public open source workflows in order to visualize them graphically and textually, as well as making their declared metadata accessible to the public in different formats such as linked data. This information may be held until explicitly requested for removal, however we reserve the right to remove any workflow from listing without prior notice.
Metadata shown from the public workflows may include personal data, including authorship or as part of workflow descriptions. We retrieve this information from the submitted git repository. Downloading a workflow or its metadata may include information from the git repository not otherwise shown in the CWL Viewer interface, e.g. authors from git commit history.
For performance reasons the CWL Viewer may keep a copy of the checked out git repository and the derived metadata. We may at a later date retrieve published changes from the original repository to update the information held.
Where is information exposed?
Workflows and their metadata can be accessed in CWL Viewer through the public listing by browsers, programmaticaly through the API, and can be downloaded in multiple formats like ZIP, SVG or RDF.
CWL Viewer generates and exposes permalinks which reference the git commit and the workflow path within the git repository, but not the git repository location or username. These permalinks are only resolvable with the public https://view.commonwl.org/ if it has previously visualized a corresponding public git repository.
Metadata from public workflows may be published to the OpenAIRE registry, including author names and workflow title.
Best Practices
In order to ensure that your workflow is well presented in CWL Viewer, we recommend the following of CWL Best Practices. Those which are specifically relevant to the viewer are detailed below, but it is suggested that you try to meet as many as possible to include the general quality and reproducibility of your workflows.
Some limitations of the CWL Viewer which you may need to be aware of are also described here.
Label Strings
Include a top level short label
summarising each tool and workflow
Labels give the user an easy human-readable version of the name for the tool or workflow
For workflows this will be displayed at the top of the page as the title and for tools it will be
displayed in the table and as the name of the step in the visualisation. If a label
is given at the step level, it will take priority over the top level tool label
. You can
use this to provide a more descriptive label of the tool's application in the particular step if
preferred.
Doc Strings
If useful, include a top level doc
string providing a longer, more detailed description
than was provided in the label
(see above)
Docs give the user a detailed description of the role a tool or workflow performs
For workflows this will be displayed at the top of the page under the title and for tools it will be
displayed in the table. If a doc
string is given at the step level, it will take priority
over the top level tool doc. You can use this to provide a more descriptive label of the tool's
application in the particular step if preferred
Conceptual Identifiers
All input
and output
identifiers should reflect their conceptual identity.
Generic and uninformative names such as result
or input
/output
should be avoided
Helpful identifiers allow for the links between steps in the CWL file to be easily distinguished
Identifiers are displayed in the tables and are unique to the step. The label
is also
used as a replacement for the identifier in the visualisation if provided.
Format Specification
The format
field should be specified for all input and output File
s
Tools should use format identifiers from a relevant ontology such as the
EDAM Ontology in the case of Bioinformatics tools.
For plain types use the
IANA media type list with
$namespaces: { iana: "https://www.iana.org/assignments/media-types/" }
, for example
iana:text/plain
, iana:text/tab-separated-values
The use of formal standards for format fields enables implementations to provide checks for compatibility in formatting of files
Ontologies will be parsed and the name of and link to the format displayed in the table on workflow pages. Plain formats will have the iana.org link given but will not display the name of the format.
Separation of Concerns
Each CommandLineTool
description should focus on a single operation only, even if the
(sub)command is capable of more.
This allows for easier reuse of the tool in other workflows and understanding as to it's purpose
In CWL Viewer this ensures that steps are clear in purpose within the workflow and generated visualisation
JavaScript Elimination
Evaluate all use of JavaScript for possible elimination or replacement. For instance, for the
manipulation of File
names and paths, often one of the built in File
properties such as basename
, nameroot
, nameext
etc
could be used instead
Tool runners can implement more efficient implementations of built in functionality, which makes JavaScript expressions a last resort
CWL viewer does not take into account JavaScript expressions when extracting information about your workflows
Use of Subworkflows
CWL implementations which also implement SubworkflowFeatureRequirement
can support nesting
workflows as a step within others. Complex workflows with individual components which can be abstracted
should utilise this to make their workflow modular and allow sections of them to be easily reused
Extracting subworkflows enables them to be run, developed on and tested individually. It also makes them able to be understood more easily
Subworkflows are simplified in the visualisations and are linked as a different workflow in the
Step
tables on each workflow page
Attribution
Include attribution information in your workflow and tool descriptions
For example, to attribute a person as the author of a workflow or tool with name, email and
ORCID information, include the following statements at the top level:
$namespaces: { s: "http://schema.org/" } s:author: - class: s:Person s:name: Mark Robinson s:email: mailto:mark@example.com s:id: http://orcid.org/0000-0002-8184-7507For attributing organisations, see this workflow as an example
Attribution information allows your workflows and tooling to be used by others while recognising your contributions. The inclusion of an ORCID allows you to be uniquely identified from other researchers
CWLViewer parses attribution information for inclusion in the Research Object Manifest from both the Git commit logs and from the CWL descriptions themselves when expressed in the http://schema.org/author format as above
Licensing
Include a OSI approved open source license in your workflow and tool descriptions
For example, the following two statements at the top level of a workflow or tool description licenses it
under the Apache V2.0 License:
$namespaces: { s: "http://schema.org/" } s:license: "https://www.apache.org/licenses/LICENSE-2.0"
A permissive open source license allows others to remix and use your tooling and workflows to prevent the community from repeating development effort, allowing everyone to benefit
CWL Viewer is designed to allow people to locate and make use of the workflows developed by others as well as to share and demonstrate work, and open source licenses promote this goal
Limitations
Research Objects
Research Objects are constructed from the containing directory of the workflow file. This means tooling external to the directory but used by the workflow will not be included (see Github issue)
We recommend that you keep all files in the containing folder for current use of CWL Viewer
SSH Cloning
SSH URLs are not able to be cloned or used as submodules due to the need for SSH keys to be set up.
We do not plan SSH support due to the impact on reproducibility from this being made a required step to download the workflow.
Others
Other limitations or unimplemented features can be viewed on the Github issues page