Create a User Defined Workflow (Github or Upload via API)

Create a User Defined Workflow (Github or Upload via API)

Creating a Form Bio Nextflow Workflow

Adding a workflow.json schema defining inputs and validation.

Form Bio Nextflow Runtime Specification

Workflow Run Quotas / Limitations

Currently there are 2 quotas that impact how many concurrent Workflows can be run by an organization and project, as well as how many parallel processes/tasks VMs can be run in a given workflow run.

This can be increased on a per organization / project basis by creating a Service request to support@formbio.com specifying the organization you want to increase the quota for.

  1. Project Concurrent Workflow Runs (Default: 50) - Limits the number of concurrent Workflow runs for a given Form Bio Project (Project Quota increases can be applied across an Org, or on a specific Project).
  2. Nextflow concurrent processes/tasks per Workflow Run executor.queueSize (Default: 50) For a given Workflow run specifies the maximum parallel processes/task VMs that can be run (Quota increases can be applied to Org, Project, User Email). https://www.nextflow.io/docs/latest/config.html#scope-executor

Nextflow Workflow outputs

params.output - Form Bio Workflow output folder

Nextflow outputs can be published to the Form Bio platforms workflow outputs using the reserved nextflow paramter --output / params.output

Custom workflow output parameter name

Or if your workflow already has a defined ouput paramter you can map it to the Form Bio workflow outputs using the following templated syntax:

# you can specify an existing nextflow paramter or set a default value in your workflow schema
# e.g. nf-core workflows usually use a `--outdir`
--outdir='{{formbio.params.output}}'

Form Bio Reserved Nextflow Configuration values

The following configuration is provided by the Form Bio platform at workflow runtime, and will be ignored if set by the workflow via nextflow.config or in main.nf.

🧑‍💻
Engineering Note The Form Bio reserved configuration is provided via 2 different Nextflow configurations.

The base Nextflow head node Docker container used to launch all workflows is defined here:

  1. formbio.config generated at runtime by the Form Bio API and provides runtime configuration specific to a form bio project (e.g. customer project context)
  2. $HOME/nextflow.config provided as default configuration in the Form Bio Nextflow Docker container base image gls-nextflow.

See Nextflow Configuration documentation

Nextflow Configuration
Value
Description
process.executor
“google-lifesciences” OR “google-batch”
The Nextflow executor to run the workflow this depends on if the Workflow is running via Google Lifesciences or Google Batch
process.time
30d
Set new default timeout for a process (7 days is default)
exectuor.queueSize
50 (default) OR provided by LaunchDarkly flag enable-larger-queue-size override per Form Bio Project https://app.launchdarkly.com/default/production/features/enable-larger-queue-size/targeting
The number of tasks (VMs) the executor will handle in parallel (default for lifesciences is 1000)
google.storage.maxTransferAttempts
10
Increase retry for intermittent GCS API errors like (503 service unavailable) - Default is 0 https://www.nextflow.io/docs/latest/google.html?highlight=maxtransferattempts
google.project
Default tundra-prd OR BYO BID GCP project ID
Form Bio GCP Project to execute workflow tasks in e.g. central production projects tundra-prd for BYO BID projects then use that GCP project
google.region
Google Batch Executor: us-central1 (or whatever BYO BID project region) - Batch only supports a single region to execute tasks in (but can use multiple zones) https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#LocationPolicy.FIELDS.allowed_locations Google Lifesciences Executor (load balanced across US regions): ["us-central1", "us-west2", "us-west4", "us-east1", "us-west1"]
GCP Region(s) that the workflow tasks will execute in.
-work-dir
./work
The local working directory for nextflow tasks/steps for processes running via executor=local Docker
-bucket-dir
formbio://${org}/${project}/pipeline-outputs/${workflow-run-folder}
The working GCS directory for nextflow tasks/steps as well as intermediate staging of channel files and cache for resuming failed workflows.

Form Bio Reserved Nextflow Param values

The following params are reserved and provided by the Form Bio platform and will be overwritten if provided by the workflow or as user-provided input params.

Workflow Param
Value
Description
--output params.output
formbio://${org}/${project}/pipeline-outputs/output/
Form Bio (GCS) path for output workflow results
--region params.region
Google Lifesciences Executor: us-central1,us-west2,us-west4,us-east1,us-west1 Google Batch Executor: us-central1
GCP region to execute workflow in, based on BYOBID project. (this may not be used as we’re setting the config regions too.
--bqLabels params.bqLabels
Example: "formbio-org":"sbx-uat","formbio-project-id":"40c9baa0-9156-4e5e-a784-f3ffffd022ef","formbio-user-id":"d4312620-981b-4276-b619-567bb9518e07","formbio-operation":"workflow_run"
Form Bio provide GCP Resource labels used to track / categorize cloud usage costs by Org/Project/Workflow/User
--registry params.registry
gcr.io/bioinfo-devel
Docker container registry used by Form Bio managed workflows. (This is likely not relevant to BYO WF User Defined workflows)
--cloudprj params.cloudprj
tundra-prd OR BYOBID customer GCP project ID
deprecated Not sure if this is being used as we’re overriding --registry which by default in Form Bio managed workflows is params.registry = "gcr.io/${params.cloudprj}"

Form Bio GitHub App Integration

Our GitHub app allows for faster, observable automation driven by pushes to workflow repositories on GitHub. Once a workflow is imported, any push to that repo will trigger an upload of the workflow under the version of the branch pushed to.

image

Importing a new or existing workflow

  1. Inside the Form Bio web app, navigate to the Manage section under Workflows
  2. Select Create New
  3. If you have not linked your GitHub account to the FormBio GitHub App yet, please click Sign in with GitHub and Authorize access
  4. Once authorized, you will be redirected to the Web App and see an Import Workflow from GitHub screen
  5. Select formbio under Select Account
  6. Select the repository containing the workflow you would like to import under Select Repository
    1. If you do not see the repository you are looking for you will need to install the App to it
    2. To do this, open up the Select Account dropdown again and select Add or Modify Installations
    3. From there, select formbio organization and select the repositories you would like to install the app onto.
    4. Click Update Access
    5. You will be redirected back to the Web App where you can now select that repository
  7. The Select Workflow list should automatically detect the workflows located within the repo’s workflows/ directory that contain a workflow.json for you.
  8. Click Configure on the workflow you’d like to import
  9. The configure form should be pre-populated with the default values and paths. It is strongly discouraged to change any of these at this time.
  10. Click Import Workflow and you’ll be redirected to the main branch’s build in progress. See step 6 in Monitor

Monitor

image
  1. Inside the Form Bio web app, navigate to the Manage section under Workflows (See first step in Import)
  2. You’ll see a list of cards representing all workflows that have been uploaded to the current project (whether via CLI or GitHub App) listed here
  3. Workflows imported via GitHub App will be indicated by a GH logo and a View Deployments button
  4. Click View Deployments to see a record of all uploads since being imported. To see logs for a particular upload, click View Logs.
  5. You’ll be taken to a logs view, displaying data about the workflow, version, and build status.
  6. If an upload was successful, you’ll see a Go to Launch button that will take you to the docs page for that version, where you can Launch the workflow

image