Precision Medicine

Expression Analysis in RNASeq
Use Cases
Summary and Methods
Inputs
Outputs
Workflow Walkthrough
Results Walkthrough
Citations
Built with
Genomic Variant Analysis
Use Cases
Summary and Methods
Inputs
Outputs
Workflow Walkthrough
Results Walkthrough
Citations
Built with
Human Haplotype
Summary
Workflow Walkthrough
Citations
Built with

Expression Analysis in RNASeq

This workflow can be used to determine gene expression, splice variants and differential expression analysis.

Version 1.1.1

Use Cases

Determine differentially expressed genes between two or more groups of samples (treated vs untreated, knock-out vs wildtype, cell type A vs cell type B)
Determine differentially expressed transcripts between two or more groups of samples
Compare the gene expression profiles of samples

Summary and Methods

This workflow is designed to help the user thoroughly analyze RNA sequencing data. Currently, two functions are supported: Full Analysis and Recalculate Statistics. Both functions include the option to specify whether the data include Human Cancer Samples. Click the toggles below to learn more about each function.

‣

Full Analysis

Summary

This workflow is designed to help the user determine differential gene abundances and differential expression between two or more groups of samples. The user will provide as input a folder containing all the read files needed for analysis and a sequencing file relating sample IDs to attributes. The user will receive as output differential gene and transcript abundance analysis files and comparison files between the two or more samples.

Methods

This analysis was performed using the Expression Analysis in RNASeq workflow on the Form Bio platform. Reads are trimmed using TrimGalore [1], to remove low quality (qual < 25) ends of reads and remove reads < 35bp. Trimmed reads are aligned to a reference genome using STAR [2] (default) or HiSAT2 [3]. Duplicate reads can optionally be marked using Picard MarkDuplicates. BAMs from the same sample generated by multiple runs are merged using Samtools [4]. The abundance of transcripts and genes are assessed using FeatureCount to generate raw gene counts [5], StringTie to generate FPKM [6] and Salmon to generate raw transcript counts [7]. Sample comparisons and differential gene/transcript expression analysis are performed using EdgeR [8], DESeq2 [9] and IsoformSwitchAnalyzeR [10].

‣

Recalculate Statistics

Summary

This workflow is designed to help the user determine differentially expressed genes using abundance counts generated by previous workflow analysis. The user will provide a folder containing output files from previous analyses. The workflow will perform the statistical analysis again with a different composition of samples and output the results of this analysis.

Methods

Sample comparisons and differential gene/transcript expression analysis are performed using EdgeR [8], DESeq2 [9] and IsoformSwitchAnalyzeR [10].

‣

Human Cancer

Summary

This workflow is designed to help the user determine differential gene abundances and differential expression between two or more groups of human tumor samples. The user will provide as input a folder containing all read files needed for analysis and a sequencing file relating sample IDs to attributes. The user will receive as output differential gene and transcript abundance analysis files and comparison files between the two or more samples.

Methods

‣

Inputs

Run Name: This is a unique name for each run of pipelines in your project
Organism: Reference Genome used for alignment
Reference Genome Annotation: Annotation that should be used for determining gene and transcript counts.
Input Folder: This is the folder that contains all of the fastq files that will be used in this analysis
File Format
Sample Description File

This file matches the sequence files to samples; sequence data from multiple runs will be merged if they have the same SampleID
RunID should be a part of the the fastq files.
SampleGroup is necessary for statistical analysis, there must be at least 2 samples per group

RunID	SampleID	SampleGroup
SRR994739	SAMEA9454349	Treated
SRR994740	SAMEA9454349	Treated
SRR994741	SAMEA9454341	Untreated
SRR994742	SAMEA9454348	Treated
SRR994743	SAMEA9454348	Treated
SRR994744	SAMEA9454342	Untreated

Advanced Parameters

Algorithms

‣

Mark Duplicates Algorithm

‣

Trim Reads

true
false

‣

Alignment Algorithm

Library Types

‣

Orientation

I = inward
O = outward
M = matching

‣

Read Origin

F = Forward
R = Reverse
'' = Single End/Unknown

‣

Stranded

S = stranded
U = unstranded

‣

Outputs

‣

Merged Sorted BAM Files per Sample

bams/SampleID.bam
bams/SampleID.bam.bai

‣

Salmon Output

featurects/SampleID.salmon.tar.gz

‣

StringTie Output

featurects/SampleID_stringtie
featurects/SampleID.fpkm.txt

‣

Gene/Transcript Abundances

featurects/SampleID.cts.txt
featurects/SampleID.cts.txt.summary
countTable.fpkm.txt
countTable.logCPM.txt
countTable.stats.txt
countTable.txt

‣

BigWig Files

featurects/SampleID.unique.bw
featurects/SampleID.all.bw

‣

MultiQC HTML

multiqc_data/multiqc.log
multiqc_data/multiqc_data.json
multiqc_data/multiqc_fastqc.txt
multiqc_data/multiqc_featureCounts.txt
multiqc_data/multiqc_general_stats.txt
multiqc_data/multiqc_samtools_flagstat.txt
multiqc_data/multiqc_samtools_stats.txt
multiqc_data/multiqc_sources.txt

‣

Raw BAM QC Tables

SampleID/SampleID.alnstat.txt
SampleID/SampleID.flagstat.txt
SampleID/SampleID_fastqc.html
SampleID/SampleID_fastqc.zip

‣

MultiQC Raw Tables

multiqc_report.html

‣

Differential Gene Abundance Analysis

Group1_Group2.edgeR.txt
Group1_Group2.gene2path.txt
Group1_Group2.stringDB.txt

‣

Sample Comparison

countTable.mds.txt
countTable.pca.txt
countTable.pcapercvar.txt
countTable.sampleDists.txt

‣

Differential Transcript Abundance Analysis

countTable.dexseq.txt
gene.trxstats.txt
splicingEnrichment.txt
splicingIsoformUsage.txt
splicingResults.html
splicingSummary.txt

‣

Workflow Walkthrough

Navigate to the Expression Analysis in RNASeq launcher card. You can use the search bar at the top right corner, or use the Google DeepOmics, Precision Medicine, Functional Genomics, or Next Generation Sequencing tags to find the workflow card.

Select the version from the dropdown box in the top right corner. When ready to begin analysis, click “Run Workflow”.

This workflow currently supports two functions: Full Analysis and Recalculate Statistics. Both functions include the option to specify whether the data include Human Cancer Samples. Checking this option includes gene fusion predictions in the analysis to show alternations causing gene fusion events.

Let’s look at the Recalculate Statistics options, for repeating statistical analysis performed in previous workflow runs. Select this function from the dropdown box. Also provide the directory containing the files to be analyzed as well as a file relating RunIDs, SampleIDs and sample attributes such as SampleGroup. (this table can be created within the workflow itself).

Select a reference genome and annotation version for the workflow run.

Name the workflow run, then take a minute to review workflow settings and parameters. When you’re satisfied, click “Run Workflow” at the bottom-left corner.

‣

Results Walkthrough

To view results for your Expression Analysis in RNASeq workflow, first find your workflow run from the Activity tab of the platform. You can use the search bar to search for it. Select your workflow run for more information.

After selecting your workflow run, click Open Analysis in the upper right-hand corner to open the RNASeq Analysis Portal in the RNASeq Dashboard (opens as a separate tab) to view an interactive summary of your data. You may also navigate to the Files tab to view and download analysis outputs in the output folder. These folders are also available in the File Explorer.

Use the RNASeq Analysis Portal to view your data analysis. Navigate the tabs across the top or use the links in the Introduction tab.

‣

Citations

Krueger, F., James, F., Ewels, P., Afyounian, E. & Schuster-Boeckler, B. FelixKrueger/TrimGalore: V0.6.7 - DOI via Zenodo. (2021) doi:10.5281/ZENODO.5127899.
Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England) 29, 15–21 (2013).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915 (2019).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, (2019).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods 14, 417–419 (2017).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England) 26, 139–140 (2010).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550 (2014).
Vitting-Seerup, K. & Sandelin, A. IsoformSwitchAnalyzeR: Analysis of changes in genome-wide patterns of alternative splicing and its functional consequences. Bioinformatics 35, 4469–4471 (2019).
Haas, B. et al. STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq. bioRxiv 120295 (2017) doi:10.1101/120295.
Feng, Y.-Y. et al. RegTools: Integrated analysis of genomic and transcriptomic data for discovery of splicing variants in cancer. bioRxiv 436634 (2018) doi:10.1101/436634.

Built with

Genomic Variant Analysis

Identify single-nucleotide variants (SNVs), indels, and structural variants in a diploid genome resequencing projects by comparison to a reference genome.

Version 1.6.1

Use Cases

Determine variants in DNA samples compared to a reference genome including single nucleotide variants (SNVs), insertions, deletions and structural variants

Germline Variant Calling
Variant Calling in Ancient DNA
Somatic Mutation Detection

Determine variants in DNA samples compared to a custom reference genome for small or synthetic genomes

Plasmid
Virus
Bacteria
Synthetic Genome

Sequencing Platform supported include Illumina, Pacbio and Oxford Nanopore (ONT)

Summary and Methods

This workflow is designed to help the user determine variants in DNA samples when compared to a reference genome. Currently, four different input DNA datatypes are supported: Germline (Diploid), Ancient DNA, Small Genomes (Viral/Prokaryotic/Synthetic), and Somatic (Human Cancer). Workflows can be run either with Parabricks, Sentieon or native open-source tools (NOST). Click the toggles below to learn more about each supported dataype.

‣

Germline (Diploid)

Summary

This workflow is designed to help the user determine germline variants in diploid DNA. The user will provide input FastQ files containing the diploid DNA to be analyzed, and will recieve as output a summary of germline variants in the DNA compared to the chosen reference genome.

Methods

This analysis was performed using the Germline Variant Analysis workflow on the Form Bio platform. When the datatype is “Germline (Diploid)”, this workflow determines genetic variants including SNVs, insertions and deletions of high-quality NGS data when compared to a reference genome. Reads are trimmed using TrimGalore [1] or FastP [2], to remove low quality (qual < 25) ends of reads and remove reads < 35bp. These default value can be changed by the user. This workflow can be run with native open-source tools (NOST), Sentieon or with Parabricks.

With NOST and Sentieon, trimmed reads are aligned to a reference genome using BWA-MeM [3], Minimap2 [4] or Winnowmap [5] depending on data type. Duplicate reads can optionally be marked using Picard MarkDuplicates [6]. BAMs from the same sample generated by multiple runs are merged using Samtools [7]. Alignment qualtity is assessed using FastQC [8], Samtools [7], Bedtools [9] and Qualimap [10]. Variants can be detected with joint calling using Freebayes [11], Samtools/Bcftools [12], DNAScope and GATK4 [13].

With Parabricks, trimmed reads are aligned, duplicate reads are marked and alignment quality is accessed using fq2bam. Quality metrics are summarized with MultiQC. Variants can be detected with GATK [13] and DeepVariant [14] to produce gVCF files. Genotyping of gVCF files is determined using GLNexus [15]. Variants effects are determined using SNPEff [16].

‣

Ancient DNA

Summary

This workflow is designed to help the user determine variants in ancient DNA. The user will provide input FastQ files containing the DNA to be analyzed, and will recieve as output a summary of variants in the DNA compared to the chosen reference genome.

Methods

This analysis was performed using the Germline Variant Analysis workflow on the Form Bio platform. When the datatype is “Ancient”,this workflow can be used to determine genetic variants of high quality NGS data in your project compared to a supported reference genome. Reads are trimmed using AdapterRemoval [17], FastP [2], or TrimGalore [1], to remove low quality (qual < 25) ends of reads and remove reads < 35bp. These default value can be changed by the user. Contaminates are detected using Kraken [18] with confidence of 0.8 using Kraken’s precompiled database or a custom database where the human genome has been removed. Unclassified trimmed reads are aligned to a reference genome using BWA MEM [3] or BWA Aln (with seed of 16,500, maximum edit distance of 0.01 and maximum gap opens of 2). BAMs from the same library generated by multiple runs are merged using Samtools [7]. Duplicate reads from the the same library can optionally be marked using PaleoMIX [19] or Picard MarkDuplicates [6]. BAMs from the same sample generated by multiple libraries are merged using Samtools [7]. Base recalibration is done using mapdamage2 [20]. Alignment quality is assessed using QualiMap [21], DamageProfiler [22], and MultiQC [23]. Germline variants can be detected using Freebayes [11], Samtools/Bcftools [12] and GATK4 [13]. In order to increase the speed of analysis, the Parabricks (requires GPUs) or Sentieon optimized versions of these algorithms are used for BWA Mem and GATK4. Genotyping of GVCF files is determined using GLnexus [15]. Variant effects are determined usng SNPEff [16].

‣

Small Genomes (Viral/Prokaryotic/Synthetic)

Summary

This workflow is designed to help the user determine germline variants in small or synthetic genomes with an option to provide the custom genome sequence. The user will provide input FastQ files containing the DNA to be analyzed, and will recieve as output a summary of variants in the DNA compared to the chosen reference genome. It is assumed that Small genomes are haploid.

Methods

This analysis was performed using the Germline Variant Analysis workflow on the Form Bio platform. When data type is a “Small Genome”, this workflow can be used to determine genetic variants of high quality NGS data in your project compared to a supported reference genome. Reads are trimmed using TrimGalore [1]. Trimmed reads are aligned to a reference genome using BWA MEM [3]. Duplicate reads are marked using Picard MarkDuplicates [6]. Germline variants can be detected using Samtools/Bcftools [12]. In order to increase the speed of analysis, the Parabricks or Sentieon optimized versions of these algorithms can be used for BWA MEM. Variant effects are determined usng SNPEff [16] if the genome is provided by the platform. For SARS-CoV-2, genome sequences of samples are determined using BCFTools [12] and lineage classification is determined using PANGOLIN [24].

‣

Somatic (Human Cancer)

Summary

This workflow is designed to help the user determine variants in somatic DNA. The user will provide input FastQ files containing the DNA to be analyzed, and will receive as output a summary of variants in the DNA compared to the chosen reference genome.

Methods

When data type is “Somatic”, this workflow can be used to determine genetic variants of tumor NGS data compared to a supported reference genome. When a normal sample is provided, specialized somatic variant calling methods will be applied and allow users to filter germline variants from the resulting VCF files. Reads are trimmed using TTrimGalore [1] or FastP [2], to remove low quality (qual < 25) ends of reads and remove reads < 35bp. These default value can be changed by the user. Trimmed reads are aligned to a reference genome using BWA MEM [3], Minimap2 [4] or Winnowmap [5] depending on data type. Duplicates reads are marked using Picard MarkDuplicates [6]; if provided by Library. BAMs from the same sample generated by multiple runs are merged using Samtools [7]. Alignment quality is assessed using FastQC [8], Samtools [7], Bedtools [9] and Qualimap [10]. Quality reports are produced by MultiQC. Variant effects are determined using SNPEff [16]. Somatic variants can be detected in somatic or tumor-only mode using Strelka2 [25], Freebayes [11], DeepSomatic and MuTect2 [26], TNScope. In order to increase the speed of analysis, the Parabricks or Sentieon optimized versions of these algorithms are used for BWA Mem and GATK4. When a matched normal sample is present, tumor/normal germline SNP matching is confirmed using NGSCheckMate [27] and microsatellite stability is assessed using MSI-Sensor [28].

‣

Inputs

Run Name: This is a unique name for each run of pipelines in your project
Organism: Reference Genome used for alignment
Reference Genome Annotation: Annotation that should be used for determining gene and transcript counts.
Input Folder: This is the folder that contains all of the fastq files that will be used in this analysis
Sample Description File

This file matches the sequence files to samples; sequence data from multiple runs will be merged if they have the same SampleID
RunID should be a part of the the fastq files.
SampleGroup is necessary for statistical analysis, there must be atleast 2 samples per group
File Format

RunID	SampleID
SRR994739	SAMEA9454349
SRR994740	SAMEA9454349
SRR994741	SAMEA9454341

Capture Bedfile

The intervals in capture BED file indicate regions where alignments are expected based on the target capture kit.
Make sure that there is no column names present in the file.
Forth column can indicate a region name and used to determine poorly capture regions.

SeqName	Start	End	Name
chr1	1787293	1787413	GNB1:GNB1_chr1:1718769-1718876:chr1:1718769-1718876_1
chr1	1787353	1787473	GNB1:GNB1_chr1:1718769-1718876:chr1:1718769-1718876_2
chr1	1789040	1789160	GNB1:GNB1_chr1:1720491-1720708:chr1:1720491-1720708_1
chr1	1789160	1789280	GNB1:GNB1_chr1:1720491-1720708:chr1:1720491-1720708_2
chr1	1790375	1790495	GNB1:GNB1_chr1:1721833-1722035:chr1:1721833-1722035_1
chr1	1790495	1790615	GNB1:GNB1_chr1:1721833-1722035:chr1:1721833-1722035_2
chr1	1793187	1793307	GNB1:GNB1_chr1:1724683-1724750:chr1:1724683-1724750_1
chr1	1793247	1793367	GNB1:GNB1_chr1:1724683-1724750:chr1:1724683-1724750_2
chr1	1804380	1804500	GNB1:GNB1_chr1:1735857-1736020:chr1:1735857-1736020_1
chr1	1804500	1804620	GNB1:GNB1_chr1:1735857-1736020:chr1:1735857-1736020_2
chr1	1806416	1806536	GNB1:GNB1_chr1:1737913-1737977:chr1:1737913-1737977_1
chr1	1806476	1806596	GNB1:GNB1_chr1:1737913-1737977:chr1:1737913-1737977_2

Advanced Parameters

Algorithms

‣

Trim Reads

true
false

‣

Alignment Algorithm

‣

Mark Duplicates Algorithm

Sequence Data Type

Option	Meaning
sr	Short single-end reads without splicing (-k21 -w11 --sr --frag=yes -A2 -B8 -O12,32 -E2,1 -r100 -p.5 -N20 -f1000,5000 -n2 -m20 -s40 -g100 -2K50m --heap-sort=yes --secondary=no). This is the default mode.
map-ont	Align noisy long reads of ~10% error rate to a reference genome.
map-hifi	Align PacBio high-fidelity (HiFi) reads to a reference genome (-k19 -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1 -s200).
map-pb	Align older PacBio continuous long (CLR) reads to a reference genome (-Hk19).

‣

Outputs

‣

Merged Sorted BAM Files per Sample

bams/SampleID.bam
bams/SampleID.bam.bai

‣

Variants (genomevariants)

DELLY VCF
Freebayes
Mutect2
Strelka2
SVABA
Union VCF
Filtered VCF
MAF

‣

MultiQC HTML

multiqc_data/multiqc.log
multiqc_data/multiqc_data.json
multiqc_data/multiqc_fastqc.txt
multiqc_data/multiqc_general_stats.txt
multiqc_data/multiqc_picard_dups.txt
multiqc_data/multiqc_samtools_flagstat.txt
multiqc_data/multiqc_samtools_stats.txt
multiqc_data/multiqc_sources.txt
multiqc_report.html

‣

NGS Checkmate

profiling/SamplePair_all.txt
profiling/SamplePair_matched.txt

‣

MSI

profiling/SamplePair.msi.txt

‣

Sample QC

SampleID/SampleID.alnstats.txt
SampleID/SampleID.covhist.txt
SampleID/SampleID.flagstat.txt
SampleID/SampleID.genomecov.txt
SampleID/SampleID.libcomplex.txt
SampleID/SampleID_exoncoverage.txt
SampleID/SampleID_lowcoverage.txt
SampleID/SampleID_fastqc.html
SampleID/SampleID_fastqc.zip

‣

Workflow Walkthrough

Navigate to the Genomic Variant Analysis workflow launcher on the Form Bio platform. You can locate the workflow using the search bar at the top right corner, or by using the Google DeepOmics, Functional Genomics, Precision Medicine, or Next-Generation Sequencing filters on the left-hand side.
Select the version from the dropdown versioning menu in the top right corner. On this page, you can find information about the workflow analysis. When ready to begin, click Run Workflow.

Select the type of input data to be analyzed. Currently, four types are supported - Germline, Ancient DNA, Somatic, and Viral/Prokaryotic/Synthetic Genome. Also provide the platform that was used to collect the data. Select the type of analysis to be run. Finally, provide the directory containing the files to be analyzed as well as a file relating RunIDs, LibraryIDs and SampleIDs. (this table can be created within the workflow itself).

Select a reference genome to which the input data will be compared. You may also optionally upload a BED file detailing genomic regions of note.

Tune additional parameters related to your workflow run. These parameters may change depending on your input data.

Give your workflow run a unique name, and review the input data and run parameters. When ready to submit, click “Run Workflow”.

‣

Results Walkthrough

To view results for your Genomics Variant Analysis workflow, first find your workflow run from the Activity tab of the platform. You can use the search bar to search for it. Select your workflow run for more information.

On the Results tab, you can see a preview summary of the analysis.

Under the All Files tab, you can view the final HTML file, which is nested in the output folder. You may view or download this file. This file can also be found in the File Explorer.

‣

Citations

Krueger, F., James, F., Ewels, P., Afyounian, E. & Schuster-Boeckler, B. FelixKrueger/TrimGalore: V0.6.7 - DOI via Zenodo. (2021) doi:10.5281/ZENODO.5127899.
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2, e107 (2023).
Li, H. [Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM](https://doi.org/arXiv:1303.3997 [q-bio.GN]). arXiv preprint arXiv 00, 3 (2013).
Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Jain, C., Rhie, A., Hansen, N. F., Koren, S. & Phillippy, A. M. Long-read mapping to repetitive reference sequences using Winnowmap2. Nature Methods 19, 705–710 (2022).
Thomer, A. K., Twidale, M. B., Guo, J. & Yoder, M. J. Picard Tools. in Conference on Human Factors in Computing Systems - Proceedings (2016).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Andrews, S. et al. FastQC. (2012).
Quinlan, A. R. & Hall, I. M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. (2012).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 43, 491–498 (2011).
Yun, T. et al. Accurate, scalable cohort variant calls using DeepVariant and GLnexus. (2020) doi:10.1101/2020.02.10.942086.
Lin, M. F. et al. GLnexus: Joint variant calling for large cohort sequencing. (2018) doi:10.1101/343970.
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012).
Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: Rapid adapter trimming, identification, and read merging. BMC Research Notes 9, 88 (2016).
Wood, D. E. & Salzberg, S. L. Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biology 15, R46 (2014).
Schubert, M. et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nature Protocols 9, 1056–1082 (2014).
Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: Fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).
Neukamm, J., Peltzer, A. & Nieselt, K. DamageProfiler: Fast damage pattern calculation for ancient DNA. Bioinformatics 37, 3652–3653 (2021).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology 5, 1403–1407 (2020).
Kim, S. et al. Strelka2: Fast and accurate calling of germline and somatic variants. Nature Methods 15, 591–594 (2018).
Benjamin, D. et al. Calling Somatic SNVs and Indels with Mutect2. (2019) doi:10.1101/861054.
Lee, S. et al. NGSCheckMate: Software for validating sample identity in Next-generation sequencing studies within and across data types. Nucleic Acids Research 45, e103 (2017).
Jia, P. et al. MSIsensor-pro: Fast, Accurate, and Matched-normal-sample-free Detection of Microsatellite Instability. Genomics, Proteomics and Bioinformatics 18, 65–71 (2020).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nature Biotechnology 37, 224–226 (2019).

Built with

Human Haplotype

Determine the haplotype of certain human genes, include HLA, RBG, and Codis.

Version 0.0.2

Summary

Sequence reads are aligned and their haplotype is predicted using Hisat-Genotype [1].

‣

Workflow Walkthrough

Navigate to the Human Haplotype workflow launcher. You can use the search bar at the top right to navigate this workflow or you can use the Precision Medicine or Next-Generation Sequencing filters on the left-hand side.
Select the version from the dropdown versioning menu. You can view information about the use-cases and workflow analysis here. When ready to begin, click “Run Workflow”.

Provide a directory that contains your FastA files for analysis. Then, select the gene you wish to determine the haplotype for. Currently, HLA, RBG, and Codis are supported.

Give your workflow run a unique name, then review input data and parameters. When ready to submit, click “Run Workflow”.

‣

Citations

Kim, D., Paggi, J. M., Park, C., Bennett, C., & Salzberg, S. L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology, 37(8), 907–915. https://doi.org/10.1038/s41587-019-0201-4