🧬

Genomics

16S Sequencing

Perform taxonomic classification on 16S sequencing data.

Workflow Walkthrough

  1. Navigate to the 16S Sequencing Analysis launcher card. This can also be found using the search bar at the top right, or using the “Genomics” tag on the left-hand side.
  2. image
  3. Select the version from the dropdown bar on the top right side.
  4. image
  5. Click “Run Workflow” in the top right corner.
  6. Launcher Tabs
    1. Select the algorithm (MOTHUR or QIIME) and select the file directory for the input reads.
    2. image
    3. Provide a sample attribute file relating run IDs and sample IDs.
    4. image

      c. Name the workflow, and review workflow parameters. When you are satisfied, click “Run Workflow” at the bottom right corner.

      image

Genome Assembly

image

This workflow can take short read Illumina, long read ONT, or PacBio data along with OMNI-C or Hi-C reads for scaffolding to create genome assemblies. This workflow is enhanced with Google DeepOmics tools such as DeepConsensus and DeepPolisher.

Version 1.0.5

Use Cases

  • Assemble genomes from short and/or long-read sequencing files

Summary and Methods

This workflow has been designed to create draft genome assemblies from short and/or long-read FastQ sequencing files. The workflow is also capable of polishing, purging, filtering, and evaluating these assemblies during their creation. If supplied with Fast5 files, the workflow can perform ONT basecalling before assembly. The user will provide as input the short and/or long-read FastQ files. The user will receive as output a draft genome assembly. Click the toggles below to learn more about how this workflow processes short reads and long/mixed reads.

Short Read

Summary

This workflow is designed to help create draft genome assemblies from short-read sequencing data, such as Illumina or WGS. The user will provide as input a directory of sequencing files as well as optional OMNI-C and/or HiC data for polishing. The user will receive as output a draft genome assembly.

Methods

This analysis was performed using the Genome Assembly workflow on the Form Bio platform. Short reads are first interleaved with bbmap [1] if paired-end and not yet interleaved before combining them into a singular file. Once reads are consolidated, they are assembled with Spades [2], MetaSpades [3], Megahit [4], and/or Skesa [5]. Finally, the created assemblies are evaluated with Quast [6], Busco [7], and/or Merqury [8].

Long Read or Mixed Read

Summary

This workflow is designed to help the user create draft genome assemblies from long-read sequencing data, such as PacBio or ONT data. The user will provide as input a directory of sequencing files as well as optional OMNI-C and/or HiC data for polishing. The user will receive as output a draft genome assembly.

Methods

This analysis was performed using the Genome Assembly workflow on the Form Bio platform. This workflow first performs basecalling if desired. This basecalling can be ONT basecalling when supplied with Fast5/Pod5 files either using Dorado or Bonito. Alternatively, if input data is PacBio subread uBAMs or the sequencer run folder, consensus contig reads are created using circular consensus sequencing (CCS) [9]. Optionally DeepConsensus [10] can be used to improve basecalls. If Bonito is run modbam2bed can optionally be run to create a methyl BED file. Using the provided input FastQ files or ONT basecalls a draft assembly is created with Flye [11], Shasta [12], Verkko [13], and/or Hifiasm [14]. Once draft assemblies are made, they are then polished with polishers running in the following order: racon [15], medaka consensus, pilon [16], juicer [17] + 3ddna [18], TGS Gap Closer [19], DeepPolisher [20], Ragtag Scaffold [21], Ragtag Patch [21], Gapless [22]. After polishing, optional purging can be run with the Purge Dups workflow using minimap2 [23] and small contigs can be filtered out with Seqtk. Between all steps in the workflow, the created assemblies are evaluated with Quast[6], Busco [24], and/or Merqury [8].

Inputs

  • Folder containing Fast5/Pod5 ONT reads (optional)
  • Folder containing PacBio reads (optional)
  • FastQ file of ONT reads (optional)
  • FastQ file of Pacbio Hifi reads (optional)
  • FastQ files of forward and reverse HI-C or OMNI-C reads (optional)
  • Folder of WGS Reads (optional)
  • FastA file of assembly of a related organism, preferably within family level (optional)

Mandatory Inputs - Sequencing data files

Parameters:

  • Type of sequencing data input (PacBio, ONT, etc)
  • Basecalling algorithm
  • Assembly algorithm(s)
  • Polishing algorithm(s)
  • Evaluation algorithm(s)

Outputs

  • ONT FastQ Basecall file (only if basecalling is selected)
  • Draft assembly in FastA format
  • Polished assembly in FastA format
  • Optional assembly evaluation step outputs

Workflow Walkthrough

  1. Navigate to the Genome Assembly launcher card. You can use the search bar at the top right corner, or use the Google DeepOmics or Genomics tags to find the workflow card.
  2. image
  3. Select the version from the dropdown box in the top right corner. When ready to begin analysis, click “Run Workflow”.
  4. image
  5. Select which type of sequencing technology was used to collect the input data. PacBio and ONT are long read technologies that can be combined with each other as well as OMNIC and HiC sequencing reads for the polishing steps. Illumina is a short read technology. WGS (whole genome shotgun sequencing) takes in a directory of short read FastQ files for input to create a singular assembly. Determine whether to run basecalling. Finally, provide the sequencing data file(s) to be analyzed (or directory, in the case of WGS).
  6. The following example shows the setup for an ONT assembly input. Parameters and other fields will differ depending on input type.

    image
  7. On the next tab, select desired assembler algorithms. There are some default options already selected.
  8. image
  9. For long read data, select desired polishing options and post-assembler algorithms on the next tab. There are some default options already selected.
  10. image
  11. On the next tab, determine which evaluation algorithm to run.
    • When using Busco, you’ll be asked to choose a lineage - if you’re unsure, you can select “auto-lineage” which will attempt to determine the input lineage, but will result in a longer runtime.
    • If running Merqury, you’ll be asked for an expected genome size in bytes.
    • image
  12. Finally, give your workflow run a unique name, and review the input data and run parameters. When ready to submit, click “Run Workflow”.
  13. image

Results Walkthrough

  1. To view results for your Sequencer Raw Data to FastQ workflow, first find your workflow run from the Activity tab of the platform. You can use the search bar to search for it. Select your workflow run for more information.
  2. Upon selection, results from your workflow run are summarized in the Results tab. HTML output files can be previewed or opened from here.
  3. image
  4. Under the All Files tab, you can view the final HTML files, which are nested in the output folder. You may view or download this file. This file can also be found in the File Explorer.
  5. image

Citations

  1. Bushnell, B. BBMap: A Fast, Accurate, Splice-Aware Aligner. (2014).
  2. Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A. & Korobeynikov, A. Using SPAdes De Novo AssemblerCurrent Protocols in Bioinformatics 70, e102 (2020).
  3. Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: A new versatile metagenomic assemblerGenome Research 27, 824–834 (2017).
  4. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graphBioinformatics 31, 1674–1676 (2015).
  5. Souvorov, A., Agarwala, R. & Lipman, D. J. SKESA: Strategic k-mer extension for scrupulous assembliesGenome Biology 19, 153 (2018).
  6. Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: Quality assessment tool for genome assembliesBioinformatics (Oxford, England) 29, 1072–1075 (2013).
  7. Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral GenomesMolecular Biology and Evolution 38, 4647–4654 (2021).
  8. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: Reference-free quality, completeness, and phasing assessment for genome assembliesGenome Biology 21, 245 (2020).
  9. Travers, K. J., Chin, C.-S., Rank, D. R., Eid, J. S. & Turner, S. W. A flexible and efficient template format for circular consensus sequencing and SNP detectionNucleic Acids Research 38, e159 (2010).
  10. Baid, G. et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformerNature Biotechnology 41, 232–238 (2023).
  11. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphsNature Biotechnology 37, 540–546 (2019).
  12. Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomesNature Biotechnology 38, 1044–1053 (2020).
  13. Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with VerkkoNature Biotechnology 41, 1474–1482 (2023).
  14. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasmNature Methods 18, 170–175 (2021).
  15. Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected readsGenome Research 27, 737–746 (2017).
  16. Walker, B. J. et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly ImprovementPLOS ONE 9, e112963 (2014).
  17. Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C ExperimentsCell Systems 3, 95–98 (2016).
  18. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffoldsScience (New York, N.Y.) 356, 92–95 (2017).
  19. Xu, M. et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long readsGigaScience 9, giaa094 (2020).
  20. Google/deeppolisher. (2024).
  21. Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editingGenome Biology 23, 258 (2022).
  22. Schmeing, S. & Robinson, M. D. Gapless provides combined scaffolding, gap filling, and assembly correction with long readsLife Science Alliance 6, e202201471 (2023).
  23. Li, H. Minimap2: Pairwise alignment for nucleotide sequencesBioinformatics 34, 3094–3100 (2018).
  24. Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. (2021) doi:10.48550/arXiv.2106.11799.

Genome Coordinate Conversion

image

Convert the location of a set of genomic features, such as genes, transcription factor bindings sites, or promoters, from one genome to another.

Version 1.0.1

Use Cases

  • Create a Genome Coordinate Conversion File between two genomes
  • Map the location of a set of genomic features (e.g. genes, transcription factor binding sites, promoters) from the target genome to the query genome
  • Filter the genomic features that are converted to the query genome for overlap with a second set of genomic features. The locations of the features in this separate set are in the query genome

Summary

This workflow is designed to help the user convert the location of a set of genomic features, such as genes, transcription factor bindings sites, or promoters, from one genome to another. The user will provide as input a query genome to convert to and a target genome to convert from. If a Genome Coordinate Conversion File is not provided, one may be generated from two input FastA files. The user will receive as output the location of genomic features on the target genome, and a Genome Coordinate Conversion File if indicated.

Methods

This workflow was performed using the Genome Coordinate Conversion workflow on the Form Bio platform. The Genome Coordinate Conversion File is a whole genome alignment between the target genome and the query genome. Only features that lie in regions of homology between the two genomes are mapped. If the Genome Coordinate Conversion File does not yet exist, this workflow can generate one using either LastZ [1] or SegAlign [2] (GPU-optimized version of LastZ) from a pair of genome FASTA files. CrossMap [3] will use this chain file to convert the coordinates of genomic features in the target genome to the query genome. This file of genomic features in the target/reference genome can be in BED, VCF, BAM, or MAF file formats. CrossMap outputs a BED file with the location of these genomic features in the query genome. If provided with a second BED file with genomic features in the query genome, the workflow will filter converted genomic features for overlap with this second BED file [4].

Inputs

  • Genome Coordinate Conversion File (see UCSC to learn about chain format https://genome.ucsc.edu/goldenPath/help/chain.htmlOR FastA of Target Genome and FastA of Query Genome.
  • File 1 of genomic features with coordinates in the Target Genome (optional)
    • Can be in VCF, MAF, BAM, or BED format
  • File 2 of Genomic Features with coordinates in the Query Genome (optional)
    • Must be in BED format
    • The genomic features in File 1 converted to the Query Genome will be filtered for intersection of genomic features in File 2.

Mandatory Inputs:

  • Target genome FastA file * Query genome FastA file

Optional Inputs:

  • Genome Coordinate File - necessary for converting genomic regions, and is created from the FastA files otherwise

Required Parameters:

  • Analysis type (create Genome Coordinate File or convert genomic regions from Genome Coordinate file) * Query genome type (provided in the project or select one of our supported genomes) * Target genome type (provided in the project or select one of our supported genomes)

Optional Parameters:

  • Alignment algorithm (the default is LastZ; the user must specify if they wish to use SegAlign) * LastZ - Mode: default OR self (alignments between genomes of the same species) OR divergent (for species diverged >150 Mya - think human vs platypus) * LastZ - Linear gap: medium (species diverged by less than 100 million years ago) OR loose (for more distantly related species) * SegAlign - Related: TRUE if target and query genomes are closely related

Outputs

  • Genome Coordinate Conversion File (in chain format) between the Target Genome and the Query Genome ("liftover.chn")
  • BED file with location of genomic features in the Query Genome that lie in regions of homology in the Target Genome
  • BED file of genomic features in the Query Genome, filtered for overlap with Input File 2.

Workflow Walkthrough

  1. Navigate to the Genome Coordinate Conversion workflow on the Form Bio platform. This workflow can also be found using the search bar at the top right or by selecting either the “Genomics” or “Sequence Alignment” filter.
  2. On the launcher page, you can view use-cases, a brief summary of the analysis, and information on inputs and outputs of the workflow. When ready to begin, click “Run Workflow” in the top right corner.
  3. image
  4. On the inputs tab, select the type of analysis you wish to perform - either creation of a Genome Coordinate Conversion File, or the conversion of genomic regions using a preexisting file. Then, upload the query and target genomes as FastA files.
  5. image
  6. Review the workflow parameters and file inputs. Give your workflow run a unique name. When ready to submit the job, click Run Workflow at the bottom right corner.
  7. image

Results Walkthrough

  1. To view the results of your Genome Coordinate Conversion workflow, first find and select your workflow run from the Activity tab.
  2. Navigate to the Files tab. Under output, all workflow files will be listed. You may also view these files in the File Explorer on the left-hand side.

Citations

  1. Harris, R. Improved Pairwise Alignment of Genomic DNA. ProQuest (2007).
  2. Goenka, S. D., Turakhia, Y., Paten, B. & Horowitz, M. SegAlign: A scalable GPU-based whole genome aligner. in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 1–13 (IEEE Press, 2020).
  3. Zhao, H. et al. CrossMap: A versatile tool for coordinate conversion between genome assembliesBioinformatics 30, 1006–1007 (2014).
  4. Quinlan, A. R. & Hall, I. M. BEDTools: A flexible suite of utilities for comparing genomic featuresBioinformatics 26, 841–842 (2010).

Built with

image

FLAG: Eukaryote Genome Annotation

image

Annotate eukaryote genomes from an input FastA file.

Version 2.1.0

Use Cases

Annotate eukaryote genomes

Summary

Genome annotation uses computational algorithms to predict the locations of potential genes and tRNAs, a process known as structural annotation. Once locations are found they are functionally annotated by labeling with commonly used gene names, such as KRT8 and KRAS.

The longest part of this process is completed in several parallel steps, including RNA transcript to genome alignment, protein to genome alignment, and gene prediction. Once these steps are completed, the predicted genes and alignments are combined to form a consensus structural annotation. This structural annotation is then formatted uniformly to be similar to that of the NCBI and then functionally annotated with EnTAP.

Methods

This analysis was performed using the FLAG: Eukaryote Gene Annotation workflow on the Form Bio platform. First, if the input genome is unmasked, masking is done with WindowMasker [1], RepeatMasker [2], or RepeatModeler [3] in conjunction with RepeatMasker. Protein and transcript data are then aligned to the genome in parallel. Extra protein or transcript data can also be pulled from databases with BLAST. Depending on the predictors selected, gene prediction will be run in parallel or in series with protein and transcript alignments. After all alignments and gene predictions are finished, they are combined and filtered down to produce more complete consensus gene predictions and to filter out unlikely predictions. The protein coding annotations are also combined with tRNA annotations from tRNAScan. Once all annotations are filtered and combined, functional annotation (labeling genes such as KRAS, BRCA2, etc) is done with enTAP. Lastly, the structural and functional annotations are combined into a singular file and formatted in a gtf format similar to that of the NCBI. Finally, annotation statistics are calculated with AGAT [4] and BUSCO [5]. Further methodology can be found in the FLAG paper.

Workflow Walkthrough

  1. Navigate to the FLAG: Eukaryote Gene Annotation workflow on the Form Bio platform. You can find the workflow using the search bar on the top right or by using the Genomics filter on the left-hand side.
  2. Take a moment to view information on inputs, outputs, and a workflow summary. Select the version from the dropdown launcher. When ready to begin, click “Run Workflow”.
  3. image
  4. Start by providing the scientific name of the species to annotate, formatted with an underscore between the species and genus - drosophila_melanogaster, for example. Then, upload FastA files corresponding to the genome, proteins, and RNA. Choose whether to softmask the input files.
  5. image
  6. For finding homologous proteins, select whether to search for transcripts or proteins - generally just searching for transcripts is recommended. Then, select the database to search, such as RefSeq.
  7. image
  8. Tune some parameters related to the size of the genome, and which gene prediction algorithms to use. You may also additionally provide a related organism genome assembly and annotation file.
  9. image
    image
  10. Take a moment to review workflow inputs and parameters. Give your workflow run a unique name, then click “Run Workflow” to submit for analysis.
  11. image

Results Walkthrough

  1. To view the results of your FLAG run, first locate and select your workflow run from the Activity tab of the Form Bio platform.
  2. Select the Files tab and navigate to the outputs nested folder. All workflow analysis files will be here. You can also view these files through the File Explorer.
  3. image

Citations

  1. Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. WindowMasker: Window-based masker for sequenced genomesBioinformatics 22, 134–141 (2006).
  2. Hubley, R. Rmhubley/RepeatMasker. (2023).
  3. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element familiesProceedings of the National Academy of Sciences 117, 9451–9457 (2020).
  4. Dainat, J. et al. NBISweden/AGAT: AGAT-v1.1.0. (2023) doi:10.5281/zenodo.7950165.
  5. Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. (2021) doi:10.48550/arXiv.2106.11799.

Built with

image

Prokaryotic (Meta)Genome Analysis

image

This workflow can analyze prokaryotic genome sequences. There are 3 modes: Cultured Genome Assembly (Single Prokaryotic Genome) and Gene Finding and Annotations; 16S rRNA Taxonomic Analysis; and Whole Shotgun Metagenomic Analysis.

Version 1.0.2

Use Cases

  • Analyze 16S rRNA sequences for use in taxonomic profiling
  • Assemble single-cultured prokaryote genomes, and find and annotate genes within the genome
  • Assemble and annotate metagenomes with metagenome-assembled genomes (MAGs)

Summary and Methods

This workflow has been designed to help the user analyze prokaryotic genomes. There are currently 3 supported modes: Cultured Genome Assembly (Single Prokaryotic Genome) and Gene Finding and Annotation, 16S rRNA Taxonomic Profiling, and Whole Shotgun Metagenomic Analysis.

Cultured Genome Assembly

Summary

Cultured Genome Assembly involves growing a pure culture of the microbe in the lab, extracting its DNA, and sequencing it to generate a high-quality, complete genome assembly. This approach is particularly useful for microbes that can be easily cultured in the lab and it allows for the characterization of specific strains or isolates.

16S rRNA Taxonomic Profiling

Summary

16S rRNA Taxonomic Profiling is a quick and cost-effective method for characterizing microbial communities. The analysis can provide insights into the diversity and structure of microbial ecosystems by examining the 16S rRNA gene, a highly conserved region of the bacterial and archaeal genome that is used as a molecular marker for identifying and differentiating between different species or genera.

Methods

This analysis was performed using the Prokaryote Genome Analysis workflow on the Form Bio platform. This workflow takes one or more FastQ reads and runs 16s sequencing analysis to determine the composition of microbial communities. In addition to taxonomic classification, when running MOTHUR, this workflow also removes common contaminants such as human, dog, cat, etc, and runs rarefaction, sample comparisons, beta diversity measurements, analysis of molecular variance (AMOVA), and homogeneity of molecular variance (HOmOVA) calculations. When running QIIME2 this workflow denoises reads, performs operational taxonomic unit (OTU) clustering, generates a phylogenetic diversity analysis tree, runs rarefaction, alpha and beta diversity analysis, differential abundance testing with ANCOM, and creates emperor plots [18, 19].

Whole Shotgun Metagenomic Analysis

Summary

Whole Shotgun Metagenomic Analysis involves sequencing the DNA from an environmental sample, such as soil or water, without the need for culturing individual microbes. The resulting metagenomic dataset contains DNA sequences from all the organisms present in the sample, including those that are difficult or impossible to culture in the lab. The genomes of individual microbes can then be reconstructed from the metagenomic dataset using specialized software.

Methods

This analysis was performed using the Prokaryote Genome Analysis workflow on the Form Bio platform. This workflow takes one or more FastQ reads and decontaminates them, then assembles, annotates, and finally creates MAGs from the assemblies and performs QC and MAG Identification. This workflow is loosely based on the IMG/JGI assembly and MAG workflows [3, 4].

Inputs

Single-end or paired-end FastQ file(s) in FastQ or FastQ.gz format. Can be ONT, Pacbio, or Illumina.

Outputs

Cultured Single Genome or Metagenome

  • Assembly
  • Functional Annotation
  • Structural Annotation

16S rRNA

  • Taxonomic classification
  • Rarefaction
  • Alpha and beta diversity
  • OTU clustering
  • Emperor plots
  • AMOVA
  • HOMOVA
  • Differential Abundance Testing

Metagenome

  • Bins
  • MAGs

Workflow Walkthrough

  1. Navigate to the ProkGenoRanger: Prokaryotic (Meta)Genome Analysis workflow on the Form Bio platform. You can use the “Genomics” filter on the left-hand side to help find this workflow.
  2. Select the version from the dropdown versioning menu in the top-right corner. You can view some information about the workflow inputs and parameters on this page. When ready to begin, click “Run Workflow”.
  3. image
  4. Select the data type you wish to analyze. Currently, there are three supported functions: 16s rRNA analysis, Cultured Strain (Single-Genome Analysis), and Metagenome Assembly. Then, provide your input data and a sample attribute file. There may be additional inputs required based on your input data type.
  5. image
  6. Edit additional parameters based on your input data, including the analysis algorithm, polishing algorithm, and minimum read/contig lengths. These parameters may differ based on your chosen data type.
  7. image
  8. Give your workflow run a unique name, and then review the inputs and parameters associated with your analysis. When ready to submit the run, click “Run Workflow”.

Results Walkthrough

  1. To view the results of your ProkGenoRanger workflow, first find and select your workflow run from the Activity tab.
  2. Navigate to the Files tab, then output. All files associated with your workflow run will be present here.
  3. image

Citations

  1. Pierre-Alain Chaumeil, Aaron J Mussig, Philip Hugenholtz, Donovan H Parks, GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1925–1927, https://doi.org/10.1093/bioinformatics/btz848
  2. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25(7):1043‐1055. doi:10.1101/gr.186072.114
  3. Chen IA, Chu K, Palaniappan K, et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 2019;47(D1):D666‐D677. doi:10.1093/nar/gky901
  4. Clum, Alicia & Huntemann, Marcel & Bushnell, Brian & Foster, Brian & Foster, Bryce & Roux, Simon & Hajek, Patrick & Varghese, Neha & Mukherjee, Supratim & Reddy, T. & Daum, Chris & Yoshinaga, Yuko & O’Malley, Ronan & Seshadri, Rekha & Kyrpides, Nikos & Eloe-Fadrosh, Emiley & Chen, I-Min & Copeland, Alex & Ivanova, Natalia. (2021). DOE JGI metagenome workflow. mSystems. 6. 10.1128/mSystems.00804-20.
  5. Cantalapiedra, Carlos & Hernández-Plaza, Ana & Letunic, Ivica & Bork, Peer & Huerta-Cepas, Jaime. (2021). eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Molecular biology and evolution. 38. 10.1093/molbev/msab293.
  6. Vaser, Robert & Sovic, Ivan & Nagarajan, Niranjan & Sikic, Mile. (2017). Fast and accurate de novo genome assembly from long uncorrected reads. Genome Research. 27. gr.214270.116. 10.1101/gr.214270.116.
  7. Mikhail Kolmogorov, Derek M. Bickhart, Bahar Behsaz, Alexey Gurevich, Mikhail Rayko, Sung Bong Shin, Kristen Kuhn, Jeffrey Yuan, Evgeny Polevikov, Timothy P. L. Smith and Pavel A. Pevzner "metaFlye: scalable long-read metagenome assembly using repeat graphs", Nature Methods, 2020 doi:s41592-020-00971-x
  8. BBMap – Bushnell B. – sourceforge.net/projects/bbmap/
  9. Li, Heng. (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics (Oxford, England). 34. 10.1093/bioinformatics/bty191.
  10. Chan, Patricia & Lin, Brian Y & Mak, Allysia J & Lowe, Todd. (2021). TRNAscan-SE 2.0: Improved detection and functional classification of transfer RNA genes. Nucleic Acids Research. 49. 10.1093/nar/gkab688.
  11. Bland, Charles & Ramsey, Teresa & Sabree, Fareedah & Lowe, Micheal & Brown, Kyndall & Kyrpides, Nikos & Philip, Hugenholtz. (2007). CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC bioinformatics. 8. 209. 10.1186/1471-2105-8-209.
  12. Cui, Xuefeng & Lu, Zhiwu & Wang, Sheng & Wang, Jim & Gao, Xin. (2016). CMsearch: Simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics. 32. i332-i340. 10.1093/bioinformatics/btw271.
  13. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119
  14. Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019 Jul 26;7:e7359. doi: 10.7717/peerj.7359. PMID: 31388474; PMCID: PMC6662567.
  15. Yu-Wei Wu, Blake A. Simmons, Steven W. Singer, MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets, Bioinformatics, Volume 32, Issue 4, 15 February 2016, Pages 605–607, https://doi.org/10.1093/bioinformatics/btv638
  16. Sieber, C.M.K., Probst, A.J., Sharrar, A. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3, 836–843 (2018). https://doi.org/10.1038/s41564-018-0171-1
  17. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017 May;27(5):824-834. doi: 10.1101/gr.213959.116. Epub 2017 Mar 15. PMID: 28298430; PMCID: PMC5411777.
  18. Bolyen E, Rideout JR, Dillon MR, et al. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9
  19. Schloss PD et al. 2009. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology 75:7537–7541.
  20. BBMap - Bushnell B. - sourceforge.net/projects/bbmap/
  21. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017 May;27(5):824-834. doi: 10.1101/gr.213959.116. Epub 2017 Mar 15. PMID: 28298430; PMCID: PMC5411777.
  22. Li, D., Liu, C-M., Luo, R., Sadakane, K., and Lam, T-W., (2015) MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, doi: 10.1093/bioinformatics/btv033 [PMID: 25609793].
  23. lexandre Souvorov, Richa Agarwala and David J. Lipman. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biology 2018 19:153. doi.org/10.1186/s13059-018-1540-z
  24. Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020). Using SPAdes de novo assembler. Current Protocols in Bioinformatics, 70, e102. doi: 10.1002/cpbi.102
  25. Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using Repeat Graphs", Nature Biotechnology, 2019 doi:10.1038/s41587-019-0072-8
  26. Vaser R, Sovic I, Nagarajan N, Sikic M "Fast and accurate de novo genome assembly from long uncorrected reads." Genome Res. 2017 May;27(5):737-746. https://doi.org/10.1101/gr.214270.116
  27. Bruce J. Walker, Thomas Abeel, Terrance Shea, Margaret Priest, Amr Abouelliel, Sharadha Sakthikumar, Christina A. Cuomo, Qiandong Zeng, Jennifer Wortman, Sarah K. Young, Ashlee M. Earl (2014) Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE 9(11): e112963. doi:10.1371/journal.pone.0112963
  28. O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014
  29. Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, (2017).

Built with

image