🌐

External Data

Download Public Data Files

image

Download publicly available Short-Read Archive (SRA), Gene Expression Omnibus (GEO), or Recount3 data from their respective databases, files from URL, or gene sequences.

Version 1.0.1

Use Cases

Access and download data from a variety of sources

  • Short Read Archive (SRA) data from the NCBI and EMBL
  • Gene Expression Omnibus (GEO) data from the NCBI
  • Recount3 Data: a public RNASeq project of human and mouse samples

Summary and Methods

This workflow is designed to help the user download data from a variety of sources, including Short Read Archive (SRA) data, gene sequences from supported genomes, the Gene Expression Omnibus (GEO), and Recount3. Click the toggles below to learn more about how the workflow accesses data from each source.

‣

Short Read Archive (SRA) Data

Summary

This workflow is designed to help the user download Short Read Archive (SRA) data from the NCBI and EMBL. The user will provide a list of SRA IDs to retrieve and will receive the associated SRA data.

Methods

This analysis was performed using the Download Data Files workflow on the Form Bio platform. The user provided a list of Short Read Archive ID numbers as a file input. The workflow retrieved any associated SRA data from the NCBI and EMBL as FastQ files.

‣

Gene Expression Omnibus (GEO)

Summary

This workflow is designed to help the user download Gene Expression Omnibus (GEO) data from the NCBI. The user will provide as input a list of GEO IDs to retrieve. The user will retrieve any information associated with the input GEO IDs.

Methods

This analysis was performed using the Download Data Files workflow on the Form Bio platform. The user provided a list of Gene Expression Omnibus (GEO) IDs as an input file. The workflow returned any information associated with the input GEO IDs as FastQ files.

‣

Recount3: Gene Expression Data in Mouse/Human

Summary

This workflow is designed to help the user access data from the Recount3 project. The user will provide as input a list of Recount3 project IDs. The user will receive as output any data associated with the input Recount3 project IDs.

Methods

This analysis was performed using the Download Data Files workflow on the Form Bio platform. The user provided a list of Recount3 project IDs as an input file. The workflow then returned any information associated with the input IDs as a FastQ file.

‣

Inputs

  • Run Name
    • This is a unique name for each run of pipelines in your project
  • SRAList
    • File with SRA Run, Sample or Project IDs: SRR/ERR, SAM, PRJ
  • File with GEO Sample ides: GSM
  • Recount3 Project ID
‣

Outputs

  • For SRA FastQ Files, 1 per RunID
    • RunID.fastq.gz
    • Sample to Run IDs
    • RunID
      SampleID
      SRR994739
      SAMEA9454349
      SRR994740
      SAMEA9454349
      SRR994741
      SAMEA9454341
      SRR994742
      SAMEA9454348
      SRR994743
      SAMEA9454348
      SRR994744
      SAMEA9454342
  • For Recount3 and GEO, there are a variety of files available depending on sample/project.
‣

Workflow Walkthrough

  1. Navigate to the Download Public Data Files workflow. You can use the search bar at the top-right corner to find the workflow, or use the External Data or Power Tools filter on the left-hand side.
  2. Select version from the dropdown menu in the top right corner. When ready to begin analysis, click the “Run Workflow” button.
  3. image
  4. To start, select the type of data you wish to retrieve - either Short Read Archive (SRA) data, Gene Omnibus Expression (GEO) data, or Recount3 data. Depending on the resource you wish to access, you will be asked to provide a file containing the IDs of the data to retrieve - SRA IDs, GEO IDs, or the Recount3 project ID respectively.
  5. image
  6. Give the workflow a unique name, and review the workflow inputs and parameters. When ready to submit, click “Run Workflow”.
  7. image
‣

Results Walkthrough

  1. To begin, find the workflow run in the Activity tab. Select your workflow.
  2. On this page, you can view an array of information about your workflow run. To find your downloaded files, select the Files tab.
  3. image
  4. In the Files tab, select the Output folder to view all output files.
  5. image
  6. Alternately, you can find retrieved files in the pipeline-outputs folder.
  7. image
  8. Once in the folder, select getncbi.
  9. image
  10. Find the folder corresponding to your workflow run, then select output to view files.

Built with

image
image