+ - 0:00:00
Notes for current slide
Notes for next slide

Genomics Data Repositories

Rockefeller University, Bioinformatics Resource Centre

1 / 14

Data Repositories

2 / 14

Getting hold of HTS data

  • From public repositories.
  • From collaborators.
  • By sequencing some of your own material!
3 / 14

Repositories for HTS


4 / 14

Public Repositories for HTS

  • Several public sources of HTS data exist.
  • First concentrating on those acting as repositories.
    • GEO (Gene Expression Omnibus).
    • ENA (European Nucleotide Database).
    • SRA (Short Read Archive).
5 / 14

Gene Expression Omnibus

igv

  • GEO holds different types of biological datasets.
  • Very popular for submission of data accompanying publication.
  • Captures metadata, processed files and raw data.
  • GEO was not built for HTS data.
6 / 14

Gene Expression Omnibus

7 / 14

Short Read Archive

  • SRA (www.ncbi.nlm.nih.gov/sra)

igv

  • NCBI's HTS specific repository.
  • Sequencing specific metadata.
  • Stores Raw data (in SRA format)
  • SRA format - requires SRA Toolkit
8 / 14

Short Read Archive

  • SRA (www.ncbi.nlm.nih.gov/sra)
9 / 14

European Nucleotide Archive

igv

  • ENA acts as a european HTS repository.
  • Mirrors much of SRA.
  • Stores Raw data
  • No SRA formats - fastq by default.
10 / 14

Other Repositories

igv

11 / 14

ENCODE Portal

ENCODE portal provides access to raw and processed/standardised results.

12 / 14

Repositories for processed data

igv igv

  • Other specialist repositories exist.
  • ReCount2 database provides standardised counts for user analysis.
  • Other databases like Immgen/Bodymap/expression atlas provide RNAseq for specific cells/tissues.
13 / 14

Reference data

  • Reference Genome available from many locations.
  • Different assemblies.
    • Major Revisisons - Change locations.
    • Minor Revisions - Update annotation.
  • Genome sequence stored as FASTA.
  • Gene build as GFF3 or GTF.
  • IGenomes contains full annotation files for many genomes.
  • UCSC GenArk contains annotation for model and non model organisms.
14 / 14

Data Repositories

2 / 14
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow