As we have seen earlier, genomic intervals are often stored in BED format.
In Genomics and High Throughput sequencing we make extensive use of Genomic Intervals to store simple annotation and summarised results over regions.
We also want to be able to assess spatial relation between such genomic intervals within our linear genome.
class: inverse, center, middle
Two popular Bioconductor packages for dealing with Genomics Intervals are:
The first package we will look at is the GenomicRanges package.
Remember we can install Bioconductor packages (fairly) easily using Bioconductor package commands.
::install(version = "3.10")
BiocManager::install("GenomicRanges") BiocManager
Now we have the package installed, we can load the library.
library(GenomicRanges)
class: inverse, center, middle
To construct a GenomicRanges object we require a set of genomic intervals.
Intervals (genomic and non genomic intervals) are created using the IRanges() function. We must specify start and end positions as numeric vectors to create our IRanges object.
<- IRanges(start=c(1,10,20),end=c(2,11,30))
myIntervals class(myIntervals)
## [1] "IRanges"
## attr(,"package")
## [1] "IRanges"
The resulting IRanges object contains our intervals and an additional column describing widths of intervals.
myIntervals
## IRanges object with 3 ranges and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
## [1] 1 2 2
## [2] 10 11 2
## [3] 20 30 11
Now to create a Genomic Ranges interval object or GRanges object. We extend our IRanges object to also include chrosomosome/contig information.
To do this we can supply our new IRanges object and a vector of corresponding chromosome locations to the seqnames parameter in GRanges() function.
<- GRanges(seqnames=c("chr1","chr1","chr2"),
myGenomicIntervals
myIntervals)class(myGenomicIntervals)
## [1] "GRanges"
## attr(,"package")
## [1] "GenomicRanges"
The created GRanges object contains our original intervals from the IRanges object and now the chromosomes/contigs on which the intervals exist.
myGenomicIntervals
## GRanges object with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-2 *
## [2] chr1 10-11 *
## [3] chr2 20-30 *
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
We can retrieve genomic intervals from our GRanges using indexing as with vectors, matrices and dataframes.
c(2,3),] myGenomicIntervals[
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 10-11 *
## [2] chr2 20-30 *
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
Unlike data frames and matrices, we dont need the second “,” to specify rows not columns.
c(2,3)] myGenomicIntervals[
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 10-11 *
## [2] chr2 20-30 *
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
To access the columns of a GRanges object we must use special accessor functions - seqnames(),start(), end() which correspond to the columns we wish to retrieve.
We get start and stop positions using the start() and end() functions.
start(myGenomicIntervals)
## [1] 1 10 20
end(myGenomicIntervals)
## [1] 2 11 30
We can use the start() and end() functions to update our start and end positions as well.
start(myGenomicIntervals) <- c(1,5,15)
start(myGenomicIntervals)
## [1] 1 5 15
We can not however create intervals with negative widths. Our start positions cannot be greater than our end positions.
start(myGenomicIntervals) <- c(10,50,150)
## Error in validObject(x): invalid class "IRanges" object:
## 'width(x)' cannot contain negative integers
We can retrieve the chromosome names from GenomicRanges in a similar manner to start() and end() functions using the seqnames() function.
<- seqnames(myGenomicIntervals)
contigNames contigNames
## factor-Rle of length 3 with 2 runs
## Lengths: 2 1
## Values : chr1 chr2
## Levels(2): chr1 chr2
The contig names are stored in another special object called a RLE (run length encoded). This RLE object saves memory and allows for more efficient manipulation of long stretches of repeated names.
We can convert to a character vector using the as.character() function
as.character(contigNames)
## [1] "chr1" "chr1" "chr2"
As with the start() and end() functions we can update our contig names using the seqnames() function.
seqnames(myGenomicIntervals) <- c("chr2","chr2","chr1")
seqnames(myGenomicIntervals)
## factor-Rle of length 3 with 2 runs
## Lengths: 2 1
## Values : chr2 chr1
## Levels(2): chr1 chr2
As with all factors however, we can not add contigs not already in use in seqnames column.
seqnames(myGenomicIntervals) <- c("chr3","chr2","chr1")
## Error in .normalize_seqnames_replacement_value(value, x): levels of supplied 'seqnames' must be identical to 'seqlevels(x)'
To add new contig we must update the seqlevels, like levels for factors but now for contigs in GRanges objects.
The seqlevels() identifies which contigs we are using.
seqlevels(myGenomicIntervals)
## [1] "chr1" "chr2"
We can update the seqlevels in use by again using the seqlevels() function.
seqlevels(myGenomicIntervals) <- c("chr1","chr2","chr3")
seqlevels(myGenomicIntervals)
## [1] "chr1" "chr2" "chr3"
Now we can add contig for chromosome 3 (chr3).
seqnames(myGenomicIntervals) <- c("chr1","chr2","chr3")
myGenomicIntervals
## GRanges object with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-2 *
## [2] chr2 5-11 *
## [3] chr3 15-30 *
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths
We can include strand information in GRanges objects.
We can set strand to be “+” , “-” , or unstranded with "*".
<- GRanges(seqnames=c("chr1","chr1","chr2"),
myGenomicIntervals
myIntervals,strand=c("+","-","*"))
myGenomicIntervals
## GRanges object with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-2 +
## [2] chr1 10-11 -
## [3] chr2 20-30 *
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
Similarly we can extract and set strand information using the strand() function.
strand(myGenomicIntervals) <- c("+","+","-")
strand(myGenomicIntervals)
## factor-Rle of length 3 with 2 runs
## Lengths: 2 1
## Values : + -
## Levels(3): + - *
The strand information is also stored as a factor RLE. This contains the levels “+”, “-”, "*" and only these levels.
<- GRanges(seqnames=c("chr1","chr1","chr2"),
myGenomicIntervals
myIntervals,strand=c("+","+","+"))
strand(myGenomicIntervals)
## factor-Rle of length 3 with 1 run
## Lengths: 3
## Values : +
## Levels(3): + - *
Finally we can set and extract names for our genomic intervals using the names() function.
names(myGenomicIntervals) <- c("peak1","peak2","peak3")
myGenomicIntervals
## GRanges object with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## peak1 chr1 1-2 +
## peak2 chr1 10-11 +
## peak3 chr2 20-30 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
As with vectors we can use to names to index and retrieve genomic intervals of interest.
"peak2"] myGenomicIntervals[
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## peak2 chr1 10-11 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
We will often want to store more information on our genomic intervals. We can use the mcols (metadata columns) to store this information.
To add additional metadata columns we can simply supply the desired column name and the values we want to the GRanges() function as with data frames.
<- IRanges(start=c(1,2,2),end=c(2,11,30))
myIntervals <- GRanges(seqnames=c("chr1","chr1","chr2"),
myGenomicIntervals strand=c("+","+","+"),
myIntervals,Score=c(10,20,40),
Comment=c("GoodQC","GoodQC","BadQC"))
We can retrieve this metadata column information as with data frames by using the $ accessor
$Score myGenomicIntervals
## [1] 10 20 40
$Comment myGenomicIntervals
## [1] "GoodQC" "GoodQC" "BadQC"
We to retrieve all metadata columns as a data frames we can use the mcols() function
mcols(myGenomicIntervals)
## DataFrame with 3 rows and 2 columns
## Score Comment
## <numeric> <character>
## 1 10 GoodQC
## 2 20 GoodQC
## 3 40 BadQC
We can convert the entire GRanges to a data.frame using the as.data.frame() function.
as.data.frame(myGenomicIntervals)
## seqnames start end width strand Score Comment
## 1 chr1 1 2 2 + 10 GoodQC
## 2 chr1 2 11 10 + 20 GoodQC
## 3 chr2 2 30 29 + 40 BadQC
The GenomicRanges package also has a quick short cut to create GRanges from character strings.
We simply specify a genomic range in a similar format to that used by IGV - Chromosome:Start-End (chr1:110-120).
GRanges("chr1:110-120")
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 110-120 *
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Often genomics programs will export results in table with a similar format for the GenomicIntervals. The BRC pipeline outputs in a format of Chromosome:Start:End (chr1:110:120) to allow separation into columns in programs such as excel.
To convert the formats we can use some string manipulation tools. In the BRC example we will need to replace last “:” with a “-”. We can use the stringi package’s stri_replace_last_fixed function.
library(stringi)
<- "chr1:110:120"
myRange <- stri_replace_last_fixed(myRange, ':', '-')
newRange newRange
## [1] "chr1:110-120"
Another common format is IDforRange:Chromosome:Start:End. We can clean this up in additional step using another stringi function.
library(stringi)
<- "MyID:chr1:110:120"
myRange <- stri_replace_last_fixed(myRange, ':', '-')
newRange <- stri_replace_first_regex(newRange, '\\w*:', '')
newerRange newerRange
## [1] "chr1:110-120"
GRanges(newerRange)
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 110-120 *
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
One of the big advantage of using GRanges objects to handle genomic data is the set of functions which allows us to manipulate and handle genomic intervals.
The include functions to resize, merge/reduce and shift genomic ranges.
To adjust the postion of a GRanges object we can manually adjust start and end positions, but GenomicRanges provides some more user friendly methods.
The shift function allows us to move by a specified number of basepairs.
1] myGenomicIntervals[
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr1 1-2 + | 10 GoodQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
shift(myGenomicIntervals[1],shift = 10)
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr1 11-12 + | 10 GoodQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
The resize() function allows to adjust the width of regions.
3] myGenomicIntervals[
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr2 2-30 + | 40 BadQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
resize(myGenomicIntervals[3], width=20)
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr2 2-21 + | 40 BadQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
By default the resize() function adjust the width of regions from the start of the region. We can control this behavior by specifying the fix parameter as start, end or center to specify where we wish to resize from.
3] myGenomicIntervals[
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr2 2-30 + | 40 BadQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
resize(myGenomicIntervals[3], width=20, fix="end")
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr2 11-30 + | 40 BadQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
The resize() function by default takes into account strand information.
For negative strand intervals, when specifying fix=start it considers the start as the end position i.e. Useful for Gene positions.
resize(myGenomicIntervals[3],width=20, fix="start")
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr2 2-21 + | 40 BadQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
strand(myGenomicIntervals)[3] <- "-"
resize(myGenomicIntervals[3],width=20, fix="start")
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | Score Comment
## <Rle> <IRanges> <Rle> | <numeric> <character>
## [1] chr2 11-30 - | 40 BadQC
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
A final function for adjusting genomic ranges which we will review is the reduce() function.
The reduce function merges overlapping regions into a single genomic interval.
First lets make some overlapping intervals
<- GRanges(seqnames=c("chr1","chr1","chr2"),
myGenomicIntervals ranges=IRanges(start=c(1,2,2),end=c(2,11,30)),
strand=c("+","+","+"))
myGenomicIntervals
## GRanges object with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-2 +
## [2] chr1 2-11 +
## [3] chr2 2-30 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
Now we can merge them with reduce() function taking into account chromosomes.
<- reduce(myGenomicIntervals)
mergedGenomicIntervals mergedGenomicIntervals
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-11 +
## [2] chr2 2-30 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
The reduce() function also takes into account the strand information and only merges genomic intervals on same strand.
strand(myGenomicIntervals) <- c("+","-","+")
<- reduce(myGenomicIntervals)
mergedGenomicIntervals mergedGenomicIntervals
## GRanges object with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-2 +
## [2] chr1 2-11 -
## [3] chr2 2-30 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
We can merge intervals across strands by setting ignore.strand=TRUE. Note that all regions are now considered strandless.
strand(myGenomicIntervals) <- c("+","-","+")
<- reduce(myGenomicIntervals,
mergedGenomicIntervals ignore.strand=TRUE)
mergedGenomicIntervals
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-11 *
## [2] chr2 2-30 *
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
Some particularly useful methods/operators in GenomicRanges allows us to assess the relationship between two sets of GRanges objects.
One of the most common is the %over% operator which allows us to identify overlapping regions.
First we make two GRanges objects.
<- GRanges(seqnames=c("chr1","chr1"),
myGenomicIntervals1 ranges=IRanges(start=c(1,25),
end=c(20,30)),
strand=c("+","+"))
<- GRanges(seqnames=c("chr1","chr1"),
myGenomicIntervals2 ranges=IRanges(start=c(22,100),
end=c(27,130)),
strand=c("+","+"))
myGenomicIntervals1
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-20 +
## [2] chr1 25-30 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
myGenomicIntervals2
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 22-27 +
## [2] chr1 100-130 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Now to find all regions in myGenomicIntervals1 which overlap with intervals in myGenomicIntervals2 we simply use the %over% operator.
This will return a logical vector specifying if regions in myGenomicIntervals1 overlap any regions in myGenomicIntervals2.
%over% myGenomicIntervals2 myGenomicIntervals1
## [1] FALSE TRUE
We can use this logical vector then to subset our myGenomicIntervals1 set as we have with vectors.
%over% myGenomicIntervals2] myGenomicIntervals1[myGenomicIntervals1
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 25-30 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Another useful function to find overlaps is the findOverlaps() function.
This returns a new object Hits object.
<- findOverlaps(myGenomicIntervals1,myGenomicIntervals2)
myOverlaps class(myOverlaps)
## [1] "SortedByQueryHits"
## attr(,"package")
## [1] "S4Vectors"
myOverlaps
## Hits object with 1 hit and 0 metadata columns:
## queryHits subjectHits
## <integer> <integer>
## [1] 2 1
## -------
## queryLength: 2 / subjectLength: 2
To extract the indices of genomic intervals in myGenomicIntervals1 overlapping myGenomicIntervals2 we can use the queryHits() function.
To get the corresponding indicies of overlapping myGenomicIntervals2 regions we can use the subjectHits() function.
queryHits(myOverlaps)
## [1] 2
subjectHits(myOverlaps)
## [1] 1
We can use these indices to retrieve the overlapping intervals.
queryHits(myOverlaps)] myGenomicIntervals1[
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 25-30 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
subjectHits(myOverlaps)] myGenomicIntervals2[
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 22-27 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Another common set of operations we often want to perform between our Genomic Intervals is to find regions which are nearest to, precede or follow each other.
We can perform these actions using the nearest, precede and follow functions.
<- GRanges(seqnames=c("chr1","chr1"),
myGenomicIntervals1 ranges=IRanges(start=c(10,20),
end=c(25,30)),
strand=c("+","+"))
<- GRanges(seqnames=c("chr1","chr1"),
myGenomicIntervals2 ranges=IRanges(start=c(1,10000),
end=c(2,10002)),
strand=c("+","+"))
myGenomicIntervals1
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 10-25 +
## [2] chr1 20-30 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
myGenomicIntervals2
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-2 +
## [2] chr1 10000-10002 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
The nearest() function accepts the two GenomicRanges we wish to compare as arguments and returns the index of intervals in second GenomicRanges nearest to those in first GenomicRanges.
<- nearest(myGenomicIntervals1,myGenomicIntervals2)
indexOfNearest indexOfNearest
## [1] 1 1
I can use this vector of index positions to then extract intervals in second GenomicRanges nearest to those in first GenomicRanges.
myGenomicIntervals2[indexOfNearest]
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-2 +
## [2] chr1 1-2 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
We can use the precede() and follow() functions in much the same way to the nearest() function. This returns the index of elements in second GenomicRanges argument that are preceeded or followed by elements in first GenomicRanges argument.
<- precede(myGenomicIntervals1,myGenomicIntervals2)
precedeIndex <- follow(myGenomicIntervals1,myGenomicIntervals2)
followIndex myGenomicIntervals2[precedeIndex]
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 10000-10002 +
## [2] chr1 10000-10002 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Another useful function is distanceToNearest() which again accepts arguments like nearest() or findOverlaps() functions and provides an object containing indexes of nearest intervals including a column of distances between nearest intervals.
<- distanceToNearest(myGenomicIntervals1,myGenomicIntervals2)
distances distances
## Hits object with 2 hits and 1 metadata column:
## queryHits subjectHits | distance
## <integer> <integer> | <integer>
## [1] 1 1 | 7
## [2] 2 1 | 17
## -------
## queryLength: 2 / subjectLength: 2
mcols(distances)
## DataFrame with 2 rows and 1 column
## distance
## <integer>
## 1 7
## 2 17
As we have seen, Genomic Intervals are often stores as BED files.
The rtracklayer package provide mechanisms to import and export from BED3/6 files using the import.bed and export.bed functions.
library(rtracklayer)
<- import.bed(con="data/SicerPeaks.bed")
mySicerPeaks mySicerPeaks
## GRanges object with 3 ranges and 2 metadata columns:
## seqnames ranges strand | name score
## <Rle> <IRanges> <Rle> | <character> <numeric>
## [1] chrY 90739201-90744599 * | peak1 1
## [2] chrY 90766601-90767999 * | peak2 20
## [3] chrX 6399201-6401199 * | peak3 40
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
Exporting to BED is achieved by using the export.bed() function.
This produces a valid BED file for use in IGV and other softwares.
export.bed(mySicerPeaks, con="moreSicerPeaks.bed")
We saw how we can extract sequences from a BSgenome object previously using subseq() function.
library(BSgenome.Mmusculus.UCSC.mm10)
subseq(BSgenome.Mmusculus.UCSC.mm10$chr10,1,100)
## 100-letter DNAString object
## seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
With GRanges objects we can specify the BSgenome for the sequence and GRanges as regions to be extracted.
Here we can use the getSeq() function to extract sequences under our Sicer Peaks on the mm10 genome.
<- getSeq(BSgenome.Mmusculus.UCSC.mm10, names=mySicerPeaks)
sicerSeq sicerSeq
## DNAStringSet object of length 3:
## width seq
## [1] 5399 AACAGGGCCTGACTGCGTACAGACATGTCCCCCT...GCCCCGGAGATCGGCGCTGGGCCCTAGCCTAGA
## [2] 1399 CATTCACCCACAGTGATCCCCCCGCCTCAGAAAC...TCAGAAAATCACTCACAGTCATCCCCCTGCCTC
## [3] 1999 TTGGCCAGCTTATCAGTGGGTGTTCTTTCGAGTA...CAAGATTGCCAGGTGGAGGGACTTCATGAACTT
The resulting DNAStringSet may be written to a FASTA file for further analysis in external programs.
writeXStringSet(sicerSeq,"sicerSeq.fa")