In this exercise, we will practice how to manipulate VCF files. Please find this VCF file “data/SAMN01882168_filt.vcf.gz” and use it to answer the following questions.
Read in the VCF file and make a VRange object
Please extract the genotype field and explain the abbreviations
## DataFrame with 10 rows and 3 columns
## Number Type Description
## <character> <character> <character>
## GT 1 String Genotype
## AD R Integer Allelic depths for t..
## DP 1 Integer Approximate read dep..
## GQ 1 Integer Genotype Quality
## MIN_DP 1 Integer Minimum DP observed ..
## PGT 1 String Physical phasing hap..
## PID 1 String Physical phasing ID ..
## PL G Integer Normalized, Phred-sc..
## RGQ 1 Integer Unconditional refere..
## SB 4 Integer Per-sample component..
Subset to just the variants on Chr21.
Extract GT information from VCF subset and make a barchart to describe the variant number in each genotype.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 13.00 19.00 25.05 30.00 629.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 99.00 99.00 91.32 99.00 99.00
## Warning: Transformation introduced infinite values in continuous x-axis
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 12 rows containing non-finite values (stat_bin).
## variant chr start end refBase altBase refCount altCount
## 1 chr21:9412358_TA/T chr21 9412358 9412359 TA T 6 4
## 2 chr21:9413584_G/A chr21 9413584 9413584 G A 3 8
## genoType gtQuality
## 1 0/1 81
## 2 0/1 62
##
## DEL INS SNP
## 612 482 9536
## variant chr start end refBase altBase refCount altCount
## 1 chr21:9412358_TA/T chr21 9412358 9412359 TA T 6 4
## 2 chr21:9413584_G/A chr21 9413584 9413584 G A 3 8
## genoType gtQuality mutType nuSub TiTv
## 1 0/1 81 DEL TA>T <NA>
## 2 0/1 62 SNP G>A Ti