VCF Quality Control¶
Overview¶
To evaluate the quality of a VCF file, different metrics are calculated using granite qcVCF
.
The software calculates both sample-based, as well as, family-based metrics.
The metrics currently available for sample are:
- variant types distribution
- base substitutions
- transition-transversion ratio
- heterozygosity ratio
- depth of coverage (GATK)
- depth of coverage (raw)
The metrics currently available for family are:
- mendelian errors in trio
Definitions¶
variant types distribution¶
Total number of variants classified by type as:
- DELetion (ACTG>A or ACTG>*)
- INSertion (A>ACTG or *>ACTG)
- Single-Nucleotide Variant (A>T)
- Multi-Allelic Variant (A>T,C)
- Multi-Nucleotide Variant (AA>TT)
base substitutions¶
Total number of SNVs classified by the type of substitution (e.g. C>T).
transition-transversion ratio¶
Ratio of transitions to transversions in SNVs. It is expected to be [2, 2.20] for WGS and [2.6, 3.3] for WES.
heterozygosity ratio¶
Ratio of heterozygous to alternate homozygous variants. It is expected to be [1.5, 2.5] for WGS analysis. Heterozygous and alternate homozygous sites are counted by variant type.
depth of coverage¶
Average depth of all variant sites called in the sample.
Depth of coverage (GATK) is calculated based on DP values as assigned by GATK. Depth of coverage (raw) is calculated based on raw read counts calculated directly from the bam file.
mendelian errors in trio¶
Variant sites in proband that are not consistent with mendelian inheritance rules based on parent genotypes. Mendelian errors are counted by variant type and classified based on genotype combinations in trio as:
Proband | Father | Mother | Type |
---|---|---|---|
0/1 | 0/0 | 0/0 | de novo |
0/1 | 1/1 | 1/1 | error |
1/1 | 0/0 | (any) | error |
1/1 | (any) | 0/0 | error |
1/1 | 0/1 | ./. | (any) | missing in parent |
1/1 | 0/1 | (any) | ./. | missing in parent |