How to use a custom cross contamination VCF file with DRAGEN
The DRAGEN cross-sample contamination module uses a probabilistic mixture model to estimate the fraction of reads in a sample that may be from another human source. DRAGEN supports separate modes for germline and somatic samples. Currently, contamination estimation is supported for WGS, WES and large panels (10 Mb or more).
The cross-contamination metric is enabled by including one of the following flags along with a compatible VCF: --qc-cross-cont-vcf (Germline) or --qc-somatic-contam-vcf (Somatic). For the germline mode, such a VCF would include marker sites (RSIDs) with population allele frequencies that are close to 0.5.
Pre-built contamination VCF files for different human references can be found at /opt/edico/config -
[xxxx@localhost config]$ cd /opt/edico/config
[xxxx@localhost config]$ ls -l \contamination\
-rw-r--r-- 1 root root 472015 May 6 2022 sample_cross_contamination_resource_GRCh37.vcf.gz
-rw-r--r-- 1 root root 477335 May 6 2022 sample_cross_contamination_resource_hg19.vcf.gz
-rw-r--r-- 1 root root 481607 May 6 2022 sample_cross_contamination_resource_hg38.vcf.gz
-rw-r--r-- 1 root root 1134475 May 6 2022 somatic_sample_cross_contamination_resource_GRCh37.vcf.gz
-rw-r--r-- 1 root root 1149422 May 6 2022 somatic_sample_cross_contamination_resource_hg19.vcf.gz
-rw-r--r-- 1 root root 1046203 May 6 2022 somatic_sample_cross_contamination_resource_hg38.vcf.gz
If a custom VCF is to be used, it is crucial to ensure that the order of contigs in this VCF (both header and the variant calls) should mirror the contig order in the reference used for analysis. The order of contigs used in the reference can be found in the hash_table.cfg file inside the hash table folder. It is not necessary to include all contigs from the reference in this contamination VCF, however, the variant calls should be sorted in the same order as the reference.
Using a VCF that is sorted incorrectly can result in a WatchDog timeout error in DRAGEN v3.x (before v4.0.3). The following segmentation fault is thrown in DRAGEN v4.0.3 -
Invalid line:
Please check the validity of the contamination reference VCF and ensure the contigs are consistent with the reference.
FATAL: Caught signal Segmentation fault (11)
More information about the cross-contamination module can be found in the DRAGEN user guide (under the Mapping and Aligning Metrics section).
For any feedback or questions regarding this article (Illumina Knowledge Article #7495), contact Illumina Technical Support techsupport@illumina.com.
Last updated