Reference genome and target regions BED file for the DRAGEN PopGen app
Last updated
Last updated
© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html
The DRAGEN PopGen app in BaseSpace runs the gVCF genotyper--it uses a set of single sample or multisample gVCFs as input and returns a multisample VCF that has an entry for each variant seen in any of the input VCFs/gVCFs. This pipeline does not adjust the genotypes based on population information.
Reference Genome
The app allows the use of hg38, hg19, Ensembl GRCh37, Ensembl hs37d5, or a custom reference as input. Note that all input gVCFs must have been generated using the same reference build that is selected in the app. The in-built options refer to the alt-aware, graph-based options for these genomes. The use of a different option of the same base genome is supported as long as the total number of contigs match between the selected reference and the reference used to generate the gVCFs (the contigs are listed in the gVCF header). For instance, using gVCFs generated with the hg38 alt-aware reference (non-graph based) with hg38 as the reference in the PopGen app results in the following error.
Fatal exception: Exception thrown in [location] line [number] -- Different numbers of contigs in FASTA reference genome ([number] contigs) and VCF [file name].vcf.gz([number] contigs)
However, the use of gVCFs generated with a graph-based, alt-masked option of a genome with one of the default references in the app is supported, as both have the same number of contigs.
Another option is to use a custom reference. Note that the app requires a FASTA file as input, not a hash table.
Target Regions BED file
The app allows the selection of a BED file (.bed or .bed.gz) that defines target regions in which to restrict joint analysis. Contig names in the BED must match those of the chosen reference and the BED must only have three columns: Contig, Starting Coordinate, and Ending Coordinate (in that order). Also, BED files for standard kits (used for variant calling or calculating coverage over custom regions) may not be a valid input, as the app expects the entries in the BED file to be discrete/non-overlapping. Using a BED file without non-overlapping entries results in the following error.
~[error] Incorrectly ordered or overlapping regions at: chr[number]:[start coordinate]-[stop coordinate] (last region was chr[number]:[start coordinate]-[stop coordinate])
If needed, convert a standard BED to one with discrete intervals is using the bedtools merge command.
bedtools merge -i [BED file] [Merged BED]
For any feedback or questions regarding this article (Illumina Knowledge Article #7889), contact Illumina Technical Support . |