How to run the DRAGEN PopGen app on BaseSpace?
The DRAGEN PopGen app on BaseSpace runs the gVCF genotyper; it uses a set of single sample or multisample gVCFs as input and returns a multisample VCF that has an entry for each variant seen in any of the input VCFs/gVCFs. This pipeline does not adjust the genotypes based on population information.
Workflow
Generate single sample/multi-sample VCFs/gVCFs: use an app such as DRAGEN Germline can for this.
Launch the DRAGEN PopGen app using the output from step 1 - the reference used in both steps should match.
DRAGEN PopGen outputs a multisample VCF with one sample column per input sample.
Inputs
Input to the app can be provided in one of these three forms.
Small variant gVCF files: directly select input gVCF files; up to 99 files can be selected.
Datasets containing gVCFs: this can be datasets output by applications, datasets created during uploads etc; up to 100 datasets can be selected.
Project input: select projects that have gVCFs; up to 100 projects can be selected.
For the Datasets and Project inputs, the app, by default, searches for files that end with '.hard-filtered.gvcf.gz'--this behavior can be changed in the Advanced Settings section using the 'Filename Selection Suffix' parameter.
Output
The app outputs a multisample gVCF that has an entry for each variant in any of the input gVCFs.
Limitations
The app currently supports up to around 800 full-size DRAGEN WGS gVCFs. However, the app uses an instance with 5TB disk space; this corresponds to around 800 gVCFs that are ~4GB in size. If the gVCFs are larger in size, then the number of input gVCFs would need to be lowered accordingly in order to use the BaseSpace app.
For any feedback or questions regarding this article (Illumina Knowledge Article #7517), contact Illumina Technical Support techsupport@illumina.com.
Last updated