DRAGEN Imputation App v 4.0.3 on BaseSpace

Link to BaseSpace Sequence Hub (BSSH) App

Demo Project

DRAGEN v.4.0 extends the capabilities to infer bi-allelic single nucleotide polymorphisms (SNP) variants from low-coverage sequencing samples by packaging the GLIMPSE software (2020, Olivier Delaneau & Simone Rubinacci). The DRAGEN implementation of the GLIMPSE software allows for scalability of variant imputation by adding the following features:

  • End-to-end pipeline, where the 3 phases of the GLIMPSE software (Chunk, Phase, and Ligate) get executed by a single command, on one chromosome or multiple chromosomes.

  • Software supported acceleration.

Input parameters

App modes

  • Germline variant calling (hg38) + Imputation: starting from FASTQ files.

  • Imputation only: starting from variant call format (VCFs).

Input files: VCFs or FASTQs

For VCFs: These VCFs are expected to contain all the variants in the reference panel. Supports up to 100 single-sample VCF or one multi-sample VCF.

The sample(s) to be imputed must have the following format:

  • VCF, multi-sample VCF, BCF or multi-sample BCF (zip or unzipped). Genome variant call format (gVCF) is not supported.

  • Must contain GL (Genotype Likelihoods) or PL (phred-scaled genotype likelihoods) information.

Reference panel:

A per-chromosome reference panel in VCF or BCF format that lists all the imputation positions in the targeted regions along with the corresponding haplotypes must be provided.

  • IRPv1 is an autosomal SNP reference panel containing the 2504 samples from the 1000 Genomes Project, which have been variant called from the ~50x NYGC data using DRAGEN 3.7.6 against hg38. Singleton SNPs and SNPs observed to be out of HWE have been excluded. For SNPs where more than one alternative allele was observed, only the most frequently observed alternative allele was retained. SNPs were phased using SHAPEIT4.

    • Note: chrX, chrY, and chrM are not supported in the IRPv1 reference panel. If a different reference panel is provided then PAR and non-PAR regions need to be treated as different chromosomes.

  • This app also supports using a custom hg38 reference panel. Select 'Custom' from the menu, and then use the Custom Reference Penal control to select the custom reference panel.

Custom reference panel: Select an hg38 reference panel tar or tar.gz to be used for analysis (only available when "Custom" is selected in the "Reference Panel" drop-down list). To upload a new reference panel, navigate to a Project and select the "Import" feature or use the BaseSpace Command-line Interface (CLI) to upload.

Reference panel can be downloaded here.

Optionally select a custom hg38 BED file to use for imputation. Contigs presented in this BED file will be imputed. If not selected, will impute all positions in the reference panel. To upload a new BED file, navigate to a Project and select the "Import" feature or use the BaseSpace CLI to upload.

Additional settings

Output files (dragen_imputation folder)

The VCF imputation tool generates a couple outputs:

  • Imputed variant file with concatenated imputed variants: one VCF or msVCF file which contains all the specified regions/chromosomes with name .impute.vcf.gz

Intermediate files:

  • Chunk regions to be passed along to the internal phase step with the name .impute.chunk.out.txt.

  • Imputed variants per chunks identified: VCF or msVCF file based on the input sample format with the name _chr_start-end.impute.phase.vcf.gz.

  • A text file with the path to all the _chr_start-end.impute.phase.vcf.gz files generated with the name .impute.phase.out.txt

Known limitations:

  • Support hg38 reference only.

  • Support up to 100 samples.

  • Input VCF files are expected to be force genotyped for all variants presented in the reference panel.

  • Only can impute diploid regions.

Default command in BSSH app

/opt/edico/bin/dragen\

--enable-imputation true\

--imputation-chunk-input-region-list /NzCbis/dragen_imputation/temp_dir/reference_chr_names.txt\

--imputation-ref-panel-dir /NzCbis/reference_panel/IRPv1\

--imputation-ref-panel-prefix IRPv1\

--imputation-phase-input-list /NzCbis/dragen_imputation/temp_dir/input_vcfs.sorted.txt\

--imputation-genome-map-dir /app/genetic_maps/b38\

--imputation-phase-filter-input-sample-in-ref false\

--output-directory /NzCbis/dragen_imputation\

--output-file-prefix input_vcfs

Reference: DRAGEN User guide v4.0

For any feedback or questions regarding this article (Illumina Knowledge Article #7061), contact Illumina Technical Support techsupport@illumina.com.

Last updated

© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html