How to run the DRAGEN MSI Pipeline
Microsatellites are genomic regions of short DNA motifs that are repeated 5-50 times and are associated with high mutation rates. Microsatellite Instability (MSI) results from deficiencies in the DNA mismatch repair pathway and can be used as a critical biomarker to predict immunotherapy responses in multiple tumor types.
DRAGEN MSI can work in 3 different modes determined by enabling the option --msi-command
Collect Evidence Mode (collect-evidence)
Tumor Normal Mode (tumor-normal)
Tumor Only Mode (tumor-only)
A Microsatellites site file that lists microsatellite sites of interest in a given reference genome is required to run this pipeline. The recommended tool for generating this file is msi-sensor; Information about this tool can be found here.
The scan command can be used to generate the microsatellites sites file.
msisensor-pro scan -d hg38.fa -o hg38_microsatellites_pro.tsv -p 1
options:
-d [string] reference genome sequences file, \*.fasta format
-o [string] output homopolymers and microsatellites file
-l [int] minimal homopolymer size, default=5
-c [int] context length, default=5
-m [int] maximal homopolymer size, default=50
-s [int] maximal length of microsate, default=5
-r [int] minimal repeat times of microsate, default=3
-p [int] output homopolymer only, 0: no; 1: yes, default=0
-h help
The '-p 1' flag is required to be added to the command to output only homopolymers; The MSI pipeline has only been benchmarked for hompolymers and cannot work with repeat regions of size > 100bp. Adding this flag should result in a compatible MSI sites file.
For WGS and WES samples, the tumor-normal mode is recommended. Here is an example command for a tumor-normal analysis.
dragen \
--msi-command tumor-normal \
--msi-coverage-threshold 60 \
--msi-microsatellites-file msi_file \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--enable-map-align=true \
--RGID=read_group_ID \
--RGSM=read_group_sample \
--ref-dir={reference_directory} \
--enable-map-align-output=true \
--enable-sort=true \
--enable-duplicate-marking=true \
--tumor-fastq1 {tumor_fq1} \
--tumor-fastq2 {tumor_fq2} \
--fastq-file1 {fq1} \
--fastq-file2 {fq2}
The tumor-only mode requires a panel of normals; this can be generated by running the MSI pipeline in the collect-evidence mode. The panel of normals is required to contain at least 20 normal samples (hard-coded requirement for running the tumor-only mode).
/dragen -f \
--ref-dir={reference_directory} \
--fastq-file1 {fq1} \
--fastq-file2 {fq2} \
--output-directory={output_directory} \
--enable-map-align=true \
--RGID=read_group_ID \
--RGSM=read_group_sample \
--output-file-prefix={prefix} \
--enable-map-align-output=true \
--enable-sort=true \
--enable-duplicate-marking=true \
--msi-command collect-evidence \
--msi-coverage-threshold 60 \
--msi-microsatellites-file msi_file
Once this is done, the MSI .dist files can be moved to a separate folder (normal_reference_directory) and be used as a part of the MSI pipeline in the tumor-only mode.
dragen \
--msi-command tumor-only \
--msi-coverage-threshold 60 \
--msi-microsatellites-file msi_file \
--msi-ref-normal-dir normal_reference_directory \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--enable-map-align=true \
--RGID=read_group_ID \
--RGSM=read_group_sample \
--ref-dir={reference_directory} \
--enable-map-align-output=true \
--enable-sort=true \
--enable-duplicate-marking=true \
--tumor-fastq1 {tumor_fq1} \
--tumor-fastq2 {tumor_fq2}
More information about the MSI pipeline can be found in the user guide.
For any feedback or questions regarding this article (Illumina Knowledge Article #7508), contact Illumina Technical Support techsupport@illumina.com.
Last updated