How to run the DRAGEN MSI Pipeline

Microsatellites are genomic regions of short DNA motifs that are repeated 5-50 times and are associated with high mutation rates. Microsatellite Instability (MSI) results from deficiencies in the DNA mismatch repair pathway and can be used as a critical biomarker to predict immunotherapy responses in multiple tumor types.

DRAGEN MSI can work in 3 different modes determined by enabling the option --msi-command

  1. Collect Evidence Mode (collect-evidence)

  2. Tumor Normal Mode (tumor-normal)

  3. Tumor Only Mode (tumor-only)

A Microsatellites site file that lists microsatellite sites of interest in a given reference genome is required to run this pipeline. The recommended tool for generating this file is msi-sensor; Information about this tool can be found here.

The scan command can be used to generate the microsatellites sites file.

msisensor-pro scan -d hg38.fa -o hg38_microsatellites_pro.tsv -p 1

options:

-d [string] reference genome sequences file, \*.fasta format

-o [string] output homopolymers and microsatellites file

-l [int] minimal homopolymer size, default=5

-c [int] context length, default=5

-m [int] maximal homopolymer size, default=50

-s [int] maximal length of microsate, default=5

-r [int] minimal repeat times of microsate, default=3

-p [int] output homopolymer only, 0: no; 1: yes, default=0

-h help

The '-p 1' flag is required to be added to the command to output only homopolymers; The MSI pipeline has only been benchmarked for hompolymers and cannot work with repeat regions of size > 100bp. Adding this flag should result in a compatible MSI sites file.

For WGS and WES samples, the tumor-normal mode is recommended. Here is an example command for a tumor-normal analysis.

dragen \

--msi-command tumor-normal \

--msi-coverage-threshold 60 \

--msi-microsatellites-file msi_file \

--output-directory={output_directory} \

--output-file-prefix={prefix} \

--enable-map-align=true \

--RGID=read_group_ID \

--RGSM=read_group_sample \

--ref-dir={reference_directory} \

--enable-map-align-output=true \

--enable-sort=true \

--enable-duplicate-marking=true \

--tumor-fastq1 {tumor_fq1} \

--tumor-fastq2 {tumor_fq2} \

--fastq-file1 {fq1} \

--fastq-file2 {fq2}

The tumor-only mode requires a panel of normals; this can be generated by running the MSI pipeline in the collect-evidence mode. The panel of normals is required to contain at least 20 normal samples (hard-coded requirement for running the tumor-only mode).

/dragen -f \

--ref-dir={reference_directory} \

--fastq-file1 {fq1} \

--fastq-file2 {fq2} \

--output-directory={output_directory} \

--enable-map-align=true \

--RGID=read_group_ID \

--RGSM=read_group_sample \

--output-file-prefix={prefix} \

--enable-map-align-output=true \

--enable-sort=true \

--enable-duplicate-marking=true \

--msi-command collect-evidence \

--msi-coverage-threshold 60 \

--msi-microsatellites-file msi_file

Once this is done, the MSI .dist files can be moved to a separate folder (normal_reference_directory) and be used as a part of the MSI pipeline in the tumor-only mode.

dragen \

--msi-command tumor-only \

--msi-coverage-threshold 60 \

--msi-microsatellites-file msi_file \

--msi-ref-normal-dir normal_reference_directory \

--output-directory={output_directory} \

--output-file-prefix={prefix} \

--enable-map-align=true \

--RGID=read_group_ID \

--RGSM=read_group_sample \

--ref-dir={reference_directory} \

--enable-map-align-output=true \

--enable-sort=true \

--enable-duplicate-marking=true \

--tumor-fastq1 {tumor_fq1} \

--tumor-fastq2 {tumor_fq2}

More information about the MSI pipeline can be found in the user guide.

For any feedback or questions regarding this article (Illumina Knowledge Article #7508), contact Illumina Technical Support techsupport@illumina.com.

Last updated

© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html