# How to run the DRAGEN MSI Pipeline

Microsatellites are genomic regions of short DNA motifs that are repeated 5-50 times and are associated with high mutation rates. Microsatellite Instability (MSI) results from deficiencies in the DNA mismatch repair pathway and can be used as a critical biomarker to predict immunotherapy responses in multiple tumor types.

DRAGEN MSI can work in 3 different modes determined by enabling the option --msi-command

1. Collect Evidence Mode (collect-evidence)
2. Tumor Normal Mode (tumor-normal)
3. Tumor Only Mode (tumor-only)

A **Microsatellites site** file that lists microsatellite sites of interest in a **given reference genome** is required to run this pipeline. The recommended tool for generating this file is **msi-sensor**; Information about this tool can be found [here](https://github.com/xjtu-omics/msisensor-pro/wiki).

The [scan](https://github.com/xjtu-omics/msisensor-pro/wiki/Key-Commands#scan-from-msisensor) command can be used to generate the microsatellites sites file.

msisensor-pro scan -d hg38.fa -o hg38\_microsatellites\_pro.tsv -p 1

options:

-d \[string] reference genome sequences file, \\\*.fasta format

-o \[string] output homopolymers and microsatellites file

-l \[int] minimal homopolymer size, default=5

-c \[int] context length, default=5

-m \[int] maximal homopolymer size, default=50

-s \[int] maximal length of microsate, default=5

-r \[int] minimal repeat times of microsate, default=3

-p \[int] output homopolymer only, 0: no; 1: yes, default=0

-h help

The **'-p 1'** flag is **required** to be added to the command to output only **homopolymers**; The MSI pipeline has only been **benchmarked** for **hompolymers** and **cannot** work with **repeat regions of size > 100bp**. Adding this flag should result in a compatible MSI sites file.

For **WGS and WES** samples, the **tumor-normal mode** is **recommended**. Here is an example command for a tumor-normal analysis.

dragen \\

\--msi-command tumor-normal \\

\--msi-coverage-threshold 60 \\

\--msi-microsatellites-file msi\_file \\

\--output-directory={output\_directory} \\

\--output-file-prefix={prefix} \\

\--enable-map-align=true \\

\--RGID=read\_group\_ID \\

\--RGSM=read\_group\_sample \\

\--ref-dir={reference\_directory} \\

\--enable-map-align-output=true \\

\--enable-sort=true \\

\--enable-duplicate-marking=true \\

\--tumor-fastq1 {tumor\_fq1} \\

\--tumor-fastq2 {tumor\_fq2} \\

\--fastq-file1 {fq1} \\

\--fastq-file2 {fq2}

The **tumor-only** mode requires a **panel of normals**; this can be generated by running the MSI pipeline in the collect-evidence mode. The panel of normals is required to contain **at least 20 normal** **samples** (hard-coded requirement for running the tumor-only mode).

/dragen -f \\

\--ref-dir={reference\_directory} \\

\--fastq-file1 {fq1} \\

\--fastq-file2 {fq2} \\

\--output-directory={output\_directory} \\

\--enable-map-align=true \\

\--RGID=read\_group\_ID \\

\--RGSM=read\_group\_sample \\

\--output-file-prefix={prefix} \\

\--enable-map-align-output=true \\

\--enable-sort=true \\

\--enable-duplicate-marking=true \\

\--msi-command collect-evidence \\

\--msi-coverage-threshold 60 \\

\--msi-microsatellites-file msi\_file

Once this is done, the MSI **.dist files** can be moved to a separate folder (normal\_reference\_directory) and be used as a part of the MSI pipeline in the tumor-only mode.

dragen \\

\--msi-command tumor-only \\

\--msi-coverage-threshold 60 \\

\--msi-microsatellites-file msi\_file \\

\--msi-ref-normal-dir normal\_reference\_directory \\

\--output-directory={output\_directory} \\

\--output-file-prefix={prefix} \\

\--enable-map-align=true \\

\--RGID=read\_group\_ID \\

\--RGSM=read\_group\_sample \\

\--ref-dir={reference\_directory} \\

\--enable-map-align-output=true \\

\--enable-sort=true \\

\--enable-duplicate-marking=true \\

\--tumor-fastq1 {tumor\_fq1} \\

\--tumor-fastq2 {tumor\_fq2}

More information about the MSI pipeline can be found in the [user guide](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/DRAGEN/Biomarkers_MSI.htm).

\
\
\ <br>

|                                                                                                                                                                                                                                                                                                                                                                 |
| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| *For any feedback or questions regarding this article (Illumina Knowledge Article #7508), contact Illumina Technical Support* [*techsupport@illumina.com*](mailto:techsupport@illumina.com?subject=Question%2FFeedback%20Regarding%20Illumina%20Knowledge%20Article%20#000007508%20-%20Software%20\&body=Dear%20Illumina%20Technical%20Support,%0D%0A%0D%0A)*.* |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://knowledge.illumina.com/software/general/software-general-faq-list/000007508.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
