# How to parse the Counts sparse matrix file output by the DRAGEN scRNA and scATAC pipelines

The DRAGEN Single Cell pipelines generate a **count matrix** of unique **UMIs/genes (scRNA) and peaks (scATAC) per cell** and outputs it in a[**Matrix Market format**](https://math.nist.gov/MatrixMarket/formats.html) (matrix.mtx.gz), a format typically used for storing **sparse matrices**. If a user wants to explore the output matrix in a human-readable format, they can do so by loading the matrix in a "dense" dataframe in Python/other programming languages. It is important to remember, however, that when possible a "sparse" representation of the matrix is preferable, due to the **significant usage of memory and disk space** by "dense" matrices. Several tools are available to work efficiently with "sparse" representations of single cell matrices (eg, scanpy in python).

The **row names** for this matrix are stored in the**barcodes.tsv.gz** file while the **column names** are stored in a **genes.tsv.gz (scRNA)** or a **peaks.tsv.gz (scATAC)** file.

The matrix can be converted into a "dense" representation through two python modules: `scanpy` and `pandas`. This has been tested with **python 3.10.0, scanpy 1.9.3, pandas 1.5.3**.

First, it is necessary to install the necessary libraries:

```
> pip install -U scanpy pandas  
```

Within python, the matrix can be loaded in "dense" representation using the following commands:

```
# import libraries import pandas as pd import scanpy as sc # define path to input files matrix\_path = "path/to/matrix.mtx.gz" genes\_path = "path/to/genes.tsv.gz" #path/to/peaks.tsv.gz for scATAC databarcodes\_path = "path/to/barcodes.tsv.gz" # load matrix through scanpy adata = sc.read\_mtx(matrix\_path).T adata.var\_names = pd.read\_csv(genes\_path, sep="\t", header=None)[1] adata.obs\_names = pd.read\_csv(barcodes\_path, sep="\t", header=None)[0] # convert scanpy internal format (AnnData) to dense pandas DataFrame df = pd.DataFrame(adata.X.todense(), index=adata.obs\_names, columns=adata.var\_names) # save it as CSV file df.to\_csv("output\_matrix.csv")  
```

The matrix can be saved through different output formats (eg, CSV), although this might not recommended due to large disk usage.

\
\
\ <br>

|                                                                                                                                                                                                                                                                                                                                                                 |
| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| *For any feedback or questions regarding this article (Illumina Knowledge Article #7911), contact Illumina Technical Support* [*techsupport@illumina.com*](mailto:techsupport@illumina.com?subject=Question%2FFeedback%20Regarding%20Illumina%20Knowledge%20Article%20#000007911%20-%20Software%20\&body=Dear%20Illumina%20Technical%20Support,%0D%0A%0D%0A)*.* |
