Analysis pipeline and output FAQ for the Illumina 5 Base WGS and Enrichment kits
What information is reported when using the analysis?
The 5-Base pipeline performs methylation calling, small variant calling, and CNV calling.
Key output files include the CX report (genome-wide cytosine methylation report), mapping/alignment metrics, coverage metrics, methylation calling metrics, and M-bias metrics.
See here for a detailed writeup of methylation associated outputs. All VC outputs not otherwise discussed are identical to DNA analysis workflows.
How is the *CX_report generated?
It is generated by counting the number of methyl-converted and non-methyl-converted cytosines in the properly paired reads covering the specified position. Reporting is limited to reference cytosines with non-0 coverage.
What is M-bias?
The M-Bias Report describes the methylation proportion across each possible position in the read. The report contains a numerical value for M-Bias plot that can indicate the presence of fundamental technical biases in the methylation calling of reads. See here for more details.
Does the DRAGEN pipeline automatically calculate (and utilize) any cutoff values from the M-bias output? If yes, is there a way to see how this was applied? If not, is there a way to provide these values for trimming, etc.?
There is currently no read trimming for M-bias applied to the data processing. M-bias is calculated at the end of secondary analysis and provided as a QC readout. The Illumina team has done an analysis of M-bias and can make a recommendation to trim 5 bases from Read 2 if customer is interested in additional trimming. Generally the M-bias plots are used to assess whether additional trimming is needed to remove artifacts at the ends of reads.
Are soft-clipped bases included in the calculations for the cytosine report or not?
Soft clipped bases are not included in the cytosine report. The first or last several bases of the reads may be soft-clipped if they contain the UMI.
If the CRAMs and CX reports out of DRAGEN are already available, is it possible to skip alignment/methylation calling and do just 5-Base aware variant calling from CRAMs? If yes, how to specify that it should still be 5-Base aware in the command-line while skipping alignment/methylation calling?
Illumina has not tested starting from CRAM input; Illumina does support BAM input though, provided the data were originally aligned using a 5-Base pipeline. Using CRAM files may also work.
--methylation-conversion=illumina --enable-map-align=false will enable the desired behavior, skipping alignment and running 5-base VC.
Also specifying the --methylation-generate-cytosine-report=false option will avoid generating a second CX report if one has already been obtained.
Is the alignment BAM file available for lambda and pUC19 controls?
Yes. The lambda and pUC19 reads are present in the alignment bam that contains human reads, the analysis only generates one such file. They are flagged as unmapped to avoid contaminating any downstream processing, but their alignment and methylation information is retained. Reads can be extracted with a command such as samtools view -f 4 -@ 8 ${BAM} | grep -e “ca:Z:puc19” -e “ca:Z:lambda”. That ca tag contains the alignment coordinates. XM/XR/XG methylation tags are retained with no changes.
Is it possible to get the MM tags in the BAMs?
Illumina does not currently support reporting MM tags in the BAM files.
What are known limitations of the analyses?
See the page here for additional information.
For any feedback or questions regarding this article (Illumina Knowledge Article #9951), contact Illumina Technical Support [email protected].
Last updated
Was this helpful?
