Sample specific nucleotide error bias estimation filter and systematic noise filter in DRAGEN

Reference for Sample-specific NTD filter.

Reference for Systematic noise filtering.

DRAGEN can compensate for oxidation and deamination artifacts that might exist upstream of the sequencing system, and are common in FFPE samples. DRAGEN does this by estimating nucleotide mutation biases on a per sample basis, taking account of read orientation. During variant calling, DRAGEN corrects for nucleotide substitution biases by combining the estimated parameters with the base call quality scores, modifying the nucleotide error rates used by the hidden Markov model.

Nucleotide (NTD**) Error Bias Estimation** is on by default and recommended as a replacement for the orientation bias filter. Both methods take account of strand-specific biases (systematic differences between F1R2 and F2R1 reads). In addition, NTD error estimation accounts for non-strand-specific biases such as sample-wide elevation of a certain SNV type, eg C->T or any other transition or transversion. NTD error estimation can also capture the biases in a trinucleotide context.

The DRAGEN systematic noise filter is available in somatic mode, and can be used to reduce false positive calls by accounting for site-specific noise.

To build the systematic noise file, it is best for the normal samples to match the same prep as the tumor sample, ie, build a "fresh frozen" systematic noise file to analyze fresh frozen tumors, and build a FFPE systematic noise file to analyze FFPE tumors. For noise generation use approximately 50 normal samples.

  1. The systematic noise and NTD filters are complementary and designed to be used together, especially for FFPE samples.

  • The NTD filter removes any lower VAF oxidation and deamination artifacts. Some FFPE artifacts may be more region specific (SINE and specifically Alu regions) These region specific artifacts may have variant allele frequencies (eg, 5%) that are too high to be removed by the NTD filter alone.

  • NTD adjusts the read base qualities based on empirical measurements of the sample's noise. This is independent of the systematic noise filter. The NTD filter can adjust base qualities, eg, for C > T bases, this adjustment can then impact the somatic quality (SQ) score.

  • The systematic noise filter runs after the SQ score is finalized. If either the SQ score or the systematic noise AQ score fails, then the call is filtered.

  • DRAGEN V4.0 --vc-enable-unequal-ntd-errors is always recommended and on by default. The trimer context is not on by default, but can optionally be enabled for samples with sufficient coverage (WGS/WES or panels with deep coverage).

  1. Systematic noise file and NTD catch two different sources of errors.

  • Systematic noise file catches errors that tend to systematically occur at certain genomic positions, consistently across normal samples. i.e. it catches positions that tend to be noisy across samples.

  • NTD catches sample-specific errors, and looks for over-representation of certain SNP types (ie, many C > T conversions). If a certain SNP type is overly represented compared to others, it flags it as a systematic error.

  • User can enable both the systematic noise file and the NTD in the same run.

  1. NTD error estimation can also capture the biases in a trinucleotide context. Below is more information regarding --vc-enable-trimer-context=true option.

  • To estimate a larger set of parameters in a trimer context (recommended on sufficiently large panels when coverage is above 1000X), specify --vc-enable-trimer-context=true.

  • The NTD filter accounts for biases such as sample-wide elevation of a certain SNV type or any other transition or transversion; eg, many C > T and G > A compared to other SNV types.

  • When the vc-enable-trimer-context is set to true, the NTD filter further breaks down each SNP type by taking into account the preceding and following base to the SNV type, so in the case of C > T it breaks down the counts as ACA > ATA, CCA > CTA, GCA > CTA, TCA > TTA. This enables catching noise levels that are context specific. (ie, the noise causes an over-representation of a particular SNP type, but only in a certain context).

  • The trimer context is not enabled by default because it requires more observation measurements to get an accurate estimate of the noise for each bucket. It is only enabled in high depth samples, such as liquid samples.

For any feedback or questions regarding this article (Illumina Knowledge Article #6989), contact Illumina Technical Support techsupport@illumina.com.

Last updated

© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html