Optimizing Library Insert Size Representation on NovaSeq X Series Instruments
Last updated
Last updated
© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html
Background
Shorter insert sizes can result in slight reductions in variant calling for single nucleotide variants (SNVs) and insertions/deletions (indels). The overall representation of insert sizes within a sequenced library pool is affected by several different factors. This article discusses these major factors with respect to sequencing on the NovaSeq X Series, and how users can optimize their workflow to achieve a representative range of insert sizes from within their library pool.
Insert size representation on the NovaSeq X Series is affected by the following parameters:
Loading Concentration
Insert Lengths within a Library
Library Cleanup
Library Type
Library Pool
Loading Concentration
Loading concentrations above the optimal range can result in declining mean insert size for some libraries as shorter fragments preferentially cluster under these conditions. In this example, a TruSeq DNA PCR-free library was prepared targeting an insert size of 450-500 bp. The mean insert size when loaded at 75 pM on NovaSeq X 10B flow cell was 470bp, while this decreased down to a mean of 450 bp when loaded at 110 pM.
In these cases, reducing the loading concentration can increase mean insert size and result in data that is more representative of the actual library insert size. In practice, insert size should be considered along with other factors such as %PF, %Duplicates and Coverage.
Insert Lengths within a Library
Wide range of insert size distribution within a pool might bias mean insert size towards short insert clustering. Increasing loading concentration influences preference for short inserts within a library pool, which can also lead to increased variation of index representation within the pool. This can make it more difficult to ensure each sample is obtaining the minimum number of reads required for analysis when loading concentrations are too high. For example, the %Reads each sample within a pool obtains increases in spread (reflected in increasing CV) as the loading concentration increases from 80 pM to 120 pM, with TruSeq DNA PCR-Free libraries that have a 30 bp insert size difference.
Individual samples within a pool that have shorter sizes will also get more reads, increasing index CV. For example, a TruSeq DNA PCR-Free library pool with 30bp insert size difference, increasing the larger inserts in the pool by 20% results in significantly lower CV compared to the equal molar pooling condition. This requires every library to be run on a Bioanalyzer or TapeStation to determine average insert size.
Library Cleanup
Illumina recommends a library clean up step to reduce very short inserts and adapter dimers and minimize the impact to CV. Additional cleanup can help improve primary and secondary metrics by tightening insert size distribution. Insert size representation can be altered by applying additional bead cleanups to the final library pool, using modified bead ratios. For example, performing an additional cleanup of Illumina DNA PCR-Free libraires can increase average insert size.
Please see Optimal variant calling with Illumina DNA PCR-Free Prep on the NovaSeqâ„¢ X Series for more details.
Library Pool
Mixing different libraries pools together on same lane show short insert clustering preference. Different library types can have different clustering efficiencies, even when they are similar in size. Additionally, shorter libraries cluster more efficiently and will obtain more reads relative to longer libraries. Therefore, mixing different library types in the same sequencing lane can have unpredictable outcomes.
Illumina does not recommend mixing library samples with insert sizes on the extreme ends e.g. micro-RNA (~150bp total) mixing with whole-genome sequencing libraries (~600bp total) on the same lane.
In such cases, NovaSeq X might preferentially clusters shorter insert libraries, tilting the balance toward shorter inserts.
When mixing samples of varied sizes, consider if there is a chance to run into adapter sequence of the shorter library, as this may cause quality issue for the longer libraries from the same pool.
Mixing asymmetrical 10X Genomics library (28 cycles in read1) with other libraries in the same lane is not recommended.
Exercise caution while combining with Illumina Stranded RNA Prep libraries which require a dark cycle recipe applied to the entire flow cell for optimal performance. Dark cycle recipes for Illumina Stranded RNA libraries are only compatible with certain library types. For more information, please refer to Custom recipes for Illumina Stranded libraries on NovaSeq X/X Plus
No Significant Differences Across Flow Cell Types
Internal studies show similar mean insert size for TruSeq DNA PCR-Free libraries sequenced across 10B and 25B flow cells when loaded at optimal concentrations. Illumina recommends titrating the loading concentration if the range of insert sizes is larger than expected when comparing similar library types on 10B and 25B Flow Cells.
Library Type
The impact of loading concentration changes is also affected by library type. Different library types have different compositions and size distributions, which can affect insert size as loading concentration is varied. For example, the insert size of Illumina Stranded Total RNA and Whole Exome libraries is less sensitive to loading concentration increases when compared to the TruSeq DNA PCR-Free (whole-genome sequencing) data.
Illumina has also seen a difference in insert size representation between NovaSeq 6000 and NovaSeq X Series that is driven by library type. The largest differences between NovaSeq 6000 and NovaSeq X Series occurs when insert size is greater than 200 bp:
For any feedback or questions regarding this article (Illumina Knowledge Article #8912), contact Illumina Technical Support techsupport@illumina.com.