What is a Single Read (SR), Paired End (PE) reads, and read requirements for sequencing?
Last updated
Was this helpful?
Last updated
Was this helpful?
Single-Read (SR) sequencing involves sequencing DNA from only one end, and is the simplest way to utilize Illumina sequencing. One library molecule will yield one sequencing read (Read1).
Figure 1. Library sequenced with Single-Read (SR) setup.
Paired-End (PE) sequencing allows users to sequence both ends of a library fragment. One library molecule will yield two sequencing reads (Read1 and Read2).
Figure 2. Library sequenced with Paired-End (PE) read setup.
**How many cycles are needed for sequencing?**Sequencing kits contain reagents for a particular number of cycles of Sequencing by Synthesis (SBS) chemistry. A sequencing cycle in SBS refers a single step where a labeled nucleotide is added to a growing DNA strand, imaged to identify the base, and then chemically cleaved to prepare for the next nucleotide addition. Therefore, one cycle of SBS corresponds to one base pair (bp) that can be sequenced. The cycle number of a kit generally indicates the number of cycles that can be used for sequencing the insert during Read1 (SR) or Read1 + Read2 (PE), with additional cycles provided for index sequencing. To find out the total number of cycles in each cartridge that are provided to cover insert and index sequencing to verify the total number of cycles are not exceeded, see How many cycles of SBS chemistry are in my kit?
For example, if using a 100 cycle NextSeq 1000/2000 cartridge for SR sequencing, then Read1 can be a maximum of 100 cycles (1x100 run). If performing Paired-End sequencing, then Read1 + Read2 cannot exceed 100 cycles, with 2x50 being the most common setup for a 100 cycle kit.
Note that quality can decline as read lengths extend, therefore a PE run (eg 2x50) will have higher overall quality than a SR run of the equivalent cycle number (eg, 1x100), especially at longer read lengths.
Also, for certain higher cycle numbers kits, it is not supported to run one long single read, and paired-end sequencing must be performed. For example, if using a MiSeq v3 600 cycle kit, Illumina does not recommend performing a 1x600 bp run; the run should be configured paired end as 2x300 bp. See Maximum read length for Illumina sequencing platforms for more information.
**How is read length affected by library size?**Depending on the length of the library's insert size, there may be no overlap, some overlap, or complete overlap between Read1 and Read2 (Figure 3). Some analysis applications benefit from some overlap between Read1 and Read2, but sequencing with complete overlap is generally an inefficient use of SBS reagents, as the same piece of DNA is being sequenced twice with no new information. Such libraries (Figure 3C) may be sequenced with a smaller cycle number kit (eg, 100 cycle kit at 2x50 read length or 1x100 read length, instead of a 200 cycle kit at 2x100 read length) to obtain the same sequencing information.
Figure 3. Libraries of different insert sizes, sequenced PE 2x100 with (A) no overlap, (B) some overlap, or (C) complete overlap.
**How many reads are needed for RNA sequencing?**Many RNA sequencing applications call for a specific number of "reads per sample." For example, gene expression profiling of the human transcriptome commonly requires "25 million reads per sample." In this application, the intent is to sequence 25 million unique molecules of RNA per sample, which would be the equivalent of 25 M Single-Reads or 50 M Paired-End reads. Using the NextSeq 1000/2000 Specifications (Figure 4) as an example, one library fragment will seed and amplify in each nanowell cluster. Therefore each cluster represents one unique library molecule. Illumina's specifications pages list both SR and PE values, though when assessing the sequencer's output, it is simplest to consider the SR output, which is equivalent to the number of clusters or the number of unique molecules, that can be sequenced.
Figure 4. NextSeq 1000/2000 XLEAP Specifications
Using the NextSeq 1000/2000 P2 XLEAP flow cell as an example, there are 400 million (M) clusters of data output, meaning 400 M unique molecules can be sequenced. If sequenced as Single-Reads, this will result in 400 M reads, and if sequenced as Paired-End reads, this will result in 800 M reads (still representing 400 M molecules). Therefore if an application requires 25 M reads per sample, the samples per run can be calculated as follows: 400 M Single-Reads (or molecules) / 25 M reads (molecules) per sample = 16 samples per run. The same result is achieved if it is understood that 25 M unique molecules is equivalent to 50 M PE reads when using Paired-End sequencing. Therefore using the P2 Paired-End output value of 800 M reads, the same results are obtained: 800 M PE reads / 50 M PE reads per sample = 16 samples per run.
Alternatively, some applications call for a specific number of "read pairs" per sample which is akin to a pair of shoes. If there are 4 "pairs of shoes" (read pairs) then there are 8 total shoes (8 PE reads) but only 4 unique shoe types (4 molecules). Therefore, if an application requires 600 M "read pairs" per sample, this means 600 M "pairs of shoes" are needed, which in turn means 600 M unique shoe types (molecules) are needed. Since 1 molecule = 1 cluster, this is also equivalent to 600 M SR or 1200 M PE. Using P4 XLEAP specifications as an example, 1.8 B Single-Reads (molecules) / 600 M (molecules) per sample = 3 samples per run. Alternatively, 3.6 B Paired-End reads / 1200 M Paired-End reads per sample = 3 samples per run.
How many reads are needed for DNA sequencing? Unlike RNA-sequencing, which often calls for "reads per sample" or "read pairs per sample," DNA sequencing aims for a specific depth of "coverage" per sample, which takes into account the genome or target size as well as the read length. This is because the genome length is well-defined and the full genome is present in 2 copies per diploid cell (for diploid organisms). In contrast, the transcriptome is not fully expressed in every cell (eg, the insulin gene is highly expressed in the pancreas and would not be highly expressed in the brain) and there is differential expression as well (pancreatic cells will express much higher levels of insulin mRNA than other genes). Therefore, more variables and more complex math are required to determine the samples per run for DNA applications compared to RNA applications. For more information about estimating coverage estimates for DNA-Seq, see the technical note Estimating Sequencing Coverage, consult the Sequencing Coverage Calculator, or the local sales specialist.
Additional resources Advantages of paired-end and single-read sequencing Read length recommendations
For any feedback or questions regarding this article (Illumina Knowledge Article #9547), contact Illumina Technical Support .