FASTQ files explained
Illumina sequencing technology uses cluster generation and sequencing by synthesis (SBS) chemistry to sequence millions or billions of clusters on a flow cell, depending on the sequencing platform. During SBS chemistry, for each cluster, base calls are made and stored for every cycle of sequencing by the Real-Time Analysis (RTA) software on the instrument. RTA stores the base call data in the form of individual base call (or BCL) files. When sequencing completes, the base calls in the BCL files must be converted into sequence data. This process is called BCL to FASTQ conversion.
A FASTQ file is a text file that contains the sequence data from the clusters that pass filter on a flow cell (for more information on clusters passing filter, see the “additional information” section of this bulletin). If samples were multiplexed, the first step in FASTQ file generation is demultiplexing. Demultiplexing assigns clusters to a sample, based on the cluster’s index sequence(s). After demultiplexing, the assembled sequences are written to FASTQ files per sample. If samples were not multiplexed, the demultiplexing step does not occur, and, for each flow cell lane, all clusters are assigned to a single sample.
For a single-read run, one Read 1 (R1) FASTQ file is created for each sample per flow cell lane. For a paired-end run, one R1 and one Read 2 (R2) FASTQ file is created for each sample for each lane. FASTQ files are compressed and created with the extension *.fastq.gz.
What does a FASTQ file look like?
For each cluster that passes filter, a single sequence is written to the corresponding sample’s R1 FASTQ file, and, for a paired-end run, a single sequence is also written to the sample’s R2 FASTQ file. Each entry in a FASTQ files consists of 4 lines:
- 1.A sequence identifier with information about the sequencing run and the cluster. The exact contents of this line vary by based on the BCL to FASTQ conversion software used.
- 2.The sequence (the base calls; A, C, T, G and N).
- 3.A separator, which is simply a plus (+) sign.
Here is an example of a single entry in a R1 FASTQ file:
How to view a FASTQ file
FASTQ files can contain up to millions of entries and can be several megabytes or gigabytes in size, which often makes them too large to open in a normal text editor. Generally, it is not necessary to view FASTQ files, because they are intermediate output files used as input for tools that perform downstream analysis, such as alignment to a reference or de novo assembly.
If you need to view a FASTQ file for troubleshooting purposes or out of curiosity, you will need either a text editor that can handle very large files, or access to a Unix or Linux system where large files can be viewed via the command line.
How to generate FASTQ files
For all runs uploaded to BaseSpace Sequence Hub, FASTQ file generation automatically occurs after the run is completely uploaded, and the FASTQ files are used as input for the various analysis apps on BaseSpace Sequence Hub. On BaseSpace Sequence Hub, you can find your FASTQ files in the project(s) associated with your run.
For information on the different settings that can be applied during FASTQ file generation, see the software user guides below.