Estimating Disk Space Utilization on the MiSeq i100 Series
Background
This guide provides disk space usage estimates for sequencing runs on the MiSeq i100 Series. The calculations are based on key parameters such as flow cell size, run type, read length, and data quality. The estimates encompass both BCL (Base Call) file sizes and compressed FASTQ file sizes, ensuring accurate resource planning for sequencing projects.
Key Parameters
The following tables provide disk space utilization estimates for the following flow cell and run configurations.
Flow Cell Types
5M
25M
Run Types and Read Length
2x151: Paired-end reads, assuming 10bp dual-index, 322 cycles in total.
2x301: Paired-end reads, assuming 10 bp dual-index, 622 cycles in total.
Disk Usage Estimates
The table below summarizes the estimated disk usage for different sequencing configurations on the MiSeq i100 Series:
Flow cell
5 M
25M
5 M
25M
Run type
2x151
2x151
2x301
2x301
Total Run Length (cycles)
322
322
622
622
Reads at Specification (M Reads)
5
25
5
25
Theoretical Max Reads (M Reads)*
6.5
32.7
6.5
32.7
Estimated Max Gbases
2.1
10.5
4.1
20.3
Est. Max Total BCL size (GB)
0.8
4.2
1.6
8.1
Est. Secondary File Sizes:
* fastq.gz size (GB)
1.1
5.5
2.1
10.6
* fastq.ora size (GB)
0.3
1.3
0.5
2.4
* BAM size (GB)
0.9
4.7
1.8
9.2
* CRAM size (GB)
0.3
1.5
0.6
2.8
* Other Sec files (GB)
0.05
0.05
0.05
0.05
Est. Total Run Folder Size (in GB w/ bcl, gz, bam)
2.9
14.5
5.6
27.9
Est. Total storable runs on instrument
477
96
249
50
Est. Total Run Folder Size (in GB w/ bcl, ora, cram)
1.4
7.0
2.7
13.5
Est. Total storable runs on instrument
972
200
512
103
Key Insights
Theoretical maximum reads passing filter assumes 85% reads passing filter. This is an approximation of the maximum output per flow cell and is not a specification or indication of guarenteed performance.
For further details, see MiSeq i100 Specifications Support page.
BCL File Sizes: Total sum of all .cbcl files within the /Data directory and is proportional to the number of reads generated and read length.
For smaller runs (5M flow cell), disk usage is minimal at ~0.84 GB for 2x151.
For larger runs with longer run lengths (25M flow cell), usage scales significantly at ~8.1 GB for 2x301.
Compressed FASTQ Sizes:
FASTQ files are the compressed output of sequencing reads. Their sizes are generally 25-30% larger than the corresponding BCL files.
FASTQ file size varies considerably depending on DRAGEN (.ora) or GZIP (.gz) compression format.
Planning Recommendations:
Ensure adequate storage for both raw (BCL) and processed (FASTQ) files.
If further Secondary Analysis is performed, ensure that there is sufficient space for the .bam or .cram analysis files depending on workflow configuration.
Conclusion
This guide provides estimates for disk space usage on the MiSeq i100 Series, helping labs optimize storage requirements and ensure smooth sequencing operations. For specific workflows or extended configurations, additional considerations may apply.
For any feedback or questions regarding this article (Illumina Knowledge Article #9356), contact Illumina Technical Support techsupport@illumina.com.
Last updated