Estimating Disk Space Utilization on the MiSeq i100 Series

Background

This guide provides disk space usage estimates for sequencing runs on the MiSeq i100 Series. The calculations are based on key parameters such as flow cell size, run type, read length, and data quality. The estimates encompass both BCL (Base Call) file sizes and compressed FASTQ file sizes, ensuring accurate resource planning for sequencing projects.

Key Parameters

The following tables provide disk space utilization estimates for the following flow cell and run configurations.

  1. Flow Cell Types

  • 5M

  • 25M

  1. Run Types and Read Length

  • 2x151: Paired-end reads, assuming 10bp dual-index, 322 cycles in total.

  • 2x301: Paired-end reads, assuming 10 bp dual-index, 622 cycles in total.

Disk Usage Estimates

The table below summarizes the estimated disk usage for different sequencing configurations on the MiSeq i100 Series:

Flow cell

5 M

25M

5 M

25M

Run type

2x151

2x151

2x301

2x301

Total Run Length (cycles)

322

322

622

622

Reads at Specification (M Reads)

5

25

5

25

Theoretical Max Reads (M Reads)*

6.5

32.7

6.5

32.7

Estimated Max Gbases

2.1

10.5

4.1

20.3

Est. Max Total BCL size (GB)

0.8

4.2

1.6

8.1

Est. Secondary File Sizes:

* fastq.gz size (GB)

1.1

5.5

2.1

10.6

* fastq.ora size (GB)

0.3

1.3

0.5

2.4

* BAM size (GB)

0.9

4.7

1.8

9.2

* CRAM size (GB)

0.3

1.5

0.6

2.8

* Other Sec files (GB)

0.05

0.05

0.05

0.05

Est. Total Run Folder Size (in GB w/ bcl, gz, bam)

2.9

14.5

5.6

27.9

Est. Total storable runs on instrument

477

96

249

50

Est. Total Run Folder Size (in GB w/ bcl, ora, cram)

1.4

7.0

2.7

13.5

Est. Total storable runs on instrument

972

200

512

103

Key Insights

  1. Theoretical maximum reads passing filter assumes 85% reads passing filter. This is an approximation of the maximum output per flow cell and is not a specification or indication of guarenteed performance.

    1. For further details, see MiSeq i100 Specifications Support page.

  2. BCL File Sizes: Total sum of all .cbcl files within the /Data directory and is proportional to the number of reads generated and read length.

    1. For smaller runs (5M flow cell), disk usage is minimal at ~0.84 GB for 2x151.

    2. For larger runs with longer run lengths (25M flow cell), usage scales significantly at ~8.1 GB for 2x301.

  3. Compressed FASTQ Sizes:

    1. FASTQ files are the compressed output of sequencing reads. Their sizes are generally 25-30% larger than the corresponding BCL files.

    2. FASTQ file size varies considerably depending on DRAGEN (.ora) or GZIP (.gz) compression format.

  4. Planning Recommendations:

    1. Ensure adequate storage for both raw (BCL) and processed (FASTQ) files.

    2. If further Secondary Analysis is performed, ensure that there is sufficient space for the .bam or .cram analysis files depending on workflow configuration.

Conclusion

This guide provides estimates for disk space usage on the MiSeq i100 Series, helping labs optimize storage requirements and ensure smooth sequencing operations. For specific workflows or extended configurations, additional considerations may apply.

For any feedback or questions regarding this article (Illumina Knowledge Article #9356), contact Illumina Technical Support techsupport@illumina.com.

Last updated

© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html