Upgrading from bcl2fastq to BCL Convert
The Illumina BCL Convert software is a standalone local Linux application that converts the binary base call (BCL) files produced by Illumina sequencing systems to FASTQ files. Based on software derived from the Illumina DRAGEN Bio-IT platform, BCL Convert offers improvements to the speed and efficiency of handling large data sets compared to the older bcl2fastq software. While both bcl2fastq and BCL Convert are currently supported, BCL Convert is planned to replace bcl2fastq in the future. The BCL Convert compatibility support page provides a broad comparison between the two programs. This article provides a detailed comparison of usage and feature changes between the latest release of bcl2fastq (v2.20) and BCL Convert.
Feature | bcl2fastq 2.20 | BCL Convert | Changes from bcl2fastq |
---|---|---|---|
File name | <Sample_Name>_S#_L00#_<R or I>#_00#.fastq.gz OR <Sample_ID>_S#_L00#_R#_001.fastq.gz (if Sample_Name is not present) | <Sample_ID>_S#_L00#_<R# or I#>_001.fastq.gz | Always include Sample ID as part of file output naming convention. Sample name is ignored in BCL Convert. |
FASTQ Header | @Instrument:RunID:FlowCellID:Lane:Tile:X:Y[:UMI] ReadNum:FilterFlag:0:IndexSequence or SampleNumber | @Instrument:RunID:FlowCellID:Lane:Tile:X:Y[:UMI] ReadNum:N:0:IndexSequence or SampleNumber | Filter flag is set to = “N” and the control bit = “0”. Missing instrument is not supported. |
Determine Expected Input Files | Expects Config.xml in defined input folder: <input folder>/Data/Intensities/BaseCalls If no Config.xml, expects RunInfo.xml in defined input folder. | Expects Config.xml in defined input folder: <input folder>/Data/Intensities/BaseCalls If no Config.xml, expects RunInfo.xml in defined input folder. | No change. |
Show command line options and help information | --help or -h --version or -v | --help or -h --version or -V | Command requires upper case letter “V” to get version info in the command line. |
Run Folder | Command Line: --runfolder-dir or -R | Command Line: --bcl-input-directory | Input specified is the same (top level of run folder is the input), only the command line option has changed. |
Input Folder | Command Line: --input-dir or -i | None | Cannot specify path to BaseCalls folder specifically. |
Output Folder | Command Line: --output-dir or -o (default input folder) | Command Line: --output-directory (required, default cannot exist) Command Line: --force, -f (allows output to be written to existing folder) | Output directory is required. Specify directory for new folder, otherwise, use --force to use an existing folder. |
Sample Sheet Format | V1 format only. | V1 and V2 formats both accepted. | See software guide for examples of V1 and V2 formatting changes. |
Sample Sheet Path | Command Line: --sample-sheet (default input folder, sample sheet not required) | Command Line: --sample-sheet (default input folder, sample sheet required) | Sample Sheet is now required. The software will default to search for SampleSheet.csv in input run folder. The --sample-sheet option is used to specify the path to the file if it is not in the default location. Note: At least one Sample_ID is required in the Data section of the Sample Sheet. |
Ignore missing base call (BCL) files | Command Line: --ignore-missing-bcls (default off) | Command Line: --strict-mode false (default false) | Missing or corrupt BCL is ignored and the corresponding base call is replaced with an N with a quality score of 2 (#). |
Ignore missing or corrupt Filter files | Command Line: --ignore-missing-filter (default off) Assumes PF for all tiles with missing filter files. | Command Line: --strict-mode false (default false) Note behavior change, no FASTQ entries for reads in tiles with missing filter files | BCL Convert does not produce FASTQ entries for any reads where the filter file is missing. |
Ignore missing or corrupt Position files | Command Line: --ignore-missing-positions (default off) | Command Line: --strict-mode false (default false) | FASTQ file header will contain automatically generated unique XY positions when position files are missing. |
Assume that failed reads are PF | Command Line: --with-failed-reads (default off) | None | No longer supported. |
Ignore beginning or end of read | Sample Sheet: Read1StartFromCycle # (default 1) Read2StartFromCycle # (default 1) Read1EndWithCycle # (default last cycle) Read2EndWithCycle # (default last cycle) Command Line: --use-bases-mask Y#;N# (default all cycles used) | Sample Sheet: OverrideCycles,Y#;N# (default all cycles used) | OverrideCycles can only be applied to the entire analysis; there is no per-lane option for OverrideCycles. Cycles cannot be ignored from the middle of a read. |
Use a subset of bases for i7/i5 indexes | Sample Sheet: Use subset of index cycles for demultiplexing by providing shortened sequence in index or index2 column within a lane. Command Line: --use-bases-mask I#N# (default use all index cycles defined in RunInfo.xml). | Sample Sheet: Use subset of index cycles for demultiplexing by providing shortened sequence in index or index2 column and providing desired length in OverrideCycles setting (default use all index cycles defined in RunInfo.xml) | The number of cycles defined per read in OverrideCycles must always match the number of cycles in the corresponding read of the RunInfo.xml. |
Wildcard Index Sequences | None | None | Wildcard entries (N) for indexes are not supported. |
Index FASTQs | Sample Sheet: CreateFastqForIndexReads 0 or 1 (default 0) Command Line: --create-fastq-for-index-reads | Sample Sheet: CreateFastqForIndexReads, 0 or 1 (default 0) | Generating FASTQs for index reads is off by default, add the sample sheet setting with a value of 1 to enable. When an index read is specified as a UMI with OverrideCycles, the UMI read will be output to a FASTQ file. This feature introduced in BCL Convert version 3.7.5 |
FASTQ Compression Specification | Command Line: --no-bgzf-compression --fastq-compression-level | Command Line: --fastq-compression-level (default 1, can specify 0-9) | Fastq files can be gzipped at a compression level of “0-9”. Multiple gzip compression regions are appended to the same file with large block sizes. Some tools could have trouble fully decompressing these files if they do not continue past the first gzip region. This feature introduced in version 4.0. |
Barcode Mismatches | Command Line: --barcode-mismatches # or #,# (default 1 == 1,1) | Sample Sheet: BarcodeMismatchIndex1,# (default 1) BarcodeMismatchIndex2,# (default 1) Note: Command Line no longer supported. | Index # must be specified separately. |
Barcode Collision Checks | For a dual index run, requires both index1 and index2 to have a collision in order to error out based on a barcode collision. | Depending on version, index collision checks are relaxed, strict, or configurable:
CombinedIndexCollisionCheck,value Where value corresponds to the number of the lane or lanes to use this behavior; multiple lane values should be semicolon-separated.
IndependentIndexCollisionCheck,value to allow optional strict checking, where value corresponds to the number of the lane or lanes to use this behavior; multiple lane values should be semicolon-separated. | Evaluation of hamming distance for barcode collision differs between version. "Relaxed" mode matches default bcl2fastq behavior. |
Adapter Read 1, 2 Trimming | Sample Sheet: Adapter/TrimAdapter, A/T/C/G AdapterRead2/TrimAdapterRead2,A/T/C/G | Sample Sheet: AdapterRead1,A/T/C/G AdapterRead2,A/T/C/G AdapterBehavior,trim (default trim) | Read 1 and Read 2 adapters must be specified separately. |
Adapter Read 1, 2 Masking | Sample Sheet: MaskAdapter A/T/C/G MaskAdapterRead2 A/T/C/G | Sample Sheet: AdapterRead1,A/T/C/G AdapterRead2,A/T/C/G AdapterBehavior,mask (default trim) | Read 1 and Read 2 adapters must be specified separately. |
Adapter Stringency | Command Line: --adapter-stringency # (default 0.9, 0.0-1.0 allowed) | Sample Sheet: AdapterStringency,# (default 0.9, 0.5-1.0 allowed) Note: Command Line no longer supported. | The range is now 0.5-1.0 vs 0.0-1.0. |
Adapter Matching Algorithm | Sample Sheet in [Settings] section: FindAdaptersWithIndels 1 (default on, 0 = Sliding Window) | Sample Sheet: FindAdaptersWithIndels,true (default off, false = Sliding Window) | Finding adapter with indels introduced in version 4.0. |
Trimming last bases when they match the adapter | Always trims or masks the final X bases when they overlap with the adapter provided according to stringency settings. | Sample Sheet: MinimumAdapterOverlap,# (default 1, 1-3 allowed) Never trims or masks less than X bases when they overlap with the adapter provided regardless of stringency settings, where X is the MinimumAdapterOverlap provided. | Default behavior is identical. |
Minimum Read Length | Command Line: --minimum-trimmed-read-length # (default 35) | Sample Sheet: MinimumTrimmedReadLength,# (default 35) Note: Command Line no longer supported. | Part of sample sheet. |
Minimum Number of ATCG Bases per Read | Command Line: --mask-short-adapter-reads # (default 22) | Sample Sheet: MaskShortReads,# (default 22) Note: Command Line no longer supported. | Part of sample sheet. |
UMI Settings | Sample Sheet: Trim UMI 0,1 (default 0) Read1UMIStartFromCycle # (default 1) Read2UMIStartFromCycle # (default 1) Read1UMILength # Read2UMILength # | Sample Sheet: OverrideCycles,U# TrimUMI,0 or 1 (default 1) | UMIs can now be defined in index or genomic reads. Default is to trim UMIs. TrimUMI option introduced in BCL Convert version 3.7.5 |
Use Subset of Tiles for Processing | Command Line: --tiles (provide list of tiles to include) | Command Line: --tiles (provide list of tiles to include) | --tiles introduced in BCL Convert version 3.9. All terms specified must also exist in the RunInfo.xml, however if specified in a range, BCL Convert will only analyze tiles that exist in the RunInfo.xml. |
Exclude Tiles from processing | Sample Sheet: ExcludeTiles #: Provide list or range of tiles to exclude from processing (default no tiles) ExcludeTilesLaneX: Provide list or range of tiles in lane X to exclude from processing (default no tiles) | Command Line: --first-tile-only true (default false) Or --exclude-tiles (provide list of tiles to exclude | For testing purposes, BCL Convert can run with the first tile only if --first-tile-only Option not compatible with NovaSeq SP flowcells. --exclude-tiles introduced in BCL Convert version 3.9 All terms specified for --exclude-tiles must match the set produced by the list defined by --tiles or the RunInfo.xml otherwise. When specified with --tiles, all tiles specified for --exclude-tiles must exist in those whitelisted by --tiles |
Logging | Console output | Console output Warnings/Errors/Information log files in <output_directory>/Logs. FastqComplete.txt into <output_directory>/Logs after all FASTQs are created. | New support for logging files. Less verbose logging. New output file: fastqcomplete.txt is generated in log folder. |
Association of Samples and output FASTQ files | None | Goes into fastq_list.csv in <output_directory>/Reports | New report of samples and output FASTQ file association is now generated. |
Combine multiple FASTQ files | Command Line: --no-lane-splitting (default off) | Command Line: --no-lane-splitting (default off) Sample sheet: NoLaneSplitting,true or false (default false) | Concatenation of FASTQ files separated by lane can be done by enabling this setting. FASTQs will be output with the naming convention <Sample_ID>_S#_<R or I>#_001.fastq.gz (no L00# included). Reports will be generated with values separated by lane. Command line option introduced in BCL Convert version 3.7.5. Sample sheet setting introduced in BCL Convert 3.8. Command line and sample sheet settings must be consistent. |
Reverse Complement all reads | Sample Sheet: ReverseComplement 1 (default 0) | None | Impacts Nextera Mate Pair kits, which are not supported by BCL Convert. |
Sample Project | Sample Sheet: Creates directory with sample project name. Can use multiple samples in same project. Cannot use “all” or “default” as project name. | Command Line: --bcl-sampleproject-subdirectories true (default false) Sample Sheet: Sample_Project column in Data section. | By default, all FASTQ files will be placed into the same output directory regardless of Sample Sheet columns. Command Line must be set in order to generate subdirectories. |
Sample Name | Sample Sheet: Used for FASTQ name. Cannot use “all” or “undetermined” as name. | Command Line: --sample-name-column-enabled true (default false) Sample Sheet: Sample_Name column in Data section | By default, all FASTQ files will be named according to the Sample_ID column unless Sample_Name is enabled. Command Line must be set in order to enable. Must be specified for every sample in the sample sheet when enabled Reports will include Sample_Name when enabled on the command line. When specified with Sample_Project, FASTQ files will be output to subdirectory name by Sample_ID. This feature introduced in version 4.0. |
IndexMetricsOut.bin Output Location | Command Line: --interop-dir (default <runfolder-dir>/InterOp) | Always output to <output_directory>/Reports. | User cannot configure IndexMetricsOut.bin output location. |
Number of perfect barcodes, 1 mismatch barcodes | Provided in DemultiplexingStats.xml and HTML report. | Provided in Demultiplex_Stats.csv (<output_directory>/Reports) | HTML reports with demultiplexing reports are not produced. |
Unknown Barcodes | Provided in AdapterTrimming.txt, DemultiplexingStats.xml, DemuxSummaryF#L#.txt, HTML report. | Reported in Top_Unknown_Barcodes.csv (default top 1000 per lane). Command Line: --num-unknown-barcodes-reported # (define number of unknown barcodes to be reported in Top_Unknown_Barcodes.csv, any value 0 or above or “all”); this feature introduced in v3.10. | AdapterTrimming.txt is not generated. |
Adapter Trimming Metrics | Provided in AdapterTrimming.txt | Provided in Adapter_Metrics.csv | |
Lane-Specific Processing | Define only the desired lanes in the sample sheet. |
2. Command Line: --bcl-only-lane # (default all lanes in sample sheet). | Sample sheet change and command line option are both required. |
Processing Options | Command Line: --loading-threads --processing-threads --writing-threads | Command Line: --bcl-num-decompression-threads --bcl-conversion-threads --bcl-num-compression-threads --bcl-num-parallel-tiles | Defaults are set dynamically. This option introduced in BCL Convert version 3.7.5 |
No Sample Sheet | Could provide no sample sheet and software would output all FASTQ files to Undetermined. | Command Line:
| All samples will go to Undetermined FASTQ files. Cannot be specified with any of the following options:
This feature introduced in version 4.0. |
Per-sample settings | N/A (all settings are global) | Settings can be either global or per-sample. Per-sample settings cannot also be used as global settings. Per-sample settings are added as columns to the BCLConvert_Data section of the sample sheet and must contain values for each sample when used. Available settings are: OverrideCycles BarcodeMismatchesIndex1 BarcodeMismatchesIndex2 AdapterRead1 AdapterRead2 AdapterBehavior AdapterStringency | This option introduced in BCL Convert v4.1.5. |
For any feedback or questions regarding this article (Illumina Knowledge Article #3710), contact Illumina Technical Support techsupport@illumina.com. |
Last updated