Links

Upgrading from bcl2fastq to BCL Convert

The Illumina BCL Convert software is a standalone local Linux application that converts the binary base call (BCL) files produced by Illumina sequencing systems to FASTQ files. Based on software derived from the Illumina DRAGEN Bio-IT platform, BCL Convert offers improvements to the speed and efficiency of handling large data sets compared to the older bcl2fastq software. While both bcl2fastq and BCL Convert are currently supported, BCL Convert is planned to replace bcl2fastq in the future. The BCL Convert compatibility support page provides a broad comparison between the two programs. This article provides a detailed comparison of usage and feature changes between the latest release of bcl2fastq (v2.20) and BCL Convert.
Feature
bcl2fastq 2.20
BCL Convert
Changes from bcl2fastq
File name
<Sample_Name>_S#_L00#_<R or I>#_00#.fastq.gz OR <Sample_ID>_S#_L00#_R#_001.fastq.gz (if Sample_Name is not present)
<Sample_ID>_S#_L00#_<R# or I#>_001.fastq.gz
Always include Sample ID as part of file output naming convention. Sample name is ignored in BCL Convert.
FASTQ Header
@Instrument:RunID:FlowCellID:Lane:Tile:X:Y[:UMI] ReadNum:FilterFlag:0:IndexSequence or SampleNumber
@Instrument:RunID:FlowCellID:Lane:Tile:X:Y[:UMI] ReadNum:N:0:IndexSequence or SampleNumber
Filter flag is set to = “N” and the control bit = “0”. Missing instrument is not supported.
Determine Expected Input Files
Expects Config.xml in defined input folder: <input folder>/Data/Intensities/BaseCalls If no Config.xml, expects RunInfo.xml in defined input folder.
Expects Config.xml in defined input folder: <input folder>/Data/Intensities/BaseCalls If no Config.xml, expects RunInfo.xml in defined input folder.
No change.
Show command line options and help information
--help or -h --version or -v
--help or -h --version or -V
Command requires upper case letter “V” to get version info in the command line.
Run Folder
Command Line: --runfolder-dir or -R
Command Line: --bcl-input-directory
Input specified is the same (top level of run folder is the input), only the command line option has changed.
Input Folder
Command Line: --input-dir or -i
None
Cannot specify path to BaseCalls folder specifically.
Output Folder
Command Line: --output-dir or -o (default input folder)
Command Line: --output-directory (required, default cannot exist) Command Line: --force, -f (allows output to be written to existing folder)
Output directory is required. Specify directory for new folder, otherwise, use --force to use an existing folder.
Sample Sheet Format
V1 format only.
V1 and V2 formats both accepted.
See software guide for examples of V1 and V2 formatting changes.
Sample Sheet Path
Command Line: --sample-sheet (default input folder, sample sheet not required)
Command Line: --sample-sheet (default input folder, sample sheet required)
Sample Sheet is now required. The software will default to search for SampleSheet.csv in input run folder. The --sample-sheet option is used to specify the path to the file if it is not in the default location. Note: At least one Sample_ID is required in the Data section of the Sample Sheet.
Ignore missing base call (BCL) files
Command Line: --ignore-missing-bcls (default off)
Command Line: --strict-mode false (default false)
Missing or corrupt BCL is ignored and the corresponding base call is replaced with an N with a quality score of 2 (#).
Ignore missing or corrupt Filter files
Command Line: --ignore-missing-filter (default off) Assumes PF for all tiles with missing filter files.
Command Line: --strict-mode false (default false) Note behavior change, no FASTQ entries for reads in tiles with missing filter files
BCL Convert does not produce FASTQ entries for any reads where the filter file is missing.
Ignore missing or corrupt Position files
Command Line: --ignore-missing-positions (default off)
Command Line: --strict-mode false (default false)
FASTQ file header will contain automatically generated unique XY positions when position files are missing.
Assume that failed reads are PF
Command Line: --with-failed-reads (default off)
None
No longer supported.
Ignore beginning or end of read
Sample Sheet: Read1StartFromCycle # (default 1) Read2StartFromCycle # (default 1) Read1EndWithCycle # (default last cycle) Read2EndWithCycle # (default last cycle) Command Line: --use-bases-mask Y#;N# (default all cycles used)
Sample Sheet: OverrideCycles,Y#;N# (default all cycles used)
OverrideCycles can only be applied to the entire analysis; there is no per-lane option for OverrideCycles. Cycles cannot be ignored from the middle of a read.
Use a subset of bases for i7/i5 indexes
Sample Sheet: Use subset of index cycles for demultiplexing by providing shortened sequence in index or index2 column within a lane. Command Line: --use-bases-mask I#N# (default use all index cycles defined in RunInfo.xml).
Sample Sheet: Use subset of index cycles for demultiplexing by providing shortened sequence in index or index2 column and providing desired length in OverrideCycles setting (default use all index cycles defined in RunInfo.xml)
The number of cycles defined per read in OverrideCycles must always match the number of cycles in the corresponding read of the RunInfo.xml.
Wildcard Index Sequences
None
None
Wildcard entries (N) for indexes are not supported.
Index FASTQs
Sample Sheet: CreateFastqForIndexReads 0 or 1 (default 0) Command Line: --create-fastq-for-index-reads
Sample Sheet: CreateFastqForIndexReads, 0 or 1 (default 0)
Generating FASTQs for index reads is off by default, add the sample sheet setting with a value of 1 to enable. When an index read is specified as a UMI with OverrideCycles, the UMI read will be output to a FASTQ file. This feature introduced in BCL Convert version 3.7.5
FASTQ Compression Specification
Command Line: --no-bgzf-compression --fastq-compression-level
Command Line: --fastq-compression-level (default 1, can specify 0-9)
Fastq files can be gzipped at a compression level of “0-9”. Multiple gzip compression regions are appended to the same file with large block sizes. Some tools could have trouble fully decompressing these files if they do not continue past the first gzip region. This feature introduced in version 4.0.
Barcode Mismatches
Command Line: --barcode-mismatches # or #,# (default 1 == 1,1)
Sample Sheet: BarcodeMismatchIndex1,# (default 1) BarcodeMismatchIndex2,# (default 1) Note: Command Line no longer supported.
Index # must be specified separately.
Barcode Collision Checks
For a dual index run, requires both index1 and index2 to have a collision in order to error out based on a barcode collision.
Depending on version, index collision checks are relaxed, strict, or configurable:
  • 3.9.x: Relaxed by default, no option to change. A collision in one index can be rescued by having enough diversity in the combined dual index sequence.
  • v3.10 and 4.0: Strict by default, no option to change. For a dual index run, requires only 1 index (index1 or index2) to have a collision in order to error out based on a barcode collision. Identical indexes within index1 or index2 (combinatorial dual indexes) still supported.
  • 4.1.5: Strict by default but configurable to relaxed mode using sample sheet option:
CombinedIndexCollisionCheck,value Where value corresponds to the number of the lane or lanes to use this behavior; multiple lane values should be semicolon-separated.
  • 4.1.7 and 4.2.x: Relaxed mode by default. CombinedIndexCollisionCheck removed. Mode now configured by using sample sheet option:
IndependentIndexCollisionCheck,value to allow optional strict checking, where value corresponds to the number of the lane or lanes to use this behavior; multiple lane values should be semicolon-separated.
Evaluation of hamming distance for barcode collision differs between version. "Relaxed" mode matches default bcl2fastq behavior.
Adapter Read 1, 2 Trimming
Sample Sheet: Adapter/TrimAdapter, A/T/C/G AdapterRead2/TrimAdapterRead2,A/T/C/G
Sample Sheet: AdapterRead1,A/T/C/G AdapterRead2,A/T/C/G AdapterBehavior,trim (default trim)
Read 1 and Read 2 adapters must be specified separately.
Adapter Read 1, 2 Masking
Sample Sheet: MaskAdapter A/T/C/G MaskAdapterRead2 A/T/C/G
Sample Sheet: AdapterRead1,A/T/C/G AdapterRead2,A/T/C/G AdapterBehavior,mask (default trim)
Read 1 and Read 2 adapters must be specified separately.
Adapter Stringency
Command Line: --adapter-stringency # (default 0.9, 0.0-1.0 allowed)
Sample Sheet: AdapterStringency,# (default 0.9, 0.5-1.0 allowed) Note: Command Line no longer supported.
The range is now 0.5-1.0 vs 0.0-1.0.
Adapter Matching Algorithm
Sample Sheet in [Settings] section: FindAdaptersWithIndels 1 (default on, 0 = Sliding Window)
Sample Sheet: FindAdaptersWithIndels,true (default off, false = Sliding Window)
Finding adapter with indels introduced in version 4.0.
Trimming last bases when they match the adapter
Always trims or masks the final X bases when they overlap with the adapter provided according to stringency settings.
Sample Sheet: MinimumAdapterOverlap,# (default 1, 1-3 allowed) Never trims or masks less than X bases when they overlap with the adapter provided regardless of stringency settings, where X is the MinimumAdapterOverlap provided.
Default behavior is identical.
Minimum Read Length
Command Line: --minimum-trimmed-read-length # (default 35)
Sample Sheet: MinimumTrimmedReadLength,# (default 35) Note: Command Line no longer supported.
Part of sample sheet.
Minimum Number of ATCG Bases per Read
Command Line: --mask-short-adapter-reads # (default 22)
Sample Sheet: MaskShortReads,# (default 22) Note: Command Line no longer supported.
Part of sample sheet.
UMI Settings
Sample Sheet: Trim UMI 0,1 (default 0) Read1UMIStartFromCycle # (default 1) Read2UMIStartFromCycle # (default 1) Read1UMILength # Read2UMILength #
Sample Sheet: OverrideCycles,U# TrimUMI,0 or 1 (default 1)
UMIs can now be defined in index or genomic reads. Default is to trim UMIs. TrimUMI option introduced in BCL Convert version 3.7.5
Use Subset of Tiles for Processing
Command Line: --tiles (provide list of tiles to include)
Command Line: --tiles (provide list of tiles to include)
--tiles introduced in BCL Convert version 3.9. All terms specified must also exist in the RunInfo.xml, however if specified in a range, BCL Convert will only analyze tiles that exist in the RunInfo.xml.
Exclude Tiles from processing
Sample Sheet: ExcludeTiles #: Provide list or range of tiles to exclude from processing (default no tiles) ExcludeTilesLaneX: Provide list or range of tiles in lane X to exclude from processing (default no tiles)
Command Line: --first-tile-only true (default false) Or --exclude-tiles (provide list of tiles to exclude
For testing purposes, BCL Convert can run with the first tile only if --first-tile-only Option not compatible with NovaSeq SP flowcells. --exclude-tiles introduced in BCL Convert version 3.9 All terms specified for --exclude-tiles must match the set produced by the list defined by --tiles or the RunInfo.xml otherwise. When specified with --tiles, all tiles specified for --exclude-tiles must exist in those whitelisted by --tiles
Logging
Console output
Console output Warnings/Errors/Information log files in <output_directory>/Logs. FastqComplete.txt into <output_directory>/Logs after all FASTQs are created.
New support for logging files. Less verbose logging. New output file: fastqcomplete.txt is generated in log folder.
Association of Samples and output FASTQ files
None
Goes into fastq_list.csv in <output_directory>/Reports
New report of samples and output FASTQ file association is now generated.
Combine multiple FASTQ files
Command Line: --no-lane-splitting (default off)
Command Line: --no-lane-splitting (default off) Sample sheet: NoLaneSplitting,true or false (default false)
Concatenation of FASTQ files separated by lane can be done by enabling this setting. FASTQs will be output with the naming convention <Sample_ID>_S#_<R or I>#_001.fastq.gz (no L00# included). Reports will be generated with values separated by lane. Command line option introduced in BCL Convert version 3.7.5. Sample sheet setting introduced in BCL Convert 3.8. Command line and sample sheet settings must be consistent.
Reverse Complement all reads
Sample Sheet: ReverseComplement 1 (default 0)
None
Impacts Nextera Mate Pair kits, which are not supported by BCL Convert.
Sample Project
Sample Sheet: Creates directory with sample project name. Can use multiple samples in same project. Cannot use “all” or “default” as project name.
Command Line: --bcl-sampleproject-subdirectories true (default false) Sample Sheet: Sample_Project column in Data section.
By default, all FASTQ files will be placed into the same output directory regardless of Sample Sheet columns. Command Line must be set in order to generate subdirectories.
Sample Name
Sample Sheet: Used for FASTQ name. Cannot use “all” or “undetermined” as name.
Command Line: --sample-name-column-enabled true (default false) Sample Sheet: Sample_Name column in Data section
By default, all FASTQ files will be named according to the Sample_ID column unless Sample_Name is enabled. Command Line must be set in order to enable. Must be specified for every sample in the sample sheet when enabled Reports will include Sample_Name when enabled on the command line. When specified with Sample_Project, FASTQ files will be output to subdirectory name by Sample_ID. This feature introduced in version 4.0.
IndexMetricsOut.bin Output Location
Command Line: --interop-dir (default <runfolder-dir>/InterOp)
Always output to <output_directory>/Reports.
User cannot configure IndexMetricsOut.bin output location.
Number of perfect barcodes, 1 mismatch barcodes
Provided in DemultiplexingStats.xml and HTML report.
Provided in Demultiplex_Stats.csv (<output_directory>/Reports)
HTML reports with demultiplexing reports are not produced.
Unknown Barcodes
Provided in AdapterTrimming.txt, DemultiplexingStats.xml, DemuxSummaryF#L#.txt, HTML report.
Reported in Top_Unknown_Barcodes.csv (default top 1000 per lane). Command Line: --num-unknown-barcodes-reported # (define number of unknown barcodes to be reported in Top_Unknown_Barcodes.csv, any value 0 or above or “all”); this feature introduced in v3.10.
AdapterTrimming.txt is not generated.
Adapter Trimming Metrics
Provided in AdapterTrimming.txt
Provided in Adapter_Metrics.csv
Lane-Specific Processing
Define only the desired lanes in the sample sheet.
  1. 1.
    Define only the desired lanes in the sample sheet.
2. Command Line: --bcl-only-lane # (default all lanes in sample sheet).
Sample sheet change and command line option are both required.
Processing Options
Command Line: --loading-threads --processing-threads --writing-threads
Command Line: --bcl-num-decompression-threads --bcl-conversion-threads --bcl-num-compression-threads --bcl-num-parallel-tiles
Defaults are set dynamically. This option introduced in BCL Convert version 3.7.5
No Sample Sheet
Could provide no sample sheet and software would output all FASTQ files to Undetermined.
Command Line: --no-sample-sheet (default disabled, sample sheet required)
All samples will go to Undetermined FASTQ files. Cannot be specified with any of the following options:
  • bcl-sampleproject-subdirectories
  • sample-name-column-enabled
  • bcl-only-matched-reads
  • num-unknown-barcodes-reported
  • bcl-validate-sample-sheet-only
  • sample-sheet
This feature introduced in version 4.0.
Per-sample settings
N/A (all settings are global)
Settings can be either global or per-sample. Per-sample settings cannot also be used as global settings. Per-sample settings are added as columns to the BCLConvert_Data section of the sample sheet and must contain values for each sample when used. Available settings are: OverrideCycles BarcodeMismatchesIndex1 BarcodeMismatchesIndex2 AdapterRead1 AdapterRead2 AdapterBehavior AdapterStringency
This option introduced in BCL Convert v4.1.5.
For any feedback or questions regarding this article (Illumina Knowledge Article #3710), contact Illumina Technical Support [email protected].
Last modified 23d ago
© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html