BCL Convert and index UMIs How to demultiplex samples and output a UMI FASTQ

Summary

BCL Convert offers options for processing Unique Molecular Identifiers (UMIs) both in the data and index reads. The UMI sequence will always be added to the header line of the sample FASTQs, but for some applications a separate FASTQ containing only the UMI sequence is required. This article uses an example paired-end run with dual index reads, one read being a UMI, to describe how to configure BCL Convert to both demultiplex the data and generate a separate FASTQ for the UMI read when performing local analysis. Note that BaseSpace Sequence Hub cannot be used for generating the UMI FASTQ as it does not retain FASTQs created from index cycles.

Run setup

In this example run, the data reads are 101 bp, index 1 is a 20 bp UMI, and index 2 is 8 bp. When using a v2 sample sheet the Reads section should contain the matching values for each read:

Sample sheet setup

In order to specify index 1 as a UMI and retain the UMI data, the following options should be entered into the sample sheet:

  • The OverrideCycles option tells BCL Convert the location of the UMI bases and how to handle each read.

  • The CreateFastqForIndexReads,1 option will generate FASTQs for both indexes; here index1 will be the UMI data while index2 will contain the sequences used for demultiplexing. Both index files will be generated with this option.

  • TrimUmi,0 is required to turn off the default trimming of UMI bases from the FASTQ files.

In the Data section of the sample sheet, retain only the column and sequences for the index being used for demultiplexing. In this example, since the first index is the UMI, only the second index is used for demultiplexing:

Notes:

  • It is best practice to not leave an empty index column in the sample sheet (in this case an empty column for the first index). Some versions of BCL Convert will accept an empty column while others will error.

  • Do not use Ns in the place of the index cycles for the UMI read. The OverrideCycles option tells BCL Convert to handle the cycles as UMI.

  • If the UMI is in the second index, adjust the OverrideCycles option and index column usage in the Data section as appropriate.

Output files

Using these options for the example above will generate demultiplexed data with four files; the two data reads as R1 and R2; I1 will contain the UMI sequences; I2 will contain the index sequences used for demultiplexing. If the I2 file is not needed for downstream analysis, it can be deleted. Some third-party pipelines may require renaming the files to R1, R2, and R3; we recommend consulting the documentation for any analysis pipelines requiring the UMI FASTQ for more information.

Note about mixed UMI-index reads

If the UMI sequence is contained as part of an index read that also contains bases for demultiplexing, for example a 20 cycle read containing 12 bases of UMI and 8 bases of index, the instructions in this article can be adapted with some caveats. Here the settings in the sample sheet would be adjusted as follows:

The Data section of the sample sheet would then contain both index columns, with only the bases being used for demultiplexing (again, do not N-pad the index containing the UMI):

In this scenario BCL Convert will produce a FASTQ containing both the UMI and index sequences for the first index (and sequences for the second). There is not an option to configure BCL Convert to output only the UMI cycle section of the read into a FASTQ, so post-processing of the file would be required to trim the output to contain only the UMI bases.

For any feedback or questions regarding this article (Illumina Knowledge Article #7337), contact Illumina Technical Support techsupport@illumina.com.

Last updated

© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html