Analysis FAQ for the TruPath Genome assay

At launch of Illumina TruPath Genome, what variants will be covered?

At launch, DRAGEN Germline analysis of TruPath data will call a variety of variant types that customers are already used to getting from DRAGEN. This includes SNVs, CNVs, SVs and Repeat Expansions (STRs). The proximity-enhanced mapped read technology enabled by the TruPath Genome assay vastly increases the ability to phase variants. With high molecular weight (HMW) DNA extractions, up to ~98% of heterozygous SNVs can be phased.

It also enables high-confidence mapping of highly homologous, repetitive, and segmentally duplicated regions. Multi-region joint detection (MRJD) for 9 paralogous genes will be enabled at launch (listed below) with more to be added in future DRAGEN versions. See the DRAGEN Product Documentationarrow-up-right for more information on MRJD.

Gene / Region

Disease research relevance

PMS2

Lynch Syndrome

SMN1, SMN2

Spinal Muscular Atrophy

NCF1

Chronic Granulomatous Disease

RCCX (CYP21A2, TNXB)

Congenital Adrenal Hyperplasia, Ehlers-Danlos syndrome

STRC

Recessive Nonsyndromic Hearing Loss

CYP2D6

Pharmacogenetics

CYP11B1, CYP11B2

Glucocorticoid-remediable Aldosteronism

CFHR1, CFHR2,

Atypical Hemolytic Uremic Syndrome

CFHR3, CFHR4

USP18

Type I Interferonopathy

Pathogenicity of Short Tandem Repeats (STRs) is often a function of the size of the repetitive regions. In the case of Autosomal Dominant traits, it is crucial to know the actual size of the STRs in each haplotype, rather than the average of the two. Analysis of TruPath Genome data can recover more accurate sizes of STRs in each haplotype.

What is MRJD? Is there more information about it?

Multi-Region Joint Detection (MRJD) is a computational method that detects SNVs in paralogous regions of the genome. The TruPath Genome analysis expands on that by using proximity information to create a haplotype-resolved view of complex paralogous regions. It does this without needing prior knowledge of common population haplotypes. Additional information and a list of genomic regions covered by MRJD can be found in the DRAGEN Product Documentationarrow-up-right.

What is "phasing" and what are "phased variants"?

Phasing is the process of resolving which sequencing reads came from the same allele as other reads. If two reads are "phased" then they came from the same allele. Likewise, "phased variants" are separate variants that are determined to be from the same allele and are therefore part of the same haplotype. The top frame in the image below is an example of two variants that are phased and the bottom frame is an example of those same two variants, but unphased. In both cases, these result in the same genotype. However, the implications of each case can be drastically different. For example, if the dysfunction of Gene A causes a disease phenotype, but only if both copies of the gene are affected, the bottom scenario will present with the disease phenotype, but the top scenario will not as there is still one unaffected copy of Gene A.

What is a haplotype?

A haplotype is the genotype of each copy of each chromosome. In diploid eukaryotes, such as humans, an individual receives one maternal copy and one paternal copy of each chromosome. Each copy the individual inherits, and any variants therein, comprise a haplotype.

What metrics should be focused on to help assess run/assay quality?

Metrics and expected ranges that are useful for customers to review can be found herearrow-up-right.

Will Emedgene (EMG) be able to interpret TruPath DRAGEN output at FCR?

Yes, Emedgene v40 should be able to ingest TruPath DRAGEN 4.5 data. The level of interpretation will depend on the variant type initially and expand later.

Are there any limitations of Emedgene with respect to TruPath Genome data at launch?

Some of the limitations within Emedgene with respect to TruPath data are:

  1. Case creation is only possible from VCF and visualization files.

  2. Complex BND SVs are visible in IGV but aren't annotated in the variant table.

  3. Colocation plot visualization is not available in EMG, but is available in ICA.

These and any additional limitations will be included in the EMG Release Notesarrow-up-right for v100.40.x.

Is TruPath Genome data compatible with Illumina Connected Multiomics (ICM)?

ICM will not be compatible with TruPath Genome data at launch. However, TruPath Genome data can be evaluated in Emedgene.

How big is the data footprint from a TruPath Genome sequencing run and subsequent analysis?

With the current C2 and C8 flowcells available at launch, the size of the run folder is ~220 GB on average, per sample. The BAM files comprise the largest footprint within the analysis output, ranging between 150-200 GB per sample. This is assuming all variant calling capabilities are turned on, including MRJD calling, SV calling, STR calling, and targeted callers.

What is the adapter trimming sequence?

CTGTCTCTTATACACATCT

Do the FASTQ files generated from the TruPath Genome assay look different from standard FASTQ files? How is the proximity information encoded in the input FASTQs?

The TruPath FASTQ files look identical to standard FASTQ files. It is important to note that the key to the technology is the input of HMW DNA and on-flow cell tagmentation, causing DNA templates from the same molecule to land in neighboring nanowells. The analysis pipeline uses the sequence of the reads from the FASTQ file along with the positional information found in the FASTQ header, and applies a proximity model to the data to increase mapping confidence.

Can TruPath FASTQs be used in non-TruPath analysis pipeline? Would it still work as it has all the same attributes compared to normal short-read sequencing FASTQs?

Yes, but no proximity information nor the resulting improvements in variant calling will be put out by the analysis. Once DRAGEN 4.5.4 is released there will also be only one Illumina DRAGEN Germline pipeline to use for any DRAGEN Germline analyses. A --enable-proximity=true flag will be used to inform DRAGEN to run the input data through the TruPath Genome, proximity-aware, analysis workflow.

What happens if a customer selects to “Enable Proximity Analysis” on data not generated with the TruPath Genome workflow? (e.g. Illumina DNA Prep library dataset)

The analysis pipeline will first check the proximity read threshold, and if it is high enough, it will perform the full Illumina DRAGEN TruPath Genome Analysis workflow. This check is done by performing a small preliminary mapping pass on a subset of input reads (~1M or 1 tile worth of data) to determine whether the data is TruPath Genome data or not. If it is, analysis will proceed. If this check fails, then analysis will fail and exit (and it will not perform the DRAGEN Germline analysis, either).

What reference genomes are supported with the DRAGEN TruPath Genome analysis pipeline?

Currently GRCh38 is supported.

What are the analysis options for the TruPath Genome assay and how does it work?

There is a DRAGEN Germline 4.5.2 ICA application that can be launched manually within ICA. Analysis can also be launched automatically, after a run is complete, when the run is planned in BaseSpace Run Planning and set for cloud analysis with the DRAGEN Germline 4.5.2 application. There is also an Illumina TruPath Genome Software v4.5.2 Installer for on-prem DRAGEN servers (v4 hardware and newer) available on the Illumina Supportarrow-up-right site. The TruPath Genome Product Documentationarrow-up-right includes all the additional analysis functionalities as well as new proximity and variant calling metrics. On-instrument local analysis is not supported.

Will each TruPath customer need to set up a private domain/tenant on our cloud environment to run analyses in the cloud?

At a minimum, a cloud subscription is required to pay for compute costs. The TruPath Genome Product Documentationarrow-up-right has information about acquiring a subscription. The customer would need to set up a private domain/tenant in order to run the analyses directly in ICA. Alternatively, one would need access to BaseSpace in order to plan and run cloud analysis via the autolaunch feature.

Can BaseSpace-only users still use the auto-launch option?

Yes, BaseSpace-only users can still use autolaunch. The user will select the DRAGEN Germline application as they have in the past, selecting the newest version (currently 4.5.2) and selecting Yes for the Enable Proximity Analysis option in the Run Planning analysis configuration page. When selecting this pipeline version in BaseSpace Run Planning, there is a message presented to remind users that this analysis mode is intended for use with the TruPath Genome assay. If user selects the option to Enable Proximity Analysis, and they are not using the TruPath Genome assay to generate sequencing data, the analysis will fail. Version 4.5.2 of the DRAGEN Germline pipeline is publicly available in BaseSpace Run Planning.

Will the DRAGEN Germline 4.5.X stand-alone BaseSpace app include TruPath proximity options?

Yes, when released, the DRAGEN Germline 4.5.X standalone BaseSpace app will be like prior versions of the application, but with the addition of a new TruPath proximity analysis option. This will be a batch option to run all the variant callers and TruPath workflow within the BaseSpace app. Note, there will be no visualization capabilities in the BaseSpace app, only for Autolaunch and ICA standalone.

Will DRAGEN 4.5.X on-prem installers be released to the public?

Yes. Illumina TruPath Genome 4.5.2 on-prem DRAGEN installers are available on the Illumina support websitearrow-up-right. TruPath proximity analysis has only been tested, and is only supported, on v4 DRAGEN hardware or newer.

Is there a DRAGEN license specific to TruPath analysis for on-prem DRAGEN servers? If so, how will customers receive and install this license? What about offline/dark sites? What is the cost for on-prem analysis?

Yes, the proximity license is specific to TruPath analysis. The command line that is run for proximity mode (--enable-proximity=true) triggers that license to be used to run the TruPath workflow. For connected on-prem DRAGEN customers, the license will be made available to them for free, assuming they have purchased the TruPath genome assay kit. Illumina will update connected DRAGEN servers automatically via our CRM.

Dark/offline customers will need to contact customer care ([email protected]) to receive new license files that can be used to update the server manually.

What is the estimated cloud compute cost for a TruPath Genome sample? What is the estimated cost of compute and storage? Does the cost of the kit cover data analysis? Is analysis cloud-based only?

The median compute cost, as of publishing this document, is ~12.5 iCredits per sample for cloud analysis. Most analyses will cost between ~11 and ~14 iCredits per sample. This cost is expected to drop as pipeline improvements are made. Cloud storage costs are estimated to be ~7 iCredits /sample / month, for storage of unarchived data.

There are no additional compute nor storage charges from Illumina for TruPath Genome analyses run with an on-prem DRAGEN server.

Will DRAGEN gigabase quotas for on-prem TruPath analyses be tracked separately from other quotas (e.g., Genome)?

Yes, DRAGEN gigabase quotas/usage will be tracked separately from the other Genome license. In addition to the Genome license, the proximity quota will be tracked separately. The cost of the proximity license is included with the purchase of the assay - when customers purchased the assay, they do not need to pay an additional fee for the proximity license.

Where is the data visualized (i.e., how can one view colocation plots)?

DRAGEN reports, ICA visualization, IGV, HiGlass, and homebrew. Additional details can be found in the Data Visualization and Analysis section of the DRAGEN Product Documentationarrow-up-right.

How long does DRAGEN Germline analysis of TruPath Genome take?

The full DRAGEN analysis time, including proximity-aware map/align and all variant callers enabled, ~3 hours per sample.

How many samples are analyzed per node in a TruPath Genome cloud analysis?

By default, 5 samples per node and each sample will be analyzed in series. Within ICA, it is possible to modify this number, up or down, as desired. Decreasing this number requires more FPGA-enabled compute resources per analysis. These resource can have long wait times depending on total system load.

Can the analysis results identify if input was fragmented due to vigorous pipetting or otherwise degraded?

There would be a reduction in Q25 proximity rate, and, in turn, likely lower Q25 proximity coverage as well as a reduction in template length and phasing block metrics. If a similar quality sample has been run in a different lane then it could be compared directly.

Because the algorithm is trying to find DNA on the flow cell nearby, does this mean some structural variants may be missed?

Structural variants are detected by reviewing physical proximity information genome wide; areas of the genome which should not be close together (without a structural variation) should be dark (no signal) on a colocation plot. Therefore, this analysis can be used to detect structural variations purely based on physical location on the flow cell.

What SKUs will be available to customers who wish to run analysis with on-prem DRAGEN server?

There will not be SKUs for on-prem DRAGEN servers as Illumina will not charge for this.

What options will on-prem DRAGEN server customers have with respect to visualizing TruPath output?

Currently, users are advised to use third-party tools, such as IGV and HiGlass. Additional details can be found in the Data Visualization and Analysis section of the DRAGEN Product Documentationarrow-up-right.

Will cloud TruPath Genome customers need to purchase a Professional/Enterprise ICA subscription, or will they be provided with free, Basic ICA tier access?

TruPath Genome customers will receive a free Illumina Connected Software subscription, with purchase of the TruPath assay, which is equivalent to the ICA Basic tier.

The analysis pipeline will only assess links within a sequencing tile. The time and computation costs needed to assess links across tile boundaries currently outweigh the benefits gained from doing so.

What are the expected metrics, like proximity rate?

Up-to-date information on these and other TruPath metrics can be found in the TruPath Genome Product Documentationarrow-up-right.

Is there any TruPath genome demo data available?

There is a demo data set within the DRAGEN 4.5 Entitled Bundlearrow-up-right that includes the DRAGEN Germline 4.5.2 analysis pipeline for TruPath Genome data. Outside of ICA, customers can contact Illumina Technical Support ([email protected]envelope) for further assistance.

For any feedback or questions regarding this article (Illumina Knowledge Article #10170), contact Illumina Technical Support [email protected]envelope.

Last updated

Was this helpful?