How to check Infinium samples for possible cross sample contamination in GenomeStudio
Last updated
Last updated
© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html
The Illumina Genome Viewer in GenomeStudio can be used to determine the number of different genotypes present in a sample. This is the basis of Copy Number Variation analysis, but can also be used to detect cross-sample contamination. For diploid organisms, including humans, the expectation is to have three genotypes (AA, AB, BB) for autosomal chromosomes; the frequencies of the B alleles are 0, 0.5, and 1 respectively. If DNA is present from multiple individuals, intermediate B Allele Frequencies (BAF) are observed (eg, AAAB has a BAF of 0.25, and ABBB has a BAF of 0.75). If the intermediate BAF is observed occur across the entire genome, this is likely caused by cross-sample contamination rather than copy number variation.
When Infinium data do not meet call rate specifications, it can be helpful for troubleshooting to determine if the failures are due to cross-sample contamination vs. noisy data (caused by processing issues and/or low quality/degraded input DNA).
In the GenomeStudio project, select Tools > Show Genome Viewer.
Select sample(s) by marking the check box next to the Sample name in Table-Sample-SubColumn.
In the SubColumn section, scroll down and check the boxes for B Allele Freq and Log R Ratio.
Select Add to favorite then OK.
Note: To change the samples selected, under the IGV Data Workspace > Data Plots tab, select the leftmost Add icon (two squares and a + icon) and update Sample or Subcolumn selection.
Select the Update button to display the B Allele Freq and Log R Ratio plots. Select the Chromosome Browser by navigating View > Chromosome Browser. The Chromosome Browser shows the plots of multiple samples simultaneously; whereas, the Genome Viewer displays only one sample at a time.
By default, the Chromosome Browser is zoomed in to display the region of the chromosome indicated by the red box on the schematic of the chromosome below the B Allele Freq and Log R Ratio plots. It is often easier to notice patterns if more data is viewed. Adjust the zoom to view an entire chromosome by selecting the 'Zoom to Chromosome' icon in the tool bar.
Non-contaminated samples have three lines on the B Allele Frequency plot, running at 0, 0.5, and 1 (representing AA, AB, and BB genotypes respectively). The Log R Ratio plot is, ideally, a narrow, straight line running at 0. Contaminated samples have more than the expected three bands in the BAF plot.
BAF: three lines running at 0, 0.5, 1 (Exceptions: X Chromosome in Males, Y Chromosome in Females).
LRR is one line, running in a narrow band centered at y = 0.
BAF: No clear distinct banding pattern of genotype nodes (regular pattern is barely detectable beneath the noise).
LRR: A very thick band with scattered 'waterfall' of data points.
BAF: Extra nodes present (red arrows). The number of chromosomes present is the total number of nodes -1. In this example there are a total of 7 nodes, meaning a total of six chromosomes are present. Since each sample contributes two chromosomes, a total of three samples are present.
LRR: one clean line running at 0.
Note: it is possible for data to be both noisy and contaminated.
Abnormal B Allele Frequency plots can also be indicative of Copy Number Variations (CNV) (Number 3 above). CNV regions can be distinguished from cross-sample contamination by checking whether the B Allele Frequency nodes are present across all chromosomes (cross-sample contamination) or only in certain regions of the genome (possible CNV). True CNV regions are also reflected in the LRR plot as non-zero regions.
For any feedback or questions regarding this article (Illumina Knowledge Article #2130), contact Illumina Technical Support techsupport@illumina.com. |