Confirming identities of replicate & matched samples with SNP targeted assays in methylation arrays
Last updated
Last updated
© 2023 Illumina, Inc. All rights reserved. All trademarks are the property of Illumina, Inc. or their respective owners. Trademark information: illumina.com/company/legal.html. Privacy policy: illumina.com/company/legal/privacy.html
Methylation experiments often include samples derived from different tissues of the same individual (matched samples). They may also include replicate samples to confirm the quality of the data. Occasionally customers will need to confirm sample mapping, so the ability to confirm the genotypic identity of replicate and matched samples is important. The MethylationEPIC beadchip has 59 and the Methylation450 beadchip has 65 assays targeting RS-SNPs instead of CpGs, to be used to confirm the identity of replicate and matched samples. This identification is based on the fact that these SNP-based assays will behave similarly to genotyping assays; that is, samples with AA, AB, and BB genotypes will yield beta values that fall around 0.0, 0.5, and 1.0 respectively. Thus, samples derived from the same individual will display the same beta values across the spectrum of SNP-based assays, while samples derived from different persons will show scattered results.
The following two methods assume a working knowledge of GenomeStudio, and the ability to generate scatter plots, run cluster analysis to generate dendrograms, and (optionally) use the filter tool to select a subset of the entries in a table. The first method uses scatter plots based only on the RS-SNP assays on the MethylationEPIC chip (method also works with Methylation450 data) to compare two samples at a time, and confirm whether they are derived from the same person. The second method generates a dendrogram based on the RS-SNP assays, in order to highlight samples with extremely high correlation, which are likely to be derived from the same individuals.
Method 1 - scatter plot to determine if 2 selected samples are derived from the same individual
In the Sample Methylation Profile of the GenomeStudio MethylationEPIC project, scroll to the bottom of the table, and select all the target ID’s that start with “rs”.
Right-click anywhere in the sample methylation profile table, and in the pop-up menu, select “Show only selected rows”. The table should now display only 59 lines, all of which have target ID’s starting with “rs”.
Alternatively, open the filter tool and set the filter such that “TargetID has rs”.
In the “columns” list, select one of the two samples in question. Leave the sub column set to “AVG_Beta”. Select the “Y Axis” button.
In the columns list, select the second of the two samples in question. Keeping the sub column set to “AVG_Beta again, select the “X Axis” button.
Select “OK” to generate the scatter plot.
If the two samples are derived from the same individual, there will be three clusters of data points, one in the lower left quadrant of the plot, one in the center, and one in the upper right.
If the two samples are not derived from the same individual, there will be nine different clusters of datapoints--3 clusters each across the top, middle, and bottom of the plot.
Method 2 - using cluster analysis to identify samples in the list that are likely derived from the same individuals1. Follow Steps 1 and 2 of Method 1.
Select the tab “Sample Methylation Profile” to make sure it is the selected table. It should be highlighted in light blue.
At the top of the GenomeStudio Window, select the icon for “Run Cluster Analysis”. It should appear as a miniature dendrogram. A dialog box titled “Cluster Analysis: Sample Methylation Profile” will appear.
Select the “Select all” button at the bottom of the window, if desired, to include all samples in the project in the analysis. Alternatively, to assess only certain samples, select the samples to include in the analysis in the list under the “Groups” heading.
In the “Cluster” section, select the option for “Samples”. Under the “Metric”, select the type of correlation to use in the analysis.
Select “Create Dendrogram”.
Samples that are from the same individual should display extremely high correlation, on the order of >98%.
For any feedback or questions regarding this article (Illumina Knowledge Article #3719), contact Illumina Technical Support techsupport@illumina.com.
Select the Scatter Plot icon in the row of icons above the sample methylation profile table. A dialog box titled “Scatter Plot” will appear.
Note: this tool is not intended to display familial relationships, and Illumina does not suggest that using this tool for anything other than to identify samples derived from the same individuals. Replicates/matched samples are connected by connecting bars to the extreme far right.