How to properly format and use a custom reference genome in MiSeq Reporter
MiSeq instruments with MiSeq Control Software (MCS 2.6) or earlier include several pre-installed reference genomes that are used during the alignment and variant calling steps of many MiSeq Reporter analysis workflows, which can also utilize user-added custom reference genomes. A custom reference genome file must be properly formatted and added to the MiSeq Reporter and Illumina Experiment Manager (IEM) Genome repositories so that it can be used for analysis in MiSeq Reporter. Custom reference genomes cannot be used for analysis in BaseSpace.
- The file must be in FASTA format using the *.fa extension, not the *.fasta extension.
- Sequence identifiers are limited to 24 characters and can only contain letters, numbers, hyphens, and underscores. All other characters, including blank spaces, are not allowed in the sequence identifiers.
- Examples of good and bad sequence identifiers:
- For the sequences, each line in the file is limited to 80 characters in length. The sequences must contain only the characters A, T, C and G and may use only upper-case characters. Ns, blank spaces, and lower case characters are not allowed.
- Each line in the file must end with a line feed character (LF), not a carriage return (CR), nor both a CR and LF. Some source code editors such as Notepad++ include an option to view all characters, which displays invisible characters such as end of line characters. In Notepad++, to change all end of line characters to LF:
- Go to Edit > EOL Conversion > Unix Format.
Add the reference file to the MiSeq Reporter Genome repository:
- 1.On the MiSeq PC, navigate to the Genome repository. The default path is C:\Illumina\MiSeq Reporter\Genomes.
- 2.In the Genomes folder (repository), create a new folder and give the new folder a genome name. Avoid using any spaces in the name.
- 3.Place the custom reference genome FASTA file inside the new folder.
Add the reference file to the Illumina Experiment Manager (IEM) Genome repository:
- 1.Open IEM and select the Folder Settings button. Take note of the Genome Repository path.
- 2.Navigate to the Genomes folder shown in the Genome Repository path.
- Note: The default Genomes directory is often hidden by default. Either copy and paste the path from the folder settings to a new Computer window or configure Windows to view hidden files in order to follow the full path. To view hidden files, open the Windows Control Panel, select Folder Options, and click on the View tab. Select Show hidden files, folders and drives and uncheck the Hide extensions for known file types setting.
- 3.In the Genomes folder, create a new folder and give the folder a genome name (do not use spaces in the name), then place your custom reference genome FASTA file inside the new folder.
- 4.Close and re-open IEM.