This option was created in order to evaluate Genotyping error detection software. This allows the user to introduce errors at a certain percentage of genotypes within a marker, by changing the input genotype subject to certain requirements (such as allele frequencies, whether to select a homozygous or heterozygous genotype, etc.). This option can be turned on and off using in the Mega2 input menu.
The mistyping simulation step requires selection of loci at which errors
should be introduced, the probability model for introducing errors, and
names of the output files which contain lists of changed genotypes, and
the percentage of genotypes changed.
The error simulation menu is as follows:
Error model and loci selection menu 0) Done with this menu - please proceed. 1) Apply error model to selected loci. 2) Apply error to all except selected loci. 3) Select error model Uniform 4) Change error probability [0.050] 5) Mistyping genotypes file name error_genos.06 [new] 6) Mistyping summary file name error_sum.06 [new]
Marker selection
Loci can be selected by two methods: (a) by specifying loci that should have
errors, or, (b) specifying loci that should NOT have errors.
Error model selection
Currently input error probabilities can follow any one of three models:
A uniform rate means that alternate genotypes are selected with a uniform probability). The value of this probability is set to 0.05 by default, and can be changed via the menu.
These error rates have to be specified inside the map file as a 4th column under the heading "error". These rates cannot be changed once Mega2 has started running.
This involves the specification of 5 separate
error probabilities, which can be changed using the menu. For an explanation
of these probability values, refer to the paper:
Detection and Integration of Genotyping Errors in Statistical Genetics
by Sobel et al. in AJHG, Vol 70: pages 496-508.
Two output files and a log file is created for
each run of Mega2 with the mistyping simulation option. The log file is named
MEGA2.ERR and behaves like the other log files. It contains details on
the options selected by the user via the menu, and a log of each genotype
changed in the process.
Two other output files are created which are in table formatted for
easy reading, a genotypes file, and a summary file. Here is a part of a
genotypes file created with the SimWalk2 error model:
Locus Pedigree Person Orig1 Orig2 Mis1 Mis2 Error type D06G025 1 10 6 6 1 1 E3 Homozygote D06G025 1 460 5 6 6 6 E1 D06G025 1 461 5 6 3 3 E5 Heterozygote D06G025 1 685 5 5 1 3 E5 Homozygote D06G025 2 18 4 6 4 4 E1 D06G025 2 25 6 6 5 6 E4 D06G025 2 469 4 6 2 5 E3 Heterozygote D06G025 3 52 4 6 6 6 E1And here is the corresponding summary file:
Locus Genotypes Errors Overall_rate Obs_E1 Obs_E2 Obs_E3 Obs_E4 Obs_E5 D06G025 1497 59 0.039 0.020 0.034 0.004 0.059 0.004The last 5 columns refer to the percentage of errors actually introduced in each error category.