Genotyping error simulation

Overview

This option was created in order to evaluate Genotyping error detection software. This allows the user to introduce errors at a certain percentage of genotypes within a marker, by changing the input genotype subject to certain requirements (such as allele frequencies, whether to select a homozygous or heterozygous genotype, etc.). This option can be turned on and off using in the Mega2 input menu.

Parameter selection menu

The mistyping simulation step requires selection of loci at which errors should be introduced, the probability model for introducing errors, and names of the output files which contain lists of changed genotypes, and the percentage of genotypes changed.

The error simulation menu is as follows:

Error model and loci selection menu
0) Done with this menu - please proceed.
 1) Apply error model to selected loci.
 2) Apply error to all except selected loci.
 3) Select error model              Uniform
 4) Change error probability       [0.050]
 5) Mistyping genotypes file name  error_genos.06  [new]
 6) Mistyping summary file name    error_sum.06    [new]

Marker selection
Loci can be selected by two methods: (a) by specifying loci that should have errors, or, (b) specifying loci that should NOT have errors.

Error model selection
Currently input error probabilities can follow any one of three models:

Error simulation output files

Two output files and a log file is created for each run of Mega2 with the mistyping simulation option. The log file is named MEGA2.ERR and behaves like the other log files. It contains details on the options selected by the user via the menu, and a log of each genotype changed in the process.
Two other output files are created which are in table formatted for easy reading, a genotypes file, and a summary file. Here is a part of a genotypes file created with the SimWalk2 error model:


Locus   Pedigree Person Orig1 Orig2  Mis1  Mis2    Error type
D06G025        1     10     6     6      1     1   E3 Homozygote
D06G025        1    460     5     6      6     6              E1
D06G025        1    461     5     6      3     3 E5 Heterozygote
D06G025        1    685     5     5      1     3   E5 Homozygote
D06G025        2     18     4     6      4     4              E1
D06G025        2     25     6     6      5     6              E4
D06G025        2    469     4     6      2     5 E3 Heterozygote
D06G025        3     52     4     6      6     6              E1

And here is the corresponding summary file:

Locus    Genotypes  Errors  Overall_rate Obs_E1 Obs_E2 Obs_E3 Obs_E4 Obs_E5
D06G025       1497      59         0.039  0.020  0.034  0.004  0.059  0.004

The last 5 columns refer to the percentage of errors actually introduced in each error category.