Mega2 requires three matched files as its input. These are the locus, pedigree, and map data files. This trio of files can be supplemented by an omit file, for omitting specific data points from all reformatted output files. It is easiest if you give these files names with the same extension, as then Mega2 will automatically fill in the file names for you when you specify the chromosome number. So, for example, if your files contained information for just chromosome 4 markers, then it is easiest if you name them as follows:
datain.04 pedin.04 map.04 omit.04
If you are creating your files on a Windows or DOS system and then transferring
them to a Unix machine, please remember to convert the DOS end-of-line
characters to Unix end-of-line characters. Mega2 will detect DOS end-of-line
characters and terminate with an error. See the trouble-shooting
section for more details.
Back to the top
Default name:
datain.##, where ## is the number of the chromosome (01, 02,..., 23)
or datain.ex, where ex is the input file extension.
For example, if the chromosome number chosen is 2, then Mega2 looks
for the file datain.02 in the current directory.
The locus data file is in standard LINKAGE-format
with the addition of locus names, which must be specified. The standard
(but not well-known) LINKAGE format for including loci names is to, right
after the number of alleles, put a # sign followed by the marker
name. For example:
:::::::::::::: datain.05 :::::::::::::: 5 0 0 5 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT RATE, HAPLOTYPE FREQUENCIES (IF 1) 1 2 3 4 5 1 2 # TRAIT 0.990000 0.010000 << GENE FREQUENCIES 1 << NO. OF LIABILITY CLASSES 0.0000 1.0000 1.0000 << PENETRANCES 0 2 # Q1 0.990000 0.010000 << GENE FREQUENCIES 1 << NO. OF TRAITS 1.000 10.000 20.000 << GENOTYPE MEANS 1.000 << VARIANCE - COVARIANCE MATRIX 1.000 << MULTIPLIER FOR VARIANCE IN HETEROZYGOTES 3 2 # M1 0.500000 0.500000 << GENE FREQUENCIES 3 2 # M2 0.500000 0.500000 << GENE FREQUENCIES 3 2 # M3 0.500000 0.500000 << GENE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.1 0.1 0.1 0.1 << RECOMBINATION VALUES 1 0.10000 0.45000 << REC VARIED, INCREMENT, FINISHING VALUE
This setup would give the name TRAIT to the
first locus, the name Q1 to the second locus, and the name M1
to the third locus, etc. (You may put no space between the # sign
and the locus name, if desired).
Each codominant
marker name must have an exact match in the corresponding map file; if
a locus name in the locus data file is not found in the map file, then
the user is warned about this. If the user still chooses to proceed, any
marker that was not found in the map file will not appear in any of the
output files (as Mega2 would not know which map position to put it in).
HINT: This feature can be used to easily exclude a marker from all files
produced by Mega2 - simply alter the name of the marker in the map file
so that it no longer matches.
The names file is an alternate file-format for the locus data file with only locus names and types. Mega2 will then read in the pedigree file, recode the pedigree data marker genotypes as numbered alleles. Here is a names file which corresponds to the locus datafile in the preceding section.
:::::::::::::: names.05 :::::::::::::: A TRAIT T Q1 M M1 M M2 M M3
Mega2 version 2.5.3 also allows the user to specify one disease allele frequency and 3 penetrance values for an affection status locus with a single liability class. Affection status loci with more than one liability classes will still be given default penetrance values, which are equi-frequent alleles (p=0.5, q=0.5), and a full penetrance (1/1=0, 1/2=1, 2/2=1). Default values will also be assigned to affection loci that do not have the allele frequency and penetrances specified.
Six locus types are recognized, autosomal numbered(M), x-linked
numbered(X), binary trait with a single liability class(C), binary
trait with multiple liability classes(L), quantitative traits(T) and
covariates(C). The pedigree data file is then processed as follows:
An autosmal or X-linked numbered locus is read in as a
pair of character strings, a binary trait is read
in as a single number which has to be 0, 1 or 2, a trait locus with multiple
liability classes is read in as a <status, liability-class> pair, where
status is 0,1, or 2, and liability-class is any number greater than 0.
Recoding takes place as follows:
Numbered allele names are lexically sorted by their value
and assigned allele numbers. Allele frequencies are computed based
on a user-selected criterion :
For x-linked loci, males have to be reprsented as homozygotes, and the
alleles are counted only once.
Output: The analysis-specific output pedigree and locus files contain
recoded allele numbers. A summary of the recoding process is stored in
the MEGA2.RECODE file
Default name: pedin.##,
where ## is the number of the chromosome (01, 02, ..., 23) or pedin.ex,
where ex is the input file extension. So if the chromosome number
chosen is 2, then Mega2 looks for the file pedin.02 in the
current directory.
The pedigree data file should have either of the following formats,
a) pre-Makeped linkage format
b) the standard (post-Makeped) LINKAGE-format without loops.
Example of a pre-Makeped file with an inbred pedigree [no. 2] (to match the example locus datafile above):
:::::::::::::: pedin.pre.05 :::::::::::::: 1 1 0 0 1 2 21.2 1 2 1 2 1 1 1 2 0 0 2 0 1.3 1 2 1 1 2 2 1 3 0 0 1 0 0.9 2 2 1 2 2 1 1 4 1 2 2 2 19.1 1 2 1 2 2 1 1 5 1 2 11 2 18.3 2 2 1 2 2 1 1 6 0 0 12 0 0.7 2 2 1 2 1 1 1 7 3 4 2 2 20.5 2 2 2 1 1 2 1 8 3 4 2 2 22.1 2 2 2 2 1 1 1 9 3 4 2 0 11.1 2 2 2 2 1 1 1 10 5 6 1 2 19.5 2 2 2 2 1 1 1 11 5 6 1 2 17.9 2 2 1 2 1 2 2 1 0 0 1 0 1.2 1 2 2 2 2 2 2 2 0 0 2 2 19.1 1 1 1 1 2 1 2 3 0 0 1 0 0.8 1 1 1 2 1 2 2 4 1 2 2 2 21.1 1 1 2 1 2 1 2 5 1 2 2 2 20.3 1 1 2 1 2 1 2 6 0 0 1 0 0.7 2 2 1 2 1 2 2 7 3 4 2 2 18.6 1 1 2 1 2 1 2 8 6 5 1 2 17.6 2 1 2 1 2 1 2 9 8 7 2 2 20.2 1 1 2 1 2 1 2 10 8 7 2 2 22.3 1 1 2 1 2 1
The LINKAGE-format is essentially the de facto standard for coding
pedigree information in a machine-readable form. For
a complete description of this format, please see the Handbook of Human
Genetic Linkage (Terwilliger and Ott 1994) and the LINKAGE Users Guide
(at http://linkage.rockefeller.edu/soft/linkage/)
.
pre-Makeped LINKAGE pedigree file consist of columns of integer data. The
pre-Makeped columns are:
Pedigree Person Father Mother Gender Phenotype1 Phenotype2 Phenotype3 ...
where missing parents are entered as 0
(zero), and, for the gender column, a 1 = Male and a 2 = Female
(This is easy to remember if you think of the number of X chromosomes).
Makeped inserts some additional columns of pointers (which would be difficult
to enter by hand) and breaks loops, which is required by the LINKAGE
programs.The columns should be separated by spaces or tabs (any number
of these is allowed).
While the order of the phenotypes is arbitrary, it is common to put the
affection status phenotype first, followed by the marker phenotypes
(which, for codominant markers, are the same as the genotypes).
Phenotype coding:
1) Trait locus: To code a simple affection status locus,
use these codes:
0 = unknown 1 = normal 2 = affected
2) Marker locus: To code a codominant marker
locus phenotype, simply list the two numbered alleles with at least one
space or tab between the alleles. The unknown genotype is coded as 0 0.
Note: Everyone must have either two parents or no parents in
the data set. Thus, to connect relatives
one may need to include people in a pedigree for whom there is at present
no data.
:::::::::::::: pedin.05 :::::::::::::: 1 1 0 0 4 0 0 1 1 2 21.2 1 2 1 2 1 1 1 2 0 0 4 0 0 2 0 0 1.3 1 2 1 1 2 2 1 3 0 0 7 0 0 1 0 0 0.9 2 2 1 2 2 1 1 4 1 2 7 5 5 2 0 2 19.1 1 2 1 2 2 1 1 5 1 2 10 0 0 1 0 2 18.3 2 2 1 2 2 1 1 6 0 0 10 0 0 2 0 0 0.7 2 2 1 2 1 1 1 7 3 4 0 8 8 2 0 2 20.5 2 2 2 1 1 2 1 8 3 4 0 9 9 2 0 2 22.1 2 2 2 2 1 1 1 9 3 4 0 0 0 2 0 0 11.1 2 2 2 2 1 1 1 10 5 6 0 11 11 1 0 2 19.5 2 2 2 2 1 1 1 11 5 6 0 0 0 1 0 2 17.9 2 2 1 2 1 2 2 1 0 0 4 0 0 1 1 0 1.2 1 2 2 2 2 2 2 2 0 0 4 0 0 2 0 2 19.1 1 1 1 1 2 1 2 3 0 0 7 0 0 1 0 0 0.8 1 1 1 2 1 2 2 4 1 2 7 5 5 2 0 2 21.1 1 1 2 1 2 1 2 5 1 2 8 0 0 2 0 2 20.3 1 1 2 1 2 1 2 6 0 0 8 0 0 1 0 0 0.7 2 2 1 2 1 2 2 7 3 4 9 0 0 2 0 2 18.6 1 1 2 1 2 1 2 8 6 5 0 0 0 1 2 2 17.6 2 1 2 1 2 1 2 9 11 7 0 10 10 2 0 2 20.2 1 1 2 1 2 1 2 10 11 7 0 0 0 2 0 2 22.3 1 1 2 1 2 1 2 11 0 0 9 0 0 1 2 2 17.6 2 1 2 1 2 1
The missing value is user-defined: if there are one or more quantitative trait loci in your input files, then Mega2 will ask you what the missing value is. However, you have to use the same missing value for all of the quantitative traits in your file, and (unfortunately) it has to be a real-valued missing value (but need not be zero!).
Unique IDs for each person can be specified in the pedigree file
using the tag "Id:". Non-numeric pedigree and person identifiers can
be indicated using the "Ped:" and "Person:" tags. These three tags
are case insensitive e.g. the ID tag can be any of "ID:", "Id:", "iD:",
or "id:". Unique IDs are allowed within the pre-makeped format pedigree
file, although the "Ped:" and "Per:" tags are not recognized. If
provided, Mega2 simply ignores these fields as long as they are placed
after the phenotype/genotype columns.
For output-formats which can handle arbitrary pedigree and person names,
Mega2 allows the user to select, which pedigree and person id should be
used in the output file. Mendel is one such option. The output pedigree
and person IDs can be selected via the "Output file names" menu
which looks like:
==========================================================
MENDEL file name menu
==========================================================
0) Done with this menu - please proceed
1) Locus file name: locus.05 [new]
2) Pedigree filename: pedm.05 [new]
3) M13 batch file name: m13bat.05 [new]
4) Batch file name: batch.05 [new]
5) M13 batch file name: m13bat.05 [new]
6) Person id in output pedigree file: Individual id
7) Pedigree identifier in output
pedigree file: Pedigree number
Select options 0-7 to enter new file names/options >
==========================================================
Selecting option 6 will display the individual id selection menu:
========================================================== Output person id selection menu: 0) Done with this menu - please proceed. *1) Individual id 2) Create unique id e.g 1_2, 1=ped, 2=ind 3) Renumber consecutively within pedigree Select from options 0 - 3 > ==========================================================
The current choice is indicated with an asterix. Option 2 can be
either selecting unique ids, if these are present as the ":ID" field
in the input pedigree file, or creation of unique ids by Mega2 if
they are absent.
Selecting option 7 will display the pedigree id selection menu:
========================================================== Output pedigree id selection menu: 0) Done with this menu - please proceed *1) Pedigree number 2) Renumbered consecutively Select from options 0 - 2 > ==========================================================
This functionality is available for the following options:
Mendel
SimWalk2
Aspex
Genotyping Summary
SOLAR
Pre-makeped
Merlin-SimWalk2-NPL
(Selection allowed only once, and used in both sets
of files
Merlin
Loki
Non-numeric allele names are allowed inside the pedigree file only with the use of a names file, and only for numbered marker loci. The recoded output pedigrees will have their genotypes altered to numeric alleles. Allele names have to be strings, and may not contain white-space characters, since the pedigree file is read in as a white-space separated column format. See the names file section above for details on recoding.
Mega2 will automatically select an optimal set of loop-breakers if there are loops inside a pre-Makeped pedigree file, and if the ouput analysis type requires loop-less pedigrees. For example, VITESSE and SLINK options require loops to be broken in the pedigree.
Mega2's loop-breaking capabilities have been successfully tested on several pedigrees, including large ones, and complex interbreeding structures. Multiple marriages are also handled by the Mega2 loop-breaking procedure, although, we currently limit the number of marriages at 10 per person. If your pedigree contains more than 10 marriages per individual, then you are advised to use Makeped in order to break the loops.
This allows the user to specify whether selecting a loop-breaker should be limited only to the non-founders in a family. The selected list of loop-breakers is displayed as well recorded in the MEGA2.LOG file.
========================================================== Loop-breaker selection menu: 0) Done with this menu - please proceed. 1) Select only non-founders as loop-breakers [n]. Enter 1 to toggle, 0 to exit >
Mega2 will automatically reconnect the loops of a linkage pedigree file when necessary. For example, this is done when generating output files in MENDEL, SAGE, and SOLAR formats. The reconnection will, unfortunately, result in a renumbering of person ids.
If the input pedigree was in pre-Makeped format, then the pedigrees remain intact for these options.
Mega2 displays each pair of pedigree records that were re-connected, as well as logging them in the MEGA2.KEYS file.
The map file gives the (relative) map position of each marker in centiMorgans (cM). If two markers fall at exactly same position, then Mega2 will assume that the marker listed first should come first, and will automatically add a small increment (of 0.0001 cM) to the position of the second marker.
Note that Mega2 can now make the distinction between Haldane and Kosambi map distances by looking at the first line of the map file. If the second column heading contains "Kosambi", then the distances are read in as Kosambi centimorgans.
An additional 4th column can be added to the map file specifying mistyping probabilities for each marker. These values are utilized within the Genotyping error simulation option. This column should have the heading "error" (case-insensitive).
Some of the analysis options like SLINK, Vitesse etc. use recombination fractions. In this case, the appropriate mapping function is used to convert the inter-marker distances into recombination fractions. In Aspex, e.g., if the user chooses to output Kosambi map distances, using Haldane distances in the map file, the map will first be converted to thetas which will then be converted to Kosambi map distances.
Example:
:::::::::::::: map.05 :::::::::::::: CHROMOSOME KOSAMBI NAME 5 0.0 M1 5 5.0 M3 5 8.0 M2
NOTE: Any marker that is in the locus file must be given a map position in the map file. Thus, the marker names used in the map file must match exactly the names used in the locus file. If a codominant marker locus in the locus file is not found in the map file, then MEGA2 will warn you about this. If you ignore the warning, then this locus will not appear in the output files created by MEGA2. You may have more loci in the map file than appear in the locus file. While you will be warned about this, it does not pose any difficulties.
Default name: map.##, where ## is the number of the chromosome (01, 02, ..., 23) or map.ex, where ex is the input file extension. So if the chromosome number chosen is 2, then Mega2 looks for the file map.02 in the current directory.
Hint: See the section on map making utilities for help on creating map files.
:::::::::::::: map.05 :::::::::::::: CHROMOSOME KOSAMBI NAME MALE FEMALE 5 0.0 M1 0.0 0.0 5 5.0 M3 2.0 7.0 5 8.0 M2 4.0 12.0
The optional omit file permits one to easily delete certain marker genotypes from all Mega2-generated output files. This is useful if certain marker genotypes are Mendelian- inconsistent, yet one wants to preserve the original marker data in the input file. Marker genotypes can be omitted for a whole family at once or for one specific individual.
The omit file should be in the following format:
Each line should have two integers and a string, separated by white space. The
first number is the pedigree number and must match that used in your input
LINKAGE-format file. The second number is the person number and must match
that used in your input LINKAGE-format file. The string should be either
All or the name of the locus. If All is used, the person or
pedigree indicated will be untyped at all the loci. If the person number
is zero, then all marker genotypes will be set to unknown for the entire
pedigree. Otherwise only the indicated person will be untyped.
A summary of the omit results will be found in the file omit.log. This file is rewritten the next time MEGA2 is run with an omit file specified. If MEGA2 can not find a person or pedigree as specified in the input omit file, it will halt with an error message.
Example:
:::::::::::::: omit.05 :::::::::::::: 1 0 All 2 10 M2 2 0 M1
This file generates the following log file:
:::::::::::::: omit.log :::::::::::::: Marker untyped everyone in pedigree 1 Marker untyped pedigree 2 person 10 at locus M2 Marker untyped everyone in pedigree 2 at locus M1
The omit file can now be used to set trait phenotypes to unknown. Here is such an example:
1 11 AFF2 1 11 QUANT1
These two lines direct Mega2 to untype person 11 of pedigree 1 at affection locus AFF2 and qtl QUANT1. The affection status will be set to unknown (0), and the quantitative phenotype will be set to the appropriate missing value in the output. These actions are logged as well.
Please note that when the marker column contains the keyword "All", it still referes to only marker loci, trait loci are left untouched.
Default name: omit.##, where ## is the number of the chromosome (01, 02, ..., 23) or omit.ex, where ex is the input file extension. So if the chromosome number chosen is 2, then Mega2 looks for the file omit.02 in the current directory.NOTE: Since a person who breaks a loop is indicated twice in a post-Makeped pedigree file, if you want to untype a loopbreaker, you must explicitly untype both occurrences of this loopbreaker person. Mega2 checks to see if loop-breakers have the same genotypes, and will flag an error otherwise.
Hint: See the section on creating omit files based on errors found by running the pedigree checking program Pedcheck.