Contents
  1. Locus datafile
    - Using a names file instead. [Updated]
    - Names files codes have changed, please be sure to check the names file section below.
  2. Pedigree datafile
    - Using unique person IDs
    [Updated]
    - Using non-numeric allele names [New]

    - Defining missing values

    - Handling of loops

         -- Loop breaking
         -- Loop reconnection
  3. Map datafile
    - Specifying sex-specific maps
    [Updated]
  4. Omit datafile [Optional]
  5. II- Input files and their formats

    Mega2 requires three matched files as its input. These are the locus, pedigree, and map data files. This trio of files can be supplemented by an omit file, for omitting specific data points from all reformatted output files. It is easiest if you give these files names with the same extension, as then Mega2 will automatically fill in the file names for you when you specify the chromosome number. So, for example, if your files contained information for just chromosome 4 markers, then it is easiest if you name them as follows:

      datain.04
      pedin.04
      map.04
      omit.04
     

    WINDOWS/DOS USERS WARNING:

    If you are creating your files on a Windows or DOS system and then transferring them to a Unix machine, please remember to convert the DOS end-of-line characters to Unix end-of-line characters. Mega2 will detect DOS end-of-line characters and terminate with an error. See the trouble-shooting section for more details.
    Back to the top

    a. Locus datafile:

    Default name: datain.##, where ## is the number of the chromosome (01, 02,..., 23) or datain.ex, where ex is the input file extension. For example, if the chromosome number chosen is 2, then Mega2 looks for the file datain.02 in the current directory.
    The locus data file is in standard LINKAGE-format with the addition of locus names, which must be specified. The standard (but not well-known) LINKAGE format for including loci names is to, right after the number of alleles, put a # sign followed by the marker name. For example:

    
    ::::::::::::::
    datain.05
    ::::::::::::::
    5 0 0 5  << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM
    0 0.0 0.0 0  << MUT LOCUS, MUT RATE, HAPLOTYPE FREQUENCIES (IF 1)
    1       2 3 4 5
    1   2   # TRAIT
     0.990000 0.010000   << GENE FREQUENCIES
    1 << NO. OF LIABILITY CLASSES
    0.0000 1.0000 1.0000 << PENETRANCES
    0       2  # Q1
     0.990000 0.010000   << GENE FREQUENCIES
    1 << NO. OF TRAITS
    1.000   10.000 20.000 << GENOTYPE MEANS
    1.000   << VARIANCE - COVARIANCE MATRIX
    1.000   << MULTIPLIER FOR VARIANCE IN HETEROZYGOTES
    3   2   # M1
     0.500000 0.500000   << GENE FREQUENCIES
    3   2   # M2
     0.500000 0.500000   << GENE FREQUENCIES
    3   2   # M3
     0.500000 0.500000   << GENE FREQUENCIES
    0 0  << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2)
    0.1     0.1 0.1 0.1  << RECOMBINATION VALUES
    1 0.10000 0.45000 << REC VARIED, INCREMENT, FINISHING VALUE 
    

    This setup would give the name TRAIT to the first locus, the name Q1 to the second locus, and the name M1 to the third locus, etc. (You may put no space between the # sign and the locus name, if desired).
    Each codominant marker name must have an exact match in the corresponding map file; if a locus name in the locus data file is not found in the map file, then the user is warned about this. If the user still chooses to proceed, any marker that was not found in the map file will not appear in any of the output files (as Mega2 would not know which map position to put it in). HINT: This feature can be used to easily exclude a marker from all files produced by Mega2 - simply alter the name of the marker in the map file so that it no longer matches.

    Using a Names file as a Locus datafile:

    The names file is an alternate file-format for the locus data file with only locus names and types. Mega2 will then read in the pedigree file, recode the pedigree data marker genotypes as numbered alleles. Here is a names file which corresponds to the locus datafile in the preceding section.

    
    ::::::::::::::
    names.05
    ::::::::::::::
    A TRAIT
    T Q1
    M M1
    M M2
    M M3
    

    Mega2 version 2.5.3 also allows the user to specify one disease allele frequency and 3 penetrance values for an affection status locus with a single liability class. Affection status loci with more than one liability classes will still be given default penetrance values, which are equi-frequent alleles (p=0.5, q=0.5), and a full penetrance (1/1=0, 1/2=1, 2/2=1). Default values will also be assigned to affection loci that do not have the allele frequency and penetrances specified.

    Six locus types are recognized, autosomal numbered(M), x-linked numbered(X), binary trait with a single liability class(C), binary trait with multiple liability classes(L), quantitative traits(T) and covariates(C). The pedigree data file is then processed as follows:
    An autosmal or X-linked numbered locus is read in as a pair of character strings, a binary trait is read in as a single number which has to be 0, 1 or 2, a trait locus with multiple liability classes is read in as a <status, liability-class> pair, where status is 0,1, or 2, and liability-class is any number greater than 0.
    Recoding takes place as follows:
    Numbered allele names are lexically sorted by their value and assigned allele numbers. Allele frequencies are computed based on a user-selected criterion :

    1. Count all genotyped founders, and from pedigrees without any genotyped founders, count a randomly chosen genotyped person.
    2. Count all genotyped founders only.
    3. Count all genotyped individuals.

    For x-linked loci, males have to be reprsented as homozygotes, and the alleles are counted only once.
    Output: The analysis-specific output pedigree and locus files contain recoded allele numbers. A summary of the recoding process is stored in the
    MEGA2.RECODE file

    Back to the top

    b. Pedigree datafile:

    Default name: pedin.##, where ## is the number of the chromosome (01, 02, ..., 23) or pedin.ex, where ex is the input file extension. So if the chromosome number chosen is 2, then Mega2 looks for the file pedin.02 in the current directory.
    The pedigree data file should have either of the following formats,

    a) pre-Makeped linkage format

    b) the standard (post-Makeped) LINKAGE-format without loops.

    Example of a pre-Makeped file with an inbred pedigree [no. 2] (to match the example locus datafile above):

    :::::::::::::: pedin.pre.05 :::::::::::::: 1 1 0 0 1 2 21.2 1 2 1 2 1 1 1 2 0 0 2 0 1.3 1 2 1 1 2 2 1 3 0 0 1 0 0.9 2 2 1 2 2 1 1 4 1 2 2 2 19.1 1 2 1 2 2 1 1 5 1 2 11 2 18.3 2 2 1 2 2 1 1 6 0 0 12 0 0.7 2 2 1 2 1 1 1 7 3 4 2 2 20.5 2 2 2 1 1 2 1 8 3 4 2 2 22.1 2 2 2 2 1 1 1 9 3 4 2 0 11.1 2 2 2 2 1 1 1 10 5 6 1 2 19.5 2 2 2 2 1 1 1 11 5 6 1 2 17.9 2 2 1 2 1 2 2 1 0 0 1 0 1.2 1 2 2 2 2 2 2 2 0 0 2 2 19.1 1 1 1 1 2 1 2 3 0 0 1 0 0.8 1 1 1 2 1 2 2 4 1 2 2 2 21.1 1 1 2 1 2 1 2 5 1 2 2 2 20.3 1 1 2 1 2 1 2 6 0 0 1 0 0.7 2 2 1 2 1 2 2 7 3 4 2 2 18.6 1 1 2 1 2 1 2 8 6 5 1 2 17.6 2 1 2 1 2 1 2 9 8 7 2 2 20.2 1 1 2 1 2 1 2 10 8 7 2 2 22.3 1 1 2 1 2 1

    The LINKAGE-format is essentially the de facto standard for coding pedigree information in a machine-readable form. For a complete description of this format, please see the Handbook of Human Genetic Linkage (Terwilliger and Ott 1994) and the LINKAGE Users Guide (at http://linkage.rockefeller.edu/soft/linkage/) .
    pre-Makeped LINKAGE pedigree file consist of columns of integer data. The pre-Makeped columns are:

    Pedigree Person Father Mother Gender Phenotype1 Phenotype2 Phenotype3 ...

      where missing parents are entered as 0 (zero), and, for the gender column, a 1 = Male and a 2 = Female (This is easy to remember if you think of the number of X chromosomes). Makeped inserts some additional columns of pointers (which would be difficult to enter by hand) and breaks loops, which is required by the LINKAGE programs.The columns should be separated by spaces or tabs (any number of these is allowed).
    While the order of the phenotypes is arbitrary, it is common to put the affection status phenotype first, followed by the marker phenotypes (which, for codominant markers, are the same as the genotypes).
    Phenotype coding:
      1) Trait locus: To code a simple affection status locus, use these codes:

       0 = unknown
       1 = normal
       2 = affected

      2) Marker locus: To code a codominant marker locus phenotype, simply list the two numbered alleles with at least one space or tab between the alleles. The unknown genotype is coded as 0 0.
      Note: Everyone must have either two parents or no parents in the data set. Thus, to connect relatives one may need to include people in a pedigree for whom there is at present no data.

    Example of post-Makeped file corresponding to the pre-Makeped file (above):
    ::::::::::::::
    pedin.05
    ::::::::::::::
       1   1   0   0   4   0   0 1  1  2  21.2 1  2  1  2  1  1
       1   2   0   0   4   0   0 2  0  0   1.3 1  2  1  1  2  2
       1   3   0   0   7   0   0 1  0  0   0.9 2  2  1  2  2  1
       1   4   1   2   7   5   5 2  0  2  19.1 1  2  1  2  2  1
       1   5   1   2  10   0   0 1  0  2  18.3 2  2  1  2  2  1
       1   6   0   0  10   0   0 2  0  0   0.7 2  2  1  2  1  1
       1   7   3   4   0   8   8 2  0  2  20.5 2  2  2  1  1  2
       1   8   3   4   0   9   9 2  0  2  22.1 2  2  2  2  1  1
       1   9   3   4   0   0   0 2  0  0  11.1 2  2  2  2  1  1
       1  10   5   6   0  11  11 1  0  2  19.5 2  2  2  2  1  1
       1  11   5   6   0   0   0 1  0  2  17.9 2  2  1  2  1  2
       2   1   0   0   4   0   0 1  1  0   1.2 1  2  2  2  2  2
       2   2   0   0   4   0   0 2  0  2  19.1 1  1  1  1  2  1
       2   3   0   0   7   0   0 1  0  0   0.8 1  1  1  2  1  2
       2   4   1   2   7   5   5 2  0  2  21.1 1  1  2  1  2  1
       2   5   1   2   8   0   0 2  0  2  20.3 1  1  2  1  2  1
       2   6   0   0   8   0   0 1  0  0   0.7 2  2  1  2  1  2
       2   7   3   4   9   0   0 2  0  2  18.6 1  1  2  1  2  1
       2   8   6   5   0   0   0 1  2  2  17.6 2  1  2  1  2  1
       2   9  11   7   0  10  10 2  0  2  20.2 1  1  2  1  2  1
       2  10  11   7   0   0   0 2  0  2  22.3 1  1  2  1  2  1
       2  11   0   0   9   0   0 1  2  2  17.6 2  1  2  1  2  1 
    
    

    Defining missing quantitative phenotype values:

    The missing value is user-defined: if there are one or more quantitative trait loci in your input files, then Mega2 will ask you what the missing value is. However, you have to use the same missing value for all of the quantitative traits in your file, and (unfortunately) it has to be a real-valued missing value (but need not be zero!).

    Back to the top

    Using Ped, Per and ID identifiers in the pedigree file

    Unique IDs for each person can be specified in the pedigree file using the tag "Id:". Non-numeric pedigree and person identifiers can be indicated using the "Ped:" and "Person:" tags. These three tags are case insensitive e.g. the ID tag can be any of "ID:", "Id:", "iD:", or "id:". Unique IDs are allowed within the pre-makeped format pedigree file, although the "Ped:" and "Per:" tags are not recognized. If provided, Mega2 simply ignores these fields as long as they are placed after the phenotype/genotype columns.
    For output-formats which can handle arbitrary pedigree and person names, Mega2 allows the user to select, which pedigree and person id should be used in the output file. Mendel is one such option. The output pedigree and person IDs can be selected via the "Output file names" menu which looks like:

    
    ==========================================================
      MENDEL file name menu
    ==========================================================
    0) Done with this menu - please proceed
     1) Locus file name:                   locus.05         [new]
     2) Pedigree filename:                 pedm.05          [new]
     3) M13 batch file name:               m13bat.05        [new]
     4) Batch file name:                   batch.05         [new]
     5) M13 batch file name:               m13bat.05        [new]
     6) Person id in output pedigree file:        Individual id
     7) Pedigree identifier in output
                         pedigree file:           Pedigree number
    Select options 0-7 to enter new file names/options >
    
    ==========================================================
    

    Selecting option 6 will display the individual id selection menu:

    
    ==========================================================
    Output person id selection menu:
    0) Done with this menu - please proceed.
    *1) Individual id
     2) Create unique id e.g 1_2, 1=ped, 2=ind
     3) Renumber consecutively within pedigree
    Select from options 0 - 3 >
    ==========================================================
    

    The current choice is indicated with an asterix. Option 2 can be either selecting unique ids, if these are present as the ":ID" field in the input pedigree file, or creation of unique ids by Mega2 if they are absent.
    Selecting option 7 will display the pedigree id selection menu:

    ==========================================================
    Output pedigree id selection menu:
    0) Done with this menu - please proceed
    *1) Pedigree number
     2) Renumbered consecutively
    Select from options 0 - 2 >
    ==========================================================
    

    This functionality is available for the following options:
        Mendel
        SimWalk2
        Aspex
        Genotyping Summary
        SOLAR
        Pre-makeped
        Merlin-SimWalk2-NPL
        (Selection allowed only once, and used in both sets of files
        Merlin
        Loki

    Back to the top

    Using non-numeric allele names

    Non-numeric allele names are allowed inside the pedigree file only with the use of a names file, and only for numbered marker loci. The recoded output pedigrees will have their genotypes altered to numeric alleles. Allele names have to be strings, and may not contain white-space characters, since the pedigree file is read in as a white-space separated column format. See the names file section above for details on recoding.

    Handling of loops inside pedigrees

    Loop breaking

    Mega2 will automatically select an optimal set of loop-breakers if there are loops inside a pre-Makeped pedigree file, and if the ouput analysis type requires loop-less pedigrees. For example, VITESSE and SLINK options require loops to be broken in the pedigree.

    Mega2's loop-breaking capabilities have been successfully tested on several pedigrees, including large ones, and complex interbreeding structures. Multiple marriages are also handled by the Mega2 loop-breaking procedure, although, we currently limit the number of marriages at 10 per person. If your pedigree contains more than 10 marriages per individual, then you are advised to use Makeped in order to break the loops.

    Loop-breaker selection menu

    This allows the user to specify whether selecting a loop-breaker should be limited only to the non-founders in a family. The selected list of loop-breakers is displayed as well recorded in the MEGA2.LOG file.

    ==========================================================
    Loop-breaker selection menu:
    0) Done with this menu - please proceed.
     1) Select only non-founders as loop-breakers [n].
    Enter 1 to toggle, 0 to exit > 
    
    

    Loop reconnection

    Mega2 will automatically reconnect the loops of a linkage pedigree file when necessary. For example, this is done when generating output files in MENDEL, SAGE, and SOLAR formats. The reconnection will, unfortunately, result in a renumbering of person ids.

    If the input pedigree was in pre-Makeped format, then the pedigrees remain intact for these options.

    Mega2 displays each pair of pedigree records that were re-connected, as well as logging them in the MEGA2.KEYS file.


     

    Back to the top

    c. Map datafile:

    The map file gives the (relative) map position of each marker in centiMorgans (cM). If two markers fall at exactly same position, then Mega2 will assume that the marker listed first should come first, and will automatically add a small increment (of 0.0001 cM) to the position of the second marker.

    Note that Mega2 can now make the distinction between Haldane and Kosambi map distances by looking at the first line of the map file. If the second column heading contains "Kosambi", then the distances are read in as Kosambi centimorgans.

    An additional 4th column can be added to the map file specifying mistyping probabilities for each marker. These values are utilized within the Genotyping error simulation option. This column should have the heading "error" (case-insensitive).

    Some of the analysis options like SLINK, Vitesse etc. use recombination fractions. In this case, the appropriate mapping function is used to convert the inter-marker distances into recombination fractions. In Aspex, e.g., if the user chooses to output Kosambi map distances, using Haldane distances in the map file, the map will first be converted to thetas which will then be converted to Kosambi map distances.

     Example:

    ::::::::::::::
    map.05
    ::::::::::::::
    CHROMOSOME    KOSAMBI   NAME
    5             0.0       M1
    5             5.0       M3
    5             8.0       M2
    

    NOTE: Any marker that is in the locus file must be given a map position in the map file. Thus, the marker names used in the map file must match exactly the names used in the locus file. If a codominant marker locus in the locus file is not found in the map file, then MEGA2 will warn you about this. If you ignore the warning, then this locus will not appear in the output files created by MEGA2. You may have more loci in the map file than appear in the locus file. While you will be warned about this, it does not pose any difficulties.

    Default name: map.##, where ## is the number of the chromosome (01, 02, ..., 23) or map.ex, where ex is the input file extension. So if the chromosome number chosen is 2, then Mega2 looks for the file map.02 in the current directory.

    Hint: See the section on map making utilities for help on creating map files.

    Specifying sex-specific maps:

    The map file now allows two extra columns for specifying male and female map distances. These columns should appear after the "Name" column. They should be labelled "male" and "female" respectively, and Mega2 is case-insensitive to these headers. The map function for the male and female maps is assumed to be the same as the sex-average map (the second column). Male and female maps are only used within the SimWalk2 option for now. Here is an example of a Mega2 map file containing male and female maps:
    ::::::::::::::
    map.05
    ::::::::::::::
    CHROMOSOME    KOSAMBI   NAME    MALE       FEMALE
    5             0.0       M1      0.0        0.0
    5             5.0       M3      2.0        7.0
    5             8.0       M2      4.0       12.0
    

    Back to the top

    d. Omit datafile: [Optional]

    The optional omit file permits one to easily delete certain marker genotypes from all Mega2-generated output files. This is useful if certain marker genotypes are Mendelian- inconsistent, yet one wants to preserve the original marker data in the input file. Marker genotypes can be omitted for a whole family at once or for one specific individual.

    The omit file should be in the following format:
    Each line should have two integers and a string, separated by white space. The first number is the pedigree number and must match that used in your input LINKAGE-format file. The second number is the person number and must match that used in your input LINKAGE-format file. The string should be either All or the name of the locus. If All is used, the person or pedigree indicated will be untyped at all the loci. If the person number is zero, then all marker genotypes will be set to unknown for the entire pedigree. Otherwise only the indicated person will be untyped.

    A summary of the omit results will be found in the file omit.log. This file is rewritten the next time MEGA2 is run with an omit file specified. If MEGA2 can not find a person or pedigree as specified in the input omit file, it will halt with an error message.

    Example:

    ::::::::::::::
    omit.05
    ::::::::::::::
    1 0 All
    2 10 M2
    2 0  M1

    This file generates the following log file:

    ::::::::::::::
    omit.log
    ::::::::::::::
    Marker untyped everyone in pedigree 1
    Marker untyped pedigree 2 person 10 at locus M2
    Marker untyped everyone in pedigree 2 at locus M1

    The omit file can now be used to set trait phenotypes to unknown. Here is such an example:

    
    1 11 AFF2
    1 11 QUANT1
    

    These two lines direct Mega2 to untype person 11 of pedigree 1 at affection locus AFF2 and qtl QUANT1. The affection status will be set to unknown (0), and the quantitative phenotype will be set to the appropriate missing value in the output. These actions are logged as well.

    Please note that when the marker column contains the keyword "All", it still referes to only marker loci, trait loci are left untouched.

    Default name: omit.##, where ## is the number of the chromosome (01, 02, ..., 23) or omit.ex, where ex is the input file extension. So if the chromosome number chosen is 2, then Mega2 looks for the file omit.02 in the current directory.NOTE: Since a person who breaks a loop is indicated twice in a post-Makeped pedigree file, if you want to untype a loopbreaker, you must explicitly untype both occurrences of this loopbreaker person. Mega2 checks to see if loop-breakers have the same genotypes, and will flag an error otherwise.

    Hint: See the section on creating omit files based on errors found by running the pedigree checking program Pedcheck.

    Back to the top