Running Mega2 in Batch mode.

Overview

Mega2 can be run in a batch-mode by invoking it with a single argument which is a batch file name:

       > mega2 MEGA2.BATCH
This runs Mega2 in a non-interactive mode through the following steps:

In the previous version, Mega2 started running in the interactive mode after the genotyping error simulation setup, and prompted the user to input analysis-specific parameters, such as output file names etc. New batch file items have been implemented which makes user-interaction unneccessary for the bulk of the execution. Subsequent versions will be made fully automatic execept for input data errors that cannot be ignored.
Each time you run Mega2 in interactive mode, a new batch file is created. This batch file is named MEGA2.BATCH, and the existing MEGA2.BATCH is moved into MEGA2.BATCH.old.

Back to the Top

Batch file format

The batch file is a text file which has a specific format. Each line is either a comment or a definition. Blank lines are ignored. Comment lines begin with a # in the first column. A definition line has two parts, a name and a value. Each definition line is of the form :

     Name=Value

Here is an example batch file

#          Mon Jun 14 12:48:45 2004
# Lines beginning with # are comments.
#
# Currently implemented keywords:
#   1) Input_Pedigree_File
#   2) Input_Locus_File
#   3) Input_Map_File
#   4) Input_Omit_File
#   5) Input_Untyped_Ped_Option
#   6) Analysis_Option
#   7) Analysis_Sub_Option
#   8) Chromosome_Single
#   9) Chromosomes_Multiple_Num
#   10) Chromosomes_Multiple
#   11) Loci_Selected_Num
#   12) Loci_Selected
#   13) Trait_Single
#   14) Traits_Num
#   15) Traits_Loop_Over
#   16) Traits_Combine
#   17) Trait_Subdirs
#   18) Value_Missing_Quant
#   19) Value_Affecteds
#   20) Error_Loci
#   21) Error_Except_Loci
#   22) Error_Loci_Num
#   23) Error_Model
#   24) Error_Probabilities
#   25) Input_Do_Error_Sim
#   26) Default_Outfile_Names
#   27) Default_Reset_Invalid
#   28) Default_Other_Values
#   29) Default_Ignore_Nonfatal
#   30) Default_Ignore_Xlinked
#   31) Default_Rplot_Options
#   32) Covariates_Num
#   33) Covariates_Selected
#   34) Output_Path
#
# Restrictions on usage:
#   Use either Chromosome_Single or
#     Chromosome_Single and Loci_Selected
#     Chromosomes_Multiple and Chromsomes_Multiple_Num
#
#  Use either Trait_Single or
#     Traits_Loop_Over and Traits_Num or
#     Traits_Combine and Traits_Num.
#
#     Error_Loci and Error_Loci_Num or
#     Error_Except_Loci and Error_Loci_Num.
#
# Default settings :
# Default_Reset_Invalid:
#   "y"= set inconsistent genotypes to 0 and continue.
#   "n"= continue without setting inconsistent genotypes to 0.
# Default_Ignore_Nonfatal :
#   Don't pause for other non-fatal errors in input:
# Default_Other_Values:
#   Use Mega2's default values instead of asking user
#   inside analysis-option menus.
# Default_Ignore_Xlinked for:
#   Ignore the X-linked flag in locus data-file,
#   Instead, set x-linked based on chromosome number(human).
#
Input_Pedigree_File=pedin.06
Input_Locus_File=datain.06
Input_Map_File=map.06
Input_Untyped_Ped_Option=2
Input_Do_Error_Sim=0
Analysis_Option=16
Chromosome_Single=6
Traits_Num=5
Traits_Combine=8 9 10 11 12
Value_Missing_Quant=-9.000000
Outut_Path=mega2_results


Back to the Top

Major classes of batch file options

The batch file options are currently classified into 7 different classes which mirror the different menus of Mega2:

Special considerations

The order in which these options appear in the batch file does not affect Mega2's behaviour. Some options are dependent on the definition of others. Each class is regarded as complete only if an adequate number of definitions in that class are provided. For example, in the Input menu items class, one has to have at least the three file names (pedigree, locus and map), the untyped-ped option and the error-simulation setup flag. If one or more of these are missing, Mega2 will stop and require the user to go through the input menu before proceeding through the other sections. These and other requirements are described below, and also in the batch file generated by Mega2.

Since the batch file is required to have a very specific format to run correctly, the user is advised to create several test runs in order to become familiar with different batch files. An existing batch file can then easily be adapted to create a new analysis, or a new set of markers, etc.

Back to the Top

Details on batch file items

1) Input_Pedigree_File

This item specifies the input pedigree file name. This is a necessary item. The value should be a string without any white-space characters.

2) Input_Locus_File

This item specifies the input locus data file. This is also a necessary item. The value should be a string without white-space characters.

3) Input_Map_File

This item specifies the input Mega2-format map file. This is also a necessary item. The value should be a string without white-space characters.
All of the the above three items should be specified in order for Mega2 to proceed without stopping for user-input at the Input menu.

4) Input_Omit_File

This item specifies the input omit data file. This is an optional item. The value should be a string without white-space characters.

5) Input_Untyped_Ped_Option

This item specifies the handling of marker-untyped pedigrees. This is also a necessary item. The value should be an integer greater than or equal to 1. Values 1-3 select one of the first 3 options of the Untyped pedigree option menu . Values 4 and onwards refer to the minimum number of the typed persons in a pedigree in order for it to be included in the analysis.
4 corresponds to at least 1 typed person
5 corresponds to at least 2 typed persons
and so on, i.e. included pedigrees must have at least <value> - 3 typed persons.

6) Input_Do_Error_Sim

This option is decides whether Mega2 should execute the random genotyping error simulation step. Valid values are "yes/no".

8) Analysis_Option

This option corresponds to the Analysis option numbers in the analysis menu . Valid values are 1-26.

9) Analysis_Sub_option

This option specifies the sub-program of the selected analysis where such analysis requires the user to choose between more than one sub-program e.g. in the SimWalk2 option the user has to select from among 5 sub-options, Haplotyping, Parametric, Non-parametric, IBD and Mistyping detection. These options are listed below along with allowed values in the Mega2 batch file.

Analysis Sub-optionValue
SimWalk2Haplotype1, h, H
Parametric (location scores)2, p, P
Non-parametric linkage3, n, N
IBD estimation4, i, I
Mistyping5, m, M
Aspexsib-ibd1, i, I
sib-tdt2, t, T
sib-phase3, p, P
sib-map4, m, M
SummarySegregation summary1, s, S
Allele frequency summary2, a, A
Liability group summary3, g, G
Genotyping success rate summary4, r, R
Quantitative phenotype summary5, q, Q
VitesseLinkmap1, l, L
Mlink2, m, M
Test for Hardy-WeinbergGen1, g, G
HWE2, h, H

10) Chromosome_Single

This option refers to the locus reordering menu option 1. It refers to the chromosome selected for analysis. If item 12 below is specified as well, then the marker numbers specified by 12 refer to markers on the chromosome specified by this option. The value should be a single integer. The validity of the chromosome is decided after the locus data file and map file are read in, and the chromosome numbers present within the input data are known.

11) Chromosomes_Multiple_Num

This option is a single integer which decides how many chromosome numbers should be read in option 11. Valid values are positive integers. This option is necessary for option 11 to work, otherwise Mega2 will report an error in the batch-file and terminate.

12) Chromosomes_Multiple

This option specifies which chromosomes should be selected for output. It should be a list of integers, each being a chromosome number present in the input data (as decided by the map and locus data files). This option also requires that option 10 be specified. There must be at least as many chromosome numbers as specified in option 10.
For the three options 9-11, if option 9 is specified, the others are ignored.

13) Loci_Selected

This option specifies the marker numbers to be selected for analysis. As mentioned in option 9, these markers refer to the ones present on the chromosome number specified in option 9. There should be at least as many marker numbers as the number specified in option 12. Whether these numbers are valid can only be decided after marker data has been read in.

14) Trait_Single

This option corresponds to the menu option 1 of the Trait reordering menu. It should be a positive integer, referring to the list of trait loci present in the input data. Trait loci are numberered 1 - N (the total number of trait loci) in order of their appearance in the locus data file.
For example, if the locus data file lists loci 10-12 as "trait1" through "trait3", specifying a "1" in this option selectes "trait1" and so on.

15) Traits_Num

This option refers to the number of trait loci to be selected for analysis in the two following options 16 and 17. This number should be a positive integer less than or equal to the number of trait loci available in the input data.

16) Traits_Loop_Over

This option corresponds to the menu item 2 of the trait reordering menu where traits are selected to be analyzed one at a time. Trait numbers should correspond to the order in which they are listed in the locus file, starting from 1 (as in option 14).

17) Traits_Combine

This option corresponds to the menu item 3 of the trait reordering menu where trait loci are selected to be combined in the same output. The actual trait numbers should correspond to the order in which traits appear in the locus file, and these numbers can be permuted for the purpose of reordering them in the output. There should also be an item numbered N + 1 where N is the number of traits. This item refers to the [MARKERS] item displayed in the trait selection menu-option 3.
For options 14, 16 and 17, option 14 takes precedence over the other items, and option 16 takes precedence over option 17 i.e. if both "Trait_Single" and "Traits_Loop_Over" are defined then the latter is ignored.

18) Trait_Subdirs

These are trait-sepcific directories in which output files are created if the Traits_Loop_Over option is defined. The value is a list of strings separated by white-space in a single line. The number of directory names read in depend on option 15.

19) Value_Missing_Quant

If the list of trait loci selected for output include one or more QTLs, then this value is interpreted as the missing quantitative phenotype value. The value should be a real number.

20) Value_Affecteds

If any of the affection loci selected have more than one liability class, then this value refers to the list of - pairs which should be considered affected.
Input format should be - pairs separarated by whitespace, e.g. 2-1. Either field can be a "*" denoting a wildcard e.g., 2-* means status 2 and all classes.

21) Error_Loci

These and the following options are used by the genotyping error simulation module of Mega2. This option defines a set of locus numbers at which random genotyping errors will be introduced. These should be a subset of the marker loci chosen in the locus reordering step. The number of loci read in depends on option 23. The locus numbers are selected from all the loci selected in the reordering step.

22) Error_Except_Loci

If this option is defined instead of option 21, this list of loci is excepted from errors. The number of loci read in also depends on option 23 and locus numbers refer to the full list of loci selected in the locus-reordering step.

23) Error_Loci_Num

The value of this option is a single integer denoting how many markers to read in from options 21 or 22. It should be a positive integer.

24) Error_Model

This option specifies the error model to use in error simulation. This should be a single character, "u" or "U", "m" or "M", "s" or "S". They refer to the Uniform error probability model, Marker-specific uniform error probability model, and the Simwalk2 error model respectively.

25) Error_Probabilities

This option specifies the prior genotyping error probability or probabilities for error simulation. It should be a single real value, if the model specified in option 24 is "Uniform", or a list of 5 real values, if the "Simwalk2" error model is specified. Marker-specific error probabilities can only be specified in the map file.

26) Default_Outfile_Names

If set to yes , the output file-names menu will be skipped for all options. In addition to file names, some of these menus also contain options for selecting pedigree and person identifiers in the output files, these will be set to default values as well.

27) Default_Reset_Invalid

This option defines how invalid genotypes should be handled without pausing for user-input via the invalid-genotypes menu (which is skipped). If set to yes the genotypes will be reset to unknowns, and if set to no invalid genotypes will not be reset.

28) Default_Other_Values

If set to yes Mega2's default value settings will be used for miscellaneous parameters within various options (such as random seed for simulate), and Mega2 will skip the input menus meant for those parameters. This has not been completely implemented yet, so Mega2 may still halt and ask for input.

29) Default_Ignore_Nonfatal

If set to yes Mega2 will npt halt for user-input on whether to continue execution if non-critical inconsistencies are found in the input data such as extra locus names in map file or locus file. Otherwise, it will halt for user-input upon encountering such inconsistencies.

30) Default_Ignore_Xlinked

If set to yes, Mega2 will ignore the X-linked flag in input locus file, and set analysis to x-linked if chromosome 23 is selected. If set to no, then Mega2 will pause for user input if the X-linked flag is set and an autosome is selected and vice versa.

31) Default_Rplot_Options

If set to yes, Mega2 will skip the Rplot parameters menu and use its default values instead for options that set up R plots.

Back to the Top

32) Covariates_Selected

The trait loci defined in this keyword are output as covariates for the Genehunter-format option. In future releases, we plan to include more options that make use of this keyword.

Back to the Top

33) Covariates_Selected_Num

This refers to the number of loci defined in the keyword Convariate_Selected_Num above.

Back to the Top