Creating LOD score plots from Mega2

Overview

Mega2 now has the capability of using the R statistical package and our nplplot R function to generate postscript files containing LOD score plots. Currently, only the three options listed in the "Supported analysis options" section are capable of using this facility. More analysis options will be added in the near future. The package nplplot should be installed along with the R libraries. For instructions on installing and using R packages, see the following section. Documentation on installing libraries in general is maintained at the CRAN site .

Installing the nplplot package

The nplplot package is distributed as part of the Mega2 bundle as an R package source named nplplot_0.2.tar.gz for UNIX and DARWIN and as nplplot_0.2.zip for WINDOWS. Perl 5.005 or later is required to install this package. To install nplplot within the current installation of R on your computer, log in as root, then from within the directory containing the nplplot_0.2.tar.gz file, type the following command:

    > R CMD INSTALL nplplot_0.2.tar.gz

It is important that INSTALL be in upper-case, and that you specify the complete package name. This creates a a new R library nplplot. If installation proceeds correctly, the user should see the following messages:


Installing *source* package `nplplot' ...
 R
 data
 help
 >>> Building/Updating help pages for package `nplplot'
     Formats: text html latex example
 DONE (nplplot)

DONE (INSTALL)

If this command is successful, when you start up R and type 'library()' at the R prompt, the 'nplplot' library should appear in the list of installed libraries.
In order to generate the plots from within Mega2, it is not necessary to invoke R. The scripts generated by Mega2 run R in batch mode to create the appropriate postscript files.

Supported analysis options

The follwing options can currently create R-plots:

Statistics selection menu

SimWalk2 computes 5 statistics for the NPL LOD score computation (see the SimWalk2 output file STATS-*.ALL for a description). This is also true of the Merlin-SimWalk2 option. Allegro computes LOD scores separately for 12 different non-parametric statistics. Each is stored in a separate output file named appropriately (e.g. allegro_linpairs_spt.01). Therefore, for each analysis option there is a specific statis selection menu where the user can specify the ones to be included in the plots.

Please note that running SimWalk2 or Allegro will still compute all of their statistics, however, only those selected via this menu will be plotted. Further instructions on how to susequently modify the selections are provided below.

SimWalk2 statistic selection menu:

==========================================================
R plot statistic selection menu:

NPL statistics will be automatically plotted into a
postscript file using R after Simwalk2-NPL
has been run.
Please select the statistics to be included in this R plot.

This list can be later modified in the shell script Rsimwalk2.sh before
running this script.

==========================================================
 1) A
 2) B
 3) C
 4) D
 5) E
==========================================================
Enter string of statistic numbers ('e' to terminate) > 1 2 3 e

Selected statistics: A B C
==========================================================

In the above example, the user has selected statistic A, B, and C for plotting. For the Merlin-SimWalk2 option, only statistics D and E, which correspond to Merlin's Pairs and All statistics respectively, can be selected.

For Merlin, the statistics list is :


==========================================================
 1) ALL
 2) Pairs
==========================================================

The Allegro statistics selection menu is:

==========================================================
R plot statistic selection menu:

NPL statistics will be automatically plotted into a
postscript file using R after Allegro
has been run.
Please select the statistics to be included in this R plot.

This list can be later modified in the shell script Rallegro.sh before
running this script.

==========================================================
 1) exppairs_mpt    Exponential multi-point pairs
 2) exppairs_spt    Exponential single-point pairs
 3) expall_mpt      Exponential multi-point all
 4) expall_spt      Exponential single-point all
 5) linpairs_mpt    Linear multi-point pairs
 6) linpairs_spt    Linear single-point pairs
 7) linall_mpt      Linear multi-point all
 8) linall_spt      Linear single-point all
 9) par_mpt:LOD     Parametric multi-point LOD scores
 10) par_spt:LOD     Parametric single-point LOD scores
 11) par_mpt:HLOD    Parametric multi-point heterogeneity LOD scores
 12) par_spt:HLOD    Parametric single-point heterogeneity LOD scores
==========================================================
Enter string of statistic numbers ('e' to terminate) >

R-plot parameters menu

This menu allows to user to define what to plot, where and how. Here is what the menu looks like:

==========================================================
R plot customization menu:

 Multiple traits may be combined into a single file
    or plotted separately, one trait per plot file.
 Multiple traits may be combined into a single plot.
    or ploted one trait per plot.
 There may be one plot-file created per chromosome, or
    all combined into one file.
 The number of plots to be displayed on a single page
    can be controlled by changing the <row> and <col> values.
 Y-axis range can be specified by setting maximum and minimum values
    and will be used if LOD scores fall within the specified range.
 Postscript file orientation can be toggled between
    portrait or landscape.

==========================================================

R plot parameter selection menu:
0) Done with this menu - please proceed
 1) Postscript output file name stem     SW2NPL
 2) Combine traits into one plot         [no ]
 3) Combine traits into one file         [no ]
 4) Combine chromosomes into one file    [yes ]
 5) Minimum Y-axis value                 0.00
 6) Maximum Y-axis value                 3.00
 7) Horizontal cut-off line at           2.00
 8) Plots per page: number of rows       2
 9) Plots per page: number of columns    2
 10) Postscript plot orientation          [landscape]
Select from options 0-10 (2,3,4,10 to toggle, 5-9 to change values) >

In this example, data containing marker loci on chromosomes 1,2, and 3 and traits AFF1 and AFF2 is analyzed with the SimWalk2-NPL option. Here, the user has chosen to combine the statistics selected in the statistics selection menu together for AFF1 on a single plot, then produce a combined file containing the three chromosomes in a single file, and similarly for AFF2. This will result in two separate postscript files SW2NPL.all.ps created in each trait directory (AFF1/ and AFF2/) with 4 plots in each file with landscape orientation. If the user had chosen to combine traits into a single file, this would reside in the root directory (i.e. the directory from which Mega2 is executed).

Creating plots

For each of the three analysis options listed above, the output file name selection menu now includes a switch that can turn the plot creation on or off. If is it turned on, these additional files are created :
Thus, first the user has to execute either the npl.*.sh or al_script.*.sh files (these could reside in separate trait directories, if multiple traits are analyzed), then run Rsimwalk2.sh or Rallegro.sh respectively.

Output

The SimWalk2 output file is called SW2NPL.##.ps where ## stands for the chromosome number or SW2NPL.all.ps if multiple chromosomes are plotted into one file.

The Allegro output file is called Allegro.##.ps. These files can be viewed using a postscript viewing utility such as ghostscript, or printed on a printer directly.

Example R plot (postscript)
Example R plot (converted to pdf)