DOCUMENTATION FOR QUIKLINK
INTRODUCTION
QUIKLINK is a program which facilitates setting up linkage analyses using MLINK, LINKMAP, ILINK, VITESSE and MFLINK. To a large extent it performs the same functions as the LCP program supplied with the LINKAGE programs and as DOLINK, although LCP and QUIKLINK can each do things which the other cannot. The main features of QUIKLINK are that it can be used to set up standard linkage analyses quickly and easily, that it can be used in batch mode as well as interactive mode, that it can set up analyses with VITESSE and with MFLINK and that it can be used to set up analyses to estimate marker allele frequencies with ILINK. However it cannot deal with the full range of locus types and analyses which can be handled by LCP and the LINKAGE programs.
Like LCP, QUIKLINK requires a pedigree file and matching locus data file in the standard LINKAGE format. Using these files, for each analysis it sets up a DOS batch file or Unix script file which preprocesses the data files, runs the necessary analysis programs (such as UNKNOWN and MLINK) and writes the output of the linkage analysis into appropriately named files. Unlike LCP, separate batch or script files and separate output files are produced for each analysis. A number of temporary files are generated while the batch or script file is run and these will be automatically deleted unless the file is run with the command line argument nodelete. QUIKLINK accepts the names of the pedigree and locus data file as command line parameters and then additional lines of input are provided to define the analyses to be set up. Each line relates to one analysis and these lines can either be typed in interactively or read in from an input file which could define a complete set of analyses to be performed. Each analysis is provided with a different name, and all the input and output files for this analysis would have different extensions but the same root name. Typically the input for QUIKLINK consists of the analysis name followed by the locus numbers for the test locus and then for one or more other loci whose position is known. If necessary a second line of input provides recombination fractions between the loci. For example to set up a LINKMAP analysis one might run QUIKLINK as follows:
quiklink alldata.ppd alldata.par
Enter filename and numbers of loci to be used:
lm423 1 4 2 3
Enter 2 recombination fractions between fixed loci:
0.12 0.04
What this does is to run QUIKLINK using a pedigree file called alldata.ppd and a locus data file called alldata.par. These would contain information regarding all the disease and marker loci in the project. A LINKMAP analysis is set up using the filename lm423 (with "lm" standing for LINKMAP and "423" indicating the marker numbers used, it being understood that the locus numbered 1 is the affection locus to be mapped). This means that under MSDOS a batch file called lm423.bat will be written or under Unix a script file called lm423.sh. Data for the relevant loci will be extracted from the pedigree and locus data files and written to two stripped down files called lm423.ppd and lm423.loc. The loci to be used in the analysis consist of the first locus, which would be the affection locus, and the next three loci, which would be marker loci. The order of these loci would be 4-2-3. In the next line the recombination fractions are provided as 0.12 between markers 4 and 2, and 0.04 between markers 2 and 3. The three files lm423.bat (or lm423.sh), lm423.ppd and lm423.loc can then be used to run a 4-point multipoint linkage analysis with LINKMAP, which would be performed by running the batch or script file.
In order to run the batch file under MSDOS one simply enters its name, optionally without the .bat extension. So to run the above example analysis one would enter:
lm423.bat
or just:
lm423
To run the script file under Unix one must specify that it is to be executed using the sh shell. The file is not set up to be directly executable itself so cannot be run just by entering its file name (as one would, for example, with the pedin file produced by LCP). To run the Unix script file, enter sh followed by the filename, for example:
sh lm423.sh
Typically one might run QUIKLINK to set up a number of different analyses at once. For example, one might provide the following lines of input:
lm423 1 4 2 3
0.12 0.04
ml2 1 2
ml3 1 3
ml4 1 4
In addition to the LINKMAP analysis described above this would set up three MLINK analyses with filenames ml2, ml3 and ml4 and to perform a two-point linkage analysis of the first locus against marker loci 2, 3 and 4 respectively.
FORMAT OF DATA FILES
The pedigree and locus data files must be in the standard format for the LINKAGE programs which is described in their documentation. The pedigree file is “post-makeped”, i.e. containing pointers for siblings and children and with loops broken. Affection and quantitative loci can only have two alleles.
REFERRING TO DIFFERENT LOCI
When setting up the analyses loci can either be referred to by their numbers, i.e. numbered from 1 in the order they appear in the main pedigree and locus data files, or they can be referred to by names provided in the locus data file. The format for these names is not strictly part of the standard LINKAGE format for the file, though is compatible with other programs which use these files. To provide a name for a locus, it is written in the first line of the locus description in the locus data file after the two numbers which define the type of locus (affection/numbered/etc.) and the number of alleles and is preceded by a # (hash) character. For example a locus description could appear as follows:
1 2 # ALZ
9.999000E-001 1.000000E-004 << gene freqs
1 << number of liability classes
0.01000 0.50000 0.50000
This affection locus could then be referred to by the name ALZ, which would also be used to provide titles in the results files produced from the analyses. Additional information, such as comments, can be appended to the line following the marker name and will be ignored.
SETTING UP DIFFERENT KINDS OF ANALYSIS
By default if QUIKLINK is provided with two loci it will set up a two-point analysis with MLINK and if it is provided with more than two loci it will set up a multipoint LINKMAP analysis with the first locus being moved over a fixed map consisting of the other loci.
The default behavior of QUIKLINK can be modified in order to set up other kinds of analysis by entering a variety of switches. These can be provided as input lines or can be entered on the command line after the pedigree and locus data file names. Switches all begin with a hyphen. Once a switch has been entered the corresponding option remains in effect until the switch to turn it off is entered.
The full range of analyses with their switches is described below.
[default]
Input: filename and two locus numbers
Sets up two-point MLINK analysis with lod score evaluated at recombination fractions 0, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5. Produces three files with extensions .bat/.sh, .ppd and .par. When the analysis is run output is sent to files with extensions .res and .out.
Example input:
2p46 4 6
This would set up a two-point MLINK analysis between loci 4 and 6.
[default]
Input: filename and more than two locus numbers followed by two fewer recombination fractions than the number of loci
Sets up multipoint LINKMAP analysis with the first locus being moved over the others, which are separated by the recombination fractions provided. Five positions are tested in each interval. Produces three files with extensions .bat/.sh, .ppd and .loc. When the analysis is run output is sent to files with extensions .res and .out.
Example input:
4p426 1 4 2 6
0.02 0.035
This sets up a 4-point LINKMAP analysis with locus 1 as the test locus being moved against loci 4, 2 and 6 which are separated by recombination fractions of 0.02 and 0.035.
-m1 to use MLINK for multipoints, -m0 to switch off
(Note: When setting a switch on the character following letter which defines the switch is the digit one not the letter L. When setting it off the character is the digit zero not the letter O.)
Input: filename and more than two locus numbers followed by two fewer recombination fractions than the number of loci
Sets up a multipoint analysis using MLINK, with the first locus being moved relative to a fixed map consisting of the other loci which are separated by the recombination fraction given. However the order of loci is kept constant, with the test locus being placed at different positions relative to the first fixed locus. The lod score is evaluated for recombinations between the test locus and first fixed locus (first and second loci) of 0, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5. Produces three files with extensions .bat/.sh, .ppd and .par. When the analysis is run output is sent to files with extensions .res and .out.
Example input:
3pml26 1 2 6
0.02
Here an MLINK analysis whereby locus 1 is tested against loci 2 and 6, which are separated by a recombination fraction of 0.02. Only the recombination fraction between loci 1 and 2 will be varied. The inclusion of locus 6 in the analysis may provide additional information for loci which are uninformative for locus 2.
-a1 to use MFLINK, -a0 to switch off
Input: filename and two or more locus numbers, one of which is an affection locus, followed by one fewer recombination fractions than the number of loci
This option sets up a "model-free" analysis using MFLINK to test (nearly) all transmission models. One of the loci used must be the affection locus to be tested while the others must all be markers. The order and recombination fractions define the test position for the affection locus. Produces five files with extensions .bat/.sh, .ppd, .par, .lda and .uda. The last two correspond to the linked.dat and unlinked.dat files required by MFLINK. Output is written to a file with extension .mfl.
Example input:
mf214 2 1 4
0.04 0.04
If we assume that 1 is the number of the affection locus then this sets up an MFLINK analysis which tests a position midway between markers 2 and 4, which are separated by a recombination fraction of approximately 0.08.
-v1 to use VITESSE, -v0 to switch off
Input: as for any MLINK, LINKMAP or MFLINK analysis
The effect of this switch is to set up analyses using VITESSE rather than the LINKAGE/FASTLINK package. VITESSE can directly replace UNKNOWN and MLINK or LINKMAP. If the -v switch is used in conjunction with the -a switch then an MFLINK analysis will be set up in which MFLINK uses the VITESSE program rather than MLINK for its likelihood calculations (in fact it uses VNOSCORE instead of NOSCORE - see MFLINK documentation).
Example input:
-v1
4pv426 1 4 2 6
0.02 0.035
This sets up a 4-point analysis using VITESSE with locus 1 as the test locus being moved against loci 4, 2 and 6 which are separated by recombination fractions of 0.02 and 0.035.
-i1 to use ILINK, -i0 to switch off
Input: filename and two or more locus numbers, followed by one fewer recombination fractions than the number of loci
Sets up an ILINK analysis to estimate recombination fractions. Starting values for the recombination fraction[s] between loci are provided. Produces three files with extensions .bat/.sh, .ppd and .par. Output is written to a files with extensions .res and .out.
Example input:
i24 2 4
0.1
The recombination fraction between loci 2 and 4 will be estimated, given a starting value of 0.1.
-q1 to estimate allele frequencies, -q0 to switch off
Input: filename and one locus number
Use ILINK to estimate allele frequencies of a marker. Produces three files with extensions .bat/.sh, .ppd and .par. Output is written to a files with extensions .res and .out. (In fact the files will be set up to include a second dummy locus which is uninformative but which seems necessary in order for ILINK to run correctly.)
Example input:
q4 4
Estimates the allele frequencies of locus 4.
-rX to set female:male distance ratio
Sets the value of the ratio of female:male genetic distance to other than the default ratio of 1:1. The LINKAGE programs support three options: equal values; a constant ratio; separate ratios in each interval. QUIKLINK only allows a constant ratio (which may be 1:1).
Example input:
-r1.5
2p13 1 3
-r1.0
Performs a two-point MLINK analysis between loci 1 and 3 with the female:male distance ratio set to 1.5, then returns the ratio to 1:1 for subsequent analyses. Note: the version of VITESSE which I have does not seem to run MLINK analyses if the ratio is not 1, although it can run LINKMAP analyses correctly.
-u1 to target Unix, -u0 to target MSDOS
Sets QUIKLINK to produce either MSDOS batch files or Unix script files. By default the files are produced which are suitable for the operating system under which QUIKLINK is running. However this allows one for example to run QUIKLINK on a PC under MSDOS and to produce the necessary files, including Unix script files, which could then be ftp'd to a Unix workstation to run the analyses.
-eN to set number of evaluations per interval
By default when a LINKMAP analysis (or equivalent VITESSE analysis) is performed likelihood evaluations will be carried out with the test locus set at 5 positions in each interval. This option allows the default number tobe changed.
Example input:
-e10
4p426 1 4 2 6
0.02 0.035
-e5
Sets up a LINKMAP analysis with 10 evaluations per interval, then returns to the default value of 5 for subsequent analyses.
RUNNING MULTIPLE SCRIPT FILES
Unlike LCP, QUIKLINK uses a separate set of data, batch/script and output files for each analysis performed. This has some advantages in terms of being able to manage the different analyses and means the output files are suitable for the TABLE utility, but it does mean that there can be large numbers of files to deal with and also that one cannot run a number of analyses sequentially just by running a single batch or script file, as one can with LCP. Instead the batch or script file for each analysis must be run separately by the user.
It is possible to write a simple Unix script file which will run a number of other scripts one after the other in order to perform multiple analyses. For example such a file, named doall.csh, might appear as follows:
#!/bin/csh
mkdir done
foreach f (*.sh)
sh $f
mv $(f:r).* done
end
This takes each script file (with extension .sh) in turn, runs the analysis and then moves all the files related to that analysis (i.e. those with the same root name) into a subdirectory called done.
EXAMPLE USAGE
Normally one might use QUIKLINK to set up a number of related analyses. Once one had decided what analyses one wanted to perform one might list the input commands for these in a text file which could be read in by QUIKLINK, rather than typing the commands in interactively. This would have the advantage that the parameters for the analyses could be set up carefully in advance rather than risking a typing error as they were input, and that one would then have a permanent record of the parameters (such as recombination fractions between markers) which had been used. Also, should it emerge that there were errors in the data files then these could be fixed and then the whole set of analyses could be prepared and run again with minimal effort.
QUIKLINK is supplied with example LINKAGE pedigree and locus data files called alzall.ppd and alzall.par, and an example input file called testql.inp. By running QUIKLINK with these data and input files one can produce examples of each of the different kinds of analysis mentioned above:
quiklink alzall.ppd alzall.par < testql.inp
Examining these files should further clarify the usage of QUIKLINK. An MSDOS batch file and Unix script file called testql.bat and testql.csh respectively are also supplied which contain this command line and which then go on to run all the analyses, provided that all the relevant executables are available on the PATH.
CONTACT DETAILS
Please do not hesitate to contact me with any comments or problems.
Dave Curtis
dcurtis@hgmp.mrc.ac.uk
http://www.gene.ucl.ac.uk/~dcurtis/
Department of Adult Psychiatry, 3rd Floor, Outpatient Building, Royal London Hospital, London E1 1BB. Phone +44 20 7377 7729.
December 1999