mlink, linkmap, lodscore, ilink and unknown are all using the FASTLINK versions of the code. The rest of the programs are version 5.2 of the LINKAGE package.
If you find that the programs provided here are not compiled with the parameters required for your data, use the "compilejob" program to compile up the job to the required parameters.
If your LINKMAP jobs are getting large, you may care to check out the FASTMAP and depending upon your problem the HOMOZ programs on this menu.
The mutation model may be useful in any circumstance where one has cause
to believe that a substantial fraction of cases are caused by
a new mutation. Important special cases mentioned by Dan and
Joe include:
There is discussion in the vicinity of page 176 in the Terwilliger and Ott book as to how to select the mutation rate when using the mutation model. The default rate given by PREPLINK is plausible in some cases.
"compilejob" is a script for automatically recompiling the fastlink versions with the required maxhap values.
Maxhap is derived from the number of alleles at each locus multiplied together. A maxhap of 1024 is big, there is a current maximum of 1,600. If your job requires a larger maxhap value and you cannot solve this by better problem design, please contact user support.
If you so desire, compilejob will allow you to submit a job to the
appropriate batch queue. Small linkage jobs will be submitted to the
small linkage queue, others will be submitted to the big linkage queue.
The decision is currently taken on the value of maxhap.
Running Linkage Programs in Batch:
Programs like those of the linkage package run more efficiently when run concurrently. We have one machine dedicated to running one large linkage job at a time. This has 512 Mb of memory and fast single processor. You can submit jobs to run on this machine as below:
batchjob lcp_command_file_name
eg. batchjob pedin
You will get emailed when this job has started running on an appropriate machineand when it has completed.
Note:
Programs submitted by batch mode using the "batchjob" mechanism are automatically checkpointed using the script level checkpointing facilities provided by fastlink. If you simple restart your job after a machine crash it may sometimes recover the data it has already computed. If it has recovered data, please check the output carefully.
It is in your own interests to read the online documentation for details of checkpointing. The following file is of particular interest: /packages/fastlink/doc/README.checkpoint
It is advisable to have only one linkage "run" per lcp command file, because this makes the checkpointing more likely to work
From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999
FASTLINK, version 4.1P
Each section in each README file starts with the string "|*|". To browse the sections, use your file viewer to search for this unique string. This is the top level README file.
|*| INTRODUCTION
As described in the papers:
This directory and its subdirectories contain version 4.1P of faster versions of the general pedigree programs of LINKAGE 5.1. Several of our users of earlier versions 1.0 and 1.1 have dubbed the new programs FASTLINK. A PostScript version of the papers can be found in the files paper1.ps, paper2.ps, paper5.ps, paper6.ps, and paper7.ps. Please cite the first two papers (so that all participants in the FASTLINK project get credit), if you use these programs in a published experiment. You should continue to cite the original papers on LINKAGE, listed below, if you use FASTLINK:
README.allele -- explains the diagnostic that states that a pedigree or dataset has unused alleles.
README.bugreport -- suggest how to send in a bug report
README.checkpoint -- explains the checkpointing scheme for LODSCORE and ILINK
README.constants -- explains the mysteries of how to properly set some of the constants in FASTLINK
README.diseq -- explains an option introduced in FASTLINK 4.1P for modeling linkage disequilibrium
README.ILINK -- What does the output of ILINK and LODSCORE mean?
README.loopfile -- explains the syntax of the file loopfile.dat used to transmit genotype inferences from unknown to the main program when working with looped pedigrees
README.lselect -- explains the new easy automatic way to select loop breakers
README.mapfun -- Explains how the mapping functions explained in Chapter 1 of Ott's book actually relate to what is in the code
README.memory -- explains memory requirements
README.scaling -- essay on the output likelihood values from FASTLINK
README.time -- a short essay on estimating the running time of sequential FASTLINK runs.
README.trouble -- LINKAGE/FASTLINK Troubleshooting
README.unknown -- describes modifications to the UNKNOWN preprocessor
program first introduced in FASTLINK 2.3P, including improved
error reporting and a bugfix.
Send suggestions for other FASTLINK documentation you would like to see to schaffer@helix.nih.gov.
Please let us know if you have problems with the programs, including if you are unhappy with the speedup and are willing to share your data to the extent that we may be able to study the problem. Note that this does not mean you have to tell us anything about what disease you are studying. And of course we will respect any request for confidentiality. We only wish to consider studying problems to see if we can find improvements.
If you read README.updates you will see that lots of the updates are suggested by users who are enthusiastic about FASTLINK, but would like to see it improve. One of the best ways to encourage us to work harder on FASTLINK is to send in your constructive suggestions.
There is a mailing list of over 200 FASTLINK users. If you wish to be on this mailing list, send e-mail to schaffer@helix.nih.gov.
From: ftp-bimas.cit.nih.gov Last mod: January 30, 1996
::::::::::::::
README.allele
::::::::::::::
|*| Diagnostic for extra alleles
This file explains the diagnostic that states that a pedigree or dataset has unused alleles. This diagnostic has been implemented by Chris Hoelscher for inclusion in FASTLINK 2.3P and beyond. The renumbering is implemented starting in version 3.0P and beyond.
The running time of LINKAGE and FASTLINK grows rapidly with the number of alleles specified for each locus used in a run. Therefore, it is important to specify no more alleles than are actually needed for the analysis. Various partial solutions to the "extra allele" problem have been implemented by:
Ellen Wijsman (in the context of LIPED) Jathine Wong and Cathryn Lewis (in the context of LINKAGE/FASTLINK) Scott Diehl, Bettie Duke, and Lynn Ploughman (in the context of MENDEL) Alan Young (in the context of GAS)
At the end of this essay we briefly describe the the partial solution implemented by Wijsman and Diehl-Duke-Ploughman. In the context of FASTLINK, their solution is applicable only to the LINKMAP and MLINK programs. We have not implemented an extension of their solution in FASTLINK 3.0P.
|*| Extra alleles in symbols and an example
Suppose a locus has n alleles, A1 through An, that occur in the population at large. Suppose that in a population to be studied with linkage analysis, only alleles A1 through Ak, with k < n-1 occur. Then one may combine alleles A(k+1) through An into one "catch-all" allele unless one is estimating allele frequencies. The frequency of the catch-all allele is the the sum of the frequencies of A(k+1) though An.
A concrete FASTLINK example:
Suppose the general population has the possibilities:
Allele 1 2 3 4 5 6 Frequency .3 .2 .15 .1 .22 .03
and this is encoded in the locus file (datain.dat).
Suppose that the pedigree(s) encoded in the pedigree file (pedin.dat) contain only the alleles 2, 4, and 5. LINKAGE and FASTLINK require that the alleles be numbered consecutively starting at 1. Therefore, in the process of reducing from 6 to 4 alleles it is necessary to renumber the alleles.
Renumber old allele 2 to be new allele 1 with frequency .2 Renumber old allele 4 to be new allele 2 with frequency .1 Renumber old allele 5 to be new allele 3 with frequency .22 Create catch-all allele 4 with frequency .48 (sum of frequencies of old 1, old 3, old 6)
No person should have the catch-all allele, but it is absolutely wrong to omit the catch-all allele.
Important technical note: the process of renumbering alleles to reduce their number loses no information in a statistical sense, unless one is estimating allele frequencies. Renumbering is distinct from "downcoding", in which multiple alleles that are distinct and do occur in the population are given the same number, in the interest of reducing running time. In general, downcoding loses information, although there are some special situations in which it does not because the frequencies of some different alleles happen to be identical.
|*| Extra alleles and separating pedigrees
The use of extra alleles often arises when the original data had P pedigrees amongst which all n alleles occur, but the population in some analysis with Q < P pedigrees contains only k < n-1 of the alleles.
The MLINK and LINKMAP programs analyze each pedigree one at a time, and sum the values of -2*(log(likelihood)) for each pedigree. Since allele renumbering makes sense on a per pedigree basis, it is valid to renumber alleles for each pedigree in an optimal manner. This requires using a different locus file for each pedigree because the renumbering may assign the same new allele number to different old alleles. One annoyance of doing the analysis for each pedigree separately is that the output values must be summed. The process of automating the separation of input pedigrees and combination of output results was automated for LIPED by Ellen Wijsman and for MENDEL by Scott Diehl, Bettie Duke, and Lynn Ploughman.
The above solution does not work for ILINK or LODSCORE.
|*| FASTLINK diagnostic error message
The main programs in FASTLINK do not know about all the loci in the locus file (datain.dat). They only know about the loci that are actually used in a given analysis. For example, if an analysis uses loci 1, 7, and 12, in *any* order, locus 1 will have index 1, locus 7 will have index 2, and locus 12 will have index 3 when reported in the diagnostic.
From: ftp-bimas.cit.nih.gov Last mod: October 6, 1997
|*| How to Submit a Useful Bug Report
Any bug reports on FASTLINK should be sent to Alejandro Schaffer (schaffer@helix.nih.gov). I have been trying to investigate all bug reports as quickly as possible.
The purpose of this document is to tell you what you should send me, so that I can track down any problem as quickly as possible without having to ask for more information. I am really anxious to find and fix whatever bugs remain in FASTLINK and LINKAGE. All the bug reports I've gotten so far have helped tremendously. However, I would like to speed up the process by guiding you in what I need to know to track down a bug successfully.
In general, there are four categories of bugs:
Here is what you should send me for each type of bug report:
Compilation Problems:
Crash or Results obviously bogus or inconsistent with LINKAGE
Anything else
If you run the programs directly rather than using a shell script, then instead of pedin.dat, datain.dat, and script, I would want pedfile.dat, ipedfile.dat, speedfile.dat, datafile.dat.
Beyond the items above, please describe any symptoms that seem relevant to you. I'd rather too much information than too little. However, you should just report behaviors that you see, not any speculations about the causes of those unexpected behaviors.
All data sent to me will be kept in complete confidence.
From: ftp-bimas.cit.nih.gov Last mod: February 9, 1996
Checkpointing in FASTLINK
by K. Shriram and A. A. Schaffer
Rice University/NIH
This README file is meant to accompany version 2.3P and beyond of FASTLINK. See the top-level README file for a roadmap to all the documentation.
This file describes in detail the checkpointing scheme that was implemented by K. Shriram and A. A. Schaffer. Checkpointing means periodically saving the state of a computation. The purpose of checkpointing is to be able to recover from a crash of the underlying computer that causes one of the FASTLINK programs to stop for a reason that has nothing to do with its computation. Two common causes for such crashes are power failures and lightning hits. Right now checkpointing works only for the sequential versions of FASTLINK on UNIX, and for MLINK and LINKMAP on VMS.
A more cursory but more scholarly description of how the checkpointing works can be found in one section of:
This paper can be found in paper2.ps that comes with the FASTLINK distribution. At the time the paper was written, the checkpointing scheme had been implemented only in LODSCORE and ILINK; these are the two difficult cases for checkpointing and the programs where it is most needed.
After seeing the checkpointing scheme in LODSCORE and ILINK for versions 2.0 and 2.1, several users who had suffered machine crashes during LINKMAP runs clamored for extending the scheme to the other two programs. As of version 2.2, all four programs have checkpointing and crash-recovery.
Through version 1.1, FASTLINK provided the same level of functionality as LINKAGE 5.1. Checkpointing adds new functionality, so we decided to write more detailed documentation about the checkpointing facility. Any questions, comments, or complaints should be directed to Alejandro Schaffer (schaffer@nchgr.nih.gov).
[This README file has been organized with each section starting with the string "|*|". To browse the sections, you can thus use your file viewer to search for this unique string, thus getting from one section to the other without having to read the intervening material.]
Frequent LINKAGE users almost certainly have had the computer crash during a long run, only to have to start the computation again. We have now included a "checkpointing" package in the code that occasionally saves the state of the computation, so that a crashed program can be restarted without much computation lost. The folklore wisdom seems to be that this form of augmentation to programs is the proper mechanism for recovering from crashes. This file briefly the checkpointing process and explains the files connected with our implementation.
There are standard packages that do checkpointing of programs for specific operating systems, but we wanted our code to be somewhat portable because LINKAGE is used on a variety of operating systems.
Unless otherwise specified, the descriptions that follow apply equally to all of ILINK,LODSCORE, LINKMAP, and MLINK. In particular, to distinguish the two, we use the names of the programs in the filenames. We shall annotate this by the string "<>", which should be replaced by the program name in question. Thus, for instance, the filename `checkpoint<>.bak' would denote `checkpointILINK.bak' or `checkpointLODSCORE.bak', depending upon context.
Before getting into details, there are three VERY IMPORTANT cautions in using the FASTLINK crash-recovery scheme.
rm checkpoint* script* outf* main*
Note: extreme care should be taken when removing these files that you don't have other meaningful files in the same directory with any of these prefixes. If "rm" doesn't normally prompt you for each file before removing it, it is probably wiser to delete these files by hand.
2. The time to save the state to a file is not zero. Therefore, if a crash occurs while the state is being saved, the program may be a little confused on restart. In particular, it may unnecessarily redo one or two likelihood function evaluations. When this happens with LINKMAP or MLINK, it means that duplicate data will show up in the output file because they write out their output after each likelihood function evaluation.
3. The checkpointing scheme has been extensively tested with simulated crashes, but we do not induce a crash of the whole system in testing. Furthermore, system-wide crashes can have bizarre and unimaginable side-effects. Therefore, user feedback based on what happened during real crashes and real runs will be invaluable in making the checkpointing system more robust.
|*| The Process
Most of the discussion below focuses on the programs ILINK and LODSCORE. At the end we explain the much simpler method of checkpointing used in LINKMAP and MLINK.
The programs ILINK and LODSCORE perform checkpointing at two distinct types of locations. A checkpoint is created at the start of each iteration (in the function iterate()); it is also made at the beginning of the functions initialize(), outf(), firststep(), decreaset() and increaset(), and at the beginning of the loops in gforward() and gcentral(). We distinguish between these two types by the terms "iteration-" and "function-checkpoint", respectively; the latter term is used since the program proceeds to make one or more calls to the routine fun() shortly after the location of checkpointing.
In the case of LINKMAP a simple checkpoint is taken after each likelihood function evaluation. MLINK is the same except we do not checkpoint on the first function evaluation where the moving marker is unlinked to the others.
The files final.dat and stream.dat (if requested) primarily contain the output, so a checkpointing mechanism must take care to ensure the contents of these files are not altered in any way by the process. In ILINK and LODSCORE All output to these files takes place in the routine outf() (and from the routines it calls); hence, these files are checkpointed before entry into outf(). More details on this follow under the discussion of the actual files created. In MLINK and LINKMAP these files are updates after each function evaluation, so they have to checkpointed as well.
|*| The Files
The following is a list of the files created for the purposes of checkpointing. All of these files are placed in the working directory of the current run of the program.
It is important to ensure that none of these files are present at the start of a fresh run; however, do not delete any of these after a run has begun, and especially when trying to recover from a previous run.
NOTE: Please note that the file protections set by the program
may not be what you desire. These can be changed by altering
the value of the variable CopyAppendPerms in the file
checkpointdefs.h, where the value specified should be as given
to the chmod(1) command. (The additional leading `0' is
essential; it makes the value that follows to be treated as a
constant in octal, as required by chmod(1).)
checkpoint.<> text, binary
For ILINK and LODSCORE:
This file is written at two types of places, namely an iterationand
a function-checkpoint. Only three parts of this file are in
text mode; they are:
The date/time stamp marks when the current checkpoint was begun. (This is not necessarily the same time as that the system shows for the file.) Of the two type information fields, the first tells us whether this is an iteration- or function-checkpoint (distinguished by the values "0" and "1", respectively); the second stores additional information about location that varies depending upon the type of checkpoint.
Following these are the bytes that constitute the actual values being stored; these are in an architecture-dependent binary format.
Finally, the end-marker provides us with a means of partially checking for the integrity of the data written in the checkpoint.
For LINKMAP and MLINK only some counters indicating how many function evaluations are complete need to be stored in this file.
checkpoint.<>.bak text, binary
When a checkpoint is to be written and a checkpoint file is already found, the existing file is moved to this backup name and the new one is written in its place. The main purpose of doing this is to increase security against crashes: should the crash have damaged the checkpoint file but have left the backup untarnished, the backup may be copied into the checkpoint and computation can be resumed, even if from a slightly earlier stage in the run.
The format of this file is the same as that of the checkpoint file, which is copied into the backup without modification.
outf.LODSCORE.stream.dat text outf.ILINK.stream.dat text main.LINKMAP.stream.dat text main.MLINK.stream.dat text outf.ILINK.final.dat text main.LINKMAP.final.dat text main.MLINK.final.dat text outf.LODSCORE.recfile.dat text
These files are created by the subroutine outf() or main(). Their purpose is to maintain copies of the files stream.dat and final.dat (for ILINK, LINKMAP, or MLINK) or recfile.dat (for LODSCORE), respectively, so that if recovery needs to take place after these files have been written to, the two files can be restored to the state they had.
script.<>.final.out text script.<>.stream.out text
Since the standard scripts being used delete the files final.out and stream.out at the start of execution, the program makes a copy of the current state of these files into the names listed. Thus, when recovering in the midst of a script, the files can be restored to their state when the programs were last entered.
main.LODSCORE.stream.dat text main.LODSCORE.recfile.dat text
Since a crash can occur in the middle of an iteration in LODSCORE and the output of the previous call to outf() would then be lost, these files are created at the start of the loop in main() so as to preserve the old output (which hasn't yet been appended to final.out and stream.out).
When the checkpoint cannot be recovered accurately, the program checks to see whether the backup exists. Depending upon its presence (but not upon its integrity), one of two message is displayed. In either case, the user is advised of the circumstance, of a possible cause for it, and of what corrective action might be taken to repair the situation as best as possible.
|*| Modifying Scripts and Checkpointing
Our experience shows that some users request multiple runs of a FASTLINK program with one shell script. As a consequence a crash may occur after some (but not all) of the requested runs are complete. When this happens, it would be nice not to lose the results of the completed runs. A user who restarts the crashed script would not like the runs that were completed previously to be redone. We have made a primitive facility to do this type of checkpointing, which we call "script-level checkpointing". However, for users who want to be safe we recommend doing only one run per shell script.
This section applies if you use script-level checkpointing, and wish to modify the scripts in the region surrounding the calls to ILINK, LODSCORE, MLINK, or LINKMAP, or wish to affect operations done to the files final.out and stream.out. We presume that the user is using shell scripts made with auxiliary program lcp that comes with LINKAGE. It would be impossible to make a script-level checkpointing scheme that could handle arbitrary scripts. We also assume that the user puts output in final.out and stream.out, using the default options of lcp.
The "standard" scripts for which we support script-level checkpointing affect final.out (and stream.out) on each run as follows for each ILINK run (and similarly for LODSCORE, MLINK, and LINKMAP):
lsp [...]
if [ $? = '0' -o $? = '1' ]
then
cat lsp.log >> final.out
cat lsp.stm >> stream.out
unknown
if [ $? = '0' ]
then
ilink
if [ $? = '0' ]
then
cat final.dat >> final.out
cat stream.dat >> stream.out
fi
fi
fi
To ensure that final.out is in the same state after our program has finished execution as it would be after this piece of script code has run, we have the following code toward the end of ILINK:
copyFile ( "final.out" , ScriptILINKFinalOut ) ; appendFile ( "final.dat" , ScriptILINKFinalOut ) ;
if ( dostream )
{
copyFile ( "stream.out" , ScriptILINKStreamOut ) ;
appendFile ( "stream.dat" , ScriptILINKStreamOut ) ;
}
which simulates the operation of the script. This is necessary since, at the stage where this code is run, the script-level checkpoint routine assumes that the run of ILINK has completed successfully, so that this entire invocation of ILINK will be ignored, and the next invocation will copy final.out and stream.out from the files named by the #define'd names above.
Hence, modifying the scripts in the light of script-level checkpointing requires for one to carefully study the operation of the main programs, the scripts and of the program ckpt. In general, it is necessary to mimic in the program that which would be done in the script, so that during recovery it will be indiscernible whether or not the script was stopped or not in the first place. However, these mime operations must be carefully placed, for if they are placed before the script-level checkpoint file is written to, then the operations would be performed one extra time, which is undesirable.
|*| Using the Script-Level Checkpointing Facility
The program ckpt implements the script-level checkpointing facility (with cooperation from ilink and lodscore, as appropriate). It's primary task is to accept the name of a script to be run, and a specification of whether the script is for ILINK or for LODSCORE. A typical invocation might look like this (we use `%' to denote the user's prompt):
% ckpt lodscore aLodscoreScript
or
% ckpt ilink anIlinkScript itsArgument
or
% ckpt linkmap aLinkmapScript itsArgument
or
% ckpt mlink anMlinkScript itsArgument
where the first parameter to ckpt tells it what kind of script it is going to run. The second parameter is the name of the actual script. If there are additional parameters for the script itself, these can be specified after the name of the script, as in the second example (where "itsArgument" is provided). The second run would, hence, be equivalent to running
% anIlinkScript itsArgument
but with the script-level checkpointing facility in action.
The code for ckpt is in the file ckpt.c. to make an executable version run the command:
make ckpt
|*| Important Caution on Breaking a ckpt Run
The ckpt program executes a system(3) call to invoke a shell in which to run the named script (with it's arguments, if any). Hence, if the user decides to abort execution and breaks execution by hitting, say, Control-C (^C), this will certainly stop the invoked shell, but will not necessarily abort the calling process (ie, ckpt). This has the following deleterious effect: when control returns to ckpt, if it is indistinguishable that the invoked shell was halted prematurely, then ckpt erases its data file, so the next time it is run, it will assume that the previous run exited normally. This is clearly not the desired effect.
Unfortunately, being able to detect premature halting of the invoked shell is dependent upon the value returned by the system() call. This may not work on all operating systems and architectures as desired, making this an unreliable way of stopping execution, should this be desired. It is recommended that, instead, the user do the following:
Again, this is not guaranteed to succeed, but should work on most systems. Note, of course, that it requires the shell to support job control and also that the shell was compiled with this feature installed.
From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999
FASTLINK, version 4.0P and beyond
This file describes an issue that will affect all users: how to set certain constants, that vary from run to run depending on the data set and chosen loci. See the file README for a roadmap to all the FASTLINK documentation.
The definitions for most of the constants that a user wants to change have now been set up in such a way that they can be modified in the Makefile without ever having to edit the code. An important consequence is that it is now possible to edit just the Makefile and be able to compile different versions of the programs with different values of of the constants. If you are not an experienced user of the Make utility, consult your system administrator for help in editing the Makefile.
The following .h files contain declarations of constants and data:
commondefs.h stuff that is common to all 4 main programs checkpointdefs.h stuff for checkpointing gemdefs.h stuff for GEMINI, common to LODSCORE and ILINK moddefs.h stuff specific to fast versions of programs slowmoddefs.h stuff specific to slow versions of programs ildefs.h stuff specific to ILINK lidefs.h stuff specific to LINKMAP lodefs.h stuff specific to LODSCORE mldefs.h stuff specific to MLINK compar.h stuff specific to parallel FASTLINK unknown.h stuff specific to UNKNOWN
|*| Constant definitions - VERY IMPORTANT!!!
There are at least 2 constants that are defined in moddefs.h that you will want to set before compiling. This means that you can edit the files to put in the appropriate numbers and then compile. The next section explains how to change the constants by editing only the Makefile.
The constants in moddefs.h are
AUTOSOMAL_RUN
SEXDIF_RUN
The user gets a severe warning if either of these constants is set to 0 and should be 1. The program will probably crash after the warning is printed.
AUTOSOMAL_RUN must be 1 if your data is autosomal. It can be 0 if your data is sexlinked. It may be worth it to change it to 0 for a sexlinked run because this will drastically reduce the memory requirements and may make it possible to use the faster versions. In terms of correctness, it always safe to set AUTOSOMAL_RUN to 1.
SEXDIF_RUN must be 1 if your data is autosomal AND you want to allow the male theta and female theta to be DIFFERENT. From our experience, such runs are rare in practice, so we are distributing the code with SEXDIF_RUN set to 0. It is always safe to have SEXDIF_RUN set to 1, but again you can save a lot of memory by setting SEXDIF_RUN to 0. It is safe to set SEXDIF_RUN to 0 if:
Here are some other constants that you might need to change occasionally. There is relatively little harm caused by boostong these constants higher.
maxsys: maximum number of loci used in the run; this is most relevant for
LODSCORE where one may do 2-point analysis on many different
pairs of loci.
maxlocus: maximum number of loci in one run of one program.
maxall: maximum number of alleles at a numbered-allele or binary-factors locus
maxfact: maximum number of binary factors; should be at least as large as maxall
maxind: maximum number of people in all pedigrees combined
maxped: maximum number of pedigrees
maxchild: maximum number of children of one parent
maxloop: maximum number of loops
fitmodel: is false unless you are estimating some parameter other than theta
ALLLE_SPEED: Is 1 if you want allele renumbering to be used; you should keep it at 1, except when estimating allele frequencies.
|*| Setting Constants by Editing only the Makefile
It is now possible to use the -D feature supported by cc, gcc, and most C compilers to change constants during compilation. For example, the default declaration of maxloop now looks like:
#ifndef maxloop
#define maxloop 6
#endif
This tells the C preprocessor that reads the hashed lines: "If maxloop is not already defined, then set maxloop to be 6".
The way you can make maxloop already defined is to include the
string -Dmaxloop=<number> in all the compilation commands.
For example, if you wanted maxloop to be 8, you would include the
flag
-Dmaxloop=8
in your compilation. This overrides the setting of 6 that is in commondefs.h
See README.Makefile for detailed instructions on how to edit the Makefile to set maxloop and other constants.
|*| Checking how constants are set for a given executable
FASTLINK now includes a -i option (for infor) for ILINK, MLINK, LINKMAP, and LODSCORE that summarizes how the various compilation options/variables are set for a given executable. For example, if you run:
linkmap -i
you get a description of how the program is configured, but nothing interesting is computed. A sample output might be:
Program LINKMAP version 5.10 (1-Feb-1991)
FASTLINK (slow) version 3.0P (29-Sep-1995)
LINKMAP has been compiled with the following options:
CHECKPOINTING is enabled (DOS not defined)
SLOW version (LESSMEMORY defined)
Program constants are set to the following maxima:
8 maximum number of loci (maxlocus)
15 maximum number of alleles at a single locus (maxall)
1000 maximum number of individuals in a pedigree (maxind)
6 maximum number of loops (maxloop)
16 maximum number of children of a parent (maxchild)
This option works for both sequential and parallel versions of FASTLINK.
Flagless runs now also print out "(slow)" with the version number if the given executable is a "slow" version (as seen in the example above).
From: ftp-bimas.cit.nih.gov Last mod: June 28, 1999
Using Conditional Allele Frequencies or Parameterized Disequiliobrium in FASTLINK By Ken Morgan and Alejandro Schaffer
Starting with FASTLINK 4.1, I am exploring the possibility of adding several options to model linkage disequilibrium in FASTLINK. On such option, conditional allele frequencies, has been completed. Conditional allele frequencies means that allele frequencies at markers can depend on the genotype at the disease locus on the same chromosome strand (haplotype). This is slightly more general and flexible than the disequilibrium option currently allowed in LINKAGE/FASTLINK
|*| Basics of using conditional allele frequencies The user is assumed to be familiar with FASTLINK usage and estimating allele frequencies with ILINK.
The constant ALLELE_SPEED works as in FASTLINK.
It is located in unknown.h and commondefs.h.
Set
#define ALLELE_SPEED 1
to achieve greater speed in non-estimation mode, when using
conditional allele frequencies.
Set
#define ALLELE_SPEED 0
#define fitmodel true
to allow for estimation of frequencies.
#define ALLELE_SPEED 0
#define fitmodel true
is always safe but may be unnecessarily slow.
"Conditional allele frequences" means that at a marker, the frequency of an allele may depend conditionally on the allele at the disease locus. E.g., the relative frequencies of the haplotypes
Disease 1 1
Marker 1 2
Disease 2 2
Marker 1 2
may be quite different.
Estimating conditional allele frequencies is conceptually different from
estimating haplotype frequencies because in the former case the
disease allele frequencies stay fixed, while in the latter case they do not
stay fixed.
To get started it is necessary to make a basic change in the format
of datafile.dat.
For each marker locus, put 2 lines of allele frequencies instead of 1.
E.g., instead of:
3 5
0.07000000 0.01000000 0.15000000 0.04000000 0.73000000
put
3 5
0.07000000 0.01000000 0.15000000 0.04000000 0.73000000
0.07000000 0.01000000 0.15000000 0.04000000 0.73000000
So long as the two lines of frequencies are equal and you are not estimating frequencies, the results should be identical to regular FASTLINK. If you want to use conditional allele frequencies, you must similarly double all marker allele frequency lines in datafile.dat. This is a reasonable requirement because if for some marker you do not want the allele frequencies to vary depending on the disease allele, then the two lines can be identical.
Do not change the way in which the disease locus is specified.
To tell unknown that you are using conditional allele frequencies, use
unknown -c
instead of
unknown
To tell mlink/ilink/linkmap that you are using conditional allele frequencies,
use:
mlink -c
ilink -c
linkmap -c
instead of
mlink
ilink
linkmap
To estimate conditional allele frequencies with ILINK, the procedure
is similar to regular ILINK.
At the bottom of datafile.dat are two lines that look like:
k
1 1 1 1 ...
or
k
0 1 1 1 ...
where k is index of the locus for which frequencies are to be
estimated, and
the first number of the last line is:
0 if theta stays fixed
1 if theta is to be estimated
the remainder of the last line has (a - 1) 1's where a is the number
of alleles at locus k.
Caution: The highest numbered allele, a, must occur or regular ILINK
will crash.
For conditional estimation, the last line of datafile.dat should have 2a -1 numbers instead of a numbers. Again the first number may be 0 or 1, and the remaining 2a-2 numbers should be 1's.
When ilink -c is used to estimate frequencies conditionally, part of the output final.dat might look like:
GENE FREQUENCIES :
0.309689 0.424004 0.250168 0.016138
CONDITIONAL (on disease allele) GENE FREQUENCIES :
0.242056 0.303824 0.452248 0.001871
The first line is conditional on the healthy allele at the disease locus. The second line is conditional on the unhealthy allele at the disease locus.
|*| Usage Suggestions
(a) the log-likelihood for estimated marker allele frequencies and
the recombination fraction (theta) between the marker and disease
loci;
(b) the log-likelihood for estimated marker allele frequencies
and fixed theta=0.5; the difference in the log-likelihoods is converted
to a lod score.
2. For the test of LD, compare the change in the
log-likehoods under LD and LE where both theta and the marker allele
frequencies are jointly estimated.
One may assume that twice this difference is asymptotically distributed as
chi-square with k -1 degrees of freedom (where k = number of
distinct alleles of the marker locus in the data). (The P-value may
need to be estimated empirically for small samples for the situation
where one or more conditional allele frequencies are estimated to be
0.)
3. For a test of linkage allowing for linkage disequilibrium, there are
tqo approaches.
Approach A: Constrain the two vectors of conditional allele
frequencies to be equal when theta=0.5.
Approach B: Allow the vectors of allele frequencies to be unconstrained
in both cases. Then in the likelihood ratio test for
linkage, the allele frequencies (conditional or not) become nuisance
paramaters, and there is only 1 degree of freedom.
Approach B (as a special case of the general idea of
likelihood ratio tests with nuisance parameters)
is advocated by Joe Terwilliger.
From: ftp-bimas.cit.nih.gov Last Mod: May 30, 1995
|*| What does the output of ILINK and LODSCORE mean?
This file describes the output that the programs ILINK and LODSCORE print to the screen. For the rest of the text we describe things in terms of ILINK because the output for LODSCORE is very similar. The need for this document was suggested by Marcy Speer (Duke).
ILINK uses the GEMINI optimization procedure to find a locally optimal value of the theta vector of recombination fractions. If you use the default scripts produced by lcp, your initial guess for theta is .1 in every dimension. GEMINI evaluates each theta by its likelihood, seeking to find theta vectors that have a higher pedigree likelihood.
The GEMINI procedure has multiple iterations. Each iteration corresponds to one line of output. Each iteration includes multiple likelihood function evaluations. Each iteration has two phases. In Phase I GEMINI seeks to improve the current best theta. In Phase II, GEMINI estimates the gradient of the likelihood with respect to the current best theta vector. In the first iteration, Phase I only evaluates the likelihood at the initial candidate theta.
When ILINK prints out a line such as:
maxcensor can be reduced to -32767,
it has completed the first likelihood function evaluation.
On long runs, this fact can be used to estimate running time.
A reasonable rough estimate for the number of function evaluations
is 10*(number of dimensions of theta vector). The number of dimensions
of the theta vector is one fewer than the number of loci in most cases.
If maletheta and femaletheta are allowed to differ (sexdif is set to 1),
then the number of dimensions doubles to 2 * (number of loci - 1).
Estimating other parameters (with fitmodel set to true) can also
increase the number of dimensions.
After each iteration, ILINK prints out one line with four pieces of information:
ITERATION is a positive integer showing the number of the iteration just completed.
T is an indication of the step size that the GEMINI procedure takes in updating theta. Sometimes, very small T indicates that GEMINI did many updates (and hence the iteration probably took longer than average) each of which requires a likelihood function evaluation.
NFE is a positive integer indicating how many likelihood function evaluations have been done through that iteration.
F is a scaled representation of -2log(likelihood) at the current best theta. Because of the - sign, the value of F decreases until it reaches a local minimum.
After the last printed iteration, ILINK in FASTLINK does one more likelihood function evaluation for the purpose of computing Ott's Generalized LODSCORE which shows up in final.dat (transferred to final.out by the default pedin scripts). Ott's generalized LODSCORE compares -2log(likelihood) at the locally optimal theta to -2log(likelihood) at a theta that is .5 in every component (i.e. each locus unlinked to all the rest). In LINKAGE ILINK more likelihood function evaluations are done after the last printed iteration line, but these likelihood function evaluations are unnecessary (see paper2.ps from the FASTLINK distribution for more details).
Some users run ILINK and LODSCORE with execution scripts that do not delete the output file outfile.dat upon termination. The file outfile.dat is primarily useful in storing information about the values of certain variables at each iteration; these variables are not of interest, except for those who wish to modify the code. Of interest to users is the last thing in outfile.dat which is some description of the condition under which LODSCORE and ILINK terminated. This is a code stored in the variable idg and takes one of 8 values:
Under all circumstances it should be emphasized that if ILINK or LODSCORE is used with only a single starting theta, the output value is only a local optimum and not a global optimum. It is a good idea to try with several different starting thetas. It is perfectly valid to compare the local optima from different starting points and choose the one that gives the best value of -2*log(likelihood); the more staring points tried, the more likely that the best value will be a global optimum.
If ILINK or LODSCORE exits with condition 5 or 6, the output value is pretty safe as a local optimum.
If ILINK or LODSCORE exits with condition 7, the output values are completely unsafe. The source code must be modified to increase iterationMultiple, which is #defined in gemdefs.h.
If ILINK or LODSCORE exits with conditions 1,2,3,4, or 8 the situation is
more nebulous, but it is a good idea to try more experiments to test
how robust the output values are. Try starting from different initial
thetas. One might also try increasing the constant tol in gemdefs.h
Increasing tol will have the effect of relaxing the convergence
criteria, so that ILINK and LODSCORE may come close to a local optimum,
where a smaller tol causes problems.
If increasing tol helps, then one should:
find the local optimum with the higher tol
reset tol to its previous value
restart the program with the first local optimum as the initial value
This experiment will test whether the initial local optimum can be improved
by more precise calculations.
ILINK or LODSCORE does not allow the theta values to get down to 0.0. Therefore, if one of the locally optimal thetas is reported as close to 0.0, the situation ought to be explored further using LINKMAP or MLINK, which will allow arbitrarily small values of theta.
From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999
All about loopfile.dat
by Dylan Cooper and
Alejandro A. Schaffer
|*| What is loopfile.dat?
Beginning with version 3.0 of FASTLINK we are making a fundamental change in the way loops are handled. The most important manifestation of the change is that the specifications for the preprocessor program UNKNOWN have changed. In particular, for pedigrees with loops, the new UNKNOWN will produce an extra output file to assist the main program. This applies to ILINK, MLINK, and LINKMAP where the standard scripts call UNKNOWN immediately before calling the main program. The change does not apply to LODSCORE for which the standard scripts do not use UNKNOWN. The main programs will still work correctly if the extra file is not present (in particular, if the old version of UNKNOWN is used) and the extra file gets deleted when the main program exits without a crash.
The file whose name is held in the macro LOOPFILE_NAME (probably "loopfile.dat") is produced by the new unknown.c when LOOPSPEED is defined. This file is used to speed up runs of ILINK, LINKMAP, and MLINK when at least one pedigree contains at least one loop.
The contents of the file and the method by which the speed up was obtained rely on the concept of a loop-breaker vector. A loop-breaker vector is an array of single locus genotypes which assigns one single locus genotype to each loop breaker. To understand more about loop breakers, readers of this file are strongly encouraged to read the FASTLINK documentation files traverse.ps and loops.ps.
|*| Syntax of loopfile.dat
loopfile.dat describes the contents of data structures that are created in unknown.c and used in ILINK, LINKMAP, and MLINK The entries in the file are as follows:
Pedigree: The pedigree for which the following information pertains. Pedigrees are numbered consecutively from 1.
fewer_vects_size: Used for diagnostic output when a malloc fails.
num_loops_considered: Due to space constraints, the number of loops considered in these data structures is bounded. A noticeable speedup is achieved even when only some of the loops in the pedigree are considered. Reducing the macro 'max_vectors_considered' in unknown.c may reduce the value of this variable.
num_loop_vectors: a table indexed by the locus numbers, holding the number of loopbreaker vectors at that locus
loop_vectors: a table indexed by the locus numbers, holding the loopbreaker vectors at each locus
unknown_poss: a table indexed by person id, locus, loopbreaker vector, and single locus genotype. If the corresponding entry is true, the person may have that genotype at that locus when the loopbreakers have been assigned the single locus genotypes specified in the loopbreaker vector.
Single locus genotypes are encoded in order by allele number, discarding genotypes where the second allele is greater than the first allele. (These allele combinations are discarded because phase is unimportant for the calculations.) For example, if a locus has 4 alleles there are ten possible genotypes:
allele 1 allele 2 genotype
1 1 0
1 2 1
1 3 2
1 4 3
2 2 4
2 3 5
2 4 6
3 3 7
3 4 8
4 4 9
Below is a commented
hypothetical file with comments indicating what each line means.
Due to the comments, the placement of white space may be distorted.
Starting in FASTLINK 4.1P, some improvements have been made to the genotype inference code, so that information about some loop breaker vectors that are not consistent (i.e., the asignment of the genotype to each loop breaker causes a violation of Mendelian rules of inheritance) are not printed. As a result, for some multi-loop pedigrees, loopfile.dat will be much shorter be different from the loopfile.dat generated by earlier versions of unknown.
Pedigree: 1 : This information is for the first pedigree
fewer_vects_size: 800 : Used in error messages
num_loops_considered: 3 : Three loops were considered
num_loop_vectors:
0 : 6 : 6 loopbreaker vectors at the locus 0
1 : 2 : 2 loopbreaker vectors at the locus 1
2 : 3 : 3 loopbreaker vectors at the locus 2
loop_vectors:
L : 0 : at locus 0
0 : 1 0 0 : loopbreaker vector 0 has single locus
1 : 2 0 0 : genotype 1, 0, and 0 at loci 0, 1,
2 : 0 2 0 : and 2 respectively
3 : 1 2 0 : etc
4 : 2 2 0
5 : 0 0 1
+
L : 1
0 : 0 0 0
1 : 1 0 0
+
L : 2
0 : 0 0 0
1 : 0 1 0
2 : 0 0 1
+
unknown_poss:
id: 3 : person 3 is unknown and has children
L: 0 : at locus 0
0 : 1 : person 3 can have single locus genotype 1
- : - indicates that person 3 is known at locus 0
L: 1 : at locus 1
0 : 0 1 : if loopbreakers have vector 1, person 3 can
1 : 0 1 : have single locus genotype 0 or 1
+ : + indicates unknown at locus 1
L: 2
0 : 0 1
1 : 0 1
2 : : indicates that no genotypes are possible at
+ this locus when the loopbreakers are assigned
this loopbreaker vector
id: 4
L: 0
0 : 0 1 2
1 : 0 1 2
2 :
3 : 0 1 2
4 : 0 1 2
5 :
+
L: 1
0 : 0 1 2
-
L: 2
0 : 0 1 2
1 : 0 1 2
2 :
+
From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999
A new method of selecting loop breakers
Alejandro A. Schaffer
|*| Selecting loop breakers easily and automatically
Exercise 7 on pages 93--96 of Handbook of Human Genetic Linkage
by Ott and Terwilliger describes a complicated, interactive method
to break loops using the makeped program and the LOOPS program
LOOPS [Xie X, Ott J: Finding all loops in a pedigree.
Am J Hum Genet 1992; 51:A205].
As a result of innovations in FASTLINK 4.0P and FASTLINK 4.1P,
their method is now obsolete.
The new method is as follows:
Note that the flag is the letter 'l', not the number '1'.
This will produce a new output file called
lpedfile.dat
which has all the loops broken for you.
If your goal is to run an lcp-produced script with pedigree file in pedin.dat and locus file in datain.dat, you then
5. Copy lpedile.dat to pedin.dat
6 Copy the locus file to datain.dat
and run your script.
You will see diagnostic output showing that unknown is still trying to find a better loop breaker set for you during the running of the lcp-produced script. The reasons are as follows.
From: ftp-bimas.cit.nih.gov Last Mod: May 24, 1995
|*| Map Functions Used In LINKAGE/FASTLINK
by Jeremy Buhler
Rice University
This README file tries to connect the discussion of mapping functions in Chapter 1 of Ott's book[3] to what actually happens in LINKAGE/FASTLINK.
LINKAGE/FASTLINK uses two functions for calculating map distance: Haldane's map function [1] and Kosambi's map function [2]. These functions are implemented as methods for calculating recombination fractions of flanking markers given the fractions between three adjacent markers.
If we have three loci A, B, and C which are present on the chromosome in the order ABC, we say that A and C are flanking markers. We say that A and B, as well as B and C are adjacent markers. If we know the recombination fractions theta(AB) and theta(BC), we would like to determine the fraction theta(AC). One way to determine theta(AC) is to take the sum theta(AB) + theta(BC); this is Morgan's map function, which equates distance on the linkage map to recombination fraction. This approach implicitly assumes only a single crossover between adjacent loci, which is unreasonable for loci which are not linked fairly tightly (theta < 0.1).
Haldane's map function assumes that crossovers follow a Poisson distribution, with no interference between crossovers. Haldane's function x(theta) is given by
x = -1/2 ln(1 - 2 * theta)
or, inversely,
theta = 1/2 [1 - exp(-2x)]
From this formula, we see that the process of adding recombination fractions while accounting for the new crossover distribution is equivalent to the mathematical manipulation:
x(AC) = x(AB) + x(BC) = -1/2(ln(1 - 2 * theta(AB)) + ln(1 - 2 * theta(BC)))
= -1/2(ln( (1 - 2 * theta(AB)) (1 - 2 * theta(BC)) ))
theta(AC) = 1/2 [1 - exp(-2 x(AC))]
= 1/2 [ 1 - (1 - 2 * theta(AB)) (1 - 2 * theta(BC))]
= 1/2 [ 1 - 1 + 2 * theta(AB) + 2 * theta(BC)
= 4 * theta(AB) * theta(BC)]
theta(AC) = theta(AB) + theta(BC) - 2 * theta(AB) * theta(BC)
This formula appears throughout LINKAGE/FASTLINK. Moreover, if we wish to use a map-function-derived theta(AC) and a given theta(AB) to derive theta(BC), we can rewrite the addition formula to find that
theta(BC) (1 - 2 * theta(AB)) = theta(AC) - theta(AB) theta(BC) = (theta(AC) - theta(AB)) / (1 - 2 * theta(AB))
This last formula is used in LINKMAP to recalculate theta(BC) from the known theta(AC) as B is moved incrementally across the gap between A and C.
Kosambi's map function is based on a model of chiasmal interference. It is given by
x = 1/2 arctanh(2 * theta) = 1/4 ln((1 + 2 * theta) / (1 - 2 * theta))
or, inversely,
theta = 1/2 tanh(2x) = 1/2 (exp(4x) - 1) / (exp(4x) + 1)
Under this mapping function, addition of recombination fractions is equivalent to the following manipulation:
x(AC) = x(AB) + x(BC)
1 / 1 + 2 * theta(AB) \ 1 / 1 + 2 * theta(BC) \
= - ln | ----------------- | + - ln | ----------------- |
4 \ 1 - 2 * theta(AB) / 4 \ 1 - 2 * theta(BC) /
1 / 1 + 2 * theta(AB) + 2 * theta(BC) + 4 * theta(AB) * theta(BC) \
= - ln | ------------------------------------------------------------- |
4 \ 1 - 2 * theta(AB) - 2 * theta(BC) + 4 * theta(AB) * theta(BC) /
theta(AC) = 1/2 (exp(4x(AC)) - 1) / (exp(4x(AC) + 1)
1 + 2 * theta(AB) + 2 * theta(BC) + 4 * theta(AB) * theta(BC)
------------------------------------------------------------- - 1
1 1 - 2 * theta(AB) - 2 * theta(BC) + 4 * theta(AB) * theta(BC)
= - ---------------------------------------------------------------------
2 1 + 2 * theta(AB) + 2 * theta(BC) + 4 * theta(AB) * theta(BC)
------------------------------------------------------------- + 1
1 - 2 * theta(AB) - 2 * theta(BC) + 4 * theta(AB) * theta(BC)
1 4 * theta(AB) + 4 * theta(BC)
= - -----------------------------
2 2 + 8 * theta(AB) * theta(BC)
theta(AC) = (theta(AB) + theta(BC)) / (1 + 4 * theta(AB) * theta(BC))
If the user specifies that interference is to be included in the model
and sets the parameter "independent" (in datain.dat) to 2, then (and
only then) is Kosambi's mapping function used instead of Haldane's.
References
[1] Haldane, J.B.S. 1919. "The combination of Linkage values and the calculation of distances between the loci of linked factors." J. Genet. 8:299-309.
[2] Kosambi, D.D. 1944. "The estimation of map distances from recombination values." Ann. Eugen. 12:172-75.
[3] Ott, J. 1991. Analysis of Human Genetic Linkage (Revised Edition). Baltimore: Johns Hopkins U. Press.
From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999
FASTLINK, version 3.0P and beyond
This file discusses memory requirements for FASTLINK. See the top level README file for a roadmap to all FASTLINK documentation.
|*| Memory Requirements
The FASTLINK programs can require large amounts of memory when doing multilocus analysis. Of course the amount of memory required is very dependent on the number of loci and the number of alleles at each locus. However even 100 Mb is not a problem to run under Sun OS for instance, because this is a virtual memory operating system. Ideally one would want to run a program of this size on a machine with 32 Mb of memory, but in our experience it is possible to run on machines with as little as 12 Mb.
Of course it is necessary to have a swap file with sufficient space to run the OS and have enough free space for the program.
To see how much space a program requires in version 2.0 or earlier, it was possible to use the unix command:
size <program name>
for instance using linkmap from FASTLINK, 2.0:
unix> /usr/bin/size linkmap
text data bss dec hex
139264 8192 28719480 28866936 1b87978
This value under "dec" is the decimal number of bytes for the whole program. So we see in this case that 28.9 Mbytes is required.
Then compare this with the unix pstat command:
unix> /etc/pstat -s
14880k allocated + 3712k reserved = 18592k used, 169000k available
This indicates that a total of 187592 Kbytes or 187 Mb has been allocated on this system for swap space, and with the current job mix, 18.5 Mb are used and 169 Mb are available. So in this case linkmap will be able to run.
To enlarge the swap space consult your local system administrator. For a single user system running FASTLINK we recommend 150Mb total swap space as a minimum.
Alternatively, use the "slow" versions of the programs. The term slow
is a little misleading in that these versions will still be
significantly faster than the originals. In the case of linkmap, the
version compiled with
make slowlinkmap
with the current constant settings is less than 1 Mb in size. Any
unix system should have a swap file large enough for this.
Starting in version 2.1 of FASTLINK a lot of the memory allocation is done dynamically (See README.updates). In version 2.2 and beyond, almost all the large data structures are allocated dynamically. This means that you will not be able to detect before running whether you have enough memory. If you do not have enough, the program should exit politely with an explanation, shortly after startup. The advantage of doing the memory allocation at runtime is that it may be possible to use significantly less memory based on knowledge of certain parameters that are available only at runtime.
From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999
FASTLINK, version 4.0P
|*| Output values from FASTLINK are scaled
The output log likelihood values printed by both LINKAGE and FASTLINK are scaled on some pedigrees by an additive constant that depends on the pedigree structure and selection of loop breakers, if any. This means that output log likelihood values should be used only by subtracting one from another to obtain a LOD score. This problem first surfaced in the initial release of FASTLINK because FASTLINK/LINKMAP uses a different scaling convention from LINKAGE/LINKMAP. See the next section below.
The scaling issue became fundamental with FASTLINK 4.0P where the change in loop breaker choice means that the raw output is unlikely to match earlier versions on looped pedigrees. The reason is that when FASTLINK 4.0P changes the selection of loop breakers, this has the side effect of changing the scaling constant. Therefore, FASTLINK 4.0P can be compared for correctness to earlier versions only by comparing LOD scores.
More changes in both loop breaker selection and genotype inference for looped pedigrees were made in version 4.1P. So the printed log likelihood values for versions 4.0P and 4.1P may differ on looped pedigrees. In general, the value for 4.1P should be the same or smaller in magnitude, indicating that less time is being wasted exploring unnecessary genoype combinations for the loop breakers.
|*| Scaling discrepancy - IMPORTANT FOR LINKMAP USERS
Prof. Ellen Wijsman (U. Washington) brought to our attention a situation in which FASTLINK versions of LINKMAP print out some values that differ from those printed out by LINKMAP in LINKAGE 5.1. What follows are two explanations for the discrepancy, one short, and one long.
Short Explanation. The different values represent differences in scaling LINKMAP's representation of the likelihood value. If you run the post-processor program which computes odds, the discrepancies will disappear.
Long Explanation. Because LINKAGE computes with very small numbers, these numbers must be scaled to avoid underflow. Any (log) likelihood values that are printed out by any of the LINKAGE programs are actually scaled by some amount that depends on the structure of the input pedigree(s). Various scaling rules can be used. In LINKAGE 5.1, the programs LODSCORE, ILINK, and MLINK all use the same scaling rules, while LINKMAP uses different scaling rules. We could find no internal or external documentation to explain this difference. The difference arises only for some pedigrees that have loops.
To increase the amount of code that the four programs can share in our versions, we have decided to make our LINKMAP use the same scaling rules as the other three programs.
If you would like details on how to modify our LINKMAP to make it consistent with the old LINKMAP contact schaffer@cs.rice.edu. The necessary editing is simple, but you would have to edit the code each time you switch between LINKMAP and one of the other three programs.
From: ftp-bimas.cit.nih.gov Last mod: July 13, 1995
FASTLINK, version 3.0P and beyond
How long will a sequential FASTLINK run take? This turns out to be extremely difficult to estimate ahead of time, but relatively easy to estimate once the run is underway.
Each FASTLINK run evaluates the same likelihood function at different candidate thetas vector inputs. For MLINK and LINKMAP the user specifies all the candidate theta vectors. For ILINK and LODSCORE they are generated on the fly. It is reasonably safe to assume that each candidate theta takes roughly the same time to evaluate. Therefore, if you know how many candidate thetas there will be, you multiply the number of thetas times the running time for one theta.
Caution: This approach will not work on a computer where the load (from other users) is varying significantly during the run.
You can estimate the time for one theta by watching the screen. When the first output gets printed after the header information, one theta is complete. Alternatively, MLINK and LINKMAP take a checkpoint after every theta. Therefore, by comparing the timestamps of the files checkpoint.LINKMAP and checkpoint.LINKMAP.bak (or checkpoint.MLINK and checkpoint.MLINK.bak), you may infer how long one candidate theta takes to evaluate. ILINK and LODSCORE usually take checkpoints every one or two thetas, so you must be more careful in making inferences from the timestamps of those checkpoint files. The timestamp of a file can usually be determined with the command "ls -l".
The number of thetas for ILINK and LODSCORE cannot be predetermined, but a good estimate is (10 * number of loci) if you have sex-averaged thetas. If male theta and female theta differ, estimate with (20 * number of loci). After each iteration, ILINK and LODSCORE print an update in which the number following the string NFE (number of function evaluations) is the number of candidate thetas already evaluated. See README.ILINK for more details. These NFE numbers can be used to estimate how much more work remains to be done by using the formula:
(((Number of thetas estimated) / (Number of thetas completed))
* (running time so far)) - (running time so far)
From: ftp-bimas.cit.nih.gov Last mod: November 28, 1995
LINKAGE/FASTLINK Troubleshooting
Alejandro A. Schaffer
LINKAGE and FASTLINK produce lots of different error messages that may be difficult to understand. This file briefly summarizes the error messages in five groups.
The way to use this file is to note down the error message you got and then use grep to find it in this file to figure out what may be wrong.
Almost all the common errors in the first and third categories are white-space placement errors. Thus the error message should be interpreted only as a clue of what is the vicinity of the error in the data files.
|*| Error Messages in LINKAGE Main Programs and UNKNOWN
The main programs and UNKNOWN use the same error routine, although in practice some of the errors can occur only one place or the other. This section describes the errors that have been reported in LINKAGE all along. Other errors may be found in the next two sections
Error Number:0
Message: Number of loci 17 exceeds the constant maxlocus
What it means: maxlocus is the maximum number of loci that can be used
simultaneously in a run. You can increase maxlocus
by changing commondefs.h or Makefile.
Error Number:1
Message: Number of loci read . Less than minimum of 1
What it means: The first number in locus file (datain.dat) or datafile.dat
is mangled; you probably erred in using preplink to
prepare the locus file.
Error Number:2
Message: Error detected reading loci order. Locus number 17
in position 5 exceeds number of loci
What it means: The third line of your locus file has no locus 17 on it, but you
asked lcp to use hat locus. This probably occurred
by using a text editor to add new loci to the locus file
and forgetting to update the locus order on line 3.
Error Number:3
Message: Error detected reading loci order. Illegal locus number 17
in position 2
What it mens: Your lcp script wants to use locus 17, but your locus
file does not have 17 loci described. This can occur when
you mix-up data sets.
Error Number:4
Message: Error detected reading loci order. Locus number repeated in
positions 2 and 3
What it means: Your probably made a typo in lcp and used the same locus in two
different positions of the fixed locus map
Error Number:5
Message: Error detected reading locus description. Illegal locus type 7 for
locus 6
What it means: The first number in the description of each locus in the
locus file must be 1,2,3,or 4.
Error Number:6
Message: Error detected reading locus description for system 7. Number
of alleles 25 exceeds maxall
What it means: One of your loci is described as having 25 alleles in the
locus file. maxall is a constant limiting the maximum
number of alleles at a locus. You can increase maxall to
more than 25, by changing unknown.c, commondefs.h, or
Makefile.
Important: Versions of FASTLINK earlier than 3.0P cannot handle maxall > 31.
Error Number:7
Message: Error detected reading locus description for system 6.
Illegal number of alleles 0
What it means: One of your loci is described as having 0 alleles.
This is likely a white space error in the locus file
causing the wrong string to be interpreted as the number
of alleles
Error Number:8
Message: Error detected reading locus description for system 6.
Number of factors 17 exceeds maxfact
What it means: Similar to error number 6. There is a constant maxfact
that is the maximum number of binary factors allowed at a
locus of that type. You can change maxfact in unknown.c
commondefs.h and Makefile.
Important: Set maxfact and maxall to the same value; FASTLINK cannot
handle maxfact > 31.
Error Number:9
Message: Error detected reading locus description for system 6.
Illegal number of factors 0
What it means: Very similar to error number 7. 7 appears for numbered allele
loci, while 9 appears for binary factors loci.
Error Number:10
Message: Error detected reading locus description for system 6.
Alleles not codominant.
THIS ERROR IS OBSOLETE
Error Number:11
Message: Error detected reading pedigree record 17. Illegal code for sex 8.
What it means: The column for gender is the eighth column in pedin.dat
and the fifth column in the input to MAKEPED. Error 11 can be caused by
entering either the wrong value for the gender or having a
white-space error that causes the wrong column to be read as gender.
Be especially careful to have exactly one carriage return after the
entry for each person, and no other carriage returns.
Error Number:12
Message: Error detected reading pedigree record at pedigree 17.
Maximum number of pedigree records exceeded
What it means: The maximum number of pedigrees is determined by
the constant maxind, which can be changed in commondefs.h
unknown.c, and Makefile. You may have truly exceeded maxped or
you may have a white-space error.
Error Number:13
Message: Error detected reading pedigree record 501.
Maximum number of individuals exceeded,
What it means: Similar to error 12. The maximum number of people in a
a data set is determined by the constant maxped.
Error Number:14
Message: Error detected reading pedigree record 300. Illegal binary factor
code 2.
What it means: Binary factors must be 0 or 1. Usually this error occurs
because of a white-space problem that causes lsp to look in the
wrong columns.
Error Number:15
Message: Error detected reading pedigree record 300.
No allelic pair for genotype.
THIS ERROR IS OBSOLETE
Error Number:16
Message: Error detected reading pedigree record 300.
Allele number 25 exceeds maxall.
What it means: A numbered allele cannot have a value larger than the
constant maxall. See error 6.
Error Number:17
Message: Error detected reading pedigree record 300.
Illegal allele number -1.
What it means: You have a negative allele number in your input file.
I have not figured out any plausible circumstances under which
this error could occur.
Error Number: 18
Message: Number of systems after factorization 60 exceeds maxsystem
THIS ERROR IS OBSOLETE
Error Number:19
Message: Number of systems after factorization 0 less than minimum of 1.
THIS ERROR IS OBSOLETE
Error Number:20
Message: Number of recombination types 100 exceeds maxrectype
THIS ERROR IS OBSOLETE
Error Number:21
Message: Number of recombination types 0 less than minimum of 1.
THIS ERROR IS OBSOLETE
Error Number: 22
Message: End of file detected in tempdat by procedure
readthg before all data found
THIS ERROR IS OBSOLETE
Error Number: 23
Message: Error detected reading iterated locus in datafile.
Value (7) greater than nlocus
What it means: You are using ILINK to estimate allele
frequencies or something else, and you gave a locus number
that is too high.
Error Number: 24
Message: Error detected reading iterated locus in datafile.
Illegal value (-1)\n",
What it means: Similar to error 23, but this one occurs when
the locus number is negative.
I have not figured out any plausible circumstances under which
this error could occur.
Error Number: 25
Message: Number of iterated parameters greater then maxn.
What it means: The number of parameters that you can simultaneously
estimate in ILINK is determined by the constant maxn, which can
be increased in ildefs.h or Makefile. You have exceeded maxn
in the way your datafile.dat is set up. Could be caused by a
white-space error.
Error Number: 26
Message: Error detected reading pedigree record 200. Liability class
(9) exceeds nclass.
What it means: When you specify a locus as as an affection status locus,
you may specify different liability classes that get numbered
1,2,3... If you assign an individual a class number in the pedigree
file that is higher than the number of liability classes
specified, then error 26 occurs. It is important to remember that
affection status loci get 1 column is no liability classes are used
and 2 columns if classes are used. Therefore, this error can occur
if you specify an affection status locus to have liability classes
in the locus file, but forget to specify the class in the
pedigree file.
Error Number: 27
Message: Error detected reading pedigree record 200. Illegal liability
class (0).
What it means: See error 26. In this case the liability class is being read
as a number that is too low (rather than too high), but the likely
causes are the same as for 26.
Error Number: 28
Message: Error detected reading locus description for system 1.
Liability classes (100) exceed maxliab.
What it means: The maximum number of liability classes at a locus is
determined by the constant maxliab, which can be set in
unknown.c, commondefs.h, or Makefile.
Error Number: 29
Message: Error detected reading locus description for system 2.
Illegal number of liability classes (-1)\n",
What it means: The number of liability classes that you specified for
an affection status locus is too low. This could be a
white-space error.
Error Number: 30
Message: Error detected reading locus description for system 2.
Penetrance out of range"
What it means: You specified a penetrance for a liability class of
an affection status locus as a number bigger than 1.0. Probably
a white-space error.
Error Number: 31
Message: Error detected reading locus description for system 2.
Number of traits (17) exceeds maxtrait
What it means: The maximum number of traits for a quantitative trait
locus is determined by the constant maxtrait, which can be
set in unknown.c, commondefs.h, or Makefile.
Error Number: 32
Message: Error detected reading locus description for system 2.
Number of traits out of range (-1)
What it means: Similar to error 31, but now the number of traits is
too low. Probably a white-space error.
Error Number: 33
Message: Error detected reading locus description for system 3.
Variance must be positive
What it means: You specified a variance for a quantitative trait as
0 or less. Almost certainly what happened is that a 0
was read because of a white-space error.
Error Number: 34
Message: Error detected reading locus description for system 2.
Variance multiplier must be positive
What it means: Similar to error 33.
Error Number: 35
Message: Error detected reading locus description for system 1.
Risk allele (17) exceeds nallele
What it means: You are doing a risk assessment and you specified an
allele number that is higher than the number of alleles possible
for that locus.
Error Number: 36
Message: Error detected reading locus description for system 2.
Illegal risk allele (0)
What it means: Similar to 35, but here the risk allele number is 0 or less.
Probably a white-space error.
Error Number: 37
Message: Error detected reading datafile. Risk locus (5) exceeds nlocus
What it means: The locus at which you want to do a risk analysis
is specified as an index that is higher than the number
of loci you specified in the lcp script.
Error Number: 38
Message: Error detected reading datafile.
Illegal value for risk locus (0)
What it means: Similar to 37, but now the risk locus number is too low.
Probably a white-space error.
Error Number: 39
Message: Error detected reading datafile.
Mutation locus (5) exceeds nlocus
What it means: Similar to 37, but this occurs when you are using the
mutation model, rather than risk analysis.
Error Number: 40
Message: Error detected reading datafile.
Illegal value for mutation locus (0)
What it means:Similar to 38, but this occurs when you are using the
mutation model, rather than risk analysis.
Error Number: 41
Message: Error detected reading datafile.
Linkage disequilibrium is not allowed with this program
What it means: You are trying to allow for linkage disequilibrium and trying
to use LODSCORE. Use ILINK instead.
Error Number: 42
Message: Locus 17 in lod score list exceeds nlocus 5
What it means: Essentially the same as error 2, but you get this
one if you use LODSCORE because the lcp script format for
lodscore is different.
Error Number: 43
Message: Illegal locus number 0 in lod score list
What it means: Similar to error 42, but now the locus number is
too high instead of too low.
Warning number: 0
Message: Illegal sex difference parameter 3 Parameter should be 0, 1, or 2
What it means: The first number after the last locus description in
the locus file indicates whether you want male theta and
female theta to be different
Codes are:
0 -- no difference (almost everyone uses this)
1 -- difference, but no females seen yet
2 -- difference (common value for sex difference)
This is probably a white-space error
Warning number: 1
Message: Illegal interference parameter 17 Lack of interference assumed
What it means: The second number after the last locus description in
the locus file indicates whether you want interference (1)
or mapping (2). No interference (the common case) is 0.
Warning Number: 2
Message: Illegal sex difference parameter 1
Parameter must be 0 with sex-linked data
What it means: You are using X-chromosome data and you specified that
male theta should be different from female theta in datain.dat.
This number is the first number after the last locus description
in the locus file. This warning may be harmless.
Warning Number 3
Message: Non-standard affection status 6 interpreted as normal
in pedigree record 200
What it means: The affection status of a person can be 0,1, or 2. You probably
have a white-space error. This warning should not be ignored.
|*| Error Message Introduced in FASTLINK
Message: WARNING: You are doing an autosomal run but have AUTOSOMAL_RUN set to 0 What it means: Change AUTOSOMAL_RUN to 1 in moddefs.h
Message: You probably need to run the slower version of this program What it means: FASTLINK can be configure to use more memory "fast version"
or less memory "slow version". You are using the fast version and have
run out of memory. Recompile to get the slow version instead, with
make installslow.
Message:Problem with malloc, probably not enough space What it means: You are out of memory, get more swap space.
Message: Your pedigree has more loops than allowed by the constant maxloop What it means: You must increase maxloop in commondefs.h. Starting
with FASTLINK 3.0P maxloop will occur also in unknown.c
You are *strongly encouraged* to read loops.ps.
Message: The program will exit politely to allow you to correct the problem What it means: I am sparing you a core dump
Message: Error opening ipedfile.dat and pedfile.dat. What it means: Something is wrong in your lcp script or your usage of it
Message: NOTE: attempting to continue previous (unfinished) run What it means: FASTLINK thinks you want to recover from a crash
Message: Data recovered
What it means: FASTLINK is recovering from a crash whether you like it
or not.
Message: Illegal instruction (on Suns)
What it means: maxhap is probably too big causing you to blow
out the stack in segdown or segup
Message: The next pedigree appears to have an unbroken loop What it means: You failed to use properly the loops program as part of makeped See Chapter 7 of Terwilliger and Ott
|*| Incompatibility Errors in UNKNOWN
One of the main purposes of UNKNOWN is to detect violations of Mendelian rules of inheritance. In LINKAGE and FASTLINK, through version 2.2, error detection was done only for loopless pedigrees and the program would report only the erroneous pedigree/locus pair.
In FASTLINK 2.3P, the loopless error checking was improved so that the program now pinpoints the nuclear family which contains the error. It is not possible for the program to determine automatically whether it is a parent or a child (or both) whose genotype must be changed.
Sometimes, the program will pinpoint multiple nuclear families that are in error in the same pedigree. In this situation, only the first nuclear family is sure to be wrong; the others may be propagated consequences of the first error detected. It may not be possible to determine whether they are separate errors or not without correcting the first error. If you want to see the first error only, change the default value of the constant ONE_ERROR_ONLY to 1.
In FASTLINK 3.0P, UNKNOWN now detects incompatibility errors in looped pedigrees. However, it reports only the pedigree/locus pair. If you wish to have the nuclear families pinpointed, then artificially remove all the loops by replacing every number that is 2 or higher in column 9 of the pedigree file with a 0. Then re-run UNKNOWN. Do not throw away your original pedigree file, since you will want to fix the genotype errors there and use that file for the actual computations.
Here are some UNKNOWN-specific error messages:
Message: Reduce max_vectors_considered to 9999 What it means: You have a looped pedigree, probably with multiple
loops. UNKNOWN is running out of memory keeping track of
all the possible loop breaker vectors. If you reduce the
constant max_vectors_considered you trade space for time.
The genotype inference for loops becomes less precise, but
takes less space.
Message: Error opening pedfile.dat in UNKNOWN What it means: pedfile.dat is not there or you do not have permission
to read it. This error could arise if you are doing
multiple runs in the same directory simultaneously
(this is a no-no for both LINKAGE and FASTLINK) or
your directory permissions are not set up properly.
Message: foundped() found 0 pedigrees - UNKNOWN What it means: Something is seriously wrong with pedfile.dat.
It's hard to imagine what could cause this, but
the message is in there for safety.
Message: Press <Enter> to continue
What it means: Recent versions of UNKNOWN ask for an interactive response
when errors occur. This was introduced
by Terwilliger and Ott. Press <Enter> if you want
incompatibilities checked for the remaining pedigrees in your
data set. Otherwise, kiil the program.
Message: You must increase the constant maxloop What it means: In FASTLINK 3.0P, maxloop is defined in both unknown.c and
commondefs.h. In bothe files, the value must be at least
as large as the number of loops in each pedigree. In
previous versions of FASTLINK, maxloop appeared only
in commondefs.h. Edit unknown.c and commondefs.h to
increase maxloop.
Message: One incompatibility involves the family in which person
17 is a parent
What it means: You have a violation of Mendelian rules of inheritance in
the current pedigree. This message will be printed
before the message for the whole pedigree.
Here "family" means "nuclear family", including parents
and children.
The first nuclear family that is pinpointed definitely has
an error (see the introduction to this section. Note that
the individuals are counted starting at 1 with each pedigree, so
17 means the 17th person listed in pedfile.dat for the current
pedigree. Note that if 17 is involved in multiple marriages,
each of these should be checked.
Message: One incompatibility involves the family in which person
9 is a child
What it means: Essentially the same as the previous error message, except that
there are two ways of flagging errors depending on how the
pedigree is traversed.
Message: The next pedigree appears to have an unbroken loop What it means: The program is getting into an infinite loop probably because
you have not broken a loop properly. The LINKAGE preprocessor
program MAKEPED can be used to break loops before
running UNKNOWN.
Message: ERROR: Incompatibility detected in this family for locus 2 What it means: This is the overall incompatibility message for a
pedigree. Here "family" means "pedigree".
Note that locus 2 here is post-lsp locus numbering.
so it means the second locus in your analysis.
Message: ERROR: File empty or inconsistent. What it means: One of pedfile.dat and datafile.dat is not there or has the wrong permissions.
|*| LSP Error Messages
In many of the following error codes, substituting S for P in the fifth letter means the problem is in the secondary file, rather than the primary file. Almost nobody uses secondary files.
Code: LN1RPR
What it means: First line of datain.dat does not have 4 numbers on it
The 4 numbers are:
Number of loci Risk locus X-linked Program code
Code: NOLIPR
What it means: Number of loci is lees than 1 or bigger than the
maximum allowed by lsp.
Code: RKLIPR
What it means: Risk locus is < 0 or bigger than the number of loci
Risk locus should be 0 unless you want to do a risk
calculation
Code: XLKIPR
What it means: The X-linked status is something other than 0 (autosomal)
or 1 (X-linked)
Code: PRGIPR
What it means: Program code is not valid
Code: MPLXPR
What it means: Program code is not valid
Code: NLEXPR
What it means: I wish I knew!
Code: LN2RPR
What it means: There is a problem reading the second line of the locus file
This should have 4 numbers:
Mutation locus Male mutation Rate Female Mutation Rate Disequilibrium
Unless you are a LINKAGE wizard, I *strongly* recommend that this line
should always be:
0 0.0 0.0 0
Code: MTLIPR
What it means: Mutation locus is out of range
Code: MMRIPR
What it means: Male mutation rate is out of range
Code: FMRIPR
What it means: Female mutation rate is out of range
Code: MTMXPS
What it means: Mutation locus index is not 0
Code: DISIPR
What it means: Disequilibrium is not 0 or 1
Code: DENXPR
What it means: Disequilibrium is not 0
Code: LN3RPR
What it means: Problem reading the 3 line of locus file that
specifies the locus order. Usually this means that
number of entries on this line does not match the
number of loci specified in the first line of the locus file
Code: LN5RPR
What it means: Problems reading line with sex difference and interference
Code: LN6RPR
What it means: Problems reading line with male recombination fractions
Code: LN7RPR
What it means: Problems reading line with female recombination fractions
Code: LCOIPR
What it means: Entry in locus order is not between 1 and the number of
loci specified.
Code: SXDIPR
What it means: Problems reading the sex difference entry in the line
immediately after the last locus, which has two numbers:
Sex difference Interference
Code: INFIPR
What it means: Problems reading the interference entry, which should be 0, 1
or 2.
Code: MRFIPR
What it means: Male recombination fraction not in the range [0.0, 1.0]
Code: GDRIPR
What it means: Problems reading either the sex difference ratio
Code: FRFIPR
What it means: Female recombination fraction not in the range [0.0, 1.0]
Code: PNORPP
What it means: Problems reading column 1 entry in pedigree file.
This is the most common lsp error. It occurs when there
are extra blanks at the end of the file
Code: IIDRP
What it means: Problems reading column 2 entry in pedigree file.
Code: PIDRP
What it means: Problems reading column 3 entry in pedigree file.
Code: MIDRP
What it means: Problems reading column 4 entry in pedigree file.
Code: FOSRPP
What it means: Problems reading column 5 entry in pedigree file.
Code: NPSRPP
What it means: Problems reading column 6 entry in pedigree file.
Code: NMRSPP
What it means: Problems reading column 7 entry in pedigree file.
Code: SEXRPP
What it means: Problems reading column 8 entry in pedigree file.
Code: PRORPP
What it means: Problems reading column 9 entry in pedigree file.
Code: QANRPP
What it means: Problems reading value for quantitative locus in
pedigree file. Beware of spurious carriage returns
Code: AFFRPP
What it means: Problems reading affection status entry in pedigree file
Beware of spurious carriage returns
Code: BINRPP
What it means: Problems reading binary code entry in pedigree file
Beware of spurious carriage returns
Code: ALERPP
What it means: Problems reading allele entry in pedigree file
Beware of spurious carriage returns
Code: FLDRPR
What it means: Cannot find two entries on the first line of a locus
description. First entry is locus type, meaning
second entry depends on locus type.
Code: LDCIPR
What it means: First entry in a locus description is something other
than 1,2,3,4
Code: NALIPR
What it means: Second entry of a locus description is < 1
Code: FGFRPR
What it means: Problems finding an allele frequency
Code: GFQIPR
What it means: Allele frequency is not in the open interval (0.0,1.0)
Beware that Genethon publishes some allele frequencies
as 0.0
Code: GFSXPR
What it means: Warning if allele frequencies sum to < 0.95 or more than
1.05
Code: NQVRPR
What it means: Problems reading a quantitative trait locus
Code: NQVIPR
What it means: Number of classes for a quantitative trait locus is < 1
Code: GTMRPR
What it means: Problem reading details of a quantitative trait locus
Code: VARRPR
What it means: Problems reading variance for quantitative trait locus
Code: VARIPR
What it means: A variance component is < 0.0
Code: CVMRPR
What it means: Problems reading a covariance component
Code: VMLRPR
What it means: Something to do with a quantitative trait locus,
but I don't know what
Code: VMLIPR
What it means: Something to do with a quantitative trait locus,
but I don't know what
Code: NLCRPR
What it means: Problems reading number of liability classes for affection
status
Code: NLCIPR
What it means: Number of liability classes is < 1
Code: GTPRPR
What it means: Problems reading a penetrance
Code: GTPIPR
What it means: A penetrance is not in the range [0.0, 1.0]
Code: NBFRPR
What it means: Problems reading number of factors for a binary
factors locus
Code: NBFIPR
What it means: Number of factors is < 1
Code: BFCRPR
What it means: Problems reading the meaning of a binary factor combination
Code: BFCIPR
What it means: A binary factor is not 0 or 1
Code: RKAPR
What it means: Problems reading risk allele
Code: RKIPR
What it means: Risk allele is < 1
Code: CMDRCI
What it means: Problems parsing the arguments to lsp
Code: CMDOPN
What it means: Cannot open one of the data files or arguments to
lsp are wrong
Code: PEDRCI
What it means: Not enough arguments to lsp
Code: PEDOPN
What it means: Cannot open one of the data files or arguments to
lsp are wrong
Code: PARRCI
What it means: Not enough arguments to lsp
Code: PAROPN
What it means: Cannot open one of the data files or arguments to
lsp are wrong
Code: NOLRCI
What it means: Not enough arguments to lsp
Code: NOLICI
What it means: Number of loci given to lsp is < 2 or too many
Code: LCORCI
What it means: Not enough arguments to lsp
Code: LCOICI
What it means: Invalid locus number in locus order
Code: INFRCI
What it means: Not enough arguments to lsp
Code: INFICI
What it means: Interference value is not 0,1, or 2 in call to lsp
Code: SXDRCI
What it means: Not enough arguments to lsp
Code: SXDICI
What it means: Sex difference argument to lsp is not 0,1, or 2
Code: MRFRCI
What it means: Not enough arguments to lsp
Code: MRFICI
What it means: Male recombination fraction argument to lsp is not
between 0.0 and 1.0
Code: GDRRCI
What it means: Nor enough arguments to lsp
Code: GDRICI
What it means: Problems reading genetic distance ratio as argument to lsp
Code: FRFRCI
What it means: Not enough arguments to lsp
Code: FRFICI
What it means: Problems reading a female recombination fraction as
an argument to lsp
Code: CMDPAR
What it means: Too many arguments to lsp
Code: PDFOPN
What it means: Problems opening pedigree file
Code: DTFOPN
What it means: Problems opening data file
Code: LOGOPN
What it means: Problems opening lsp logfile
Code: STMOPN
What it means: Problems opening stream file
Code: LEPIPR
What it means: I wish I knew
Code: LEPRPR
What it means: You cannot do this with LODSCORE or ILINK
Code: GNPIPR
What it means: Problems with iterated parameters
Code: GNPRPR
What it means: You cannot do this with LODSCORE or ILINK
Code: TLCRCI
What it means: Not enough arguments to lsp
Code: TLCIC
What it means: Locus number is < 1 or too high as argument to lsp
Code: STVRCI
What it means: Not enough arguments to lsp
Code: STVRCI
What it means: Stop value for moving theta is not between 0.0 and 1.0
Code: GRSRCI
What it means: Not enough arguments to lsp
Code: GRSICI
What it means: Number of evaluations in interval or LINKMAP is < 1
Code: RFVRCI
What it means: Not enough arguments to lsp
Code: RFVICI
What it means: For MLINK usage recombination fraction to vary is
< 1 or > number of loci
Code: INVRCI
What it means: Not enough arguments to lsp
Code: INVICI
What it means: Increment value for MLINK is <= 0.0
Code:NOERCI
What it means: Not enough arguments to lsp
Code: NOEICI
What it means: Number of additional likelihood evaluations for MLINK is < 0
or > some specified limit.
Code: IRFRCI
What it means: Not enough arguments to lsp
Code: IRFICI
What it means: Initial recombination fraction for MLINK is not in the
range [0.0, 1.0]
Code: INTERR
What it means: Internal error in lsp. Heaven help you if you get this code!
Code: CMDNTF
What it means: Lsp does not understand how to set up for this program
I think you get this if you ask to run a program that is
not one of the LINKAGE main programs.
Code: CMDNTU
What it means: Similar to CMDNTF. I can't tell the difference.
Code: CMDNOD
What it means: Probably some junk characters in input
Code: SPDRCI
What it means: Looking for name of secondary pedigree file
and can't find it
Code: SPDOPN
What it means: Problems opening secondary pedigree file
Code: SPRRCI
What it means: Looking for name of secondary locus file and can't find it
Code: SPROPN
What it means: Problems opening secondary locus file
Code: OPDRCI
What it means: Problems finding the name of output pedigree file
(to use as input to unknown)
Code: OPDOPN
What it means: Problems opening output pedigree file
Code: OPRRCI
What it means: Problems finding the name of output locus file
Code: OPRRCN
What it means: Problems opening output locus file
Code: FTLXSP
What it means: Problems setting up secondary pedigree file
Code: SPEMP
What it means: Individual has index 0
Code: FSKXSP
What it means: Problems with secondary pedigree file
Code: PPDEMP
What it means: Problems reading a pedigree number
Code: PLNRSP
What it means: Problems reading from secondary pedigree file
Code: PNMXPS
What it means: Problems merging primary and secondary pedigree files
Code: INMXPS
What it means: Problems merging primary and secondary pedigree files
Code: FIMXPS
What it means: Problems merging primary and secondary pedigree files
Code: MIMXPS
What it means: Problems merging primary and secondary pedigree files
Code: FOMXPS
What it means: Problems merging primary and secondary pedigree files
Code: NPMXPS
What it means: Problems merging primary and secondary pedigree files
Code: SXMXPS
What it means: Problems merging primary and secondary pedigree files
Code: IIDIPP
What it means: Problems merging primary and secondary pedigree files
Code: PIDIPP
What it means: Problems merging primary and secondary pedigree files
Code: MIDIPP
What it means: Problems merging primary and secondary pedigree files
Code: FOSIPP
What it means: Problems merging primary and secondary pedigree files
Code: NPSIPP
What it means: Problems merging primary and secondary pedigree files
Code: NMSIPP
What it means: Problems merging primary and secondary pedigree files
Code: SEXIPP
What it means: Problems merging primary and secondary pedigree files
Code: PROIPP
What it means: Problems merging primary and secondary pedigree files
|*| LRP Error Messages
Message: Screen width is too small
What it means: If you are using a one-window system, there is not much
you can do. However, if you have control over your windows,
it may help to widen the window in which you run lrp and
start over.
Message: Screen length is too small
What it means: Similar to previous message. Try lengthening your window and
starting over.
Message: Internal Error
What it means: If there is no modifier to describe the Internal Error you
have hit a bug in lrp.
Message: Internal Error - Length of 'lrp_rprt_scrn' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: You hit a bug in lrp and the authors of the program are
protecting you from a core dump.
Message: Internal Error - Length of 'lrp_hlp1_scrn' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message: Internal Error - Length of 'lrp_hlp2_scrn' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message; Internal Error - Length of 'lrp_hlp3_scrn' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message: Internal Error - Length of 'lrp_help_line' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message: Internal Error - Length of 'lrp_info_line' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message: Internal Error - Length of 'lrp_cmmd_line' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message: Internal Error - Length of 'lrp_wait_line' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message: Internal Error - Length of 'lrp_vers_line' exceeded
LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message
Message: Internal Error - Memory allocation failure What it means: You are out of memory. Look around for other processes that may be using all the memory.
Message: Internal Error - Bad field number What it means: There was a problem in the way you specified the report format
Message: Internal Error - Function FSEEK failed What it means: There was a problem modifying the report file. If your disk is on a different machine, this might be a network problem.
Message Internal Error - Function TMPNAM failed What it means: I do not know
Message: Internal Error - Function FOPEN failed What it Means: Could not open the file that you designated as the report file
Possible reasons include improper permission for the directory
you are working in or a disk problem.
mutl.c: Internal Error - Function LSF_REWIND failed What it Means: Could not read from the stream file that you designated.
Maybe it doesn't exist. Maybe the permission is wrong.
Maybe there is a disk problem.
mutl.c: Internal Error - Function LSF_STATUS_TEXT failed What it Means: While attempting to print out an error message, another error occurred. I cannot figure out why this would happen, though.
mutl.c: Internal Error - Function LSF_INFORMATION failed What it Means: While trying to figure out if the stream file was properly
formatted, an error occurred. This is probably not an
error with the contents of the stream file, but with
access to it.
rful.c: Internal error - LSF_READ error detected What it Means: Problems reading the contents of your stream file.
Although the lsf_read routine reports a diagnostic of the
error, this diagnostic is not used in the error printing
routine.
rloc.c: Internal error - LSF_ALLOCATE error detected What it Means: Memory allocation problem
rloc.c: Internal error - LSF_READ_SET error detected What it Means: Problems reading the contents of the stream file
ulth.c: Must specify temporary file name What it Means: You mangled the file specifications for the input or output files. Start over.
ulth.c: Must specify temporary file name What it Means: You mangled the file specifications for the input or output files. Start over.
ulth.c: Must specify report file name
What it Means: You mangled the file specifications for the input or
output files. Start over.
ulth.c: Must specify stream file name
What it Means: You mangled the file specifications for the input or
output files. Start over.
ulth.c: Must specify report title
What it Means: You mangled the file specifications for the input or
output files. Start over.
From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999
FASTLINK, version 2.3P and beyond
This file describes some modifications to the UNKNOWN preprocessor program introduced in FASTLINK 2.3P and beyond. We have improved the error reporting capability and fixed some bugs. A scholarly description of what UNKNOWN does can be found in unknown.ps.
One purpose of UNKNOWN is to catch violations of Mendelian rules of inheritance. Previous versions of UNKNOWN reported an error by identifying the pedigree and locus of the error. The new version tries to identify the nuclear family/families in which errors occur. When an error occurs at least one nuclear family will be identified either by a child or parent. For any pedigree-locus pair, the first nuclear family identified is guaranteed to contain an error. Subsequent nuclear families identified may or may not contain errors. If you want to see only the first nuclear family with an error, then change the constant ONE_ERROR_ONLY from 0 to 1.
I fixed a printing bug that arose when the number of liability classes was greater than 99. As a result the output of UNKNOWN will be spaced differently than before. Thanks to Margaret Gelder Ehm for reporting the bug.
Lots of other aspects of UNKNOWN are explained in the documents unknown.ps and loops.ps, including subsequent changes to UNKNOWN. Both FASTLINK 3.0P and 4.0P introduced fundamental, drastic changes to UNKNOWN that are best explained in a longer, more scholarly document. These changes make the interesting part of UNKNOWN incompatible with previous versions of FASTLINK and LINKAGE. However, extra computation is done for the sake of backwards compatibility.
Version 3.0P introduced much better inference for looped pedigrees including for the first time the ability to detect Mendelian inconsistencies in looped pedigrees. Version 4.1P improved some of the loop breaker genotype inference algorithms. Information about possible genotypes for different loop breaker vectors is kept in the file loopfile.dat. See README.loopfile for a description of the syntax.
Version 4.0P introduced the ability to improve on the user's choice of
loop breakers. For looped pedigrees UNKNOWN chooses a provably optimal
loop breaker set. For looped pedigrees with multiple marriages UNKNOWN
and the main programs can now use one loop breaker to break multiple
loops. In version 4.1P these methods were enhanced with a better
algorithm for pedigrees with multiple marriages, and the ability to
select loop breakers from scratch without having to rely on
the makeped and LOOPS preprocessor programs.
For a scholarly description of the loop breaker selection methods
see paper6.ps, and paper7.ps.
For a scholarly description on what loop breakers are all about
see loops.ps.
For a simple practical method to choose loop breakers see
README.lselect.