Linkage Programs

mlink, linkmap, lodscore, ilink and unknown are all using the FASTLINK versions of the code. The rest of the programs are version 5.2 of the LINKAGE package.

If you find that the programs provided here are not compiled with the parameters required for your data, use the "compilejob" program to compile up the job to the required parameters.

If your LINKMAP jobs are getting large, you may care to check out the FASTMAP and depending upon your problem the HOMOZ programs on this menu.


allele | bugreport | checkpoint | constants | diseq | ilink | loopfile | lselect | mapfun | memory | scaling | time | trouble | unknown

The mutation model may be useful in any circumstance where one has cause to believe that a substantial fraction of cases are caused by a new mutation. Important special cases mentioned by Dan and Joe include:

There is discussion in the vicinity of page 176 in the Terwilliger and Ott book as to how to select the mutation rate when using the mutation model. The default rate given by PREPLINK is plausible in some cases.


Recompiling Program with the correct maxhap value:

"compilejob" is a script for automatically recompiling the fastlink versions with the required maxhap values.

Maxhap is derived from the number of alleles at each locus multiplied together. A maxhap of 1024 is big, there is a current maximum of 1,600. If your job requires a larger maxhap value and you cannot solve this by better problem design, please contact user support.

If you so desire, compilejob will allow you to submit a job to the appropriate batch queue. Small linkage jobs will be submitted to the small linkage queue, others will be submitted to the big linkage queue. The decision is currently taken on the value of maxhap.

Running Linkage Programs in Batch:

Programs like those of the linkage package run more efficiently when run concurrently. We have one machine dedicated to running one large linkage job at a time. This has 512 Mb of memory and fast single processor. You can submit jobs to run on this machine as below:

batchjob lcp_command_file_name

eg. batchjob pedin

You will get emailed when this job has started running on an appropriate machineand when it has completed.

Note:

  1. Run only a single linkage job in a directory at a time, even if one is in batch mode. The programs make use of intermediate files and would overwrite each others results.
  2. Please bear in mind that it is very easy to design linkage problems that will be too big even for this beast, so please take care in constructing your linkage problem.

Checkpointing

Programs submitted by batch mode using the "batchjob" mechanism are automatically checkpointed using the script level checkpointing facilities provided by fastlink. If you simple restart your job after a machine crash it may sometimes recover the data it has already computed. If it has recovered data, please check the output carefully.

It is in your own interests to read the online documentation for details of checkpointing. The following file is of particular interest: /packages/fastlink/doc/README.checkpoint

It is advisable to have only one linkage "run" per lcp command file, because this makes the checkpointing more likely to work


From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999

FASTLINK, version 4.1P

Each section in each README file starts with the string "|*|". To browse the sections, use your file viewer to search for this unique string. This is the top level README file.

|*| INTRODUCTION

As described in the papers:

  1. R. W. Cottingham Jr., R. M. Idury, and A. A. Schaffer, Faster Sequential Genetic Linkage Computations, American Journal of Human Genetics, 53(1993), pp. 252-263.
  2. A. Schaffer, S. K. Gupta, K. Shriram, and R. W. Cottingham, Jr., Avoiding Recomputation in Linkage Analysis, Human Heredity, 44(1994), pp. 225-237.
  3. A. Schaffer, Faster Linkage Analysis Computations for Pedigrees with Loops or Unused Alleles, Human Heredity, 46(1996), pp. 226-235.
  4. Becker, D. Geiger, and A. A. Schaffer, Automatic Selection of Loop Breakers for Genetic Linkage Analysis, Human Heredity, 48(1998), pp. 49-60,
  5. Becker, R. Bar-Yehuda, D. Geiger, Random Algorithms for the Loop Cutset Problem, Proceedings of the fifteenth conference on Uncertainty in Artificial Intelligence, Sweden, 1999, pp. 49-56.

This directory and its subdirectories contain version 4.1P of faster versions of the general pedigree programs of LINKAGE 5.1. Several of our users of earlier versions 1.0 and 1.1 have dubbed the new programs FASTLINK. A PostScript version of the papers can be found in the files paper1.ps, paper2.ps, paper5.ps, paper6.ps, and paper7.ps. Please cite the first two papers (so that all participants in the FASTLINK project get credit), if you use these programs in a published experiment. You should continue to cite the original papers on LINKAGE, listed below, if you use FASTLINK:

The FASTLINK code is available by anonymous ftp to a machines at NIH, Rice University, and EBI. For instructions see the file README.install. In addition to the two papers and this README file, the top level directory contains various other pieces of documentation. Carol Haynes (Duke) suggested that we split up the documentation into smaller pieces. This file is primarily a roadmap to the documentation for FASTLINK.

README.allele -- explains the diagnostic that states that a pedigree or dataset has unused alleles.

README.bugreport -- suggest how to send in a bug report

README.checkpoint -- explains the checkpointing scheme for LODSCORE and ILINK

README.constants -- explains the mysteries of how to properly set some of the constants in FASTLINK

README.diseq -- explains an option introduced in FASTLINK 4.1P for modeling linkage disequilibrium

README.ILINK -- What does the output of ILINK and LODSCORE mean?

README.loopfile -- explains the syntax of the file loopfile.dat used to transmit genotype inferences from unknown to the main program when working with looped pedigrees

README.lselect -- explains the new easy automatic way to select loop breakers

README.mapfun -- Explains how the mapping functions explained in Chapter 1 of Ott's book actually relate to what is in the code

README.memory -- explains memory requirements

README.scaling -- essay on the output likelihood values from FASTLINK

README.time -- a short essay on estimating the running time of sequential FASTLINK runs.

README.trouble -- LINKAGE/FASTLINK Troubleshooting

README.unknown -- describes modifications to the UNKNOWN preprocessor

                  program first introduced in FASTLINK 2.3P, including improved
                  error reporting and a bugfix.

Send suggestions for other FASTLINK documentation you would like to see to schaffer@helix.nih.gov.

Please let us know if you have problems with the programs, including if you are unhappy with the speedup and are willing to share your data to the extent that we may be able to study the problem. Note that this does not mean you have to tell us anything about what disease you are studying. And of course we will respect any request for confidentiality. We only wish to consider studying problems to see if we can find improvements.

If you read README.updates you will see that lots of the updates are suggested by users who are enthusiastic about FASTLINK, but would like to see it improve. One of the best ways to encourage us to work harder on FASTLINK is to send in your constructive suggestions.

There is a mailing list of over 200 FASTLINK users. If you wish to be on this mailing list, send e-mail to schaffer@helix.nih.gov.


From: ftp-bimas.cit.nih.gov Last mod: January 30, 1996

::::::::::::::
README.allele
::::::::::::::

|*| Diagnostic for extra alleles

This file explains the diagnostic that states that a pedigree or dataset has unused alleles. This diagnostic has been implemented by Chris Hoelscher for inclusion in FASTLINK 2.3P and beyond. The renumbering is implemented starting in version 3.0P and beyond.

The running time of LINKAGE and FASTLINK grows rapidly with the number of alleles specified for each locus used in a run. Therefore, it is important to specify no more alleles than are actually needed for the analysis. Various partial solutions to the "extra allele" problem have been implemented by:

Ellen Wijsman (in the context of LIPED) Jathine Wong and Cathryn Lewis (in the context of LINKAGE/FASTLINK) Scott Diehl, Bettie Duke, and Lynn Ploughman (in the context of MENDEL) Alan Young (in the context of GAS)

At the end of this essay we briefly describe the the partial solution implemented by Wijsman and Diehl-Duke-Ploughman. In the context of FASTLINK, their solution is applicable only to the LINKMAP and MLINK programs. We have not implemented an extension of their solution in FASTLINK 3.0P.

|*| Extra alleles in symbols and an example

Suppose a locus has n alleles, A1 through An, that occur in the population at large. Suppose that in a population to be studied with linkage analysis, only alleles A1 through Ak, with k < n-1 occur. Then one may combine alleles A(k+1) through An into one "catch-all" allele unless one is estimating allele frequencies. The frequency of the catch-all allele is the the sum of the frequencies of A(k+1) though An.

A concrete FASTLINK example:

Suppose the general population has the possibilities:

  Allele           1   2   3   4   5   6
  Frequency       .3  .2  .15 .1  .22  .03

and this is encoded in the locus file (datain.dat).

Suppose that the pedigree(s) encoded in the pedigree file (pedin.dat) contain only the alleles 2, 4, and 5. LINKAGE and FASTLINK require that the alleles be numbered consecutively starting at 1. Therefore, in the process of reducing from 6 to 4 alleles it is necessary to renumber the alleles.

Renumber old allele 2 to be new allele 1 with frequency .2 Renumber old allele 4 to be new allele 2 with frequency .1 Renumber old allele 5 to be new allele 3 with frequency .22 Create catch-all allele 4 with frequency .48 (sum of frequencies of old 1, old 3, old 6)

No person should have the catch-all allele, but it is absolutely wrong to omit the catch-all allele.

Important technical note: the process of renumbering alleles to reduce their number loses no information in a statistical sense, unless one is estimating allele frequencies. Renumbering is distinct from "downcoding", in which multiple alleles that are distinct and do occur in the population are given the same number, in the interest of reducing running time. In general, downcoding loses information, although there are some special situations in which it does not because the frequencies of some different alleles happen to be identical.

|*| Extra alleles and separating pedigrees

The use of extra alleles often arises when the original data had P pedigrees amongst which all n alleles occur, but the population in some analysis with Q < P pedigrees contains only k < n-1 of the alleles.

The MLINK and LINKMAP programs analyze each pedigree one at a time, and sum the values of -2*(log(likelihood)) for each pedigree. Since allele renumbering makes sense on a per pedigree basis, it is valid to renumber alleles for each pedigree in an optimal manner. This requires using a different locus file for each pedigree because the renumbering may assign the same new allele number to different old alleles. One annoyance of doing the analysis for each pedigree separately is that the output values must be summed. The process of automating the separation of input pedigrees and combination of output results was automated for LIPED by Ellen Wijsman and for MENDEL by Scott Diehl, Bettie Duke, and Lynn Ploughman.

The above solution does not work for ILINK or LODSCORE.

|*| FASTLINK diagnostic error message

The main programs in FASTLINK do not know about all the loci in the locus file (datain.dat). They only know about the loci that are actually used in a given analysis. For example, if an analysis uses loci 1, 7, and 12, in *any* order, locus 1 will have index 1, locus 7 will have index 2, and locus 12 will have index 3 when reported in the diagnostic.


::::::::::::::
README.bugreport
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: October 6, 1997

|*| How to Submit a Useful Bug Report

Any bug reports on FASTLINK should be sent to Alejandro Schaffer (schaffer@helix.nih.gov). I have been trying to investigate all bug reports as quickly as possible.

The purpose of this document is to tell you what you should send me, so that I can track down any problem as quickly as possible without having to ask for more information. I am really anxious to find and fix whatever bugs remain in FASTLINK and LINKAGE. All the bug reports I've gotten so far have helped tremendously. However, I would like to speed up the process by guiding you in what I need to know to track down a bug successfully.

In general, there are four categories of bugs:

  1. Compilation Problems
  2. Crash during a run
  3. Results obviously bogus or inconsistent with LINKAGE
  4. Anything else

Here is what you should send me for each type of bug report:

Compilation Problems:

  1. A script showing your compilation attempt and the errors you got.
  2. Some indication which version number of FASTLINK you are using
  3. Your Makefile
  4. If the compilation complains that something is not defined, send me your current versions of the files, commondefs.h, moddefs.h, and slowmoddefs.h

Crash or Results obviously bogus or inconsistent with LINKAGE

  1. Which program and version (number, fast or slow) you are using
  2. Your data files (e.g., pedin.dat, datain.dat)
  3. Whatever shell script you are using to run the programs
  4. An indication of whether the results you are getting are bogus or are plausible but inconsistent with LINKAGE.

Anything else

  1. Which program and version (number, fast or slow) you are using
  2. Your data files (e.g., pedin.dat, datain.dat)
  3. An explanation with sample scripts of how the behavior deviates from what you expected.

If you run the programs directly rather than using a shell script, then instead of pedin.dat, datain.dat, and script, I would want pedfile.dat, ipedfile.dat, speedfile.dat, datafile.dat.

Beyond the items above, please describe any symptoms that seem relevant to you. I'd rather too much information than too little. However, you should just report behaviors that you see, not any speculations about the causes of those unexpected behaviors.

All data sent to me will be kept in complete confidence.


::::::::::::::
README.checkpoint
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: February 9, 1996

                  Checkpointing in FASTLINK
                  by K. Shriram and A. A. Schaffer
                  Rice University/NIH

This README file is meant to accompany version 2.3P and beyond of FASTLINK. See the top-level README file for a roadmap to all the documentation.

This file describes in detail the checkpointing scheme that was implemented by K. Shriram and A. A. Schaffer. Checkpointing means periodically saving the state of a computation. The purpose of checkpointing is to be able to recover from a crash of the underlying computer that causes one of the FASTLINK programs to stop for a reason that has nothing to do with its computation. Two common causes for such crashes are power failures and lightning hits. Right now checkpointing works only for the sequential versions of FASTLINK on UNIX, and for MLINK and LINKMAP on VMS.

A more cursory but more scholarly description of how the checkpointing works can be found in one section of:

  1. A. Schaffer, S. K. Gupta, K. Shriram, and R. W. Cottingham Jr., Avoiding Recomputation in Genetic Linkage Analysis, Human Heredity 44(1994), pp. 225-237.

This paper can be found in paper2.ps that comes with the FASTLINK distribution. At the time the paper was written, the checkpointing scheme had been implemented only in LODSCORE and ILINK; these are the two difficult cases for checkpointing and the programs where it is most needed.

After seeing the checkpointing scheme in LODSCORE and ILINK for versions 2.0 and 2.1, several users who had suffered machine crashes during LINKMAP runs clamored for extending the scheme to the other two programs. As of version 2.2, all four programs have checkpointing and crash-recovery.

Through version 1.1, FASTLINK provided the same level of functionality as LINKAGE 5.1. Checkpointing adds new functionality, so we decided to write more detailed documentation about the checkpointing facility. Any questions, comments, or complaints should be directed to Alejandro Schaffer (schaffer@nchgr.nih.gov).

[This README file has been organized with each section starting with the string "|*|". To browse the sections, you can thus use your file viewer to search for this unique string, thus getting from one section to the other without having to read the intervening material.]

Frequent LINKAGE users almost certainly have had the computer crash during a long run, only to have to start the computation again. We have now included a "checkpointing" package in the code that occasionally saves the state of the computation, so that a crashed program can be restarted without much computation lost. The folklore wisdom seems to be that this form of augmentation to programs is the proper mechanism for recovering from crashes. This file briefly the checkpointing process and explains the files connected with our implementation.

There are standard packages that do checkpointing of programs for specific operating systems, but we wanted our code to be somewhat portable because LINKAGE is used on a variety of operating systems.

Unless otherwise specified, the descriptions that follow apply equally to all of ILINK,LODSCORE, LINKMAP, and MLINK. In particular, to distinguish the two, we use the names of the programs in the filenames. We shall annotate this by the string "<>", which should be replaced by the program name in question. Thus, for instance, the filename `checkpoint<>.bak' would denote `checkpointILINK.bak' or `checkpointLODSCORE.bak', depending upon context.

Before getting into details, there are three VERY IMPORTANT cautions in using the FASTLINK crash-recovery scheme.

  1. After a crash occurs, if you run the program in the same directory where it was running before, the program will assume that you want to *restart* the crashed run. The only way to have the program start a different run is to delete all the files created by the checkpointing scheme. The files created during checkpointing will have names with one of the following prefixes: checkpoint, script, outf, main. To remove these files, you can use the command:

rm checkpoint* script* outf* main*

Note: extreme care should be taken when removing these files that you don't have other meaningful files in the same directory with any of these prefixes. If "rm" doesn't normally prompt you for each file before removing it, it is probably wiser to delete these files by hand.

2. The time to save the state to a file is not zero. Therefore, if a crash occurs while the state is being saved, the program may be a little confused on restart. In particular, it may unnecessarily redo one or two likelihood function evaluations. When this happens with LINKMAP or MLINK, it means that duplicate data will show up in the output file because they write out their output after each likelihood function evaluation.

3. The checkpointing scheme has been extensively tested with simulated crashes, but we do not induce a crash of the whole system in testing. Furthermore, system-wide crashes can have bizarre and unimaginable side-effects. Therefore, user feedback based on what happened during real crashes and real runs will be invaluable in making the checkpointing system more robust.

|*| The Process

Most of the discussion below focuses on the programs ILINK and LODSCORE. At the end we explain the much simpler method of checkpointing used in LINKMAP and MLINK.

The programs ILINK and LODSCORE perform checkpointing at two distinct types of locations. A checkpoint is created at the start of each iteration (in the function iterate()); it is also made at the beginning of the functions initialize(), outf(), firststep(), decreaset() and increaset(), and at the beginning of the loops in gforward() and gcentral(). We distinguish between these two types by the terms "iteration-" and "function-checkpoint", respectively; the latter term is used since the program proceeds to make one or more calls to the routine fun() shortly after the location of checkpointing.

In the case of LINKMAP a simple checkpoint is taken after each likelihood function evaluation. MLINK is the same except we do not checkpoint on the first function evaluation where the moving marker is unlinked to the others.

The files final.dat and stream.dat (if requested) primarily contain the output, so a checkpointing mechanism must take care to ensure the contents of these files are not altered in any way by the process. In ILINK and LODSCORE All output to these files takes place in the routine outf() (and from the routines it calls); hence, these files are checkpointed before entry into outf(). More details on this follow under the discussion of the actual files created. In MLINK and LINKMAP these files are updates after each function evaluation, so they have to checkpointed as well.

|*| The Files

The following is a list of the files created for the purposes of checkpointing. All of these files are placed in the working directory of the current run of the program.

It is important to ensure that none of these files are present at the start of a fresh run; however, do not delete any of these after a run has begun, and especially when trying to recover from a previous run.

NOTE: Please note that the file protections set by the program

      may not be what you desire.  These can be changed by altering
      the value of the variable CopyAppendPerms in the file
      checkpointdefs.h, where the value specified should be as given
      to the chmod(1) command.  (The additional leading `0' is
      essential; it makes the value that follows to be treated as a
      constant in octal, as required by chmod(1).)

checkpoint.<>                                             text, binary

For ILINK and LODSCORE:
This file is written at two types of places, namely an iterationand a function-checkpoint. Only three parts of this file are in text mode; they are:

The date/time stamp marks when the current checkpoint was begun. (This is not necessarily the same time as that the system shows for the file.) Of the two type information fields, the first tells us whether this is an iteration- or function-checkpoint (distinguished by the values "0" and "1", respectively); the second stores additional information about location that varies depending upon the type of checkpoint.

Following these are the bytes that constitute the actual values being stored; these are in an architecture-dependent binary format.

Finally, the end-marker provides us with a means of partially checking for the integrity of the data written in the checkpoint.

For LINKMAP and MLINK only some counters indicating how many function evaluations are complete need to be stored in this file.

checkpoint.<>.bak text, binary

When a checkpoint is to be written and a checkpoint file is already found, the existing file is moved to this backup name and the new one is written in its place. The main purpose of doing this is to increase security against crashes: should the crash have damaged the checkpoint file but have left the backup untarnished, the backup may be copied into the checkpoint and computation can be resumed, even if from a slightly earlier stage in the run.

The format of this file is the same as that of the checkpoint file, which is copied into the backup without modification.

outf.LODSCORE.stream.dat                                          text
outf.ILINK.stream.dat                                             text
main.LINKMAP.stream.dat                                           text
main.MLINK.stream.dat                                             text
outf.ILINK.final.dat                                              text
main.LINKMAP.final.dat                                            text
main.MLINK.final.dat                                              text
outf.LODSCORE.recfile.dat                                         text

These files are created by the subroutine outf() or main(). Their purpose is to maintain copies of the files stream.dat and final.dat (for ILINK, LINKMAP, or MLINK) or recfile.dat (for LODSCORE), respectively, so that if recovery needs to take place after these files have been written to, the two files can be restored to the state they had.

script.<>.final.out                                               text
script.<>.stream.out                                              text

Since the standard scripts being used delete the files final.out and stream.out at the start of execution, the program makes a copy of the current state of these files into the names listed. Thus, when recovering in the midst of a script, the files can be restored to their state when the programs were last entered.

main.LODSCORE.stream.dat                                          text
main.LODSCORE.recfile.dat                                         text

Since a crash can occur in the middle of an iteration in LODSCORE and the output of the previous call to outf() would then be lost, these files are created at the start of the loop in main() so as to preserve the old output (which hasn't yet been appended to final.out and stream.out).

When the checkpoint cannot be recovered accurately, the program checks to see whether the backup exists. Depending upon its presence (but not upon its integrity), one of two message is displayed. In either case, the user is advised of the circumstance, of a possible cause for it, and of what corrective action might be taken to repair the situation as best as possible.

|*| Modifying Scripts and Checkpointing

Our experience shows that some users request multiple runs of a FASTLINK program with one shell script. As a consequence a crash may occur after some (but not all) of the requested runs are complete. When this happens, it would be nice not to lose the results of the completed runs. A user who restarts the crashed script would not like the runs that were completed previously to be redone. We have made a primitive facility to do this type of checkpointing, which we call "script-level checkpointing". However, for users who want to be safe we recommend doing only one run per shell script.

This section applies if you use script-level checkpointing, and wish to modify the scripts in the region surrounding the calls to ILINK, LODSCORE, MLINK, or LINKMAP, or wish to affect operations done to the files final.out and stream.out. We presume that the user is using shell scripts made with auxiliary program lcp that comes with LINKAGE. It would be impossible to make a script-level checkpointing scheme that could handle arbitrary scripts. We also assume that the user puts output in final.out and stream.out, using the default options of lcp.

The "standard" scripts for which we support script-level checkpointing affect final.out (and stream.out) on each run as follows for each ILINK run (and similarly for LODSCORE, MLINK, and LINKMAP):

lsp [...]
if [ $? = '0' -o $? = '1' ]
then

      cat lsp.log >> final.out
      cat lsp.stm >> stream.out
      unknown
      if [ $? = '0' ]
      then
        ilink
        if [ $? = '0' ]
        then
          cat final.dat  >> final.out
          cat stream.dat >> stream.out
        fi
      fi

fi

To ensure that final.out is in the same state after our program has finished execution as it would be after this piece of script code has run, we have the following code toward the end of ILINK:

copyFile ( "final.out" , ScriptILINKFinalOut ) ; appendFile ( "final.dat" , ScriptILINKFinalOut ) ;

if ( dostream )
{

      copyFile ( "stream.out" , ScriptILINKStreamOut ) ;
      appendFile ( "stream.dat" , ScriptILINKStreamOut ) ;

}

which simulates the operation of the script. This is necessary since, at the stage where this code is run, the script-level checkpoint routine assumes that the run of ILINK has completed successfully, so that this entire invocation of ILINK will be ignored, and the next invocation will copy final.out and stream.out from the files named by the #define'd names above.

Hence, modifying the scripts in the light of script-level checkpointing requires for one to carefully study the operation of the main programs, the scripts and of the program ckpt. In general, it is necessary to mimic in the program that which would be done in the script, so that during recovery it will be indiscernible whether or not the script was stopped or not in the first place. However, these mime operations must be carefully placed, for if they are placed before the script-level checkpoint file is written to, then the operations would be performed one extra time, which is undesirable.

|*| Using the Script-Level Checkpointing Facility

The program ckpt implements the script-level checkpointing facility (with cooperation from ilink and lodscore, as appropriate). It's primary task is to accept the name of a script to be run, and a specification of whether the script is for ILINK or for LODSCORE. A typical invocation might look like this (we use `%' to denote the user's prompt):

% ckpt lodscore aLodscoreScript

or

% ckpt ilink anIlinkScript itsArgument

or

% ckpt linkmap aLinkmapScript itsArgument

or

% ckpt mlink anMlinkScript itsArgument

where the first parameter to ckpt tells it what kind of script it is going to run. The second parameter is the name of the actual script. If there are additional parameters for the script itself, these can be specified after the name of the script, as in the second example (where "itsArgument" is provided). The second run would, hence, be equivalent to running

% anIlinkScript itsArgument

but with the script-level checkpointing facility in action.

The code for ckpt is in the file ckpt.c. to make an executable version run the command:

make ckpt

|*| Important Caution on Breaking a ckpt Run

The ckpt program executes a system(3) call to invoke a shell in which to run the named script (with it's arguments, if any). Hence, if the user decides to abort execution and breaks execution by hitting, say, Control-C (^C), this will certainly stop the invoked shell, but will not necessarily abort the calling process (ie, ckpt). This has the following deleterious effect: when control returns to ckpt, if it is indistinguishable that the invoked shell was halted prematurely, then ckpt erases its data file, so the next time it is run, it will assume that the previous run exited normally. This is clearly not the desired effect.

Unfortunately, being able to detect premature halting of the invoked shell is dependent upon the value returned by the system() call. This may not work on all operating systems and architectures as desired, making this an unreliable way of stopping execution, should this be desired. It is recommended that, instead, the user do the following:

  1. Suspend the executing process(es), typically by hitting a key like Control-Z (^Z).
  2. Kill the suspended process, usually by typing a command such as "kill %+".

Again, this is not guaranteed to succeed, but should work on most systems. Note, of course, that it requires the shell to support job control and also that the shell was compiled with this feature installed.


::::::::::::::
README.constants
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999

FASTLINK, version 4.0P and beyond

This file describes an issue that will affect all users: how to set certain constants, that vary from run to run depending on the data set and chosen loci. See the file README for a roadmap to all the FASTLINK documentation.

The definitions for most of the constants that a user wants to change have now been set up in such a way that they can be modified in the Makefile without ever having to edit the code. An important consequence is that it is now possible to edit just the Makefile and be able to compile different versions of the programs with different values of of the constants. If you are not an experienced user of the Make utility, consult your system administrator for help in editing the Makefile.

The following .h files contain declarations of constants and data:

commondefs.h             stuff that is common to all 4 main programs
checkpointdefs.h         stuff for checkpointing
gemdefs.h                stuff for GEMINI, common to LODSCORE and ILINK
moddefs.h                stuff specific to fast versions of programs
slowmoddefs.h            stuff specific to slow versions of programs
ildefs.h                 stuff specific to ILINK
lidefs.h                 stuff specific to LINKMAP
lodefs.h                 stuff specific to LODSCORE
mldefs.h                 stuff specific to MLINK
compar.h                 stuff specific to parallel FASTLINK
unknown.h                stuff specific to UNKNOWN

|*| Constant definitions - VERY IMPORTANT!!!

There are at least 2 constants that are defined in moddefs.h that you will want to set before compiling. This means that you can edit the files to put in the appropriate numbers and then compile. The next section explains how to change the constants by editing only the Makefile.

The constants in moddefs.h are

AUTOSOMAL_RUN
SEXDIF_RUN

The user gets a severe warning if either of these constants is set to 0 and should be 1. The program will probably crash after the warning is printed.

AUTOSOMAL_RUN must be 1 if your data is autosomal. It can be 0 if your data is sexlinked. It may be worth it to change it to 0 for a sexlinked run because this will drastically reduce the memory requirements and may make it possible to use the faster versions. In terms of correctness, it always safe to set AUTOSOMAL_RUN to 1.

SEXDIF_RUN must be 1 if your data is autosomal AND you want to allow the male theta and female theta to be DIFFERENT. From our experience, such runs are rare in practice, so we are distributing the code with SEXDIF_RUN set to 0. It is always safe to have SEXDIF_RUN set to 1, but again you can save a lot of memory by setting SEXDIF_RUN to 0. It is safe to set SEXDIF_RUN to 0 if:

  1. your data is sexlinked or
  2. your data is autosomal and you assume male theta = female theta

Here are some other constants that you might need to change occasionally. There is relatively little harm caused by boostong these constants higher.

maxsys: maximum number of loci used in the run; this is most relevant for

         LODSCORE where one may do 2-point analysis on many different
         pairs of loci.

maxlocus: maximum number of loci in one run of one program.

maxall: maximum number of alleles at a numbered-allele or binary-factors locus

maxfact: maximum number of binary factors; should be at least as large as maxall

maxind: maximum number of people in all pedigrees combined

maxped: maximum number of pedigrees

maxchild: maximum number of children of one parent

maxloop: maximum number of loops

fitmodel: is false unless you are estimating some parameter other than theta

ALLLE_SPEED: Is 1 if you want allele renumbering to be used; you should keep it at 1, except when estimating allele frequencies.

|*| Setting Constants by Editing only the Makefile

It is now possible to use the -D feature supported by cc, gcc, and most C compilers to change constants during compilation. For example, the default declaration of maxloop now looks like:

#ifndef maxloop
#define maxloop 6
#endif

This tells the C preprocessor that reads the hashed lines: "If maxloop is not already defined, then set maxloop to be 6".

The way you can make maxloop already defined is to include the string -Dmaxloop=<number> in all the compilation commands. For example, if you wanted maxloop to be 8, you would include the flag
-Dmaxloop=8
in your compilation. This overrides the setting of 6 that is in commondefs.h

See README.Makefile for detailed instructions on how to edit the Makefile to set maxloop and other constants.

|*| Checking how constants are set for a given executable

FASTLINK now includes a -i option (for infor) for ILINK, MLINK, LINKMAP, and LODSCORE that summarizes how the various compilation options/variables are set for a given executable. For example, if you run:

linkmap -i

you get a description of how the program is configured, but nothing interesting is computed. A sample output might be:

Program LINKMAP version 5.10 (1-Feb-1991)

FASTLINK (slow) version 3.0P (29-Sep-1995)

LINKMAP has been compiled with the following options:

         CHECKPOINTING is enabled (DOS not defined)
         SLOW version (LESSMEMORY defined)

Program constants are set to the following maxima:

       8 maximum number of loci (maxlocus)
      15 maximum number of alleles at a single locus (maxall)
    1000 maximum number of individuals in a pedigree (maxind)
       6 maximum number of loops (maxloop)
      16 maximum number of children of a parent (maxchild)

This option works for both sequential and parallel versions of FASTLINK.

Flagless runs now also print out "(slow)" with the version number if the given executable is a "slow" version (as seen in the example above).


::::::::::::::
README.diseq
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: June 28, 1999

Using Conditional Allele Frequencies or Parameterized Disequiliobrium in FASTLINK By Ken Morgan and Alejandro Schaffer

Starting with FASTLINK 4.1, I am exploring the possibility of adding several options to model linkage disequilibrium in FASTLINK. On such option, conditional allele frequencies, has been completed. Conditional allele frequencies means that allele frequencies at markers can depend on the genotype at the disease locus on the same chromosome strand (haplotype). This is slightly more general and flexible than the disequilibrium option currently allowed in LINKAGE/FASTLINK

|*| Basics of using conditional allele frequencies The user is assumed to be familiar with FASTLINK usage and estimating allele frequencies with ILINK.

The constant ALLELE_SPEED works as in FASTLINK. It is located in unknown.h and commondefs.h. Set
#define ALLELE_SPEED 1
to achieve greater speed in non-estimation mode, when using conditional allele frequencies.

Set
#define ALLELE_SPEED 0
#define fitmodel true
to allow for estimation of frequencies.

#define ALLELE_SPEED 0
#define fitmodel true
is always safe but may be unnecessarily slow.

"Conditional allele frequences" means that at a marker, the frequency of an allele may depend conditionally on the allele at the disease locus. E.g., the relative frequencies of the haplotypes

     Disease   1     1
     Marker    1     2

     Disease   2     2
     Marker    1     2

may be quite different.
Estimating conditional allele frequencies is conceptually different from estimating haplotype frequencies because in the former case the disease allele frequencies stay fixed, while in the latter case they do not stay fixed.

To get started it is necessary to make a basic change in the format of datafile.dat.
For each marker locus, put 2 lines of allele frequencies instead of 1.

E.g., instead of:

3 5
0.07000000 0.01000000 0.15000000 0.04000000 0.73000000

put

3 5
0.07000000 0.01000000 0.15000000 0.04000000 0.73000000 0.07000000 0.01000000 0.15000000 0.04000000 0.73000000

So long as the two lines of frequencies are equal and you are not estimating frequencies, the results should be identical to regular FASTLINK. If you want to use conditional allele frequencies, you must similarly double all marker allele frequency lines in datafile.dat. This is a reasonable requirement because if for some marker you do not want the allele frequencies to vary depending on the disease allele, then the two lines can be identical.

Do not change the way in which the disease locus is specified.

To tell unknown that you are using conditional allele frequencies, use unknown -c
instead of
unknown

To tell mlink/ilink/linkmap that you are using conditional allele frequencies, use:
mlink -c
ilink -c
linkmap -c

instead of

mlink
ilink
linkmap

To estimate conditional allele frequencies with ILINK, the procedure is similar to regular ILINK.
At the bottom of datafile.dat are two lines that look like:

k
1 1 1 1 ...

or

k
0 1 1 1 ...

where k is index of the locus for which frequencies are to be estimated, and
the first number of the last line is:
0 if theta stays fixed
1 if theta is to be estimated

the remainder of the last line has (a - 1) 1's where a is the number of alleles at locus k.
Caution: The highest numbered allele, a, must occur or regular ILINK will crash.

For conditional estimation, the last line of datafile.dat should have 2a -1 numbers instead of a numbers. Again the first number may be 0 or 1, and the remaining 2a-2 numbers should be 1's.

When ilink -c is used to estimate frequencies conditionally, part of the output final.dat might look like:

GENE FREQUENCIES :
0.309689 0.424004 0.250168 0.016138
CONDITIONAL (on disease allele) GENE FREQUENCIES : 0.242056 0.303824 0.452248 0.001871

The first line is conditional on the healthy allele at the disease locus. The second line is conditional on the unhealthy allele at the disease locus.

|*| Usage Suggestions

  1. To caluculate LOD scores for linkage under each of the Linkage Disequilibrium (LD) (usingILINK -c) and linkage equilibrium (LE; using ILINK) models requires two analyses:

(a) the log-likelihood for estimated marker allele frequencies and the recombination fraction (theta) between the marker and disease loci;
(b) the log-likelihood for estimated marker allele frequencies and fixed theta=0.5; the difference in the log-likelihoods is converted to a lod score.

2. For the test of LD, compare the change in the log-likehoods under LD and LE where both theta and the marker allele frequencies are jointly estimated.
One may assume that twice this difference is asymptotically distributed as chi-square with k -1 degrees of freedom (where k = number of distinct alleles of the marker locus in the data). (The P-value may need to be estimated empirically for small samples for the situation where one or more conditional allele frequencies are estimated to be 0.)

3. For a test of linkage allowing for linkage disequilibrium, there are tqo approaches.
Approach A: Constrain the two vectors of conditional allele frequencies to be equal when theta=0.5. Approach B: Allow the vectors of allele frequencies to be unconstrained in both cases. Then in the likelihood ratio test for linkage, the allele frequencies (conditional or not) become nuisance paramaters, and there is only 1 degree of freedom. Approach B (as a special case of the general idea of likelihood ratio tests with nuisance parameters) is advocated by Joe Terwilliger.


::::::::::::::
README.ILINK
::::::::::::::

From: ftp-bimas.cit.nih.gov Last Mod: May 30, 1995

|*| What does the output of ILINK and LODSCORE mean?

This file describes the output that the programs ILINK and LODSCORE print to the screen. For the rest of the text we describe things in terms of ILINK because the output for LODSCORE is very similar. The need for this document was suggested by Marcy Speer (Duke).

ILINK uses the GEMINI optimization procedure to find a locally optimal value of the theta vector of recombination fractions. If you use the default scripts produced by lcp, your initial guess for theta is .1 in every dimension. GEMINI evaluates each theta by its likelihood, seeking to find theta vectors that have a higher pedigree likelihood.

The GEMINI procedure has multiple iterations. Each iteration corresponds to one line of output. Each iteration includes multiple likelihood function evaluations. Each iteration has two phases. In Phase I GEMINI seeks to improve the current best theta. In Phase II, GEMINI estimates the gradient of the likelihood with respect to the current best theta vector. In the first iteration, Phase I only evaluates the likelihood at the initial candidate theta.

When ILINK prints out a line such as:
maxcensor can be reduced to -32767,
it has completed the first likelihood function evaluation. On long runs, this fact can be used to estimate running time. A reasonable rough estimate for the number of function evaluations is 10*(number of dimensions of theta vector). The number of dimensions of the theta vector is one fewer than the number of loci in most cases. If maletheta and femaletheta are allowed to differ (sexdif is set to 1), then the number of dimensions doubles to 2 * (number of loci - 1). Estimating other parameters (with fitmodel set to true) can also increase the number of dimensions.

After each iteration, ILINK prints out one line with four pieces of information:

ITERATION is a positive integer showing the number of the iteration just completed.

T is an indication of the step size that the GEMINI procedure takes in updating theta. Sometimes, very small T indicates that GEMINI did many updates (and hence the iteration probably took longer than average) each of which requires a likelihood function evaluation.

NFE is a positive integer indicating how many likelihood function evaluations have been done through that iteration.

F is a scaled representation of -2log(likelihood) at the current best theta. Because of the - sign, the value of F decreases until it reaches a local minimum.

After the last printed iteration, ILINK in FASTLINK does one more likelihood function evaluation for the purpose of computing Ott's Generalized LODSCORE which shows up in final.dat (transferred to final.out by the default pedin scripts). Ott's generalized LODSCORE compares -2log(likelihood) at the locally optimal theta to -2log(likelihood) at a theta that is .5 in every component (i.e. each locus unlinked to all the rest). In LINKAGE ILINK more likelihood function evaluations are done after the last printed iteration line, but these likelihood function evaluations are unnecessary (see paper2.ps from the FASTLINK distribution for more details).

Some users run ILINK and LODSCORE with execution scripts that do not delete the output file outfile.dat upon termination. The file outfile.dat is primarily useful in storing information about the values of certain variables at each iteration; these variables are not of interest, except for those who wish to modify the code. Of interest to users is the last thing in outfile.dat which is some description of the condition under which LODSCORE and ILINK terminated. This is a code stored in the variable idg and takes one of 8 values:

  1. Maximum possible accuracy reached
  2. Search direction no longer downhill
  3. Accumulation of rounding error prevents further progress
  4. All significant differences lost through cancellation in conditioning
  5. Specified tolerance on normalized gradient met
  6. Specified tolerance on gradient met
  7. Maximum number of iterations reached
  8. Excessive cancellation in gradient

Under all circumstances it should be emphasized that if ILINK or LODSCORE is used with only a single starting theta, the output value is only a local optimum and not a global optimum. It is a good idea to try with several different starting thetas. It is perfectly valid to compare the local optima from different starting points and choose the one that gives the best value of -2*log(likelihood); the more staring points tried, the more likely that the best value will be a global optimum.

If ILINK or LODSCORE exits with condition 5 or 6, the output value is pretty safe as a local optimum.

If ILINK or LODSCORE exits with condition 7, the output values are completely unsafe. The source code must be modified to increase iterationMultiple, which is #defined in gemdefs.h.

If ILINK or LODSCORE exits with conditions 1,2,3,4, or 8 the situation is more nebulous, but it is a good idea to try more experiments to test how robust the output values are. Try starting from different initial thetas. One might also try increasing the constant tol in gemdefs.h Increasing tol will have the effect of relaxing the convergence criteria, so that ILINK and LODSCORE may come close to a local optimum, where a smaller tol causes problems.
If increasing tol helps, then one should: find the local optimum with the higher tol reset tol to its previous value
restart the program with the first local optimum as the initial value This experiment will test whether the initial local optimum can be improved by more precise calculations.

ILINK or LODSCORE does not allow the theta values to get down to 0.0. Therefore, if one of the locally optimal thetas is reported as close to 0.0, the situation ought to be explored further using LINKMAP or MLINK, which will allow arbitrarily small values of theta.


::::::::::::::
README.loopfile
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999

              All about loopfile.dat
               by Dylan Cooper and
                  Alejandro A. Schaffer

|*| What is loopfile.dat?

Beginning with version 3.0 of FASTLINK we are making a fundamental change in the way loops are handled. The most important manifestation of the change is that the specifications for the preprocessor program UNKNOWN have changed. In particular, for pedigrees with loops, the new UNKNOWN will produce an extra output file to assist the main program. This applies to ILINK, MLINK, and LINKMAP where the standard scripts call UNKNOWN immediately before calling the main program. The change does not apply to LODSCORE for which the standard scripts do not use UNKNOWN. The main programs will still work correctly if the extra file is not present (in particular, if the old version of UNKNOWN is used) and the extra file gets deleted when the main program exits without a crash.

The file whose name is held in the macro LOOPFILE_NAME (probably "loopfile.dat") is produced by the new unknown.c when LOOPSPEED is defined. This file is used to speed up runs of ILINK, LINKMAP, and MLINK when at least one pedigree contains at least one loop.

The contents of the file and the method by which the speed up was obtained rely on the concept of a loop-breaker vector. A loop-breaker vector is an array of single locus genotypes which assigns one single locus genotype to each loop breaker. To understand more about loop breakers, readers of this file are strongly encouraged to read the FASTLINK documentation files traverse.ps and loops.ps.

|*| Syntax of loopfile.dat

loopfile.dat describes the contents of data structures that are created in unknown.c and used in ILINK, LINKMAP, and MLINK The entries in the file are as follows:

Pedigree: The pedigree for which the following information pertains. Pedigrees are numbered consecutively from 1.

fewer_vects_size: Used for diagnostic output when a malloc fails.

num_loops_considered: Due to space constraints, the number of loops considered in these data structures is bounded. A noticeable speedup is achieved even when only some of the loops in the pedigree are considered. Reducing the macro 'max_vectors_considered' in unknown.c may reduce the value of this variable.

num_loop_vectors: a table indexed by the locus numbers, holding the number of loopbreaker vectors at that locus

loop_vectors: a table indexed by the locus numbers, holding the loopbreaker vectors at each locus

unknown_poss: a table indexed by person id, locus, loopbreaker vector, and single locus genotype. If the corresponding entry is true, the person may have that genotype at that locus when the loopbreakers have been assigned the single locus genotypes specified in the loopbreaker vector.

Single locus genotypes are encoded in order by allele number, discarding genotypes where the second allele is greater than the first allele. (These allele combinations are discarded because phase is unimportant for the calculations.) For example, if a locus has 4 alleles there are ten possible genotypes:

        allele 1  allele 2	genotype
1 1 0 1 2 1 1 3 2 1 4 3 2 2 4 2 3 5 2 4 6 3 3 7 3 4 8 4 4 9

Below is a commented
hypothetical file with comments indicating what each line means. Due to the comments, the placement of white space may be distorted.

Starting in FASTLINK 4.1P, some improvements have been made to the genotype inference code, so that information about some loop breaker vectors that are not consistent (i.e., the asignment of the genotype to each loop breaker causes a violation of Mendelian rules of inheritance) are not printed. As a result, for some multi-loop pedigrees, loopfile.dat will be much shorter be different from the loopfile.dat generated by earlier versions of unknown.

Pedigree: 1                   : This information is for the first pedigree
fewer_vects_size: 800         : Used in error messages
num_loops_considered: 3       : Three loops were considered
num_loop_vectors:
        0 : 6                 : 6 loopbreaker vectors at the locus 0
        1 : 2                 : 2 loopbreaker vectors at the locus 1
        2 : 3                 : 3 loopbreaker vectors at the locus 2
loop_vectors:
        L : 0             : at locus 0
                0 : 1 0 0     :   loopbreaker vector 0 has single locus
                1 : 2 0 0     :      genotype 1, 0, and 0 at loci 0, 1,
                2 : 0 2 0     :      and 2 respectively
                3 : 1 2 0     :   etc
                4 : 2 2 0
                5 : 0 0 1

+
L : 1
0 : 0 0 0
1 : 1 0 0
+
L : 2
0 : 0 0 0
1 : 0 1 0
2 : 0 0 1
+
unknown_poss:

id: 3                   : person 3 is unknown and has children
        L: 0                : at locus 0
                0 : 1          : person 3 can have single locus genotype 1
-                              : - indicates that person 3 is known at locus 0
        L: 1                : at locus 1
                0 : 0 1        : if loopbreakers have vector 1, person 3 can
                1 : 0 1        :    have single locus genotype 0 or 1
+                              : + indicates unknown at locus 1
        L: 2
                0 : 0 1
                1 : 0 1
                2 :            : indicates that no genotypes are possible at
+                                this locus when the loopbreakers are assigned
                                 this loopbreaker vector

id: 4
L: 0
0 : 0 1 2
1 : 0 1 2
2 :
3 : 0 1 2
4 : 0 1 2
5 :
+
L: 1
0 : 0 1 2
-
L: 2
0 : 0 1 2
1 : 0 1 2
2 :
+


::::::::::::::
README.lselect
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999

               A new method of selecting loop breakers
                  Alejandro A. Schaffer

|*| Selecting loop breakers easily and automatically

Exercise 7 on pages 93--96 of Handbook of Human Genetic Linkage by Ott and Terwilliger describes a complicated, interactive method to break loops using the makeped program and the LOOPS program LOOPS [Xie X, Ott J: Finding all loops in a pedigree. Am J Hum Genet 1992; 51:A205].
As a result of innovations in FASTLINK 4.0P and FASTLINK 4.1P, their method is now obsolete.

The new method is as follows:

  1. When running your pre-makeped file through makeped, ALWAYS say that the pedigrees have no loops, even if they do.
  2. Put the post-makeped pedigree file in pedfile.dat.
  3. Put the locus file in datafile.dat.
  4. Run unknown -l

Note that the flag is the letter 'l', not the number '1'. This will produce a new output file called lpedfile.dat
which has all the loops broken for you.

If your goal is to run an lcp-produced script with pedigree file in pedin.dat and locus file in datain.dat, you then

5. Copy lpedile.dat to pedin.dat
6 Copy the locus file to datain.dat

and run your script.

You will see diagnostic output showing that unknown is still trying to find a better loop breaker set for you during the running of the lcp-produced script. The reasons are as follows.

  1. Your pedigree file may have dozens of loci while any specific analysis may have only 2 or 3. The initial loop breaker set selected by
    unknown -l
    attempts to be good for all loci, but may not be optimal for any specific locus subset.
  2. The method used during the main run uses a more precise weight function to choose among the loop breaker possibilities than does the preliminary run of
    unknown -l

::::::::::::::
README.mapfun
::::::::::::::

From: ftp-bimas.cit.nih.gov Last Mod: May 24, 1995

|*| Map Functions Used In LINKAGE/FASTLINK

       by Jeremy Buhler
       Rice University

This README file tries to connect the discussion of mapping functions in Chapter 1 of Ott's book[3] to what actually happens in LINKAGE/FASTLINK.

LINKAGE/FASTLINK uses two functions for calculating map distance: Haldane's map function [1] and Kosambi's map function [2]. These functions are implemented as methods for calculating recombination fractions of flanking markers given the fractions between three adjacent markers.

If we have three loci A, B, and C which are present on the chromosome in the order ABC, we say that A and C are flanking markers. We say that A and B, as well as B and C are adjacent markers. If we know the recombination fractions theta(AB) and theta(BC), we would like to determine the fraction theta(AC). One way to determine theta(AC) is to take the sum theta(AB) + theta(BC); this is Morgan's map function, which equates distance on the linkage map to recombination fraction. This approach implicitly assumes only a single crossover between adjacent loci, which is unreasonable for loci which are not linked fairly tightly (theta < 0.1).

Haldane's map function assumes that crossovers follow a Poisson distribution, with no interference between crossovers. Haldane's function x(theta) is given by

x = -1/2 ln(1 - 2 * theta)

or, inversely,

theta = 1/2 [1 - exp(-2x)]

From this formula, we see that the process of adding recombination fractions while accounting for the new crossover distribution is equivalent to the mathematical manipulation:

x(AC) = x(AB) + x(BC) = -1/2(ln(1 - 2 * theta(AB)) + ln(1 - 2 * theta(BC)))
      = -1/2(ln( (1 - 2 * theta(AB)) (1 - 2 * theta(BC)) ))

theta(AC) = 1/2 [1 - exp(-2 x(AC))]
          = 1/2 [ 1 - (1 - 2 * theta(AB)) (1 - 2 * theta(BC))]
          = 1/2 [ 1 - 1 + 2 * theta(AB) + 2 * theta(BC)
	  = 4 * theta(AB) * theta(BC)]

theta(AC) = theta(AB) + theta(BC) - 2 * theta(AB) * theta(BC)

This formula appears throughout LINKAGE/FASTLINK. Moreover, if we wish to use a map-function-derived theta(AC) and a given theta(AB) to derive theta(BC), we can rewrite the addition formula to find that

theta(BC) (1 - 2 * theta(AB)) = theta(AC) - theta(AB)
theta(BC) = (theta(AC) - theta(AB)) / (1 - 2 * theta(AB))

This last formula is used in LINKMAP to recalculate theta(BC) from the known theta(AC) as B is moved incrementally across the gap between A and C.

Kosambi's map function is based on a model of chiasmal interference. It is given by

    x = 1/2 arctanh(2 * theta) = 1/4 ln((1 + 2 * theta) / (1 - 2 * theta))

or, inversely,

    theta = 1/2 tanh(2x) = 1/2 (exp(4x) - 1) / (exp(4x) + 1)

Under this mapping function, addition of recombination fractions is equivalent to the following manipulation:

x(AC) = x(AB) + x(BC)
          1    / 1 + 2 * theta(AB) \   1    / 1 + 2 * theta(BC) \
      =   - ln | ----------------- | + - ln | ----------------- |
          4    \ 1 - 2 * theta(AB) /   4    \ 1 - 2 * theta(BC) /

      1    / 1 + 2 * theta(AB) + 2 * theta(BC) + 4 * theta(AB) * theta(BC) \
   =  - ln | ------------------------------------------------------------- |
      4    \ 1 - 2 * theta(AB) - 2 * theta(BC) + 4 * theta(AB) * theta(BC) /

theta(AC) = 1/2 (exp(4x(AC)) - 1) / (exp(4x(AC) + 1)

        1 + 2 * theta(AB) + 2 * theta(BC) + 4 * theta(AB) * theta(BC)
        ------------------------------------------------------------- - 1
    1   1 - 2 * theta(AB) - 2 * theta(BC) + 4 * theta(AB) * theta(BC)
  = - ---------------------------------------------------------------------
    2   1 + 2 * theta(AB) + 2 * theta(BC) + 4 * theta(AB) * theta(BC)
        ------------------------------------------------------------- + 1
        1 - 2 * theta(AB) - 2 * theta(BC) + 4 * theta(AB) * theta(BC)

     1 4 * theta(AB) + 4 * theta(BC)
  =  - -----------------------------
     2 2 + 8 * theta(AB) * theta(BC)


theta(AC) =  (theta(AB) + theta(BC)) / (1 + 4 * theta(AB) * theta(BC))

If the user specifies that interference is to be included in the model and sets the parameter "independent" (in datain.dat) to 2, then (and only then) is Kosambi's mapping function used instead of Haldane's. References

[1] Haldane, J.B.S. 1919. "The combination of Linkage values and the calculation of distances between the loci of linked factors." J. Genet. 8:299-309.

[2] Kosambi, D.D. 1944. "The estimation of map distances from recombination values." Ann. Eugen. 12:172-75.

[3] Ott, J. 1991. Analysis of Human Genetic Linkage (Revised Edition). Baltimore: Johns Hopkins U. Press.


::::::::::::::
README.memory
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999

FASTLINK, version 3.0P and beyond

This file discusses memory requirements for FASTLINK. See the top level README file for a roadmap to all FASTLINK documentation.

|*| Memory Requirements

The FASTLINK programs can require large amounts of memory when doing multilocus analysis. Of course the amount of memory required is very dependent on the number of loci and the number of alleles at each locus. However even 100 Mb is not a problem to run under Sun OS for instance, because this is a virtual memory operating system. Ideally one would want to run a program of this size on a machine with 32 Mb of memory, but in our experience it is possible to run on machines with as little as 12 Mb.

Of course it is necessary to have a swap file with sufficient space to run the OS and have enough free space for the program.

To see how much space a program requires in version 2.0 or earlier, it was possible to use the unix command:

size <program name>

for instance using linkmap from FASTLINK, 2.0:

unix> /usr/bin/size linkmap

        text    data    bss     	dec     	hex
        139264	8192	28719480	28866936	1b87978

This value under "dec" is the decimal number of bytes for the whole program. So we see in this case that 28.9 Mbytes is required.

Then compare this with the unix pstat command:

unix> /etc/pstat -s
14880k allocated + 3712k reserved = 18592k used, 169000k available

This indicates that a total of 187592 Kbytes or 187 Mb has been allocated on this system for swap space, and with the current job mix, 18.5 Mb are used and 169 Mb are available. So in this case linkmap will be able to run.

To enlarge the swap space consult your local system administrator. For a single user system running FASTLINK we recommend 150Mb total swap space as a minimum.

Alternatively, use the "slow" versions of the programs. The term slow is a little misleading in that these versions will still be significantly faster than the originals. In the case of linkmap, the version compiled with
make slowlinkmap
with the current constant settings is less than 1 Mb in size. Any unix system should have a swap file large enough for this.

Starting in version 2.1 of FASTLINK a lot of the memory allocation is done dynamically (See README.updates). In version 2.2 and beyond, almost all the large data structures are allocated dynamically. This means that you will not be able to detect before running whether you have enough memory. If you do not have enough, the program should exit politely with an explanation, shortly after startup. The advantage of doing the memory allocation at runtime is that it may be possible to use significantly less memory based on knowledge of certain parameters that are available only at runtime.


::::::::::::::
README.scaling
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999

FASTLINK, version 4.0P

|*| Output values from FASTLINK are scaled

The output log likelihood values printed by both LINKAGE and FASTLINK are scaled on some pedigrees by an additive constant that depends on the pedigree structure and selection of loop breakers, if any. This means that output log likelihood values should be used only by subtracting one from another to obtain a LOD score. This problem first surfaced in the initial release of FASTLINK because FASTLINK/LINKMAP uses a different scaling convention from LINKAGE/LINKMAP. See the next section below.

The scaling issue became fundamental with FASTLINK 4.0P where the change in loop breaker choice means that the raw output is unlikely to match earlier versions on looped pedigrees. The reason is that when FASTLINK 4.0P changes the selection of loop breakers, this has the side effect of changing the scaling constant. Therefore, FASTLINK 4.0P can be compared for correctness to earlier versions only by comparing LOD scores.

More changes in both loop breaker selection and genotype inference for looped pedigrees were made in version 4.1P. So the printed log likelihood values for versions 4.0P and 4.1P may differ on looped pedigrees. In general, the value for 4.1P should be the same or smaller in magnitude, indicating that less time is being wasted exploring unnecessary genoype combinations for the loop breakers.

|*| Scaling discrepancy - IMPORTANT FOR LINKMAP USERS

Prof. Ellen Wijsman (U. Washington) brought to our attention a situation in which FASTLINK versions of LINKMAP print out some values that differ from those printed out by LINKMAP in LINKAGE 5.1. What follows are two explanations for the discrepancy, one short, and one long.

Short Explanation. The different values represent differences in scaling LINKMAP's representation of the likelihood value. If you run the post-processor program which computes odds, the discrepancies will disappear.

Long Explanation. Because LINKAGE computes with very small numbers, these numbers must be scaled to avoid underflow. Any (log) likelihood values that are printed out by any of the LINKAGE programs are actually scaled by some amount that depends on the structure of the input pedigree(s). Various scaling rules can be used. In LINKAGE 5.1, the programs LODSCORE, ILINK, and MLINK all use the same scaling rules, while LINKMAP uses different scaling rules. We could find no internal or external documentation to explain this difference. The difference arises only for some pedigrees that have loops.

To increase the amount of code that the four programs can share in our versions, we have decided to make our LINKMAP use the same scaling rules as the other three programs.

If you would like details on how to modify our LINKMAP to make it consistent with the old LINKMAP contact schaffer@cs.rice.edu. The necessary editing is simple, but you would have to edit the code each time you switch between LINKMAP and one of the other three programs.


::::::::::::::
README.time
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: July 13, 1995

FASTLINK, version 3.0P and beyond

How long will a sequential FASTLINK run take? This turns out to be extremely difficult to estimate ahead of time, but relatively easy to estimate once the run is underway.

Each FASTLINK run evaluates the same likelihood function at different candidate thetas vector inputs. For MLINK and LINKMAP the user specifies all the candidate theta vectors. For ILINK and LODSCORE they are generated on the fly. It is reasonably safe to assume that each candidate theta takes roughly the same time to evaluate. Therefore, if you know how many candidate thetas there will be, you multiply the number of thetas times the running time for one theta.

Caution: This approach will not work on a computer where the load (from other users) is varying significantly during the run.

You can estimate the time for one theta by watching the screen. When the first output gets printed after the header information, one theta is complete. Alternatively, MLINK and LINKMAP take a checkpoint after every theta. Therefore, by comparing the timestamps of the files checkpoint.LINKMAP and checkpoint.LINKMAP.bak (or checkpoint.MLINK and checkpoint.MLINK.bak), you may infer how long one candidate theta takes to evaluate. ILINK and LODSCORE usually take checkpoints every one or two thetas, so you must be more careful in making inferences from the timestamps of those checkpoint files. The timestamp of a file can usually be determined with the command "ls -l".

The number of thetas for ILINK and LODSCORE cannot be predetermined, but a good estimate is (10 * number of loci) if you have sex-averaged thetas. If male theta and female theta differ, estimate with (20 * number of loci). After each iteration, ILINK and LODSCORE print an update in which the number following the string NFE (number of function evaluations) is the number of candidate thetas already evaluated. See README.ILINK for more details. These NFE numbers can be used to estimate how much more work remains to be done by using the formula:

    (((Number of thetas estimated) / (Number of thetas completed)) 
    * (running time so far)) - (running time so far)

::::::::::::::
README.trouble
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: November 28, 1995

                LINKAGE/FASTLINK Troubleshooting
                Alejandro A. Schaffer

LINKAGE and FASTLINK produce lots of different error messages that may be difficult to understand. This file briefly summarizes the error messages in five groups.

  1. Error messages in LINKAGE that were inherited in FASTLINK These all have numbers, although the numbers are not printed out.
  2. Error messages new to FASTLINK
  3. Incompatibility errors in UNKNOWN
  4. Error messages in lsp. These error messages are especially cryptic. They all have a 6 letter error code and little other information.
  5. Error messages in lrp. Unfortunately, the error printing routine ignores a lot of useful diagnostic information that is passed to it.

The way to use this file is to note down the error message you got and then use grep to find it in this file to figure out what may be wrong.

Almost all the common errors in the first and third categories are white-space placement errors. Thus the error message should be interpreted only as a clue of what is the vicinity of the error in the data files.

|*| Error Messages in LINKAGE Main Programs and UNKNOWN

The main programs and UNKNOWN use the same error routine, although in practice some of the errors can occur only one place or the other. This section describes the errors that have been reported in LINKAGE all along. Other errors may be found in the next two sections

Error Number:0
Message: Number of loci 17 exceeds the constant maxlocus What it means: maxlocus is the maximum number of loci that can be used

               simultaneously in a run. You can increase maxlocus
               by changing commondefs.h or Makefile.

Error Number:1
Message: Number of loci read . Less than minimum of 1 What it means: The first number in locus file (datain.dat) or datafile.dat

               is mangled; you probably erred in using preplink to
               prepare the locus file.

Error Number:2
Message: Error detected reading loci order. Locus number 17 in position 5 exceeds number of loci What it means: The third line of your locus file has no locus 17 on it, but you

               asked lcp to use hat locus. This probably occurred
               by using a text editor to add new loci to the locus file
               and forgetting to update the locus order on line 3.

Error Number:3
Message: Error detected reading loci order. Illegal locus number 17 in position 2
What it mens: Your lcp script wants to use locus 17, but your locus

              file does not have 17 loci described. This can occur when
              you mix-up data sets.

Error Number:4
Message: Error detected reading loci order. Locus number repeated in positions 2 and 3
What it means: Your probably made a typo in lcp and used the same locus in two different positions of the fixed locus map

Error Number:5
Message: Error detected reading locus description. Illegal locus type 7 for locus 6
What it means: The first number in the description of each locus in the locus file must be 1,2,3,or 4.

Error Number:6
Message: Error detected reading locus description for system 7. Number of alleles 25 exceeds maxall
What it means: One of your loci is described as having 25 alleles in the

               locus file. maxall is a constant limiting the maximum
               number of alleles at a locus. You can increase maxall to
               more than 25, by changing unknown.c, commondefs.h, or
               Makefile.

Important: Versions of FASTLINK earlier than 3.0P cannot handle maxall > 31.

Error Number:7
Message: Error detected reading locus description for system 6. Illegal number of alleles 0
What it means: One of your loci is described as having 0 alleles.

               This is likely a white space error in the locus file
               causing the wrong string to be interpreted as the number
               of alleles

Error Number:8
Message: Error detected reading locus description for system 6. Number of factors 17 exceeds maxfact What it means: Similar to error number 6. There is a constant maxfact

         that is the maximum number of binary factors allowed at a
         locus of that type. You can change maxfact in unknown.c
         commondefs.h and Makefile.
Important: Set maxfact and maxall to the same value; FASTLINK cannot
           handle maxfact > 31.

Error Number:9
Message: Error detected reading locus description for system 6. Illegal number of factors 0
What it means: Very similar to error number 7. 7 appears for numbered allele loci, while 9 appears for binary factors loci.

Error Number:10
Message: Error detected reading locus description for system 6. Alleles not codominant.
THIS ERROR IS OBSOLETE

Error Number:11
Message: Error detected reading pedigree record 17. Illegal code for sex 8. What it means: The column for gender is the eighth column in pedin.dat and the fifth column in the input to MAKEPED. Error 11 can be caused by entering either the wrong value for the gender or having a white-space error that causes the wrong column to be read as gender. Be especially careful to have exactly one carriage return after the entry for each person, and no other carriage returns.

Error Number:12
Message: Error detected reading pedigree record at pedigree 17. Maximum number of pedigree records exceeded What it means: The maximum number of pedigrees is determined by

         the constant maxind, which can be changed in commondefs.h
         unknown.c, and Makefile. You may have truly exceeded maxped or
         you may have a white-space error.

Error Number:13
Message: Error detected reading pedigree record 501. Maximum number of individuals exceeded, What it means: Similar to error 12. The maximum number of people in a a data set is determined by the constant maxped.

Error Number:14
Message: Error detected reading pedigree record 300. Illegal binary factor code 2.
What it means: Binary factors must be 0 or 1. Usually this error occurs

        because of a white-space problem that causes lsp to look in the
        wrong columns.

Error Number:15
Message: Error detected reading pedigree record 300. No allelic pair for genotype.
THIS ERROR IS OBSOLETE

Error Number:16
Message: Error detected reading pedigree record 300. Allele number 25 exceeds maxall. What it means: A numbered allele cannot have a value larger than the constant maxall. See error 6.

Error Number:17
Message: Error detected reading pedigree record 300. Illegal allele number -1.
What it means: You have a negative allele number in your input file.

               I have not figured out any plausible circumstances under which
               this error could occur.

Error Number: 18
Message: Number of systems after factorization 60 exceeds maxsystem THIS ERROR IS OBSOLETE

Error Number:19
Message: Number of systems after factorization 0 less than minimum of 1. THIS ERROR IS OBSOLETE

Error Number:20
Message: Number of recombination types 100 exceeds maxrectype THIS ERROR IS OBSOLETE

Error Number:21
Message: Number of recombination types 0 less than minimum of 1. THIS ERROR IS OBSOLETE

Error Number: 22
Message: End of file detected in tempdat by procedure readthg before all data found
THIS ERROR IS OBSOLETE

Error Number: 23
Message: Error detected reading iterated locus in datafile. Value (7) greater than nlocus
What it means: You are using ILINK to estimate allele

         frequencies or something else, and you gave a locus number
         that is too high.

Error Number: 24
Message: Error detected reading iterated locus in datafile. Illegal value (-1)\n",
What it means: Similar to error 23, but this one occurs when

         the locus number is negative.
         I have not figured out any plausible circumstances under which
         this error could occur.

Error Number: 25
Message: Number of iterated parameters greater then maxn. What it means: The number of parameters that you can simultaneously

        estimate in ILINK is determined by the constant maxn, which can
        be increased in ildefs.h or Makefile. You have exceeded maxn
        in the way your datafile.dat is set up. Could be caused by a
        white-space error.

Error Number: 26
Message: Error detected reading pedigree record 200. Liability class (9) exceeds nclass.
What it means: When you specify a locus as as an affection status locus,

         you may specify different liability classes that get numbered
         1,2,3... If you assign an individual a class number in the pedigree
         file that is higher than the number of liability classes
         specified, then error 26 occurs. It is important to remember that
         affection status loci get 1 column is no liability classes are used
         and 2 columns if classes are used. Therefore, this error can occur
         if you specify an affection status locus to have liability classes
         in the locus file, but forget to specify the class in the
         pedigree file.

Error Number: 27
Message: Error detected reading pedigree record 200. Illegal liability class (0).
What it means: See error 26. In this case the liability class is being read

         as a number that is too low (rather than too high), but the likely
         causes are the same as for 26.

Error Number: 28
Message: Error detected reading locus description for system 1. Liability classes (100) exceed maxliab. What it means: The maximum number of liability classes at a locus is

        determined by the constant maxliab, which can be set in
        unknown.c, commondefs.h, or Makefile.

Error Number: 29
Message: Error detected reading locus description for system 2. Illegal number of liability classes (-1)\n", What it means: The number of liability classes that you specified for

         an affection status locus is too low. This could be a
         white-space error.

Error Number: 30
Message: Error detected reading locus description for system 2. Penetrance out of range"
What it means: You specified a penetrance for a liability class of

         an affection status locus as a number bigger than 1.0. Probably
         a white-space error.

Error Number: 31
Message: Error detected reading locus description for system 2. Number of traits (17) exceeds maxtrait What it means: The maximum number of traits for a quantitative trait

         locus is determined by the constant maxtrait, which can be
         set in unknown.c, commondefs.h, or Makefile.

Error Number: 32
Message: Error detected reading locus description for system 2. Number of traits out of range (-1) What it means: Similar to error 31, but now the number of traits is too low. Probably a white-space error.

Error Number: 33
Message: Error detected reading locus description for system 3. Variance must be positive
What it means: You specified a variance for a quantitative trait as

         0 or less. Almost certainly what happened is that a 0
         was read because of a white-space error.

Error Number: 34
Message: Error detected reading locus description for system 2. Variance multiplier must be positive What it means: Similar to error 33.

Error Number: 35
Message: Error detected reading locus description for system 1. Risk allele (17) exceeds nallele What it means: You are doing a risk assessment and you specified an

          allele number that is higher than the number of alleles possible
          for that locus.

Error Number: 36
Message: Error detected reading locus description for system 2. Illegal risk allele (0)
What it means: Similar to 35, but here the risk allele number is 0 or less. Probably a white-space error.

Error Number: 37
Message: Error detected reading datafile. Risk locus (5) exceeds nlocus What it means: The locus at which you want to do a risk analysis

         is specified as an index that is higher than the number
         of loci you specified in the lcp script.

Error Number: 38
Message: Error detected reading datafile. Illegal value for risk locus (0) What it means: Similar to 37, but now the risk locus number is too low. Probably a white-space error.

Error Number: 39
Message: Error detected reading datafile. Mutation locus (5) exceeds nlocus What it means: Similar to 37, but this occurs when you are using the mutation model, rather than risk analysis.

Error Number: 40
Message: Error detected reading datafile. Illegal value for mutation locus (0) What it means:Similar to 38, but this occurs when you are using the mutation model, rather than risk analysis.

Error Number: 41
Message: Error detected reading datafile. Linkage disequilibrium is not allowed with this program What it means: You are trying to allow for linkage disequilibrium and trying to use LODSCORE. Use ILINK instead.

Error Number: 42
Message: Locus 17 in lod score list exceeds nlocus 5 What it means: Essentially the same as error 2, but you get this

                one if you use LODSCORE because the lcp script format for
                lodscore is different.

Error Number: 43
Message: Illegal locus number 0 in lod score list What it means: Similar to error 42, but now the locus number is too high instead of too low.

Warning number: 0
Message: Illegal sex difference parameter 3 Parameter should be 0, 1, or 2 What it means: The first number after the last locus description in

               the locus file indicates whether you want male theta and
               female theta to be different
               Codes are:
               0 -- no difference (almost everyone uses this)
               1 -- difference, but no females seen yet
               2 -- difference (common value for sex difference)
               This is probably a white-space error

Warning number: 1
Message: Illegal interference parameter 17 Lack of interference assumed What it means: The second number after the last locus description in

                the locus file indicates whether you want interference (1)
                or mapping (2). No interference (the common case) is 0.

Warning Number: 2
Message: Illegal sex difference parameter 1 Parameter must be 0 with sex-linked data What it means: You are using X-chromosome data and you specified that

      male theta should be different from female theta in datain.dat.
      This number is the first number after the last locus description
      in the locus file. This warning may be harmless.

Warning Number 3
Message: Non-standard affection status 6 interpreted as normal in pedigree record 200
What it means: The affection status of a person can be 0,1, or 2. You probably have a white-space error. This warning should not be ignored.

|*| Error Message Introduced in FASTLINK

Message: WARNING: You are doing an autosomal run but have AUTOSOMAL_RUN set to 0 What it means: Change AUTOSOMAL_RUN to 1 in moddefs.h

Message: You probably need to run the slower version of this program What it means: FASTLINK can be configure to use more memory "fast version"

         or less memory "slow version". You are using the fast version and have
         run out of memory. Recompile to get the slow version instead, with
          make installslow.

Message:Problem with malloc, probably not enough space What it means: You are out of memory, get more swap space.

Message: Your pedigree has more loops than allowed by the constant maxloop What it means: You must increase maxloop in commondefs.h. Starting

      with FASTLINK 3.0P maxloop will occur also in unknown.c
      You are *strongly encouraged* to read loops.ps.

Message: The program will exit politely to allow you to correct the problem What it means: I am sparing you a core dump

Message: Error opening ipedfile.dat and pedfile.dat. What it means: Something is wrong in your lcp script or your usage of it

Message: NOTE: attempting to continue previous (unfinished) run What it means: FASTLINK thinks you want to recover from a crash

Message: Data recovered
What it means: FASTLINK is recovering from a crash whether you like it or not.

Message: Illegal instruction (on Suns)
What it means: maxhap is probably too big causing you to blow out the stack in segdown or segup

Message: The next pedigree appears to have an unbroken loop What it means: You failed to use properly the loops program as part of makeped See Chapter 7 of Terwilliger and Ott

|*| Incompatibility Errors in UNKNOWN

One of the main purposes of UNKNOWN is to detect violations of Mendelian rules of inheritance. In LINKAGE and FASTLINK, through version 2.2, error detection was done only for loopless pedigrees and the program would report only the erroneous pedigree/locus pair.

In FASTLINK 2.3P, the loopless error checking was improved so that the program now pinpoints the nuclear family which contains the error. It is not possible for the program to determine automatically whether it is a parent or a child (or both) whose genotype must be changed.

Sometimes, the program will pinpoint multiple nuclear families that are in error in the same pedigree. In this situation, only the first nuclear family is sure to be wrong; the others may be propagated consequences of the first error detected. It may not be possible to determine whether they are separate errors or not without correcting the first error. If you want to see the first error only, change the default value of the constant ONE_ERROR_ONLY to 1.

In FASTLINK 3.0P, UNKNOWN now detects incompatibility errors in looped pedigrees. However, it reports only the pedigree/locus pair. If you wish to have the nuclear families pinpointed, then artificially remove all the loops by replacing every number that is 2 or higher in column 9 of the pedigree file with a 0. Then re-run UNKNOWN. Do not throw away your original pedigree file, since you will want to fix the genotype errors there and use that file for the actual computations.

Here are some UNKNOWN-specific error messages:

Message: Reduce max_vectors_considered to 9999 What it means: You have a looped pedigree, probably with multiple

               loops. UNKNOWN is running out of memory keeping track of
               all the possible loop breaker vectors. If you reduce the
               constant max_vectors_considered you trade space for time.
               The genotype inference for loops becomes less precise, but
               takes less space.

Message: Error opening pedfile.dat in UNKNOWN What it means: pedfile.dat is not there or you do not have permission

               to read it. This error could arise if you are doing
               multiple runs in the same directory simultaneously
               (this is a no-no for both LINKAGE and FASTLINK) or
               your directory permissions are not set up properly.

Message: foundped() found 0 pedigrees - UNKNOWN What it means: Something is seriously wrong with pedfile.dat.

               It's hard to imagine what could cause this, but
               the message is in there for safety.

Message: Press <Enter> to continue
What it means: Recent versions of UNKNOWN ask for an interactive response

               when errors occur. This was introduced
               by Terwilliger and Ott. Press <Enter> if you want
               incompatibilities checked for the remaining pedigrees in your
               data set. Otherwise, kiil the program.

Message: You must increase the constant maxloop What it means: In FASTLINK 3.0P, maxloop is defined in both unknown.c and

               commondefs.h. In bothe files, the value must be at least
               as large as the number of loops in each pedigree. In
               previous versions of FASTLINK, maxloop appeared only
               in commondefs.h. Edit unknown.c and commondefs.h to
               increase maxloop.

Message: One incompatibility involves the family in which person 17 is a parent
What it means: You have a violation of Mendelian rules of inheritance in

               the current pedigree. This message will be printed
               before the message for the whole pedigree.
               Here "family" means "nuclear family", including parents
               and children.
               The first nuclear family that is pinpointed definitely has
               an error (see the introduction to this section. Note that
               the individuals are counted starting at 1 with each pedigree, so
               17 means the 17th person listed in pedfile.dat for the current
               pedigree. Note that if 17 is involved in multiple marriages,
               each of these should be checked.

Message: One incompatibility involves the family in which person 9 is a child
What it means: Essentially the same as the previous error message, except that

               there are two ways of flagging errors depending on how the
               pedigree is traversed.

Message: The next pedigree appears to have an unbroken loop What it means: The program is getting into an infinite loop probably because

               you have not broken a loop properly. The LINKAGE preprocessor
               program MAKEPED can be used to break loops before
               running UNKNOWN.

Message: ERROR: Incompatibility detected in this family for locus 2 What it means: This is the overall incompatibility message for a

               pedigree. Here "family" means "pedigree".
               Note that locus 2 here is post-lsp locus numbering.
               so it means the second locus in your analysis.

Message: ERROR: File empty or inconsistent. What it means: One of pedfile.dat and datafile.dat is not there or has the wrong permissions.

|*| LSP Error Messages

In many of the following error codes, substituting S for P in the fifth letter means the problem is in the secondary file, rather than the primary file. Almost nobody uses secondary files.

Code: LN1RPR
What it means: First line of datain.dat does not have 4 numbers on it The 4 numbers are:
Number of loci Risk locus X-linked Program code

Code: NOLIPR
What it means: Number of loci is lees than 1 or bigger than the maximum allowed by lsp.

Code: RKLIPR
What it means: Risk locus is < 0 or bigger than the number of loci

               Risk locus should be 0 unless you want to do a risk
               calculation

Code: XLKIPR
What it means: The X-linked status is something other than 0 (autosomal) or 1 (X-linked)

Code: PRGIPR
What it means: Program code is not valid

Code: MPLXPR
What it means: Program code is not valid

Code: NLEXPR
What it means: I wish I knew!

Code: LN2RPR
What it means: There is a problem reading the second line of the locus file This should have 4 numbers:
Mutation locus Male mutation Rate Female Mutation Rate Disequilibrium Unless you are a LINKAGE wizard, I *strongly* recommend that this line should always be:
0 0.0 0.0 0

Code: MTLIPR
What it means: Mutation locus is out of range

Code: MMRIPR
What it means: Male mutation rate is out of range

Code: FMRIPR
What it means: Female mutation rate is out of range

Code: MTMXPS
What it means: Mutation locus index is not 0

Code: DISIPR
What it means: Disequilibrium is not 0 or 1

Code: DENXPR
What it means: Disequilibrium is not 0

Code: LN3RPR
What it means: Problem reading the 3 line of locus file that

               specifies the locus order. Usually this means that
               number of entries on this line does not match the
               number of loci specified in the first line of the locus file

Code: LN5RPR
What it means: Problems reading line with sex difference and interference

Code: LN6RPR
What it means: Problems reading line with male recombination fractions

Code: LN7RPR
What it means: Problems reading line with female recombination fractions

Code: LCOIPR
What it means: Entry in locus order is not between 1 and the number of loci specified.

Code: SXDIPR
What it means: Problems reading the sex difference entry in the line

               immediately after the last locus, which has two numbers:
               Sex difference Interference

Code: INFIPR
What it means: Problems reading the interference entry, which should be 0, 1 or 2.

Code: MRFIPR
What it means: Male recombination fraction not in the range [0.0, 1.0]

Code: GDRIPR
What it means: Problems reading either the sex difference ratio

Code: FRFIPR
What it means: Female recombination fraction not in the range [0.0, 1.0]

Code: PNORPP
What it means: Problems reading column 1 entry in pedigree file.

               This is the most common lsp error. It occurs when there
               are extra blanks at the end of the file

Code: IIDRP
What it means: Problems reading column 2 entry in pedigree file.

Code: PIDRP
What it means: Problems reading column 3 entry in pedigree file.

Code: MIDRP
What it means: Problems reading column 4 entry in pedigree file.

Code: FOSRPP
What it means: Problems reading column 5 entry in pedigree file.

Code: NPSRPP
What it means: Problems reading column 6 entry in pedigree file.

Code: NMRSPP
What it means: Problems reading column 7 entry in pedigree file.

Code: SEXRPP
What it means: Problems reading column 8 entry in pedigree file.

Code: PRORPP
What it means: Problems reading column 9 entry in pedigree file.

Code: QANRPP
What it means: Problems reading value for quantitative locus in pedigree file. Beware of spurious carriage returns

Code: AFFRPP
What it means: Problems reading affection status entry in pedigree file Beware of spurious carriage returns

Code: BINRPP
What it means: Problems reading binary code entry in pedigree file Beware of spurious carriage returns

Code: ALERPP
What it means: Problems reading allele entry in pedigree file Beware of spurious carriage returns

Code: FLDRPR
What it means: Cannot find two entries on the first line of a locus

               description. First entry is locus type, meaning
               second entry depends on locus type.

Code: LDCIPR
What it means: First entry in a locus description is something other than 1,2,3,4

Code: NALIPR
What it means: Second entry of a locus description is < 1

Code: FGFRPR
What it means: Problems finding an allele frequency

Code: GFQIPR
What it means: Allele frequency is not in the open interval (0.0,1.0)

               Beware that Genethon publishes some allele frequencies
               as 0.0

Code: GFSXPR
What it means: Warning if allele frequencies sum to < 0.95 or more than 1.05

Code: NQVRPR
What it means: Problems reading a quantitative trait locus

Code: NQVIPR
What it means: Number of classes for a quantitative trait locus is < 1

Code: GTMRPR
What it means: Problem reading details of a quantitative trait locus

Code: VARRPR
What it means: Problems reading variance for quantitative trait locus

Code: VARIPR
What it means: A variance component is < 0.0

Code: CVMRPR
What it means: Problems reading a covariance component

Code: VMLRPR
What it means: Something to do with a quantitative trait locus, but I don't know what

Code: VMLIPR
What it means: Something to do with a quantitative trait locus, but I don't know what

Code: NLCRPR
What it means: Problems reading number of liability classes for affection status

Code: NLCIPR
What it means: Number of liability classes is < 1

Code: GTPRPR
What it means: Problems reading a penetrance

Code: GTPIPR
What it means: A penetrance is not in the range [0.0, 1.0]

Code: NBFRPR
What it means: Problems reading number of factors for a binary factors locus

Code: NBFIPR
What it means: Number of factors is < 1

Code: BFCRPR
What it means: Problems reading the meaning of a binary factor combination

Code: BFCIPR
What it means: A binary factor is not 0 or 1

Code: RKAPR
What it means: Problems reading risk allele

Code: RKIPR
What it means: Risk allele is < 1

Code: CMDRCI
What it means: Problems parsing the arguments to lsp

Code: CMDOPN
What it means: Cannot open one of the data files or arguments to lsp are wrong

Code: PEDRCI
What it means: Not enough arguments to lsp

Code: PEDOPN
What it means: Cannot open one of the data files or arguments to lsp are wrong

Code: PARRCI
What it means: Not enough arguments to lsp

Code: PAROPN
What it means: Cannot open one of the data files or arguments to lsp are wrong

Code: NOLRCI
What it means: Not enough arguments to lsp

Code: NOLICI
What it means: Number of loci given to lsp is < 2 or too many

Code: LCORCI
What it means: Not enough arguments to lsp

Code: LCOICI
What it means: Invalid locus number in locus order

Code: INFRCI
What it means: Not enough arguments to lsp

Code: INFICI
What it means: Interference value is not 0,1, or 2 in call to lsp

Code: SXDRCI
What it means: Not enough arguments to lsp

Code: SXDICI
What it means: Sex difference argument to lsp is not 0,1, or 2

Code: MRFRCI
What it means: Not enough arguments to lsp

Code: MRFICI
What it means: Male recombination fraction argument to lsp is not between 0.0 and 1.0

Code: GDRRCI
What it means: Nor enough arguments to lsp

Code: GDRICI
What it means: Problems reading genetic distance ratio as argument to lsp

Code: FRFRCI
What it means: Not enough arguments to lsp

Code: FRFICI
What it means: Problems reading a female recombination fraction as an argument to lsp

Code: CMDPAR
What it means: Too many arguments to lsp

Code: PDFOPN
What it means: Problems opening pedigree file

Code: DTFOPN
What it means: Problems opening data file

Code: LOGOPN
What it means: Problems opening lsp logfile

Code: STMOPN
What it means: Problems opening stream file

Code: LEPIPR
What it means: I wish I knew

Code: LEPRPR
What it means: You cannot do this with LODSCORE or ILINK

Code: GNPIPR
What it means: Problems with iterated parameters

Code: GNPRPR
What it means: You cannot do this with LODSCORE or ILINK

Code: TLCRCI
What it means: Not enough arguments to lsp

Code: TLCIC
What it means: Locus number is < 1 or too high as argument to lsp

Code: STVRCI
What it means: Not enough arguments to lsp

Code: STVRCI
What it means: Stop value for moving theta is not between 0.0 and 1.0

Code: GRSRCI
What it means: Not enough arguments to lsp

Code: GRSICI
What it means: Number of evaluations in interval or LINKMAP is < 1

Code: RFVRCI
What it means: Not enough arguments to lsp

Code: RFVICI
What it means: For MLINK usage recombination fraction to vary is < 1 or > number of loci

Code: INVRCI
What it means: Not enough arguments to lsp

Code: INVICI
What it means: Increment value for MLINK is <= 0.0

Code:NOERCI
What it means: Not enough arguments to lsp

Code: NOEICI
What it means: Number of additional likelihood evaluations for MLINK is < 0 or > some specified limit.

Code: IRFRCI
What it means: Not enough arguments to lsp

Code: IRFICI
What it means: Initial recombination fraction for MLINK is not in the range [0.0, 1.0]

Code: INTERR
What it means: Internal error in lsp. Heaven help you if you get this code!

Code: CMDNTF
What it means: Lsp does not understand how to set up for this program

               I think you get this if you ask to run a program that is
               not one of the LINKAGE main programs.

Code: CMDNTU
What it means: Similar to CMDNTF. I can't tell the difference.

Code: CMDNOD
What it means: Probably some junk characters in input

Code: SPDRCI
What it means: Looking for name of secondary pedigree file and can't find it

Code: SPDOPN
What it means: Problems opening secondary pedigree file

Code: SPRRCI
What it means: Looking for name of secondary locus file and can't find it

Code: SPROPN
What it means: Problems opening secondary locus file

Code: OPDRCI
What it means: Problems finding the name of output pedigree file (to use as input to unknown)

Code: OPDOPN
What it means: Problems opening output pedigree file

Code: OPRRCI
What it means: Problems finding the name of output locus file

Code: OPRRCN
What it means: Problems opening output locus file

Code: FTLXSP
What it means: Problems setting up secondary pedigree file

Code: SPEMP
What it means: Individual has index 0

Code: FSKXSP
What it means: Problems with secondary pedigree file

Code: PPDEMP
What it means: Problems reading a pedigree number

Code: PLNRSP
What it means: Problems reading from secondary pedigree file

Code: PNMXPS
What it means: Problems merging primary and secondary pedigree files

Code: INMXPS
What it means: Problems merging primary and secondary pedigree files

Code: FIMXPS
What it means: Problems merging primary and secondary pedigree files

Code: MIMXPS
What it means: Problems merging primary and secondary pedigree files

Code: FOMXPS
What it means: Problems merging primary and secondary pedigree files

Code: NPMXPS
What it means: Problems merging primary and secondary pedigree files

Code: SXMXPS
What it means: Problems merging primary and secondary pedigree files

Code: IIDIPP
What it means: Problems merging primary and secondary pedigree files

Code: PIDIPP
What it means: Problems merging primary and secondary pedigree files

Code: MIDIPP
What it means: Problems merging primary and secondary pedigree files

Code: FOSIPP
What it means: Problems merging primary and secondary pedigree files

Code: NPSIPP
What it means: Problems merging primary and secondary pedigree files

Code: NMSIPP
What it means: Problems merging primary and secondary pedigree files

Code: SEXIPP
What it means: Problems merging primary and secondary pedigree files

Code: PROIPP
What it means: Problems merging primary and secondary pedigree files

|*| LRP Error Messages

Message: Screen width is too small
What it means: If you are using a one-window system, there is not much

               you can do. However, if you have control over your windows,
               it may help to widen the window in which you run lrp and
               start over.

Message: Screen length is too small
What it means: Similar to previous message. Try lengthening your window and starting over.

Message: Internal Error
What it means: If there is no modifier to describe the Internal Error you have hit a bug in lrp.

Message: Internal Error - Length of 'lrp_rprt_scrn' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: You hit a bug in lrp and the authors of the program are protecting you from a core dump.

Message: Internal Error - Length of 'lrp_hlp1_scrn' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message: Internal Error - Length of 'lrp_hlp2_scrn' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message; Internal Error - Length of 'lrp_hlp3_scrn' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message: Internal Error - Length of 'lrp_help_line' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message: Internal Error - Length of 'lrp_info_line' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message: Internal Error - Length of 'lrp_cmmd_line' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message: Internal Error - Length of 'lrp_wait_line' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message: Internal Error - Length of 'lrp_vers_line' exceeded LRP_MAX_STRING_BUFFER_LENGTH
What it means: See the previous message

Message: Internal Error - Memory allocation failure What it means: You are out of memory. Look around for other processes that may be using all the memory.

Message: Internal Error - Bad field number What it means: There was a problem in the way you specified the report format

Message: Internal Error - Function FSEEK failed What it means: There was a problem modifying the report file. If your disk is on a different machine, this might be a network problem.

Message Internal Error - Function TMPNAM failed What it means: I do not know

Message: Internal Error - Function FOPEN failed What it Means: Could not open the file that you designated as the report file

               Possible reasons include improper permission for the directory
               you are working in or a disk problem.

mutl.c: Internal Error - Function LSF_REWIND failed What it Means: Could not read from the stream file that you designated.

               Maybe it doesn't exist. Maybe the permission is wrong.
               Maybe there is a disk problem.

mutl.c: Internal Error - Function LSF_STATUS_TEXT failed What it Means: While attempting to print out an error message, another error occurred. I cannot figure out why this would happen, though.

mutl.c: Internal Error - Function LSF_INFORMATION failed What it Means: While trying to figure out if the stream file was properly

               formatted, an error occurred. This is probably not an
               error with the contents of the stream file, but with
               access to it.

rful.c: Internal error - LSF_READ error detected What it Means: Problems reading the contents of your stream file.

               Although the lsf_read routine reports a diagnostic of the
               error, this diagnostic is not used in the error printing
               routine.

rloc.c: Internal error - LSF_ALLOCATE error detected What it Means: Memory allocation problem

rloc.c: Internal error - LSF_READ_SET error detected What it Means: Problems reading the contents of the stream file

ulth.c: Must specify temporary file name What it Means: You mangled the file specifications for the input or output files. Start over.

ulth.c: Must specify temporary file name What it Means: You mangled the file specifications for the input or output files. Start over.

ulth.c: Must specify report file name
What it Means: You mangled the file specifications for the input or output files. Start over.

ulth.c: Must specify stream file name
What it Means: You mangled the file specifications for the input or output files. Start over.

ulth.c: Must specify report title
What it Means: You mangled the file specifications for the input or output files. Start over.


::::::::::::::
README.unknown
::::::::::::::

From: ftp-bimas.cit.nih.gov Last mod: June 27, 1999

FASTLINK, version 2.3P and beyond

This file describes some modifications to the UNKNOWN preprocessor program introduced in FASTLINK 2.3P and beyond. We have improved the error reporting capability and fixed some bugs. A scholarly description of what UNKNOWN does can be found in unknown.ps.

One purpose of UNKNOWN is to catch violations of Mendelian rules of inheritance. Previous versions of UNKNOWN reported an error by identifying the pedigree and locus of the error. The new version tries to identify the nuclear family/families in which errors occur. When an error occurs at least one nuclear family will be identified either by a child or parent. For any pedigree-locus pair, the first nuclear family identified is guaranteed to contain an error. Subsequent nuclear families identified may or may not contain errors. If you want to see only the first nuclear family with an error, then change the constant ONE_ERROR_ONLY from 0 to 1.

I fixed a printing bug that arose when the number of liability classes was greater than 99. As a result the output of UNKNOWN will be spaced differently than before. Thanks to Margaret Gelder Ehm for reporting the bug.

Lots of other aspects of UNKNOWN are explained in the documents unknown.ps and loops.ps, including subsequent changes to UNKNOWN. Both FASTLINK 3.0P and 4.0P introduced fundamental, drastic changes to UNKNOWN that are best explained in a longer, more scholarly document. These changes make the interesting part of UNKNOWN incompatible with previous versions of FASTLINK and LINKAGE. However, extra computation is done for the sake of backwards compatibility.

Version 3.0P introduced much better inference for looped pedigrees including for the first time the ability to detect Mendelian inconsistencies in looped pedigrees. Version 4.1P improved some of the loop breaker genotype inference algorithms. Information about possible genotypes for different loop breaker vectors is kept in the file loopfile.dat. See README.loopfile for a description of the syntax.

Version 4.0P introduced the ability to improve on the user's choice of loop breakers. For looped pedigrees UNKNOWN chooses a provably optimal loop breaker set. For looped pedigrees with multiple marriages UNKNOWN and the main programs can now use one loop breaker to break multiple loops. In version 4.1P these methods were enhanced with a better algorithm for pedigrees with multiple marriages, and the ability to select loop breakers from scratch without having to rely on the makeped and LOOPS preprocessor programs. For a scholarly description of the loop breaker selection methods see paper6.ps, and paper7.ps.
For a scholarly description on what loop breakers are all about see loops.ps.
For a simple practical method to choose loop breakers see README.lselect.