Top Banner
Tutorial #5 by Ma’ayan Fishelson
32

Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Jan 15, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Tutorial #5by Ma’ayan Fishelson

Page 2: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Input Format of Superlink

• There are 2 input files:– The locus file describes the loci being analyzed and

parameters for the different analyzing programs. – The pedigree file describes the pedigrees being analyzed.

• Locus File:The first 3 lines describe some general parameters of the analysis being performed. Following are lines that describe each locus.The end of the file provides recombination information and program-specific information.

• Pedigree File:Each line in this file describes an individual in one of the pedigrees.

Page 3: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

3 0 0 5 00 0.0 0.0 0 1 2 3 1 20.406000 0.5940003 0.650000 0.322000 0.4870000.635000 0.349000 0.2030000.903000 0.473000 0.945000 3 40.132000 0.048000 0.299000 0.5210003 50.138000 0.175000 0.055000 0.272000 0.3600000 00.100000 0.1000001 0.10000 0.30000

# of loci

Chromosome

order ofthe loci

Affection Status locus

(disease locus)

Numbered Alleles locus

(marker)

Recombinationvalues

Program-specific Parameters.

Program code

# of diseaseloci (0 or 2)

Number of alleles for1st locus

Gene frequenciesfor 2nd locus

Number of penetrance classes forAffection

Status locusPenetrance

s

Locus File

Page 4: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Pedigree File

Pedigree

Number

Individual’s ID

Father’sID

Mother’sID

First child’s ID

Next paternal

sibling’s ID

Next maternal

sibling’s ID

Sex: 1=male 2=female

Disease Status:0=unknown1=unaffected2=affected

Penetrance

Class

Marker Alleles(2 alleles per locus)

1 1 0 0 3 0 0 1 0 2 2 3 2 1 1

1 2 0 0 3 0 0 2 0 2 2 1 4 2 2

1 3 1 2 6 4 4 2 0 0 2 4 2 2 1

1 4 1 2 0 0 0 1 0 0 3 1 3 2 1

1 5 0 0 6 0 0 1 0 0 3 4 4 5 4

1 6 5 3 0 7 7 2 0 0 3 2 4 1 4

1 7 5 3 0 8 8 1 0 0 3 4 4 2 4

1 8 5 3 0 0 0 2 0 2 1 4 4 1 5

00

0

0

000

0

1st marker

2nd marker

Page 6: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Possible Input Errors

• Incompatibility between the 2 input files (in the number of loci, in the order of specification of the loci,…)

• Errors in Locus File: probabilities don’t sum to 1, impossible values for recombination fractions or other probabilities, incompatibility between number of loci and number of loci descriptions…

• Errors in Pedigree file: no correspondence between child and parent, pointer problems, genotyping errors…

Page 7: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Genotyping Errors

Can be divided into 2 types:

1. Errors that can be detected when observing one marker.

2. Errors that can be detected only when observing several adjacent markers.

Page 8: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

PedCheck (Jeffrey O’connell and Daniel Weeks)

• A Program for identification of genotype incompatibilities in Linkage Analysis.

• Genotype incompatibilities are detected in 4 stages:

1. Level 1: performs checks on the nuclear family level. 2. Level 2: Uses the Lange-Goradia algorithm to perform

genotype elimination.3. Level 3: Determines “critical genotypes”.4. Level 4: Determines alternative typing for the critical

genotypes, and finds the most likely person to be mistyped.

Page 9: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Example 1a – Level 1 errors

1 2

43

4/3

6 5

5/12/14/9

7

4/3List the errors.

Assume there are 6 alleles at this marker..

Page 10: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Example 1b – Level 1 errors

1 2

43

4/3

5

4/12/2

1 2

43

4/4

5

2/12/2

List the errors here.

Page 11: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Level 1 Errors

• Incompatibility between a child and a parent’s alleles.

• A person is half-typed.

• More than 4 alleles in a sibship.

• More than 3 alleles in a sibship when there is a homozygous child.

• More than 2 alleles in a sibship when there are 2 different homozygous children.

• The allele is out of bounds.

Page 12: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Level 2 Errors• Performs genotype elimination via an

extended version of the Lange-Goradia algorithm for set-recoded genotypes.

• This algorithm recursively uses the nuclear-family relationships to eliminate invalid genotypes in the pedigree. Continues until no more genotypes can be eliminated.

• For each pedigree and locus: identifies the first nuclear family with an error that hasn’t been detected in level 1, and outputs the inferred genotype lists.

Page 13: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Example 2 – Level 2 errors

1 2

34

3/1

3/1

53/2

4/3

Page 14: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Genotype Elimination Algorithm

A. For each pedigree member, save only ordered genotypes compatible with his/her phenotype.

B. For each nuclear family:1. Consider each mother-father genotype pair:

a. Determine which zygotes can arise from this pair.b. If each child in the nuclear family has one or more of these zygote

genotypes among his or her current genotype list, then save the parental genotypes and any child genotype matching one of the created zygote genotypes.

c. If any child has none of these zygote genotypes among his/her genotype list, then don’t save any genotypes.

2. For each person in the nuclear family, exclude any genotypes not saved during step (1).

C. Repeat part (B) until no more genotypes can be excluded.

A. For each pedigree member, save only ordered genotypes compatible with his/her phenotype.

B. For each nuclear family:1. Consider each mother-father genotype pair:

a. Determine which zygotes can arise from this pair.b. If each child in the nuclear family has one or more of these zygote

genotypes among his or her current genotype list, then save the parental genotypes and any child genotype matching one of the created zygote genotypes.

c. If any child has none of these zygote genotypes among his/her genotype list, then don’t save any genotypes.

2. For each person in the nuclear family, exclude any genotypes not saved during step (1).

C. Repeat part (B) until no more genotypes can be excluded.

Page 15: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Genotype Elimination Example

1

3

2

4

5

O

O

A

Page 16: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Complete Genotype-Elimination Algorithm

• A genotype elimination algorithm is complete if it can detect that the set of given genotypes violates Mendelian laws of inheritance.

• If a complete genotype elimination algorithm finds no errors the genotypes are consistent with Menelian laws of inheritance.

Page 17: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Genotype Elimination -Another Example..

6

2/2

21

3 4

75

1/2 2/3

Is the presented genotype elimination algorithm complete ?

Page 18: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Additional Problems..• The inferred genotype lists don’t always

permit easy identification of the source of the problem:

– The genotype lists may be long.

– More than one individual may be the error source.

– The error may not be in the nuclear family reported.

Page 19: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Critical Genotypes• Genotypes of an individual that eliminate the

pedigree inconsistency when removed from the data (i.e., treated as unknown).

• Note: a critical genotype isn’t necessarily erroneous.

• Degree n critical genotypes: an n-tuple of genotypes of typed individuals that when treated as unknown simultaneously, the inconsistency is eliminated.

• The set of erroneous genotypes is a subset of the critical genotypes.

Page 20: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Critical-Genotype Algorithm (Level 3)

• Attempts to identify the critical genotypes, if any, in the pedigree.

• “Untypes” one typed individual at a time, and applies the genotype-elimination algorithm to determine if the inconsistency has been eliminated.

• There may be one or more critical genotypes or there may be none. If there are none, higher-degree critical genotypes can be investigated at a higher cost.

• If only one critical genotype is found this genotype represents the error.

Page 21: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Example 3 – Level 3 errors

1/2

3

1/1 2/2

21

4 5

2/2

Page 22: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Dilemma…

Several critical genotypes have been identified at a locus

There’s no way of deciding a priori whichone is most likely to be erroneous..

Page 23: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Odds-Ratio Algorithm (Level 4)

Algorithm Outline:

1. For each individual with a critical genotype, identify valid typings that eliminate the inconsistency.

2. Compute the likelihood L of the pedigree data for each alternative typing at each critical genotype, holding all other critical genotypes at their original value.

3. Let Lmax be the largest likelihood obtained. For each alternative genotype compute the odds ratio Lmax/L.

4. Return each alternative typing together with its odds ratio.

Helps distinguish between alternative critical genotypes.

Based on single-locus likelihoods of the pedigree.

Page 24: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Example 3 – Level 4

1/2

3

1/1 2/2

21

4 5

2/2

Only one consistent alternative typing: 1/2

Two consistent alternative

typings: 1/2 & 2/2

Page 25: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Odds-Ratio Algorithm (allele frequencies)

There are 3 variations:

1. User-defined allele frequencies.

2. Assume all alleles are equally frequent.

3. Estimate allele-frequencies from typed individuals (leads to a bigger spread in odds ratio).

Page 26: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

2nd Type of Genotyping Errors

• The pedigree data indicates a certain recombination event in an interval where Ө=0.

• The pedigree data indicates more (or less) recombination events than expected according to the specified recombination fractions.

Page 27: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Error Detection in Merlin

• Calculate L(G| Ө) and L(G| Ө=0.5).• For each genotype g:

– Mark it as unknown.– Calculate L(G\g| Ө) and L(G\g| Ө=0.5) .– Compute the ratio rlinked = L(G\g| Ө) / L(G| Ө).

– Compute the ratio runlinked = L(G\g| Ө=0.5) / L(G| Ө=0.5).

– Compute the statistic r = rlinked / runlinked.– Genotypes that cause inconsistency with

neighboring markers result in large values of r.

• Calculate L(G| Ө) and L(G| Ө=0.5).• For each genotype g:

– Mark it as unknown.– Calculate L(G\g| Ө) and L(G\g| Ө=0.5) .– Compute the ratio rlinked = L(G\g| Ө) / L(G| Ө).

– Compute the ratio runlinked = L(G\g| Ө=0.5) / L(G| Ө=0.5).

– Compute the statistic r = rlinked / runlinked.– Genotypes that cause inconsistency with

neighboring markers result in large values of r.

Page 28: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Genotype Elimination in Superlink

• Superlink’s algorithm is composed of 2 types of algorithms:

– Downward traversal algorithm in which the children are updated according to the parents.

– Upward traversal algorithm in which the parents are updated according to the children.

• Genotypes are stores as 2 lists of alleles:

Possible paternal alleles. Possible maternal alleles.

• Genotypes are stores as 2 lists of alleles:

Possible paternal alleles. Possible maternal alleles.

Page 29: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Downward Traversal Algorithm

• Traverses the pedigree in such a manner that a child is updated by his parent only after the parent has been updated.

• The update is performed as follows:

– If nothing is known about the child’s genotype, add all the possible alleles of the parent to the child’s relevant allele.

– Else, check for each possible allele of the child if it is possible according to the parent.

Page 30: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Example: Downward Update

1 2

3

1 | 2

The child 3 can only receive alleles 1 or 2 from his father

(2).

Page 31: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Upward Traversal Algorithm

• Traverses the pedigree in such a manner that a parent is updated by his child only after the child has been updated.

• The update is performed as follows:– All the alleles that a child got from the parent for

certain are marked.– If two alleles have been marked as certain, the rest of

the alleles are erased (the genotype has been determined).

– Sometimes the genotype is determined including phase.

Page 32: Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Example: Upward Update

The father (1) must have transmitted

alleles 3 & 4 to the children.

3

1 | 121

4 1 | 31 | 4

The mother (2) could only transmitted allele 1 to the children (3 &

4).

3 | 4