WWW Interface (ZygProb) to Calculate MZ:DZ Zygosity Probability using ECLIPSE2
Important Notes:
1. | Nyholt DR (2006) On the probability of dizygotic twins being concordant for two alleles at multiple polymorphic loci. Twin Res Hum Genet 9(2):194-197 |
Please also use the following reference when reporting results from the ZygProb ECLIPSE2 web interface:
2. | Sieberts SK, Wijsman EM, Thompson EA (2002) Relationship inference from trios of individuals, in the presence of typing error. Am J Hum Genet 70(1):170-80 |
This web interface allows users to simply upload 3 input files to obtain ECLIPSE2 (latest version 1.10) likelihood results for a pair of individuals sharing the uploaded marker alleles. Likelihoods are shown for each pair of individuals being Full Siblings (full) = Dizygotic Twins (DZ), Half Siblings (half), Unrelated (unrel), Monozygotic Twins (MZ), Parent-Offspring (P-O), Grandparent-Grandchild (GP-GC), Avuncular (avunc), and First Cousins (FC). Error model 0 is used for all analyses.
Additionally, the MZ/DZ likelihood ratio is given, which represents the odds in favour of the two individuals being an MZ pair compared to a DZ pair [i.e., odds > 1 indicate the pair is more likely to be MZ].
It is of particular importance to take note of the results incorporating error rates. For example, a pair of individuals sharing both alleles at all but one marker, may be more likely to be MZ with a genotyping error, than DZ.
The ZygProb interface takes the following
3 input files:
1) an eclipse2 (v1.10) format pedigree
file ("eclipse2.pre"),
2) an eclipse2 (v1.0, v1.10) format
map file ("eclipse2.map"),
3) an eclipse2 (v1.0, v1.10) format
error file ("eclipse2.err").
The following links show example input
files for Profiler-Plus Markers in Australian Caucasians (Bagdonavicius
et al. 2002):
test.pre
test.map
err.in
To run ECLIPSE2 via ZygProb comparing
only individuals within a family:
Utilising Australian Caucasian allele frequency data presented in Bagdonavicius et al. (2002) [J Forensic Sci 47(5):1149-53] and formulae from Li 1996 [Hum Biol 68(2):167-184], I have calculated the average probability of a DZ twin pair sharing both alleles identical by state (IBS) at all markers and resulting probability of correct zygosity assignment, for the following commonly used multiplex systems:
ampFISTR COfiler (D3S1358, D16S539, TH01, TPOX, CSF1PO, D7S820)
Average Probability (DZ pair are IBS=2 at all loci) | 0.004007323 |
Odds for MZ compared to DZ | 249.54 : 1 |
Average Certainty of Twin Pair being MZ (%) | 99.59926766 |
PowerPlex 1.1 (vWA, D16S539, TH01, TPOX, CSF1PO, D5S818, D13S317, D7S820)
Average probability (DZ pair are IBS=2 at all loci) | 0.000620818 |
Odds for MZ compared to DZ | 1610.78 : 1 |
Average certainty of twin pair being MZ (%) | 99.93791820 |
ampFISTR Profiler (D3S1358, vWA, FGA, TH01, TPOX, CSF1PO, D5S818, D13S317, D7S820)
Average probability (DZ pair are IBS=2 at all loci) | 0.000200265 |
Odds for MZ compared to DZ | 4993.38 : 1 |
Average certainty of twin pair being MZ (%) | 99.97997346 |
ampFISTR Profiler Plus (D3S1358, vWA, FGA, D8S1179, D21S11, D18S51, D5S818, D13S317, D7S820)
Average probability (DZ pair are IBS=2 at all loci) | 0.000097862 |
Odds for MZ compared to DZ | 10218.50 : 1 |
Average certainty of twin pair being MZ (%) | 99.99021383 |
Note: allele frequency data was not available to complete calculations for other multiplex systems
Download a copy of my MS Excel worksheet enabling easy calculation of exact Random Match Probabilities from allele frequency data, for markers in unrelated and full sibling (DZ) pairs - as used to calculate the above probabilities.
Please use the following reference when reporting the above exact average probabilities of correct zygosity assignment:
1. | Nyholt DR (2006) On the probability of dizygotic twins being concordant for two alleles at multiple polymorphic loci. Twin Res Hum Genet 9(2):194-197 |
Approximate average probability of correct zygosity assignment:
A recent paper by Presciuttini et al [BMC Genet. 2002 Nov 20;3(1):23], showed that the probabilities (z_{i}) of sharing alleles identical by state (IBS) depend on locus heterozygosity (H), and are scarcely affected by variation of the distribution of allele frequencies. This allowed them to obtain empirical curves relating z_{i}'s to H for a series of common relationships, so that the likelihood ratio of a pair of relationships between any two individuals, given their genotypes at a locus, is a function of a single parameter, H. Application to large samples of mother-child and full-sib pairs showed that the statistical power of this method to infer the correct relationship is not much lower than the exact method.
Analysis of a large database of short tandem repeat (STR) data proved that locus heterozygosity did not vary significantly among Caucasian populations, apart from special cases, so that the likelihood ratio of the more common relationships between pairs of individuals may be obtained by looking at their tabulated z_{i} values.
The equation relating heterozygosity to full-siblings (/DZ twins) sharing both alleles IBS (Presciuttini et al. 2002) is:
P(IBS=2) = 0.7753 + 0.0358*H - 1.1771*H^{2} + 0.6181*H^{3}
Using the above formula and heterozygosities (H_{E}) from Bagdonavicius et al. (2002) produces the following table:
ampFISTR Profiler Plus Loci
STR Locus | Heterozygosity | P(IBS=2) |
D5S818 | 0.7061 | 0.431302388 |
D3S1358 | 0.7898 | 0.373834657 |
D13S317 | 0.7927 | 0.371903183 |
vWA | 0.8082 | 0.361665514 |
D7S820 | 0.8113 | 0.359636078 |
D8S1179 | 0.8152 | 0.357091866 |
D21S11 | 0.8462 | 0.337248686 |
FGA | 0.8646 | 0.325820040 |
D18S51 | 0.8763 | 0.318701523 |
Hence, multiplying together the P(IBS=2) for each locus gives:
Average probability (DZ pair are IBS=2 at all loci) | 0.000097534 |
Odds for MZ compared to DZ | 10252.89 : 1 |
Average certainty of twin pair being MZ (%) | 99.99024665 |
Clearly, the formula of Presciuttini et al. (2002) approximates the exact probabilities well.
Download a copy of my MS Excel worksheet enabling easy calculation of approximate Random Match Probabilities from heterozygosities, for markers in full sibling (DZ) pairs - as used to calculate the above probabilities.
Please use the following reference when reporting the above approximate average probabilities of correct zygosity assignment:
1. | Nyholt DR (2006) On the probability of dizygotic twins being concordant for two alleles at multiple polymorphic loci. Twin Res Hum Genet 9(2):194-197 |
Diallelic Markers:
In the special case of diallelic markers, commonly termed single nucleotide polymorphism (SNPs), it should be noted that the equation relating heterozygosity to full-siblings (/DZ twins) sharing both alleles IBS is exact, where:
P(IBS=2) = 1-H(1-3H/8)
Conclusion regarding the average probability of correct zygosity assignment:
Whether one uses the exact or approximate methods outlined above, the average probability of correct zygosity assignment is obviously proportional to the number and heterozygosity of markers.
However, one should keep in mind the possibility for genotyping errors and spontaneous mutations when determining zygosity.
For example, assuming a realistic error rate of 0.25% (P=0.0025; 1 in 400) the probability of an DZ pair being IBS=2 at 8 of the 9 Profiler Plus loci ranges from 0.00023 to 0.00030 (odds of around 1 in 3300 to 1 in 4300). Hence, it is far more likely the pair is MZ with a genotyping error/mutation than DZ.
To this end, one can either calculate the
overall probability for the observed number of loci for which the pair
of individuals are IBS=2 and compare this to an assumed error rate(s),
or use a fully parametric approach such as that implemented in ECLIPSE2.
Given a wide range of researchers may be interested in the latter approach,
I believe many will appreciate the convenience of the ZygProb Web interface.
Other Zygosity-determination-related Links:
Link to download page for ECLIPSE2 program described in Sieberts SK, Wijsman EM, Thompson EA (2002) Relationship inference from trios of individuals, in the presence of typing error. Am J Hum Genet 70(1):170-80.
Link to download page for TWIN.EXE program described in Zhao JH and Sham PC (1998) A method for calculating probability convolution using ternary numbers with application in the determination of twin zygosity. Computional Statistics and Data Analysis 28:225-232.
Link to FORENSIC SCIENCE COMMUNICATIONS article detailing "Genotype Profiles for Six Population Groups at the 13 CODIS Short Tandem Repeat Core Loci and Other PCR Based Loci".
Link to Short Tandem Repeat DNA Internet DataBase, for loads of information regarding STR systems.
Link to Law
Offices of Kim Kruglick, for loads of information regarding Forensics.
Page last updated March 15, 2007.
Special thanks to David Smyth for assisting
with the development of this web interface.
Tel: +61-7-3362 0258 | Find Us | |
Email: daleN@qimr.edu.au | Contact Us |