Articles
Common and low-frequency variants associated with genome-wide recombination rate
npg
© 2014 Nature America, Inc. All rights reserved.
Augustine Kong1,2, Gudmar Thorleifsson1, Michael L Frigge1, Gisli Masson1, Daniel F Gudbjartsson1,2, Rasmus Villemoes1, Erna Magnusdottir3, Stefania B Olafsdottir1, Unnur Thorsteinsdottir1,3 & Kari Stefansson1,3 Meiotic recombination contributes to genetic diversity by yielding new combinations of alleles. Individuals vary with respect to the genome-wide recombination counts in their gametes. Exploiting data resources in Iceland, we compiled a data set consisting of 35,927 distinct parents and 71,929 parent-offspring pairs. Within this data set, we called over 2.2 million recombination events and imputed variants with sequence-level resolution from 2,261 whole genome–sequenced individuals into the parents to search for variants influencing recombination rate. We identified 13 variants in 8 regions that are associated with genome-wide recombination rate, 8 of which were previously unknown. Three of these variants associate with male recombination rate only, seven variants associate with female recombination rate only and three variants affect both. Two are low-frequency variants with large effects, one of which is estimated to increase the male and female genetic maps by 111 and 416 cM, respectively. This variant, located in an intron, would not be found by exome sequencing. Some features of meiotic recombination are known to differ between individuals. One is the location of the crossover events1,2 and another is the number of events across the genome3,4. The latter is the main focus here. Three common sequence variants (minor allele frequency (MAF) >10%) in two regions have been conclusively shown to associate with genome-wide recombination rates. Two of these variants are SNPs, rs3796619 and rs1670533, both of which reside in a region harboring RNF212 (refs. 5–7), that associate with male and female recombination rates, respectively. The third variant is an inversion on chromosome 17 that associates with recombination rate in females8. In addition, it has been reported that variants in the region harboring PRDM9 also influence genome-wide recombination rate1,9. One of our aims was to find other variants that influence genome-wide recombination rate, including low-frequency variants with large effects. RESULTS Study overview We previously demonstrated that by using Icelandic genealogy, a large number of Icelandic samples genotyped with SNP arrays and a method of long-range phasing10,11, recombination events could be reliably called, requiring only genotype data on parentoffspring pairs12. Here we applied the same approach (Online Methods) to a much larger sample size. The current data set consists of 35,927 distinct Icelandic parents (15,253 fathers and 20,674 mothers) (Supplementary Tables 1–3), including a total of 71,929 meioses (30,184 father-child pairs and 41,745 mother-child pairs) and 2,264,323 recombination events (580,970 from father-child pairs and 1,683,353 from mother-child pairs; Supplementary Data Set).
This study includes over ten times the number of distinct parents and seven times the number of informative meioses of the largest previous genome scan of variants influencing genome-wide recombination rate5. Furthermore, we expanded the genetic information available by imputing sequence variants into parents using data from 2,261 whole genome–sequenced Icelanders (>10× coverage) 13. We examined these variants for associations with recombination characteristics, enhancing the resolution of the association signals, and allowed for the discovery of low-frequency variants with large effects. Extraneous factors can affect the number of apparent recombination events (Online Methods). To limit noise, every parent in the parent-offspring pairs analyzed has four grandparents listed in the Icelandic genealogy. Moreover, we adjusted raw recombination counts by the number of genotyped grandparents in the parent-offspring pair (Supplementary Table 2). The r2 value between the raw and adjusted counts was 0.997 for both father-offspring pairs and motheroffspring pairs. This adjustment is not important in identifying variants that influence recombination rate but has a meaningful impact on variance decompositions. We treated the unique parents as probands in the search for sequence variants that influence recombination characteristics. The number of informative children for these probands ranged from 1 to 14 (Supplementary Table 1). The phenotype assigned to a proband was the number of recombination events across the genome averaged over the children. To preserve symmetry between males and females, we used only recombination events in the autosomal genome as the pheno type for the genome-wide scan. However, we included sequence variants on the X chromosome in the association tests. We subsequently
1deCODE 3Faculty
Genetics/Amgen, Inc., Reykjavik, Iceland. 2Faculty of Physical Sciences, School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland. of Medicine, University of Iceland, Reykjavik, Iceland. Correspondence should be addressed to A.K. (kong@decode.is) or K.S. (kari.stefansson@decode.is).
Received 18 March; accepted 31 October; published online 24 November 2013; doi:10.1038/ng.2833
Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014
11
Articles Table 1 Sequence variants associated with genome-wide recombination rate Variants evaluated jointlya
Variants evaluated individually Male Chr.
Positionb
4 4 4 4 14 14 14 20 1 17 17 5 5
1,080,625 1,074,723 1,048,276 789,924 19,854,558 59,973,510 90,994,780 1,158,647 76,118,411 41,130,280 40,121,588 23,568,400 23,579,164
Gene RNF212 RNF212
Variant
Annotation Alleles Frequency (%) Effectc
rs4045481 p.Thr58Thr rs658846 Intronic rs12233733 Intergenic CPLX1 rs74434767 Intronic CCNB1IP1 rs1132644 5′ UTRd C14orf39 rs1254319 p.Leu524Phe SMEK1 rs10135595 3′ UTR RAD21L rs450739 p.Cys90Arg MSH4 rs5745459 p.Tyr589Cys Inversion rs56162163 CCDC43 rs75502650 Intronic PRDM9 rs6889665 Upstream PRDM9 rs150798754 Intergenic
A/G G/C C/T C/T T/G A/G G/T T/C G/A T/A A/G T/C A/G
66.8 22.1 79.4 0.66 51.5 30.5 59.8 48.9 1.6 18.0 9.7 96.8 98.7
63 −57 14 124 16 −5 1 48 −2 −4 −5 51 81
Female P
1.2 × 5.7 × 4.5 × 2.4 × 6.5 × 0.29 0.84 8.2 × 0.89 0.39 0.44 6.8 × 3.6 ×
Effectc
10−48 −57 10−32 105 10−3 75 10−6 386 10−5 57 72 73 10−33 27 157 60 68 10−6 −2 10−6 −29
Male
P 3.3 × 3.1 × 1.3 × 1.0 × 1.1 × 9.4 × 2.3 × 3.1 × 1.7 × 1.5 × 9.1 × 0.92 0.31
Effectc
10−16 64 10−41 – 10−20 – 10−20 111 10−18 17 10−25 – 10−28 – 10−5 49 10−9 – 10−12 – 10−10 – 29 54
Female P
Effectc
P
9.1 × 10−50 – – – 95 6.3 × 10−33 – 47 1.0 × 10−8 1.6 × 10−5 416 9.9 × 10−25 3.3 × 10−5 57 6.4 × 10−19 – 72 9.3 × 10−26 – 75 5.0 × 10−31 2.2 × 10−34 29 5.6 × 10−6 – 155 1.1 × 10−9 – 63 4.0 × 10−14 – 76 2.2 × 10−12 0.049 – – 0.016 – –
aBetween
npg
© 2014 Nature America, Inc. All rights reserved.
the results from simple and multiple regressions, the biggest differences are for the variants on the telomeric region of chromosome 4, the two PRDM9 variants and, to a lesser extent, the two chromosome 17 variants because of linkage disequilibrium between them. For the other five variants, even though the estimated effects change very little, the P values consistently become smaller, which can be attributed to noise reduction that increases signal-to-noise ratios. bPositions of the variants given correspond to NCBI build 36. cEffects are shown as recombination rate changes in cM and are the estimated effect of the first (listed on the left in the “Allele” column) allele given relative to the second (right) allele. dFor most transcripts, this variant is located in the 5′ UTR, but for one transcript, ENST00000556563, this is a missense (p.Pro20Thr) variant. Chr., chromosome.
examined recombination events on the X chromosome for females. After excluding 5-Mb regions at the ends of chromosomes relative to SNP coverage, where the determination of recombination events is less reliable, the analyses focused on 2,547 Mb of the autosomal genome (Supplementary Table 3). The genetic lengths of the studied regions were 1,925 and 3,868 cM for males and females, respectively. We performed association tests separately for fathers and mothers and used weighted regression, with the weight set as the number of informative offspring, to evaluate the significance of the associations. We used an additive model for the two alleles at a locus and applied genomic control14 to adjust for relatedness among probands. Many variants in the same region often show strong association of similar significance, usually because of linkage disequilibrium. We considered a variant to be a member of the equivalent set that included the most significant variant if neither variant showed significant association (P < 0.05) after adjustment for the other in multiple regression. For presentation, we selected one variant to represent an equivalent set (Supplementary Table 4). Selection within an equivalent set was based primarily on sequence context and annotations, where we selected the variant that we judged as the most likely to be functional. We tested about 30.3 million variants with MAF > 0.1% and used a conservative threshold for genome-wide significance of P < 1.65 × 10−9 (0.05/30,300,000) to select variants for further investigation. Quantile-quantile plots of the scans and Manhattan plots of a few highlighted regions are shown in Supplementary Figures 1 and 2. After detailed analysis, we found that 13 different variants in 8 regions associated with genome-wide recombination rate (Table 1). Three variants associated with male recombination only, seven variants associated with female recombination only and three variants affected both. At least 8 of the 13 variants, including 2 low-frequency variants (MAF = 0.66% and 1.6%), have not been reported previously. Association results generated when we evaluated variants individually using simple (weighted) regression are shown in Table 1. In addition, for males and females separately, we show results from multiple regression that included all variants that were considered significant when evaluated jointly. The telomeric region of chromosome 4 In the region harboring RNF212 (Fig. 1), for fathers, rs3796619, a previously discovered intronic SNP5, remained a member of the 12
equivalent set that showed the strongest association with recombination rate. However, to represent the equivalent set, we chose rs4045481, a highly correlated (r2 > 0.99) synonymous singlenucleotide variant (SNV) (p.Thr58Thr)6. The A allele of rs4045481 (frequency, 66.8%) is estimated to increase male recombination rate by 63 cM relative to the G allele (P = 1.2 × 10−48). For association with recombination rates of mothers, the previously reported variant rs1670533 remained highly significant (P = 1.4 × 10−40), but it was surpassed by the strongly correlated (r2 = 0.96) SNP rs658846 (P = 3.1 × 10−41). The G allele of rs658846 (frequency, 22.1%) is estimated to increase female recombination rate by 105 cM relative to the C allele. In addition, we discovered a separate association with rs12233733, which is located less than 27 kb from rs658846. This SNP is in an intergenic region between RNF212 and FGFRL1 (Fig. 1), with the C allele associating with increased female recombination rate (P = 1.3 × 10−20 when evaluated individually; Table 1). rs4045481, rs658846 and rs12233733 are correlated with each other (Table 2), and a proper interpretation of their contributions requires a joint analysis. rs4045481[A] has a substantial negative correlation with rs658846[G] (r = −0.76). Consequently, when evaluated individually, rs4045481[A] is associated with reduced female recombination rate (P = 3.3 × 10−16). After adjusting for rs658846, rs4045481[A] becomes positively associated with female recombination rate (P = 8.6 × 10−4). However, this effect is apparently due to rs4045481[A] being positively correlated with rs12233733[C] (r = 0.16). After adjusting for both rs658846 and rs12233733, rs4045481[A] no longer associates with female recombination rate (P > 0.05). Similarly, whereas rs658846 and rs12233733 are significantly associated with male recombination rate when evaluated individually, the associations are nonsignificant (P > 0.05) after adjusting for rs4045481. Thus, for the three common variants in or close to RNF212, rs4045481 alone can account for the association with recombination rate in males, and rs658846 and rs12233733 can jointly account for the association with recombination rate in females. However, the fact that rs4045481[A] and rs658846[G] are negatively correlated implies that a change of haplotype frequencies in a population (because of selection or drift) that leads to an increased recombination rate in one sex would be partially balanced by a reduced recombination rate in the other sex. VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics
Articles rs1132644, located in the 5′ UTR of CCNB1IP1, increases female recombination SPON1 CPLX1 SLC26A1 MYL5 with high significance (effect of 57 cM relaFGFRL1 TMED11P MFSD7 LOC100129917 GAK tive to the G allele, P = 6.4 × 10−19) and has PCGF3 RNF212 TMEM175 a weaker effect in males (effect of 17 cM, P = rs658846 3.3 × 10−5). CCNB1IP1 (cyclin B1–interacting rs4045481 rs12233733 rs74434767 protein 1) is essential for successful meiotic crossing over in mice22 and is structurally 600,000 700,000 800,000 900,000 1,000,000 1,100,000 1,200,000 and functionally related to the S. cerevisiae Chromosome 4 (bp) Zip3 ZMM protein23. The second variFigure 1 Telomeric region of chromosome 4 harboring four separate variants influencing ant is rs1254319, a nonsynonymous SNV genome-wide recombination rate. (p.Leu524Phe) in C14orf39, which is of unknown function. Allele A is associated We also discovered association with a rare variant, rs74434767, that with higher female recombination (effect of 72 cM, P = 9.3 × 10−26) is located approximately 300 kb telomeric of rs658846 (Fig. 1). When but does not show an effect in males. A third variant, rs10135595, is evaluated individually, relative to allele T, the minor allele C (0.66%) located in the 3′ UTR of SMEK1 (encoding suppressor of MEK null 1). is estimated to increase the female recombination rate by 386 cM The G allele is associated with higher recombination rate in females (P = 1.0 × 10−20) and to increase the male recombination rate by (effect of 75 cM, P = 5.0 × 10−31) but has no effect in males. SMEK1 124 cM (P = 2.4 × 10−6). rs74434767 is weakly correlated with the is a member of the PP2A subfamily that has a known role in DNA three common variants around RNF212 (Table 2). From the multiple repair24,25. It is possible that SMEK1 affects meiotic recombination regression performed for female recombination rate when a set of through that process. ten variants were evaluated jointly (Table 1), the estimated effect of On chromosome 20, we discovered new associations between rs74434767[C] is 416 cM (P = 9.9 × 10−25). That is more than fourfold rs450739, a nonsynonymous SNV (p.Cys90Arg) in RAD21L, and the estimated effect of rs658846 (95 cM, P = 6.3 × 10−33) and is over recombination rates. The T allele had a highly significant effect 10% of the average female genome-wide recombination rate. Four in males (effect of 49 cM, P = 2.2 × 10−34) and a weaker effect in other variants are highly correlated with rs74434767, but this SNP females (effect of 29 cM, P = 5.6 × 10−6). RAD21L is a meiosisbelongs to its own equivalent set when the male and female results specific member of the α-kleisin protein family in mammals that is are combined (Online Methods). Notably, on the basis of data from expressed in both spermatocytes and oocytes26–28. The p.Cys90Arg the 1000 Genomes Project, rs74434767[C] has not been seen in indi- alteration is located in its conserved N-terminal region. The three viduals of European ancestry but is observed with low frequencies kleisins (RAD21L, RAD21 and REC8) are subunits of protein cohesin in Africans (1%) and Asians (0.35%). rs74434767 is located in the complexes that join together the sister chromatids in the replication second intron of CPLX1 (encoding complexin 1), which is known for process and have a role in the formation of the meiotic chromosome its role in modulating neurotransmitter release but does not have a axis, which is essential for interhomolog synapsis29. Rad21l-deficient known function in the recombination process15. However, there are mice are defective in full synapsis of homologous chromosomes at other genes located nearby, such as PCGF3 and GAK (Fig. 1), as well meiotic prophase I, which leads to total azoospermia and infertility in as RNF212, which is slightly further away, that take part in cellular males, whereas Rad21l-deficient females are fertile but develop ageprocesses that can potentially affect recombination. PCGF3 (encod- dependent sterility30. Sequence variants in REC8 have been reported ing polycomb group ring finger 3) is a part of the polycomb group of to influence male recombination in cattle31. complexes that are involved in cell-cycle checkpoints and DNA repair Another newly discovered association is with rs5745459, a nonpathways16. GAK (encoding cyclin-G–associated kinase) encodes a synonymous SNV (p.Tyr589Cys) in MSH4 on chromosome 1. The serine/threonine kinase that forms an active complex with cyclin-G, G allele, with a frequency of 1.6%, is estimated to increase female recomwhich was recently shown to be involved in meiotic recombination bination rate by 155 cM (P = 1.1 × 10−9), an effect that ranks second repair in Drosophila17. RNF212 encodes a protein with homology to that of rs74434767. We observed no effect in males. MSH4 (mutS to Zip3 in Saccharomyces cerevisiae and to ZHP-3 in Caenorhabditis protein homolog 4) is one of the ZMM proteins, and the p.Tyr589Cys elegans, both of which belong to the class of ZMM proteins that are alteration is located within its MutS domain III. ZMM proteins are essential for meiotic crossing over and for coordinating recombina- known to have a role in determining crossover formation in humans18; tion and assembly of the synaptonemal complex18,19. In mice, Rnf212 the discovery that variants in MSH4, RNF212 and CCNB1IP1 assois a dosage-sensitive regulator of crossing over during meiosis20. ciate with recombination rate further illustrates their importance. Further investigations will be required to determine through which In S. cerevisiae, MSH4 is required for meiotic recombination, and gene the large effect of rs74434767 is mediated and how this region msh4-null mutants have substantially decreased crossover interferthat is less than 400 kb in length (0.7–1.1 Mb on chromosome 4) ence32,33. In mice, disruption of Msh4 results in male and female has such a major role in the diversity of recombination rate for both infertility because of meiotic failure34. males and females. Inspection of the ENCODE21 data around the noncoding variants rs74434767 and rs658846, as well around as the synonymous variant rs4045481, did not provide much additional Table 2 Pairwise correlation of four variants in the telomeric region insight (Supplementary Note), which could be because the tissue of chromosome 4 rs4045481[A] rs658846[G] rs12233733[C] rs74434767[C] types examined were not ideal. PDE6B
npg
© 2014 Nature America, Inc. All rights reserved.
ATP5I
DGKQ
IDUA
Six new variants and the inversion on chromosome 17 We discovered new associations between recombination rates and common variants in three genes on chromosome 14. Allele T of Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014
rs4045481[A] rs658846[G] rs12233733[C] rs74434767[C]
1
−0.76 1
0.16 0.27 1
0.05 −0.04 0.03 1
13
Articles
0 5
Standardized effect rs450739
© 2014 Nature America, Inc. All rights reserved.
10
15
0 5
rs658846
1.0 0 15
1.0
20
2.0
10
2.0
rs1132644
1.0
20
10
15
2.0 1.0 0 5
10
15
2.0 1.0 0
20 X rs1254319
2.0
5
npg
Female Joint linear predictor
Standardized effect Joint linear predictor
Male
Standardized effect rs4045481
Figure 2 Chromosome-specific effects. Shown are estimates of standardized effects. The calculation of the corresponding 95% confidence intervals is described in the Online Methods. Depending on whether the standardized effect is above or below 1, the effect on a chromosome is either above or below average, with the genetic length of the chromosome taken into account. A negative effect would mean that the effect on the chromosome has an opposite sign compared to the effect of the predictor or variant on the rest of the genome.
20 X
5
10
15
20 X
5
10
15
20 X
2.0 1.0 0
14
rs10135595
rs74434767
When a 900-kb common inversion at 2.0 2.0 2.0 17q21.31 was discovered8, the minor orien1.0 1.0 1.0 tation H2 was reported to increase recom0 0 0 bination rate in females. In our current data, among the many SNPs in the region 5 10 15 20 5 10 15 20 X 5 10 15 20 X that were strongly correlated with H2 Chromosome Chromosome Chromosome (r > 0.95), the T allele of rs56162163 (frequency, 18.0%) showed the strongest association with female recombination rate, increasing it by 63 cM relative males and was substantially smaller (51 cM). In the PRDM9 region, to the A allele (P = 4.0 × 10−14). We observed no effect in males. The rs150798754 (which is nearly perfectly correlated with an intronic genomic structure in this region is complex35,36, and we cannot deter- SNP, rs147347993) had the strongest association with female genomemine whether it is H2 itself or a strongly correlated variant or subtype wide recombination rate (P = 3.6 × 10−6; Table 1). rs6889665 and that has a direct impact on recombination. Notably, a set of SNPs rs150798754 were correlated (r2 = 0.41), but both remained nominally represented by rs75502650, about 1 Mb centromeric to rs56162163 significant after adjusting for each other (Table 1). We further noted and at least 500 kb from the known centromeric boundary of the that rs150798754 was associated with hot spots for both males and inversion (Supplementary Fig. 2), appears as an independent asso- females with high significance when evaluated individually, but after ciation signal. The minor A allele (9.7%) of rs75502650, located in adjusting for rs6889665, this SNP was not significant in males and was an intron of CCDC43, was estimated to increase recombination rate barely significant in females (Supplementary Table 5). These results by 76 cM (P = 2.2 × 10−12), but, as with rs56162163, the effect was highlight the variant diversity in the PRDM9 region and the complexlimited to females. rs75502650[A] and rs56162163[T] were negatively ity of their impact on various aspects of recombination characteristics. correlated (r = −0.052), and consequently, the estimated effect of both By comparison, the H2 inversion was also associated with a higher fraction of recombination events occurring in hot spots (P = 3.1 × was stronger in the joint analysis (Table 1). 10−14), but as with recombination counts, the effect was restricted to females. Notably, the neighboring variant rs75502650 also showed The PRDM9 polymorphism and hot spots Locations of crossovers constitute a phenotype that is distinct from the association in females (P = 4.6 × 10−6), but, distinct from H2 and number of recombination events. Previously, for males and females the PRDM9 polymorphisms, the allele that associated with higher separately, we partitioned the genome into 10-kb bins and classified genome-wide recombination count was associated with a lower fracthose with estimated recombination rates greater than ten times the tion of recombination counts in hot spots. Examination of the other genome average as hot spots12. We then studied the fraction of recom- variants displayed in Table 1 showed that they had little to no effect bination events occurring in hot-spot bins as a quantitative hot-spot on hot spots (Supplementary Table 6). For those that displayed nomiphenotype. PRDM9, which is known to have an important role in hot nally significant associations (males, P = 0.0027 for rs450739; females, spots in Icelanders, has zinc fingers that vary in number from 12 to P = 0.0064 and 0.0052 for rs1254319 and rs5745459, respectively), 15. When reduced to two composite alleles (12 or 13 repeats and 14 their behaviors were similar to that of rs75502650. or 15 repeats), the 12–13 composite allele was shown to be positively correlated with the hot-spot phenotype. Applying the same approach Associations with other variants to all the variants in the PRDM9 region found in our current data, Variants with P < 1 × 10−7 that are not located in already-described we found the strongest associations (P < 1 × 10−300 for males and regions (four for males and two for females) are shown in females individually; Supplementary Table 5) for members of a Supplementary Table 7. One is a low-frequency insertion or delegroup of very strongly correlated (r2 > 0.99) variants that includes tion (indel) (0.57%) in an intron of GRM8 that was associated with rs6889665, which was reported previously to have the strongest male recombination rate (P = 9.2 × 10−10). We considered the results association with hot-spot activity9. However, after adjusting for strongly suggestive but not definitive because the role of GRM8 in rs6889665, a low-frequency intergenic SNP (MAF = 0.4%) at posi- recombination is not obvious, and we saw no association between tion 23,522,725 and the 12–13 composite allele remained highly signi the variant and female recombination rate. Another notable variant ficant (Supplementary Table 5). For genome-wide recombination (P = 6.9 × 10−8) for male recombination rate is a common missense rate, a weakly significant association (P = 0.04) for rs6889665 was alteration (p.Glu50Gln) in CTCFL, which encodes a zinc-finger propreviously reported with a large estimated effect of 130 cM9. When tein that has a role in spermatogenesis37,38. Excluding variants in the evaluated individually, we observed a stronger association (P = 6.8 × regions around RNF212, the chromosome 17 inversion and PRDM9, 10−6; Table 1) with rs6889665, but the estimated effect was limited to we examined 43 variants previously reported in two studies 6,7 VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics
Articles
npg
© 2014 Nature America, Inc. All rights reserved.
as having suggestive associations (21 for males and 22 for females). One gave a P value of 0.011, whereas all the others had P values >0.05 (Supplementary Table 8). Formulas for calculating the power of our study in detecting effects of various sizes are given in the Supplementary Note. Effects on individual chromosomes For the variants found, we examined chromosome-specific effects, which for females include those for the X chromosome (Online Methods and Supplementary Table 9). A subset of these results is displayed in Figure 2. For males, results are given for the linear predictor incorporating the six variants shown to be associated with genomewide recombination rate, with coefficients obtained from the multipleregression fit. We also show the individual results for rs4045481 and rs450739, which are the two variants with the strongest associations. For females, we show results for the linear predictor incorporating ten variants and the individual results for the five strongest variants. Displayed are standardized effects, calculated as (effectC/effectT)/ (LC/LT), where effectC and effectT denote, respectively, the effect of the variant on an individual chromosome and the total effect, and LC and LT denote, respectively, the genetic length of a chromosome and the total length for the regions examined. Thus, a uniform multiplicative effect over the genome would correspond to all standardized effects being 1. For males, apart from chromosome 16, the estimated effects for the linear predictor were all significantly positive, i.e., their 95% confidence intervals did not include 0. Although this indicates that the variants indeed influence recombination rate across the genome, the effect is not completely uniform. Specifically, the standardized effects for chromosomes 10, 12 and 22 were significantly higher than 1, whereas, for chromosomes 13, 14, 16 and 17, they were significantly below 1. There were also significant differences between rs4045481 and rs450739, most strikingly for chromosome 19, where the estimated standardized effect with rs4045481 was negative, and the 95% confidence interval barely included 0 but was significantly above 1 for rs450739. For females, the estimated standardized effects for the linear predictor were above 0 for all 23 chromosomes, and significantly so, except for chromosome 21. However, the standardized effects varied, being significantly above 1 for chromosomes 1 and 2 and significantly below 1 for chromosomes 18, 20 and 21. Whereas these effects are not striking, there did appear to be differences between the effects of the variants. For example, the standardized effect for chromosome X was significantly above 1 for rs1254319 but was significantly below 1 for rs10135595. Variance decomposition and variance explained We decomposed the total variance of the number of recombination events observed in an offspring into three components: the parental effect, the gamete effect and the random effect (Online Methods). The gamete effect measures the systematic difference that exists between siblings: for example, the sibling who has a higher recombination count for one chromosome than another sibling tends to also have a higher recombination count for another chromosome. For maternal recombination events, we estimated the mother effect, the gamete effect and the random effect to account for 12.72%, 44.34% and 42.94% of the total variation, respectively (Online Methods). For paternal recombination events, the corresponding variance decomposition gave 8.70%, 16.05% and 75.25% of the total variation, respectively. The ten variants that associated with female recombination rate accounted for 3.15% of the total variation in maternal recombination count, or 3.15/12.72 = 24.8% of the mother component. The six variants found to associate with male recombination rate accounted Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014
for 2.52% of the total variation in paternal recombination count, or 2.52/8.70 = 29.0% of the paternal component. Although it is probably not genetic in nature, the large gamete effect is striking, particularly for females. Although the underlying mechanism is not well understood, it has been observed that the number of double-strand breaks (DSBs) is highly variable among sex cells at the early stages of meiosis39. At later stages, when a fraction of the DSBs become crossovers, this variation is reduced, but the reduction is less for females than for males39,40, which is consistent with our data. DISCUSSION We assembled a data set with a very large number of informative meiosis to search for sequence variants that influence genome-wide recombination rate. Combining that data with information obtained through sequencing the whole genomes of 2,261 Icelanders, we refined the previously discovered association signals at the RNF212 locus, found eight other previously unreported variants and provided additional insights into how variants in the PRDM9 region affect hot spots and genome-wide recombination rate. Among our discoveries are coding variants in genes known to have a role in recombination. One notable exception is a low-frequency variant with an extraordinarily large effect in an intron of CPLX1 that would not be found by exome sequencing. Strikingly, many of the variants with a large impact on genome-wide recombination rates appear to have little to no impact on whether recombination events occur in hot spots, suggesting that their effect is more through influencing the propensity of turning DSBs into crossovers than in affecting DSBs directly. However, in addition to the impact of variants as a group differing across chromosomes, significant differences can be observed between the variants. Thus, sequence variants can have effects on locations of recombination events that are not easily described as hot spots. Further investigation, which includes studying the biological pathways that involve the identified variants, will be necessary to understand to what degree the variation in effect among the chromosomes is sequence related. We know that local sequence variation cannot by itself explain all variation in recombination rate, as differences between males and females are seen in every aspect of recombination behavior that has been investigated. One example is that 10 of the 13 variants listed in Table 1 have effects that are limited to only one sex. It is also notable that the PRDM9 variants that affect hot spots tend to have effects that are comparable across the sexes, but association with genome-wide recombination rate is only observed for males. Methods Methods and any associated references are available in the online version of the paper. Note: Any Supplementary Information and Source Data files are available in the online version of the paper. Acknowledgments We thank the three referees for comments that led to an improved version of the manuscript. AUTHOR CONTRIBUTIONS A.K. and K.S. planned and directed the research. A.K. and G.T. wrote the first draft of the paper and, with K.S., M.L.F. and U.T., wrote most of the final version. Phasing was performed by D.F.G. and M.L.F., assisted by software from R.V., which called the recombination events. D.F.G. and G.M. processed the wholegenome sequencing data and performed imputation. G.T. performed the initial association analyses and also carried out the literature search for associated genes. A.K. performed most of the final analyses and calculations, including the study of chromosome-specific effects and the decomposition of variance. S.B.O. assisted in the imputation of the PRDM9 zinc-finger polymorphism, and E.M. carried out the investigation involving Encyclopedia of DNA Elements (ENCODE) data.
15
Articles COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details are available in the online version of the paper.
1. Baudat, F. et al. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science 327, 836–840 (2010). 2. Coop, G., Wen, X., Ober, C., Pritchard, J.K. & Przeworski, M. High-resolution mapping of crossovers reveals extensive variation in fine-scale recombination patterns among humans. Science 319, 1395–1398 (2008). 3. Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247 (2002). 4. Broman, K.W., Murray, J.C., Sheffield, V.C., White, R.L. & Weber, J.L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861–869 (1998). 5. Kong, A. et al. Sequence variants in the RNF212 gene associate with genome-wide recombination rate. Science 319, 1398–1401 (2008). 6. Chowdhury, R., Bois, P.R., Feingold, E., Sherman, S.L. & Cheung, V.G. Genetic analysis of variation in human meiotic recombination. PLoS Genet. 5, e1000648 (2009). 7. Fledel-Alon, A. et al. Variation in human recombination rates and its genetic determinants. PLoS ONE 6, e20321 (2011). 8. Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet. 37, 129–137 (2005). 9. Hinch, A.G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011). 10. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068–1075 (2008). 11. Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009). 12. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010). 13. Jonsson, T. et al. A mutation in APP protects against Alzheimer’s disease and agerelated cognitive decline. Nature 488, 96–99 (2012). 14. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999). 15. Reim, K. et al. Complexins regulate a late step in Ca2+-dependent neurotransmitter release. Cell 104, 71–81 (2001). 16. Sauvageau, M. & Sauvageau, G. Polycomb group proteins: multi-faceted regulators of somatic stem cells and cancer. Cell Stem Cell 7, 299–313 (2010). 17. Nagel, A.C., Fischer, P., Szawinski, J., Rosa, M.K. & Preiss, A. Cyclin G is involved in meiotic recombination repair in Drosophila melanogaster. J. Cell Sci. 125, 5555–5563 (2012). 18. Lynn, A., Soucek, R. & Borner, G.V. ZMM proteins during meiosis: crossover artists at work. Chromosome Res. 15, 591–605 (2007). 19. Shinohara, M., Oh, S.D., Hunter, N. & Shinohara, A. Crossover assurance and crossover interference are distinctly regulated by the ZMM proteins during yeast meiosis. Nat. Genet. 40, 299–309 (2008).
npg
© 2014 Nature America, Inc. All rights reserved.
Reprints and permissions information is available online at http://www.nature.com/ reprints/index.html.
20. Reynolds, A. et al. RNF212 is a dosage-sensitive regulator of crossing-over during mammalian meiosis. Nat. Genet. 45, 269–278 (2013). 21. Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011). 22. Ward, J.O. et al. Mutation in mouse hei10, an E3 ubiquitin ligase, disrupts meiotic crossing over. PLoS Genet. 3, e139 (2007). 23. Chelysheva, L. et al. The Arabidopsis HEI10 is a new ZMM protein related to Zip3. PLoS Genet. 8, e1002799 (2012). 24. Chowdhury, D. et al. A PP4-phosphatase complex dephosphorylates γ-H2AX generated during DNA replication. Mol. Cell 31, 33–46 (2008). 25. Nakada, S., Chen, G.I., Gingras, A.C. & Durocher, D. PP4 is a γH2AX phosphatase required for recovery from the DNA damage checkpoint. EMBO Rep. 9, 1019–1026 (2008). 26. Gutiérrez-Caballero, C. et al. Identification and molecular characterization of the mammalian α-kleisin RAD21L. Cell Cycle 10, 1477–1487 (2011). 27. Ishiguro, K., Kim, J., Fujiyama-Nakamura, S., Kato, S. & Watanabe, Y. A new meiosis-specific cohesin complex implicated in the cohesin code for homologous pairing. EMBO Rep. 12, 267–275 (2011). 28. Lee, J. & Hirano, T. RAD21L, a novel cohesin subunit implicated in linking homologous chromosomes in mammalian meiosis. J. Cell Biol. 192, 263–276 (2011). 29. Llano, E. et al. Meiotic cohesin complexes are essential for the formation of the axial element in mice. J. Cell Biol. 197, 877–885 (2012). 30. Herrán, Y. et al. The cohesin subunit RAD21L functions in meiotic synapsis and exhibits sexual dimorphism in fertility. EMBO J. 30, 3091–3105 (2011). 31. Sandor, C. et al. Genetic variants in REC8, RNF212, and PRDM9 influence male recombination in cattle. PLoS Genet. 8, e1002854 (2012). 32. Novak, J.E., Ross-Macdonald, P.B. & Roeder, G.S. The budding yeast Msh4 protein functions in chromosome synapsis and the regulation of crossover distribution. Genetics 158, 1013–1025 (2001). 33. Ross-Macdonald, P. & Roeder, G.S. Mutation of a meiosis-specific MutS homolog decreases crossing over but not mismatch correction. Cell 79, 1069–1080 (1994). 34. Kneitz, B. et al. MutS homolog 4 localization to meiotic chromosomes is required for chromosome pairing during meiosis in male and female mice. Genes Dev. 14, 1085–1097 (2000). 35. Boettger, L.M., Handsaker, R.E., Zody, M.C. & McCarroll, S.A. Structural haplotypes and recent evolution of the human 17q21.31 region. Nat. Genet. 44, 881–885 (2012). 36. Steinberg, K.M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012). 37. Suzuki, T. et al. Expression of a testis-specific form of Gal3st1 (CST), a gene essential for spermatogenesis, is regulated by the CTCF paralogous gene BORIS. Mol. Cell. Biol. 30, 2473–2484 (2010). 38. Sleutels, F. et al. The male germ cell gene regulator CTCFL is functionally different from CTCF and binds CTCF-like consensus sites in a nucleosome compositiondependent manner. Epigenetics Chromatin 5, 8 (2012). 39. Cole, F. et al. Homeostatic control of recombination is implemented progressively in mouse meiosis. Nat. Cell Biol. 14, 424–430 (2012). 40. Lenzi, M.L. et al. Extreme heterogeneity in the molecular events leading to the establishment of chiasmata during meiosis I in human oocytes. Am. J. Hum. Genet. 76, 112–127 (2005).
16
VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics
ONLINE METHODS
npg
© 2014 Nature America, Inc. All rights reserved.
Study samples. Using the Icelandic genealogy database, we identified all parent-offspring pairs that had been genotyped previously on a genome-wide Illumina BeadChip (HumanHap300, HumanHap300-Duo, HumanCNV370Duo, Human610-Quad, Human1M or Human1M-Duo BeadChip) and phased for parental origin. After removing pairs that did not exhibit Mendelian inheritance and only keeping those in which all four grandparents of the parent in the parent-offspring pair were listed in the Icelandic genealogy, 71,929 parentoffspring relationships (41,745 mother-offspring pairs and 30,184 fatheroffspring pairs) were available to estimate autosomal recombination across 690,421 phased autosomal SNPs (17,101 SNPs were phased for X chromosome recombination estimation). Among these pairs, some parents have more than one offspring, and some offspring are paired individually with both parents. These families are described in Supplementary Table 1. All biological samples used in this study were obtained according to protocols approved by the Data Protection Commission of Iceland and the National Bioethics Committee of Iceland. Informed consent was obtained from all participants, and all personal identifiers were encrypted with a code that is held by the Data Protection Commission of Iceland. Phasing sequence variants, imputation and recombination resolution. Using methods previously described13, we identified approximately 30.3 million SNP sequence variants from whole-genome sequencing of 2,261 Icelanders. We then imputed these variants into the phased haplotypes of 95,085 Icelanders using the same model used by IMPUTE12,13,41. When resolving the location of an observed recombination for investigating the hot spots, each single event had the probability of the recombination spread over the region of uncertainty12. Because of the asymmetry that is implicit in the phasing of haplotypes for a parent-offspring pair, this resolution was affected by the coverage of the observed Illumina chip genotypes in the offspring. The Illumina chips essentially form two groups, the first of which is the hhap chips (HumanHap300, HumanHap300-Duo and HumanCNV370-Duo), for which 304,937 genotyped SNPs were phased, and the second of which is the omni chips (Human610-Quad, Human1M and Human1M-Duo), for which 564,196 genotyped SNPs were phased. Examining and adjusting for extraneous factors that could affect the recombination events observed. SNP arrays. Because the samples were genotyped with different Illumina chips that consist of a different number of markers, even though this theoretically should not have an effect on the number of recombinations called, as a precaution we examined the empirical results. No association was observed between the number of recombination events called and the four SNP array combinations (omni-omni, omni-hhap, hhap-omni and hhap-hhap) for parent-offspring pairs. The SNP array used to type the offspring in a parent-offspring pair was expected to have some impact on the resolution of the recombination events called and hence could have an impact on the estimated fraction of recombination events occurring in hot-spot bins. This was confirmed by the empirical results. The estimated fraction of recombination events occurring in hot spots is higher for parent-offspring pairs in which the offspring is typed using an omni chip as opposed to a hhap chip (but, as expected, the chip used to type the parent has no detectable effect). The effect, however, is very small. For females, it is not statistically significant. For males, the effect is statistically significant (P < 1 × 10−5), but the effect size is still very small, explaining less that 0.1% of the variance. We did adjust for this in the analyses, but it had no meaningful impact on any of the conclusions. Number of grandparents typed. Although the combination of long-range phasing and the use of Icelandic genealogy allowed us to phase parents without genotyped grandparents, it is expected that the results would be somewhat less noisy if genotype data of at least one grandparent were available. Among the 41,745 mother-offspring pairs, 11,221 (corresponding to 6,808 distinct mothers) have at least one grandparent typed. Empirically, the observed recombination events for these mother-offspring pairs were reduced by 0.80 (P < 1 × 10−10). Although highly statistically significant, this accounted for only 0.22% of the variance. Among the 30,184 father-offspring pairs, 7,948 (corresponding to 4,813 distinct fathers) have at least one typed grandparent (Supplementary Table 2). The observed recombination events for these father-offspring pairs were reduced by 0.52. This effect accounted for 0.30% of the variance.
doi:10.1038/ng.2833
As noted in the main text, we used recombination counts adjusted for number of grandparents genotyped for the association analyses pairs (i.e., regressing raw recombination counts on the number of grandparents genotyped and the total lengths of the concordant segments, and using the residuals from the fit for subsequent analyses). We also note that the r2 value between the raw counts and the adjusted counts is 0.9974 for motheroffspring pairs and is 0.9969 for father-offspring pairs. Because the correlations are so high, whether we used the raw counts or the adjusted counts had no meaningful impact on the identification of the sequence variants that affect recombination rate. However, for the epidemiological study described in the variance decomposition section below, whether there is at least one grandparent genotyped did have a meaningful impact when we attempted to decompose the total variation of recombination counts into various components. Genomic control and the equivalent sets. In general, applying genomic control to the association results corresponds to multiplying the standard errors of estimated effects by 1.0971 and 1.1544 for males and females, respectively. Apart from the inversion on chromosome 17 (where there are too many variants with r2 > 0.9 to list) and the variants in the PRDM9 gene region (where the region is complex and we do not have any power for refinement), Supplementary Table 4 lists all other sequence variants that we found that are highly correlated (r2 > 0.9) with the highlighted variant for each of the other ten variants shown in Table 1. Supplementary Table 4 further indicates which of these variants are in the same (statistical) equivalent set as the highlighted variant, i.e., variants that when adjusted for the highlighted variant would no longer be significant at P < 0.05. Three of the ten highlighted variants have effects on both male and female recombination rates. In those cases, results from males and females were combined to define the equivalent set. For the seven variants that show an effect for a single sex, we used only the data for that sex to define the equivalent set. Only a modest fraction of the highly correlated variants belong to the same equivalent set as the highlighted variant. That is because with the large sample size used here, there is often power to distinguish between highly correlated variants. However, it must be cautioned that even if a highly correlated variant is not a member of the top equivalent set, it could still be the functional variant, although the probability of it being one is smaller. Hot spots and PRDM9 variants. Hot-spot bins, determined for males and females separately, were defined as described in the main text, which is the same definition that we used previously 12. For an individual father or mother, the hot-spot phenotype, also as defined previously12, is the fraction of recombination events observed occurring in hot-spot bins. For reference, the population average of the fractions of recombination events occurring in hot-spot bins are 36.67% (0.3667) and 29.83% (0.2983) for paternal and maternal recombination events, respectively. Supplementary Table 5 displays the association between four PRDM9 variants, individually and jointly, with the hot-spot phenotype. The reasons these four variants are highlighted are given in the main text. We emphasize that these results are not meant to be a full exploration of the association between the variants in the PRDM9 region, which is extremely complex, and hot spots, although we do believe they provide additional insights into what is already known. Supplementary Table 6 displays the estimated effects of the variants in Table 1, excluding the PRDM9 variants, on the hot-spot phenotype. Chromosome-specific effects and estimated effects expressed in standardized scale. For the ten variants for females and six variants for males shown in Table 1 that we found to associate with genome-wide maternal recombination and paternal recombination rates, Supplementary Table 9 gives their estimated effects for each chromosome individually. For females, the X chromosome is also included. In addition to the estimated effects and associated standard errors presented in the original scale, standardized effects and associated standard errors are also given. Specifically, for a variant, the standardized effect for a chromosome is defined as (effectC/effectT)/(LC/LT), where effectC and effectT denote, respectively, the effect of the variant on an individual chromosome and the total genome effect, and LC and LT (Supplementary Table 3) denote, respectively, the genetic length of a chromosome and the total length for the regions examined. Corresponding to the estimates, the standard errors in the standardized scale were calculated by multiplying the standard errors
Nature Genetics
npg
© 2014 Nature America, Inc. All rights reserved.
in the original scale by (1/effectT)/(LC/LT). The standardized estimates and corresponding standard errors were used to construct Figure 2. For male recombination events, the two PRDM9 variants are combined (a weighted average with weights taken from the multiple regression fit) because even with the genome-wide recombination counts, the effects of the two invariants individually are only marginally significant when adjusted for each other. Variance decomposition of the genome-wide recombination count. As noted earlier, the number of grandparents genotyped has a very modest effect on the number of recombination events called. However, the focus there is the bias that it can create, if not properly adjusted, on the number of recombination events called. For that analysis, ‘white’ noise added to the observed recombination counts for those parent-offspring pairs for which no grandparent was genotyped because of reduced informativeness, as long as it is not too large, would have the effect mainly of reducing power and precision in a modest manner. However, for the epidemiological study, it is particularly important to keep this noise to a minimum because the amount of true variation in the recombination counts (as opposed to variation added by measurement noise) is precisely what is of interest for this investigation. Hence, for the variance decomposition study described here, we used only the parent-offspring pairs for which at least one grandparent was genotyped. (Obviously, it would be even better to have both grandparents genotyped, but if we limit ourselves to parent-offspring pairs where both grandparents were genotyped, the sample size would become too small). The total variation of the number of paternal or maternal recombination events in an offspring can first be decomposed into two components: 2 s 2 = s P2 + s O
(1)
2 where s P2 is the parent component and s O represents other variation not accounted for by the parent effect. For paternal and maternal recombination events, the σ2 values are estimated to be 16.272 and 56.264 (the sample vari2 ances), respectively. To separate out s P2 and s O requires families in which recombination events are measured for two or more siblings. Specifically, with two siblings, the average of their recombination counts will have variance:
s2 s P2 + O 2
(2)
If (1) is estimated by A and (2) is estimated by B, then 2(A − B) estimates 2 sO , and 2B − A estimates s P2 . Using the sibling pairs available in our data, (2) is estimated to be 8.844 for paternal recombination events and 31.710 for maternal recombination events. Combining these with the estimates we have for σ2, the s P2 values are estimated to be 1.415 and 7.157 for paternal and maternal recombination events, respectively, or 8.70% and 12.72% of the 2 corresponding total variance, and the s O values are estimated to be 14.857 and 49.107, respectively. It has been noted that, even for children of the same parents, there exists a systematic difference between their recombination counts, as reflected by the substantial positive correlation between recombination counts for different chromosomes. This has been called the gamete effect. We hence can further 2 decompose s O into: 2 sO
Nature Genetics
= s G2
+ s R2
where s G2 denotes the gamete effect, and s R2 denotes the remaining random variation. Most likely, some gamete effect exists even if the focus is on the recombination counts on one chromosome only. However, in practice, the gamete effect and the random effect are completely (or nearly completely) confounded for a single chromosome (as, because of interference, we cannot assume the random component to be Poisson distributed). So here we elect to define the gamete effect as zero within a single chromosome and let it capture only the correlation across chromosomes. This leads to, for individual chromosome i, i = 1,…,22: s i2 = s i2P + s i2O = s i2P + s i2R and s R2 = ∑i = 1,...,22s i2R . 2 By applying the same method used to estimate s P2 and s O to recombination counts for individual chromosomes, we obtained estimates of σ i2R, and summing them gives an estimate of s R2 . Values for the latter are 12.244 and 24.160 for paternal and maternal recombination events, respectively, or 75.25% and 42.94% of the corresponding total variance. This also means that the gamete effect is estimated as 16.05% and 44.34% of the total male and female variance, respectively. Because the gamete effect as estimated does not take into account within-chromosome effects, one may consider it to be a conservative estimate of the actual gamete effect. It is noted that the method used here to estimate s P2 would capture fully the contributions of variants and factors whose effects are not necessarily uniform over the genome. Their effects could be limited to a specific chromosome or even a single locus. In a previous study5, the mother and father components were estimated as 11.1% and 6.6%, respectively, under the assumption that the parental effect is uniform across the genome. The current estimates are larger, which have the effect of lowering the estimated proportions explained by the variants, mainly because they were calculated in a manner that incorporates nonuniform effects. Specifically, in the previous study, we estimated the parent component by measuring the correlation of recombination counts between two approximately equal halves of the autosomal genome: the odd chromosomes and the even chromosomes. That estimate would capture the full effect of a variant or factor that influences the genome uniformly. However, if the effect is nonuniform, it would only be captured partially. If the effect is limited to a single chromosome, it would not be captured at all (because it would not generate correlation between recombinations on odd and even chromosomes). That is the main reason that the previous estimates are lower than the current estimates of the parent effect. In view of the results presented in Figure 2, we moved to the current method of estimation. However, even though the current method has the advantage that it could capture fully the contributions of genetic variants and other genuine factors that affect recombination rate, it could also end up capturing more of the extraneous noise. Other estimates regarding genetic components of recombination rate can be found elsewhere7.
41. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).
doi:10.1038/ng.2833
Copyright of Nature Genetics is the property of Nature Publishing Group and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.