PLoS Epigenetics 2010 Collection by PLoS

Epigenetics 2010: A Collection from the PLoS Journals

www.ploscollections.org/epigenetics2010

Image Credit: http://commons.wikimedia.org/wiki/File:Nucleosome_1KX5_2.png

Produced with support from New England Biolabs. The PLoS journal editors have sole responsibility for the content of this collection.

UNDERSTANDING CHANGE

New tools to advance epigenetics research For over 35 years, New England Biolabs has been committed to understanding the mechanisms of restriction and methylation of DNA. This expertise in enzymology has recently led to the development of a suite of validated products for epigenetics research. These unique solutions to study DNA methylation are designed to address some of the challenges of the current methods. EpiMark™ validated reagents simplify epigenetics research and expand the potential for biomarker discovery. Simplify DNA methylation analysis with MspJI

–

Hela

Plant (Maize) – +

–

Yeast

EpiMark™ validated products include: • Newly discovered methylation-dependent restriction enzymes

MspJI

• A novel kit for 5-hmC and 5-mC analysis and quantitation • Methyltransferases • Histones • Genomic DNAs

32 bp

To learn how these products can help you to better understand epigenetic changes, visit neb.com/epigenetics.

MspJI recognizes methylated and hydroxymethylated DNA and cleaves out 32 bp fragments for sequencing analysis. Overnight digestion of 1 µg of genomic DNA from various sources with or without MspJI is shown. Note: Yeast DNA does not contain methylated DNA, therefore no 32-mer is detected.

CLONING & MAPPING

DNA AMPLIFICATION & PCR

RNA ANALYSIS

PROTEIN EXPRESSION & ANALYSIS

GENE EXPRESSION & CELLULAR ANALYSIS

www.neb.com

Primer 1

Centromeres Convert but Don’t Cross Paul B. Talbert, Steven Henikoff* Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America

Thus, selective pressure to reduce crossing over near the centromere is likely to be strong. Crossing over within the centromere itself could be even more deleterious, leading to attachment of the centromere to both halves of the spindle, resulting in chromosome breakage and loss.

A long-standing problem in chromosome biology concerns the dynamic nature of centromeres. These chromosomal sites assemble the protein machines called kinetochores that connect chromosomes to the spindle microtubules for segregation to daughter cells during mitosis and meiosis. In multicelluar eukaryotes, centromeres are typically composed of highly homogeneous tandem repeats that evolve rapidly despite their highly conserved function [1]. For tandem repeats to evolve, a mutation must spread by some recombinational process, but a persistent dogma is that centromeres do not undergo homologous chromosome recombination (the shuffling of genetic segments between chromosomal pairs). New evidence [2] challenges this dogma and addresses the problem of rapidly evolving centromeres.

Centromeres, Heterochromatin, and Crossover Suppression How is crossing over suppressed at centromeres? The location of centromeres in heterochromatin raises the possibility that the crossover suppression seen at the centromere may simply be a property of the surrounding heterochromatin. Early attempts to separate heterochromatin from the centromere utilized inversions of pericentric heterochromatin on the Drosophila X chromosome and suggested that the centromere can suppress recombination independently of its flanking heterochromatin [18]. Subsequent work confirmed that heterochromatin also suppresses crossing over [19], consistent with its proposed role in facilitating cohesion. An increase in crossovers in Drosophila mutants that affect heterochromatin structure support the role of heterochromatin in suppressing pericentric crossovers [20]. Crossover suppression in plants also appears to be a feature of both centromeres and flanking heterochromatin. In Arabidopsis thaliana, crossing over is reduced .200-fold in the 2.3-Mb centromere region of Chromosome I, and 10–50 fold by the 1-Mb heterochromatic flanking regions [12], At the molecular level, centromeres are distinguished from both heterochromatin and euchromatin by specialized nucleosomes containing the centromere-specific histone H3 variant known as CENP-A or CenH3, which is necessary to form the kinetochore. Occasionally functional CenH3-containing centromeres can arise on DNA that was previously non-centromeric and be faithfully transmitted (neocentromeres), indicating that centromere inheritance is epigenetic, dependent on the presence of CenH3 nucleosomes, not on specific DNA sequences (reviewed in [1]). Despite the apparent irrelevance of centromeric DNA sequence to kinetochore function, natural centromeres in plants and animals are usually composed of Mb-sized tandem arrays of short

The Role of Crossing Over in Meiosis Centromeres do not act alone in orchestrating chromosome segregation. In order for sister kinetochores to properly disjoin (separate) and segregate chromosomes equally to daughter cells in mitosis, their sister chromatids must be linked so that the pulling forces from the two halves of the spindle generate tension to correctly orient the kinetochores, stabilize kinetochore attachments, and signal that kinetochores are ready to disjoin. Centromeres in multicellular eukaryotes are typically embedded in heterochromatin, the permanently condensed chromatin found around centromeres, in contrast to the euchromatic chromosome arms, which decondense between mitoses. Heterochromatin has been implicated in facilitating cohesion of sister chromatids around the centromere. This cohesion is mediated by cohesins, proteins that link the sisters together and that are enriched around centromeres [3], and possibly also by catenation (interlocking) of DNA threads observed between sister centromeres [4]. In most eukaryotes, homologs become physically linked during meiosis through the recombinational process of ‘‘crossing over’’—the breakage and reciprocal reunion of homologous chromatids, resulting in a chiasma, the point where recombinant chromatids cross over each other (Figure 1). Failure to cross over is a major source of non-disjunction (improper segregation) at the first meiotic division in animals [5,6], underscoring the importance of chiasmata for segregation of homologs. As early as 1930, observations on the distribution of chiasmata along chromosomes led Karl Sax to predict that crossing over (and hence genetic recombination) is reduced around the centromere [7], and this ‘‘centromere effect’’ was verified in the fruitfly Drosophila melanogaster soon afterward [8]. Suppression of crossing over around or in centromeres has since been verified in several animals [9,10], plants [11–14], and fungi [15,16], with estimates of crossover suppression ranging from 5-fold to .200-fold in different organisms. Why is crossing over suppressed around centromeres? In Drosophila [5], humans [6], and budding yeast (Saccharomyces cerevisiae) [17], non-disjunction events at the second meiotic division are enriched in centromere-proximal crossovers. This suggests that crossovers that are too close to the centromere disrupt pericentric sister chromatid cohesion, leading to premature separation of sister chromatids, which then segregate randomly. PLoS Biology | www.plosbiology.org

Citation: Talbert PB, Henikoff S (2010) Centromeres Convert but Don’t Cross. PLoS Biol 8(3): e1000326. doi:10.1371/journal.pbio.1000326 Published March 9, 2010 Copyright: ß 2010 Talbert, Henikoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CenH3, centromeric histone H3 variant; CRM2, centromerespecific retrotransposon of maize 2; LD, linkage disequilibrium * E-mail: steveh@fhcrc.org

March 2010 | Volume 8 | Issue 3 | e1000326

Figure 1. Chromosome connections in meiosis. Kinetochores attach homologous chromosomes to opposite halves of the spindle. Homologs are held together by chiasmata, in which recombinant chromatids cross each other. Sisters are held together by cohesins and possibly by catenation of centromeric DNA threads, which have been observed in human mitosis. Cohesion is released in two steps: on chromosome arms to resolve chiasmata and separate homologs in the first meiotic division; and around centromeres to separate sisters in the second meiotic division. doi:10.1371/journal.pbio.1000326.g001

Figure 2. Unequal exchange in satellite arrays. Identical tandem satellite repeats become diversified over time by mutation. Unequal exchange results in gain or loss of tandem repeats. Repeated exchange can lead to homogenization of satellite repeats (left). If the unit of exchange consists of multiple diverged monomers, higher-order repeats are generated (right). doi:10.1371/journal.pbio.1000326.g002

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000326

(150–180 bp) noncoding ‘‘satellite’’ repeats. These arrays may also be rich in transposon insertions, probably because suppression of crossing over prevents their elimination through recombination. The same or similar repeats comprise the flanking pericentric heterochromatin, underscoring the epigenetic specification of centromeres by CenH3 nucleosomes. Although both centromeres and pericentric heterochromatin are rich in repetitive elements, repeats per se do not appear to be necessary for crossover suppression. For example, centromere 8 of rice (Oryza sativa), which has only very little satellite DNA, lacks detectable crossovers in a 2.3-Mb span around the 750-kb centromere region that contains discontinuous blocks of CenH3containing nucleosomes. Remarkably, there is little difference in gene activity, transposon composition, or abundance of common histone modifications between this recombination-free region and adjacent recombining regions [21], suggesting that crossover suppression does not depend on DNA sequence but instead is epigenetic. A clearer separation of centromere and heterochromatin effects can be found in budding yeast, which is unusual in having ‘‘point’’ centromeres that are only ,120 bp in length [22], harbor a single CenH3 nucleosome [23] and lack surrounding heterochromatin. Suppression of crossing over at yeast centromeres is modest, estimated at only 3–6 fold, and extends over only about 10 kb or less [15,24], although this represents as much as 80 times the length of the centromere itself. This suppression is eliminated by a point mutation in the centromere that renders it unable to assemble a functional kinetochore [25], strongly suggesting that the kinetochore mediates suppression.

Satellite Arrays and Recombination Although crossing over is suppressed around centromeres, the tandem satellite array structure that is typical for most centromeres is best explained by extensive and repeated recombination. The generation of such arrays has been modeled as a recombinational process of random unequal exchange [26]. Unequal exchange can act on variation in the individual satellite monomers due to mutation to lead to expansion of new repeat variants and/or formation of higher-order repeats (Figure 2), as well as eliminating variation in monomers (homogenization). In the human X chromosome, the CenH3-containing chromatin is found centrally in the most recent and most homogeneous higher-order repeats of the human alpha satellite array, whereas the older and more diversified satellite monomers comprise the flanking pericentric heterochromatin [27]. Analysis of the CentO satellites in centromeres of rice revealed segmental duplications, insertions and deletions, inversions, and reshuffling of variant satellite monomers [28]. Unequal exchange occurs at a high frequency between sister centromeres in mitotically cycling mouse (Mus musculus) chromosomes and is negatively regulated by DNA methylation, without which loss of repeats occurs [29]. However, it is unknown whether these recombination events can be transmitted through meiosis to the next generation. These observations provide evidence of extensive recombination in centromeres over evolutionary time scales and underscore the instability of repeat arrays to recombination and the necessity of suppressing crossing over in order to maintain centromere structure. How can this evidence for recombination in centromeres be reconciled with crossover suppression?

Figure 3. Gene conversion. In a popular model for gene conversion [41], recombination begins with a double-strand break in one chromosome (red) and resectioning (chewing back) of the 59 ends of the break. A free 39 end invades the homolog (blue) forming a D-loop and heteroduplex DNA. Non-reciprocal DNA synthesis fills in missing DNA (dashed arrows), forming two Holliday junctions, which may be resolved as either crossovers or noncrossovers, depending on which strands are cut (green and orange arrows). Gene conversion between homologs takes place in meiosis (bottom right), generating both crossovers and noncrossovers. Centromeres might undergo noncrossover conversion in mitotically cycling cells during growth and development (bottom left and center) as part of double-strand break repair. Conversion between homologs would be necessary to repair breaks prior to replication, when there is no cohering sister centromere to use as a repair template. doi:10.1371/journal.pbio.1000326.g003

Conversion in Centromeres In the same year that Sax predicted the centromere effect on crossing over, a new model of recombination, called gene PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000326

conversion, was proposed to explain non-reciprocal recombination events in mosses and basidiomycetes [30]. Gene conversion is now thought to be a normal part of the homologous recombination pathway in which a programmed double-strand break in the DNA is repaired by copying a short (usually ,2 kb or less) stretch of the homologous chromosome. The resulting conversion event may then be resolved into a either a crossover or a noncrossover (Figure 3). Could noncrossover gene conversions contribute to recombination in centromeres in the absence of crossing over? The localized nature of gene conversion makes it significantly more difficult to detect than crossing over. A key problem is the need for numerous closely spaced unique markers in the highly repetitive sequences of the centromere and pericentromere. Consequently this question has been most thoroughly addressed in budding yeast, which lacks centromeric and pericentric repeats. Most studies have concluded that gene conversion is moderately suppressed (4- to 7-fold) at yeast centromeres, along with crossing over [24,25]. However, initiating double-strand breaks are not found within the point centromeres, but rather nearby [31,32]. One study reported that when nearby conversion events were examined, the conversion tract frequently included part or all of the centromere, and concluded that conversion rates at centromeres were not different than in non-centromeric regions [33]. Thus, the small size of yeast centromeres means that the relationship between the kinetochore and suppression of gene conversion has remained ambiguous. To determine whether gene conversion events can occur within large centromeres and provide the recombination events underlying both satellite homogenization and centromere diversity, a new report by Shi et al. [2] studied events within the centromeres of maize (Zea mays). They developed 238 centromeric markers based on insertion polymorphisms of the centromere-specific transposon CRM2 that map to all ten maize centromeres. To verify their centromeric location, centromeric chromatin was immunoprecipitated with an anti-CenH3 antibody. CenH3 is distributed discontinuously in maize centromeres [34] and only about 30% of CRM sequences can be immunoprecipitated with anti-CenH3 [34–36]. Markers were then assessed in two parental lines and in 94 recombinant inbred lines derived from their progeny. As expected, no crossovers were observed. However, in two cases a single marker from one parent was gained in a centromere with all markers of the other parent, indicating a conversion event. The formal possibility that these events represent double crossovers is unlikely given the failure to find single crossovers. Shi et al. then proceeded to assess their marker set in 53 highly diverse inbred lines representing the diversity of maize and found widespread evidence for marker recombination since the origin of maize, perhaps 9,000 years ago [37]. They could distinguish

between crossovers and noncrossover conversions by determining the linkage disequilibrium (LD), or tendency of markers in a population to occur together on the same chromosome. In crossing over, LD decreases with distance, whereas the short conversion tracts of noncrossovers show no relationship between LD and distance, because the conversion of one marker ordinarily has no effect on the coinheritance of its neighbors. No correlation was found between distance and LD in centromere 2, which has been fully sequenced [36], consistent with noncrossover conversion. Two population genetic methods gave similar estimates of the conversion rate of .161025 conversions per marker per generation, a rate not dissimilar to one estimate for the conversion rate on the chromosome arms [38]. These results are significant both for understanding the regulation of recombination in maize and for understanding the evolution of centromeres. Except in yeasts, studying recombination in centromeres has hitherto been largely a matter of inferring the occurrence of ancient events based on present-day sequences. The results of Shi et al. show that it is possible to study centromeric recombination in action in a multicellular eukaryote. They also confirm that such recombination can take place between homologs and not solely between sisters, with implications for the creation and spread of new centromere variants. Meiotic recombination involves complete end-to-end pairing of homologs, whereas a gene conversion event requires only a local homologous interaction, and it is possible that the observed conversion events occurred during mitotic development rather than during meiosis (Figure 3). For example, the mitotic threads seen to connect human sister centromeres [4] might sometimes be resolved via breakage events that initiate repair by homologous recombination. By this scenario, the surprisingly high level of genetic exchange observed by Shi et al. might be a consequence of the many mitoses that occur for each meiotic generation within a maize lineage. Widespread gene conversion might be a general feature of centromeres of multicellular eukaryotes. Human centromeres are composed of higher-order alpha satellite repeat arrays [27], and evidence for their periodic homogenization suggests an underlying gene conversion mechanism [39]. As is the case for unequal exchange between sisters, which is the most attractive explanation for the large expansions and contractions of alpha satellite repeat arrays, centromeric gene conversion challenges the widely held perception of centromeres as genetically stable regions of the genome. The actions of gene conversion and unequal exchange provide variation that makes possible Darwinian competition of centromeres that may lead to their rapid diversification [40]. Thus the problem of both homogenization and diversification of centromeres in the absence of crossovers can be resolved.

References 1. Malik HS, Henikoff S (2009) Major evolutionary transitions in centromere complexity. Cell 138: 1067–1082. 2. Shi J, Wolf SE, Burke JM, Presting GG, Ross-Ibara J, et al. (2010) Widespread gene conversion in centromere cores. PLoS Biol 8(3): e1000327. doi:10.1371/ journal.pbio.1000327. 3. Gartenberg M (2009) Heterochromatin and the cohesion of sister chromatids. Chromosome Res 17: 229–238. 4. Wang LH, Schwarzbraun T, Speicher MR, Nigg EA (2008) Persistence of DNA threads in human anaphase cells suggests late completion of sister chromatid decatenation. Chromosoma 117: 123–135. 5. Koehler KE, Boulton CL, Collins HE, French RL, Herman KC, et al. (1996) Spontaneous X chromosome MI and MII nondisjunction events in Drosophila melanogaster oocytes have different recombinational histories. Nat Genet 14: 406–414. 6. Lamb NE, Sherman SL, Hassold TJ (2005) Effect of meiotic recombination on the production of aneuploid gametes in humans. Cytogenet Genome Res 111: 250–255. 7. Sax K (1930) Chromosome structure and the mechanism of crossing over. J Arnold Arb 11: 193–220.

PLoS Biology | www.plosbiology.org

8. Beadle GW (1932) A possible influence of the spindle fibre on crossing-over in Drosophila. Proc Natl Acad Sci U S A 18: 160–165. 9. Mahtani MM, Willard HF (1998) Physical and genetic mapping of the human X chromosome centromere: Repression of recombination. Genome Res 8: 100–110. 10. Rahn MI, Solari AJ (1986) Recombination nodules in the oocytes of the chicken, Gallus domesticus. Cytogenet Cell Genet 43: 187–193. 11. Sherman JD, Stack SM (1995) Two-dimensional spreads of synaptonemal complexes from solanaceous plants. VI. high-resolution recombination nodule map for tomato (Lycopersicon esculentum). Genetics 141: 683–708. 12. Haupt W, Fischer TC, Winderl S, Fransz P, Torres-Ruiz RA (2001) The centromere1 (CEN1) region of Arabidopsis thaliana: Architecture and functional impact of chromatin. Plant J 27: 285–296. 13. Harushima Y, Yano M, Shomura A, Sato M, Shimano T, et al. (1998) A highdensity rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148: 479–494. 14. Anderson LK, Doyle GG, Brigham B, Carter J, Hooker KD, et al. (2003) Highresolution crossover maps for each bivalent of Zea mays using recombination nodules. Genetics 165: 849–865.

March 2010 | Volume 8 | Issue 3 | e1000326

29. Jaco I, Canela A, Vera E, Blasco MA (2008) Centromere mitotic recombination in mammalian cells. J Cell Biol 181: 885–892. 30. Winkler H (1930) Die konversion der gene. Jena, Germany: Gustav Fischer. 31. Blitzblau HG, Bell GW, Rodriguez J, Bell SP, Hochwagen A (2007) Mapping of meiotic single-stranded DNA reveals double-stranded-break hotspots near centromeres and telomeres. Curr Biol 17: 2003–2012. 32. Buhler C, Borde V, Lichten M (2007) Mapping meiotic single-strand DNA reveals a new landscape of DNA double-strand breaks in Saccharomyces cerevisiae. PLoS Biol 5(12): e324. doi:10.1371/journal.pbio.0050324. 33. Symington LS, Petes TD (1988) Meiotic recombination within the centromere of a yeast chromosome. Cell 52: 237–240. 34. Jin W, Melo JR, Nagaki K, Talbert PB, Henikoff S, et al. (2004) Maize centromeres: Organization and functional adaptation in the genetic background of oat. Plant Cell 16: 571–581. 35. Zhong CX, Marshall JB, Topp C, Mroczek R, Kato A, et al. (2002) Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14: 2825–36. 36. Wolfgruber TK, Sharma A, Schneider KL, Albert PS, Koo DH, et al. (2009) Maize centromere structure and evolution: Sequence analysis of centromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons. PLoS Genet 5(11): e1000743. doi:10.1371/journal.pgen.1000743. 37. Ranere AJ, Piperno DR, Holst I, Dickau R, Iriarte J (2009) The cultural and chronological context of early holocene maize and squash domestication in the Central Balsas River Valley, Mexico. Proc Natl Acad Sci U S A 106: 5014–5018. 38. Yandeau-Nelson MD, Zhou Q, Yao H, Xu X, Nikolau BJ, et al. (2005) MuDR transposase increases the frequency of meiotic crossovers in the vicinity of a mu insertion in the maize a1 gene. Genetics 169: 917–929. 39. Brown SD, Dover GA (1980) Conservation of segmental variants of satellite DNA of Mus musculus in a related species: Mus spretus. Nature 285: 47–49. 40. Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: Stable inheritance with rapidly evolving DNA. Science 293: 1098–1102. 41. Szostak JW, Orr-Weaver TL, Rothstein RJ, Stahl FW (1983) The double-strandbreak repair model for recombination. Cell 33: 25–35.

15. Lambie EJ, Roeder GS (1986) Repression of meiotic crossing over by a centromere (CEN3) in Saccharomyces cerevisiae. Genetics 114: 769–789. 16. Nakaseko Y, Adachi Y, Funahashi S, Niwa O, Yanagida M (1986) Chromosome walking shows a highly homologous repetitive sequence present in all the centromere regions of fission yeast. EMBO J 5: 1011–1021. 17. Rockmill B, Voelkel-Meiman K, Roeder GS (2006) Centromere-proximal crossovers are associated with precocious separation of sister chromatids during meiosis in Saccharomyces cerevisiae. Genetics 174: 1745–1754. 18. Mather K (1939) Crossing over and heterochromatin in the X chromosome of Drosophila melanogaster. Genetics 24: 413–435. 19. Slatis HM (1955) A reconsideration of the brown-dominant position effect. Genetics 40: 246–251. 20. Westphal T, Reuter G (2002) Recombinogenic effects of suppressors of positioneffect variegation in Drosophila. Genetics 160: 609–621. 21. Yan H, Jin W, Nagaki K, Tian S, Ouyang S, et al. (2005) Transcription and histone modifications in the recombination-free region spanning a rice centromere. Plant Cell 17: 3227–3238. 22. Carbon J, Clarke L (1984) Structural and functional analysis of a yeast centromere (CEN3). J Cell Sci Suppl 1: 43–58. 23. Furuyama S, Biggins S (2007) Centromere identity is specified by a single centromeric nucleosome in budding yeast. Proc Natl Acad Sci U S A 104: 14706–14711. 24. Chen SY, Tsubouchi T, Rockmill B, Sandler JS, Richards DR, et al. (2008) Global analysis of the meiotic crossover landscape. Dev Cell 15: 401–415. 25. Lambie EJ, Roeder GS (1988) A yeast centromere acts in cis to inhibit meiotic gene conversion of adjacent sequences. Cell 52: 863–873. 26. Smith GP (1976) Evolution of repeated DNA sequences by unequal crossover. Science 191: 528–535. 27. Schueler MG, Dunn JM, Bird CP, Ross MT, Viggiano L, et al. (2005) Progressive proximal expansion of the primate X chromosome centromere. Proc Natl Acad Sci U S A 102: 10563–10568. 28. Ma J, Wing RA, Bennetzen JL, Jackson SA (2007) Plant centromere organization: A dynamic structure with conserved functions. Trends Genet 23: 134–9.

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000326

Primer 1

Genomic Responses to Abnormal Gene Dosage: The X Chromosome Improved on a Common Strategy Xinxian Deng1, Christine M. Disteche1,2* 1 Department of Pathology, University of Washington, Seattle, Washington, United States of America, 2 Department of Medicine, University of Washington, Seattle, Washington, United States of America

Mechanisms to guard genomic integrity are critical to ensuring the welfare and survival of an organism. Disruptions of genomic integrity can result in aneuploidy, a large-scale genomic imbalance caused by either extra or missing whole chromosomes (chromosomal aneuploidy) or chromosome segments (segmental aneuploidy). A change in dosage of a single gene may not compromise the well-being of an organism, but the combined altered dosage of many genes due to aneuploidy disturbs the overall balance of gene expression networks, resulting in decreased fitness and mortality [1,2]. Chromosomal aneuploidy is a common cause of birth defects—Down syndrome is caused by an extra copy of Chromosome 21, and Turner syndrome by a single copy of the X chromosome in females. Furthermore, methods that detect segmental aneuploidy have uncovered small deletions or duplications of the genome in association with many disorders, such as mental retardation. Chromosomal and segmental aneuploidies are also frequent in cancer cells in which changes in copy number paradoxically increase cell fitness but are unfavorable to survival of the organism. A fundamental issue in biology and medicine is to understand the effects of aneuploidy on gene expression and the mechanisms that alleviate aneuploidy-induced imbalance of the genome. Chromosomal aneuploidy is caused by non-disjunction of chromosomes in meiosis or mitosis, while segmental aneuploidy involves breakage and ligation of DNA. In contrast, the sex chromosomes provide an example of a naturally occurring ‘‘aneuploidy’’ caused by the evolution of a specific set of chromosomes for sex determination that often differ in their copy number between males and females. For example, in mammals and in flies, females have two X chromosomes and males have one X chromosome and a Y chromosome, resulting in X monosomy in males. How does a cell or an organism respond to such different types of aneuploidy, abnormal or natural? It turns out that the overall expression level of a given gene is not necessarily in direct relation to the copy number. Unique strategies have evolved to deal with abnormal gene dosage to alleviate the effects of aneuploidy by dampening changes in expression levels. What’s more, the X chromosome has evolved sophisticated mechanisms to achieve complete dosage compensation, not surprisingly, since the copy number difference between males and females has been evolving for a long time.

increase in gene dosage from 2 to 3, due to a chromosomal gain or duplication, would produce 1.5-fold more products (Figure 1). In the second scenario, the amount of products from altered gene dosage would either equal or nearly equal that in WT cells, due to complete or partial compensation (Figure 1). Gene expression analyses of aneuploid cells or tissues in human, mouse, fly, yeast, and plant provide examples of both primary dosage effects and dosage compensation. Hence, changes in expression levels due to chromosomal aneuploidy do not affect all genes in the same manner. For example, in Down syndrome, 29% of transcripts from human Chromosome 21 are overexpressed (22% in proportion to gene dosage and 7% with higher expression), while the rest of genes are either partially compensated (56%) or highly variable among individuals (15%) [4]. Interestingly, dosage-sensitive genes, such as genes encoding transcription factors or ribosomal proteins, are more likely to be compensated to avoid harmful network imbalances [1,5]. This basal dynamic dosage compensation could be due to buffering, feedback regulation, or both, depending on the gene and the organism [4,6–9]. Buffering, a passive process of absorption of gene dose perturbations, is due to inherent non-linear properties of the transcription system. In contrast, feedback regulation is an active mechanism that detects abnormal transcript abundance and adjusts transcription levels.

Sex Chromosome-Specific Dosage Compensation Sex chromosome-specific dosage compensation evolved in response to the dose imbalance between autosomes and sex chromosomes in the heterogametic sex due to the different number of sex chromosomes between the sexes—for example, a single X chromosome and a gene-poor Y chromosome in males and two X chromosomes in females. Compensatory mechanisms that restore balance both between the sex chromosomes and autosomes and between the sexes vary among species [10,11]. In Drosophila melanogaster (fruit fly), expression from the single X Citation: Deng X, Disteche CM (2010) Genomic Responses to Abnormal Gene Dosage: The X Chromosome Improved on a Common Strategy. PLoS Biol 8(2): e1000318. doi:10.1371/journal.pbio.1000318 Published February 23, 2010 Copyright: ß 2010 Deng, Disteche. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Gene Expression Responses to Altered Dosage in Aneuploidy There are two main outcomes from altered gene dosage in aneuploidy in terms of transcript levels—either levels directly correlate with gene dosage (primary dosage effect) or they are unchanged/partially changed with gene dosage (complete or partial dosage compensation) [3]. In the first scenario, a reduction of the normal gene dosage in a wild-type (WT) diploid cell from a symbolic dose value of 2 to a value of 1 after a chromosomal loss or deletion would produce half as many gene products, while an PLoS Biology | www.plosbiology.org

Funding: This work was supported by National Institutes of Health grants GM079537 and GM046883 (to CMD). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: ES, embryonic stem; MOF, males absent on the first; MSL, malespecific lethal; WT, wild-type * E-mail: cdistech@u.washington.edu

February 2010 | Volume 8 | Issue 2 | e1000318

chromosomeâ&#x20AC;&#x201C;specific mechanism. The beauty of their experimental system, the S2 cell line derived from a male fly, is that it has a defined genome with numerous segmental aneuploid regions, both autosomal and X-linked. Thus, genomic responses to aneuploidy could be queried both on autosomes and on the X chromosome, the latter being associated to the MSL complex. Using secondgeneration DNA- and RNA-sequencing, the authors carefully examined the relationship between gene copy number and gene expression in S2 cells before and after induced depletion of the MSL complex. By this approach the effects of the MSL complex on the genome have effectively been separated from those triggered by a basal response to aneuploidy. What Zhang et al. have found is that partial dosage compensation of both autosomal and X-linked regions occurs even in the absence of the MSL complex. This provides strong evidence that basal dosage compensation mediated by buffering and feedback pathways allows dosage compensation across the whole genome. In the presence of the MSL complex, X-linked genes, but not autosomal genes, become subject to an additional level of regulation, which increases expression independent of gene copy or expression levels. This feed-forward regulation of the X chromosome by the MSL complex ensures a highly stable doubling of expression specific to this chromosome. Note that this feed-forward regulation results in precise dosage compensation only when X dose is half of the autosome dose, while insufficient or excessive X-linked gene expression occurs at lower or higher X dose. Excessive X expression has also been reported when ectopic expression of MSL2 is induced in Drosophila females, which leads to binding of the MSL complex to both X chromosomes and lethality [16]. The new findings by Zhang et al. implicate two levels of regulation of the X chromosome: one basal mechanism that can regulate both the X and the autosomes in the event of aneuploidy; and a second feed-forward mechanism specific to the X and regulated by the MSL complex to ensure doubling of X-linked gene expression (Figure 2). The new study proposes that the basal compensation mechanism provides a 1.5-fold increase in gene expression and the feed-forward mechanism, another 1.35-fold, resulting in a precise two-fold increase in expression of X-linked genes. The specificity of the MSL-mediated mechanism to double X-linked gene expression is ensured by the existence of DNA sequence motifs specifically enriched on the X chromosome to recruit the MSL complex only to this chromosome [14]. Autosomal aneuploidy would only trigger a response of the basal dosage compensation pathway, which would result in a 1.5-fold increase in expression of genes located within a monosomic segment (Figure 2). It should be noted that since gene expression levels were measured relative to whole genome expression (due to normalization) a fold change in expression of genes in an aneuploid segment could also be interpreted as a fold change in expression of the rest of the genome. How did such a precise mechanism evolve to ensure appropriate expression of sex-linked genes? The feed-forward process mediated by the MSL complex is a highly stable epigenetic modification selected and maintained during the evolution of heteromorphic sex chromosomes (Figure 2). Heteromorphic sex chromosomes have arisen from an ancestral pair of autosomes, following inhibition of recombination between the proto-Y chromosome that carries the male determinant and its counterpart, the proto-X chromosome [13]. Gradual loss of Ylinked genes due to lack of recombination could have happened gene-by-gene or on a chromosomal segment-by-segment basis. The human Y chromosome apparently evolved by a series of large inversions leading to a rapid loss of large chromosomal segments [17]. If the lost Y segments contained dosage sensitive

Figure 1. Expression levels change in response to altered gene dose in aneuploidy. The transcript output from a given pair of chromosomes in normal WT diploid cells is set as a value of 2. In case of aneuploidy (monosomy or trisomy), the amount of transcript would be strictly correlated with gene dose in the absence of a dosage compensation mechanism (No DC). In the presence of partial DC, the expression level per copy would be partially increased in monosomy or partially decreased in trisomy, relative to the diploid level. In the presence of complete DC, expression levels would be adjusted so that the amount of transcripts is the same in monosomic or trisomic cells compared to diploid cells. doi:10.1371/journal.pbio.1000318.g001

chromosome is specifically enhanced two-fold in males, while no such upregulation occurs in females. X upregulation also occurs in Caenorhabditis elegans (round worm) and in mammals but in both sexes [6,12]. Silencing of one X chromosome in mammalian females and partial repression of both X chromosomes in C. elegans hermaphrodites have been adapted to avoid too high an expression level of X-linked genes in the homogametic sex. A unified theme in these diverse mechanisms of sex chromosome dosage compensation is coordinated upregulation of most Xlinked genes approximately two-fold to balance their expression with that of autosomal genes present in two copies. This process utilizes both genetic and epigenetic mechanisms to increase expression of an X-linked gene once it has lost its Y-linked partner during evolution. While the mechanisms of X upregulation in mammals and worms are not clear, Drosophila X upregulation is mediated by the male-specific lethal (MSL) complex [10,13]. The MSL complex binds hundreds of sites along the male X chromosome and modifies its chromatin structure by MOF (males absent on the first)â&#x20AC;&#x201C;mediated acetylation of histone H4 at lysine 16. Other histone modifications and chromatin-associated proteins, including both activating and silencing factors, are also involved in the two-fold upregulation of the Drosophila male X chromosome [14]. How these modifications coordinately work to fine-tune a doubling of gene expression is still not well understood. Moreover, the basal dynamic dosage compensation response observed in studies of autosomal aneuploidy could also play a role in Drosophila X upregulation [3]. An important question is how much this basal response to the onset of aneuploidy contributes to sex chromosomeâ&#x20AC;&#x201C;specific dosage compensation.

Fine-Tuning of the Drosophila X Chromosome Adds a Special Layer of Regulation above a Genome-Wide Response to Aneuploidy In this issue of PLoS Biology, Zhang et al. [15] report that the exquisitely precise X chromosome upregulation in Drosophila utilizes both a basal response to aneuploidy and an X PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000318

Figure 2. Evolutionary model of sex chromosome dosage compensation compared to the basal compensation response of an autosome after a deletion. After the proto-Y chromosome evolved a gene with a male-determining function (green bar), it became subject to gradual gene loss on a gene-by-gene or segment-by-segment basis due to lack of recombination between the proto-sex chromosomes. If the lost region on the proto-Y chromosome contained dosage sensitive genes such as those that encode transcriptional factors (yellow bars), this would have triggered a basal dosage compensation response (yellow faucet) on the proto-X chromosome and led to a partial (1.5-fold) increase of expression (small arrows). The same basal dosage compensation process would also modify a deleted region on an autosome (A) in an abnormal cell. Dosageinsensitive genes (black bars) may escape this process. When broader regions were lost on the proto-Y chromosome, the collective imbalance effects of multiple aneuploid genes would have become highly deleterious and the increased load of aneuploidy could have stressed the basal mechanism of dosage compensation. Survival was achieved by recruiting regulatory complexes such as the MSL complex (red faucet) to aneuploid X segments (red regions), to further increase gene expression (big arrows) and rescue the X monosomy. This feed-forward sex chromosome–specific regulation would provide 1.35-fold increase in expression, which together with the basal dosage compensation (1.5-fold increase) would achieve the approximate two-fold upregulation of most genes on the present day X chromosome. In contrast, large-scale deleterious autosomal aneuploidy would be lost due to lack of a specific sex-driven compensatory mechanism. doi:10.1371/journal.pbio.1000318.g002

genes, this would probably have triggered a basal dosage compensation response as observed in autosomal aneuploidy (Figure 2). However, this type of dosage compensation is dynamic and incomplete, as it is probably mediated by buffering or feedback mechanisms. An organism might tolerate partial imbalances as long as those were small, but extensive gene loss from the Y chromosome would eventually have caused a deleterious collective imbalance for multiple X-linked genes. A progressive increase in the size of aneuploid X regions could have reached a threshold of unsustainable stress on the basal dosage compensation process. To relieve this stress and survive X aneuploidy, specific mechanisms of dosage compensations targeted to the X chromosome would be desirable. Such mechanisms probably derived by recruiting pre-existing regulatory complexes, for example in the making of the MSL complex in Drosophila. Indeed, one of the components of this complex is MOF, a histone acetyltransferase also involved in autosomal gene regulation [10,13]. Homologues of Drosophila MSL proteins also exist in other organisms where they are involved in gene regulation and DNA replication and repair but do not appear to associate with the X chromosome, suggesting that the PLoS Biology | www.plosbiology.org

components of X chromosome–specific complexes may differ between organisms [18]. In conclusion, two mechanisms apparently collaborate to achieve the approximate two-fold upregulation of the Drosophila X chromosome: a dynamic basal dosage compensation mechanism probably mediated by buffering and feedback processes; and a feed-forward, sex chromosome–specific regulation chiefly mediated by the MSL complex. In mammals, upregulation of the X chromosome may also result from a combination of more than one mechanism, some applicable to aneuploidy that may arise anywhere in the genome and others that evolved to control the X chromosome. High X-linked gene expression in mammalian cells with two active X chromosomes—undifferentiated female embryonic stem (ES) cells [19] and human triploid cells [20]— suggests that X upregulation does not default in these cells. Thus, in mammals, X upregulation may also be mediated by a highly stable feed-forward mechanism that acts on top of a basal aneuploidy response. In contrast, the sex chromosomes of birds and silkworms, ZZ in males and ZW in females, seem to lack a precise dosage compensation mechanism of the Z chromosome, possibly due to the absence of a feed-forward process [21,22]. The 3

February 2010 | Volume 8 | Issue 2 | e1000318

nisms underlying the basal and feed-forward regulatory pathways should help to fully understand the role of these processes in different organisms, both in response to the acute onset of aneuploidy and in evolution of sex-specific traits. Loss or dysregulation of dosage compensation mechanisms could be important in birth defects and in diseases, such as cancer, where aneuploidy is common; exploring approaches to enhance dosage compensation may be useful to relieve aneuploidy-related diseases.

Z chromosome could have a biased paucity of dosage-sensitive regulatory genes, or else selection for sexual traits may have favored the retention of gene expression imbalances between males and females. Male and female mammals display significant expression differences of a subset of genes that escape X inactivation and thus have higher expression in females [23]. Whether such genes play a role in female-specific functions is unknown. Future work to uncover the actual molecular mecha-

References 13. Vicoso B, Bachtrog D (2009) Progress and prospects toward our understanding of the evolution of dosage compensation. Chromosome Res 17: 585–602. 14. Gelbart ME, Kuroda MI (2009) Drosophila dosage compensation: a complex voyage to the X chromosome. Development 136: 1399–1410. 15. Zhang Y, Malone JH, Powell SK, Periwal V, Spana E, et al. (2010) Expression in aneuploid Drosophila S2 cells. PLoS Biol 8(2): e1000320. doi:10.1371/ journal.pbio.1000320. 16. Kelley RL, Solovyeva I, Lyman LM, Richman R, Solovyev V, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81: 867–877. 17. Lahn BT, Page DC (1999) Four evolutionary strata on the human X chromosome. Science 286: 877–879. 18. Rea S, Xouri G, Akhtar A (2007) Males absent on the first (MOF): from flies to humans. Oncogene 26: 5385–5394. 19. Lin H, Gupta V, Vermilyea MD, Falciani F, Lee JT, et al. (2007) Dosage compensation in the mouse balances up-regulation and silencing of X-linked genes. PLoS Biol 5: e326. 20. Deng X, Nguyen DK, Hansen RS, Van Dyke DL, Gartler SM, Disteche CM (2009) Dosage regulation of the active X chromosome in human triploid cells. PLoS Genet 5: e1000751. 21. Arnold AP, Itoh Y, Melamed E (2008) A bird’s-eye view of sex chromosome dosage compensation. Annu Rev Genomics Hum Genet 9: 109–127. 22. Zha X, Xia Q, Duan J, Wang C, He N, et al. (2009) Dosage analysis of Z chromosome genes using microarray in silkworm, Bombyx mori. Insect Biochem Mol Biol 39: 315–321. 23. Prothero KE, Stahl JM, Carrel L (2009) Dosage compensation and gene expression on the mammalian X chromosome: one plus one does not always equal two. Chromosome Res 17: 637–648.

1. Birchler JA, Riddle NC, Auger DL, Veitia RA (2005) Dosage balance in gene regulation: biological implications. Trends Genet 21: 219–226. 2. Veitia RA, Bottani S, Birchler JA (2008) Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects. Trends Genet 24: 390–397. 3. Zhang Y, Oliver B (2007) Dosage compensation goes global. Curr Opin Genet Dev 17: 113–120. 4. Ait Yahya-Graison E, Aubert J, Dauphinot L, Rivals I, Prieur M, et al. (2007) Classification of human chromosome 21 gene-expression variations in Down syndrome: impact on disease phenotypes. Am J Hum Genet 81: 475–491. 5. Edger PP, Pires JC (2009) Gene and genome duplications: the impact of dosagesensitivity on the fate of nuclear genes. Chromosome Res 17: 699–717. 6. Gupta V, Parisi M, Sturgill D, Nuttall R, Doctolero M, et al. (2006) Global analysis of X-chromosome dosage compensation. J Biol 5: 3. 7. FitzPatrick DR (2005) Transcriptional consequences of autosomal trisomy: primary gene dosage with complex downstream effects. Trends Genet 21: 249–253. 8. Stenberg P, Lundberg LE, Johansson AM, Ryden P, Svensson MJ, et al. (2009) Buffering of segmental and chromosomal aneuploidies in Drosophila melanogaster. PLoS Genet 5: e1000465. 9. Makarevitch I, Phillips RL, Springer NM (2008) Profiling expression changes caused by a segmental aneuploid in maize. BMC Genomics 9: 7. 10. Straub T, Becker PB (2007) Dosage compensation: the beginning and end of generalization. Nat Rev Genet 8: 47–57. 11. Cheng MK, Disteche CM (2006) A balancing act between the X chromosome and the autosomes. J Biol 5: 2. 12. Nguyen DK, Disteche CM (2006) Dosage compensation of the active X chromosome in mammals. Nat Genet 38: 47–53.

PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000318

Research in Translation

Epigenetic Epidemiology of Common Complex Disease: Prospects for Prediction, Prevention, and Treatment Caroline L. Relton1*, George Davey Smith2 1 Human Nutrition Research Centre, Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, United Kingdom, 2 MRC Centre for Causal Analyses in Translational Epidemiology, School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom

Introduction There is considerable anticipation of future improvements in disease prevention and treatment following recent advances in genomics [1]. One aspect of genomics that is receiving considerable interest is epigenetics—the regulatory processes that control the transcription of information encoded in the DNA sequence into RNA before their translation into proteins. Programmed developmental changes and the ability of the genome to register, signal, and perpetuate environmental cues are subsumed under the epigenetic banner [2]. Genes are packaged into chromatin and dynamic chromatin remodeling processes are required for the initial step in gene expression (transcription), achieved by altering the accessibility of gene promoters and regulatory regions [3]. Epigenetic factors are responsible for this regulatory process, the major components of which are DNA methylation, histone modifications, and the action of small non-coding RNAs (Figure 1). Unlike DNA sequence, which is largely fixed throughout the lifecourse, epigenetic patterns not only vary from tissue to tissue but alter with advancing age and are sensitive to environmental exposures [4–7]. It is this propensity for change that makes epigenetic processes the focus of such interest, as they lie at the interface of the environment and co-ordinated transcriptional control. In rare developmental disorders, the role of aberrant epigenetic processes is well established [8]. Our focus here, however, is on the potential role of epigenetic processes in the context of common complex disease. Tumor-specific changes in epigenetic patterns are a hallmark of numerous cancers, with analysis of the

epigenetic machinery beginning to feature prominently in emerging cancer diagnostics and therapies [9–11]. There is an increasing body of evidence to demonstrate that epigenetic patterns are altered by environmental factors known to be associated with disease risk (e.g., diet, smoking, alcohol intake, environmental toxicants, stress) [7,8]; however, an important question remains to be resolved in defining which epigenetic changes are a secondary outcome of either exposure or disease, and which lie on the causal pathway linking the two. Without proven causality, interventions to prevent or treat common complex diseases based upon epigenetic mechanisms will not be fruitful. Conversely, regardless of causality, defining a robust prospective relationship between epigenetic patterns and phenotypic traits may have application in diagnostics or in identifying highrisk individuals for non-epigenetic-based interventions.

chromatin, whereas analysis of miRNA requires a source of RNA. Planned prospective collection for such analyses is necessary, and both are costly to undertake on sizable sample sets. The Nterminal tails of the four core histones (H2A, H2B, H3, and H4) commonly exhibit post-translational modifications, including acetylation, methylation, or phosphorylation [13]. These histone modifications can be analysed following precipitation of chromatin, and subsequent use of an antibody to a specific modification e.g., methylation of histone 3, lysine 9 (H3-K9). miRNA expression levels can be measured using the same principles and methods as regular trranscriptomic analysis (miRNA array or qPCR). DNA methylation can be assayed through genomewide approaches where the investigator is interested in global changes or in identifying regions of interest [14], or targeted approaches that focus on DNA methylation at a particular locus or loci associated with genes in a specific pathway [15]. These technologies are reviewed in detail elsewhere [16]. The tissue specificity of epigenetic patterns is a well-established phenomenon, with variation between tissues within individuals being greater than variation between individuals [5]. Furthermore, epigenetic dysregulation with advancing age has been shown to be highly tissue dependent [17]. Extrapolating epigenetic information gleaned from DNA from

Measurement of Epigenetic Patterns Epigenetic patterns, including histone modifications, microRNA (miRNA), and DNA methylation, can be assessed in a range of tissue types. As DNA methylation assays on stored DNA samples are straightforward, this has been extensively studied [12]. Histone modification analysis requires that DNA is maintained as intact

Citation: Relton CL, Davey Smith G (2010) Epigenetic Epidemiology of Common Complex Disease: Prospects for Prediction, Prevention, and Treatment. PLoS Med 7(10): e1000356. doi:10.1371/journal.pmed.1000356 Published October 26, 2010 Copyright: ß 2010 Relton, Davey Smith. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors received no specific funding for this article. Competing Interests: The authors have declared that no competing interests exist.

Research in Translation discusses health interventions in the context of translation from basic to clinical research, or from clinical evidence to practice.

Abbreviations: CAD, coronary artery disease; CpG, cytosine guanine dinucleotide; DNMT, DNA methyltransferase; HDAC, histone deacetylase; HNSCC, head and neck squamous cell carcinoma; LDL-C, low density lipoprotein-cholesterol; microRNA, miRNA; SNP, single nucleotide polymorphism * E-mail: c.l.relton@ncl.ac.uk Provenance: Commissioned; externally peer reviewed.

PLoS Medicine | www.plosmedicine.org

October 2010 | Volume 7 | Issue 10 | e1000356

Summary Points

N N N N

The epigenome records a variety of dietary, lifestyle, behavioral, and social cues, providing an interface between the environment and the genome. Epigenetic variation, whether genetically or environmentally determined, contributes to inter-individual variation in gene expression and thus to variation in common complex disease risk. Interventions based upon epigenetic agents, including DNA methyltransferase inhibitors and histone deacetylase inhibitors, have been in clinical use for many years, but their role outside treatment of specific cancers is not established. Epigenetic therapies will only be fruitful if epigenetic mechanisms are causally related to the disease being treated. Evidence linking epigenetic variation to specific disease phenotypes to date is lacking. Epidemiological approaches can be applied to help separate causal from noncausal associations. We propose the development of a Mendelian randomization approach (‘‘genetical epigenomics’’), which could help overcome the problems of confounding and reverse causation (when an association between epigenetic patterns and disease phenotype is observed but it is unknown whether the disease is causing changes to the epigenome or epigenetic changes are causal in disease pathogenesis).

accessible sources such as peripheral white blood or buccal cells to other tissue types is therefore problematic. The correlation between methylation patterns in different tissues is complex and locus dependent, but data that are beginning to emerge suggest that epigenetic signatures on easily accessible material such as circulating cells have potential utility as biomarkers of exposure or disease risk [18]. Epigenetic patterns are heritable across cell divisions (mitosis) [19], but undergo comprehensive but incompletely understood reprogramming during meiosis [20]. Evidence that environmental exposures can act across generations to influence epigenetic patterns in offspring exist [21], with maternal exposure to famine during the perinatal period influencing offspring DNA methylation in adulthood [22,23]. The quantitative importance of such intergenerational epigenetic transmission remains uncertain, and may have been over-emphasized in comparison with the theoretically less challenging but probably more tractable and important intra-generational epigenetic influences [24].

Environmental Influences on Epigenetic Patterns Several other factors beyond tissue type and age [4,5,17,25,26] are believed to influence epigenetic patterns. Nutritional factors modulate epigenetic marks in both animal models and humans (reviewed by [27]), with dietary sources of methyl groups, including folate, choline, betaine, methionine, and serine, which are required for DNA methylation [28,29], PLoS Medicine | www.plosmedicine.org

having been most studied. In animal and human studies these modulate epigenetic patterns in disease and non-disease settings. Other dietary components with evidence for an effect on epigenetic patterns relevant to the pathogenesis of common complex diseases include the influence of a high-fat diet on DNA methylation [30] and various dietary modifiers of histone deacetylase (HDAC) activity such as isothiocyanates, butyrate, and diallyl disulfide [31,32]. miRNA levels have also been observed to be altered following dietary modulation, with miRNA expression in human muscle being increased following a dietary challenge of essential amino acids [33]. The most widely studied lifestyle influence on epigenetic patterns is smoking. It has been associated with global hypomethylation in DNA [34] as well as genespecific hypermethylation [35] in tumor tissues in head and neck squamous cell carcinoma (HNSCC). Animal models suggest that epigenetic changes arise in lung tissue following short-term exposure to tobacco smoke condensate [36] and precede histopathological changes. Exposure to tobacco smoke is also believed to alter expression of DNA methyltransferase (DNMT) enzymes [37,38] and modulate histone modifications, including acetylation and methylation [39]. In addition, miRNAs have been proposed as modulators of smoking-induced changes in gene expression in human airway epithelium [40], and studies in rodent models have demonstrated that chemopreventive agents can protect the lung tissue from smoke exposure-induced changes in 2

miRNA expression [41]. Maternal cigarette smoking during pregnancy influences DNA methylation patterns in offspring [42,43], pointing to a vulnerability of the epigenome to environmental exposures during the intrauterine period. Animal studies have shown that chronic alcohol consumption is associated with reduced genomic DNA methylation in the colon [44], although evidence from human studies is equivocal. Alcohol-induced shifts in DNA methylation patterns could arise through perturbation of one-carbon metabolism and interference with methyl group donation (reviewed by [45]). The molecular actions of ethanol are also thought to involve site-specific changes to histone modifications, exemplified by a recent study of alcohol exposure during adolescence [46]. Epigenetic processes could also influence patterns of alcohol drinking, with emerging evidence suggesting that alcohol-sensitive miRNAs control the development of tolerance and subsequent alcohol addiction [47]. The alcoholrelated miRNA responses may in turn reflect alcohol-induced changes in DNA methylation [48]. Air pollutants such as air particulate matter and airborne benzene exposure levels have been associated with changes in DNA methylation in genes involved in inflammation and carcinogenesis [49,50]. Endocrine disruptors (vinclozilin, bisphenol A), and various heavy metals (arsenic, mercury, cadmium) are among other compounds present in the environment that have been implicated in epigenetic changes, including altered histone methylation [21]. Most epigenetic studies of environmental toxins have focused on the potential of DNA methylation patterns as biological markers of exposure rather than establishing epigenetic mechanisms as being causally related to a specific disease. Studies have, however, suggested a role for miRNAs in mediating the effects of exposure to black carbon on disease [51]. Several infectious agents, including Helicobacter pylori [52] and Epstein-Barr virus [53], have been shown to induce epigenetic changes, either directly or secondary to inflammation. Epigenetic modulation is recognized as an aetiological component in chronic inflammatory diseases such as rheumatoid arthritis and multiple sclerosis [54]. Inflammation also plays an important role in a wide range of diseases such as cancers, obesity, and atopic disorders, and epigenetic changes may be causal in disease pathogenesis [54]. There is increasing evidence that epigenetic mechanisms contribute to the transcriptional regulation of inflammatory responses [55]. October 2010 | Volume 7 | Issue 10 | e1000356

Figure 1. Epigenetic modifications. Chromosomes are composed of chromatin, consisting of DNA wrapped around eight histone protein units. Each DNA-bound histone octamer is a nucleosome. Histone tails protruding from histone proteins are decorated with modifications, including phosphorylation (Ph), methylation (Me), and acetylation (Ac). DNA molecules are methylated by the addition of a methyl group to carbon position 5 on cytosine bases when positioned adjacent to a guanine base (CpG sites), a reaction catalyzed by DNA methyltransferase enzymes. DNA methylation maintains repressed gene activity. Transcription involves the conversion of DNA to messenger RNA (mRNA), which is usually repressed by DNA methylation and histone deacetylation. mRNA is translated into a protein product, but this process can be repressed by binding of microRNA (miRNA) to mRNA. Each miRNA binds to the mRNA of up to 200 gene targets. miRNAs can also be involved in establishing DNA methylation and may influence chromatin structure by regulating histone modifiers. doi:10.1371/journal.pmed.1000356.g001

Perhaps the most widely celebrated example of the influence of environmental conditions (other than diet) on the epigenome relates to maternal postnatal nurturing and epigenetically mediated alterations to the hypothalamic-pituitary-adrenal response to stress [56]. Variations in maternal signals alter gene expression and complex behavioral phenotypes in rodent offspring through a well-defined mechanism involving the epigenetic regulation of the glucocorticoid receptor gene within the target tissue. A further example of modulation of epigenetic patterns in a target tissue is that of increased histone acetylation in human muscle biopsy tissue PLoS Medicine | www.plosmedicine.org

following exercise [57], providing evidence that chromatin remodeling might be important in mediating longer-term responses to exercise. miRNA involvement in exercise-induced changes to gene expression has also been reported [58].

Genetic Influences on Epigenetic Patterns Twin- and family-based studies have demonstrated that variation in epigenetic patterns, including both chromatin states [59] and DNA methylation [25,60,61], is heritable. Much inter-individual variation in epigenetic patterns can be explained by 3

common genetic variation [62], with a recent study estimating that 6.5% of the variance in methylation at the IGF2 (insulin-like growth factor 2) locus could be explained by five single nucleotide polymorphisms (SNPs) [63]. A genomewide association study considering DNA methylation in human brain tissue as a quantitative trait identified both cis and trans genetic effects upon DNA methylation (cytosine guanine dinucleotide [CpG]) sites, the predominant influences being in cis, defined as SNPs influencing methylation at CpG sites within 1 Mb of themselves [64]. Similar cis effects have been reported in whole blood DNA [25]. October 2010 | Volume 7 | Issue 10 | e1000356

Figure 2. Defining the causal relationship between epigenetic patterns and phenotype. Analysis of the respective relationships between DNA methylation (CpG), body mass index (BMI), and cardiovascular disease (CVD) can help to inform the direction of causality. An observed association between BMI and CpG and CpG and CVD will not decipher which of the depicted scenarios apply. doi:10.1371/journal.pmed.1000356.g002

Greater knowledge of the genetic determinants of DNA methylation, histone modifications, and miRNA activity will transform our understanding of the mechanisms involved in the establishment and maintenance of epigenetic patterns, with such genetic influences undoubtedly contributing to observed inter-individual differences in gene expression [65]. Despite the relatively large body of evidence that disease-related environmental exposures are associated with epigenetic alterations, there remains little compelling data to support the link between epigenetic variation and common complex disease phenotypes (other than cancer). Investigation of parent-of-origin effects on risk of common complex disease have suggested a role of perturbed DNA methylation [66]. Adequately powered studies relating epigenetic profiles to both exposure and disease are in their infancy, but it is highly likely that a myriad of such associations will be identified, and the major issue will be identifying meaningful and useful associations within this tsunami of data. Epigenetic measures are phenotypic, not genotypic, and as with phenotypic measures in general, non-causal associations will be the rule rather than the exception [67]. As with conventional epidemiological investigations, separating PLoS Medicine | www.plosmedicine.org

respect to behavioral factors, it has been used in a proof-of-principle manner to demonstrate associations of alcohol intake with esophageal [71] and head and neck cancers [72], as well as to considerably strengthen evidence on the associations of alcohol intake with blood pressure [73]. The method has particular promise when applied to circulating intermediate phenotypes, the manipulation of which can potentially prevent disease. Again, as proof-of-principle, an increasing number of genetic variants that are associated with low density lipoprotein-cholesterol (LDLC) level are also associated with coronary artery disease (CAD) risk [67,74–76] (Figure 3). In a similar fashion, genetic variants related to body mass index and obesity have been shown to influence a wide variety of metabolic, cardiovascular, and bone-related traits, strengthening evidence on the causal influence of adiposity in these cases [77–80]. Conversely, genetic variants associated with C-reactive protein (CRP) level have not been found to predict insulin resistance [80] or coronary heart disease [81], casting doubt on the causal role of CRP with respect to these conditions.

causal from non-causal associations will become an important task (Figure 2).

‘‘Genetical Epigenomics’’: Identifying Causal Relationships between Exposure, Epigenetic Patterns, and Disease Using germ-line genetic variation as a proxy for environmental exposures provides a route to strengthening causal inference within observational data [68– 70]. The rationale is that genetic variants are not, in general, related to the socioeconomic, behavioral, and physiological factors that confound associations in conventional observational epidemiology [67], nor are they altered by disease processes and thus subject to reverse causation. The Mendelian randomization approach can be extended to the interrogation of epigenetic variation as potential mediators of the influence of a modifiable exposure on disease outcomes, and thus appropriate targets for disease prevention. Mendelian randomization methods can be applied to many categories of environmentally modifiable exposures to help define whether their relationship with phenotype is causal. For example, with 4

Figure 3. Applying Mendelian randomization to define the causal relationship between phenotype and disease. An example based upon the report of LintelNietschke et al. (2008) [74] reporting the association between a gene variant in the LDLR gene with decreased low density lipoprotein-cholesterol (LDL-C) levels and with a reduced risk of coronary artery disease (CAD). The variant can be used in a Mendelian randomization approach to test the causal relationship between LDL-C and CAD. If LDL-C has a causal role in CAD, an association between the LDLR gene variant and disease risk would be seen (red dashed arrow). If LDLC levels are correlated with CAD risk but not causal, then the gene variant will not show an association with CAD risk. This will establish whether reverse causation is at play and remove the potential confounding influence of factors such as smoking and nutritional status. doi:10.1371/journal.pmed.1000356.g003

October 2010 | Volume 7 | Issue 10 | e1000356

In the field of gene expression studies, identifying causal processes within a multitude of associations is at least as problematic as in observational epidemiological studies. For example, the majority of gene expression signatures in adipose tissue, and in high proportions (up to 10%) in blood, have been found to be related to obesity [82]. Methods equivalent to the Mendelian randomization approach we propose here (sometimes called ‘‘genetical genomics’’ [83] in the context of gene expression studies) have been applied to separate causal transcription effects from those generated by reverse causation [82]. This is facilitated by strong cis effects on gene expression, which allows isolation of specific loci influencing transcript level. The identification of strong cis effects in a genome-wide association study analysis of methylation patterns [64] provides encouragement that these methods can be extended to investigate the causal influences of epigenetic signatures in what could be called ‘‘genetical epigenomics’’. As a hypothetical example of how this approach could be applied, we will consider alcohol intake and HNSCC. It is likely that alcohol intake would be associated with a wide range of epigenetic changes, although at least some (and probably many) of these associations could reflect confounding by the many other factors related to alcohol consumption. Similarly, HNSCC could be related to a multitude of epigenetic changes, which could arise through reverse causation (the disease influencing the epigenetic patterns) or confounding (factors associated with HNSCC risk influencing the epigenetic patterns). If the epigenetic processes are to be targeted as a component of disease prevention they must be causally associated with HNSCC, and for them to mediate the effect of alcohol intake on HNSCC risk they need to be responsive to changes in alcohol intake. Observational data demonstrating an association of alcohol intake with a particular epigenetic profile exists, but the association of this profile with HNSCC risk does not, of course, establish causality. As depicted in Figure 4, Mendelian randomization approaches could be applied to this scenario.

Epigenomic Modifiers and the Prospects for Future Treatments It can be argued that mitotically stable changes in gene expression are very likely to underlie the development of virtually all disease (in the same way as they are an essential component in the process of the PLoS Medicine | www.plosmedicine.org

Figure 4. Incorporating epigenetic information in a Mendelian randomization framework. (A) Alcohol exposure is associated with risk of head and neck squamous cell carcinoma (HNSCC) and this may be mediated by altered DNA methylation (CpG). The relationship between alcohol exposure and HNSCC is potentially confounded by factors such as socio-economic position, which correlate with both exposure and disease. A common variant in ADH1B can be used as an unconfounded, genetic proxy for alcohol exposure, and if this SNP is associated with CpG (either locally or more widely across the genome), it would lend support to the hypothesis that alcohol intake causally influences DNA methylation. However, showing associations of these epigenetic measures with HNSCC does not demonstrate causality of either alcohol or CpG on HNSCC, as either or both associations (alcoholRHNSCC and CpGRHNSCC) could be confounded or alcohol could influence HNSCC through another pathway (dashed line). (B) To investigate this, another Mendelian randomization experiment could be undertaken using an SNP known to have a cis influence on loci-specific DNA methylation. If an association were observed between this SNP and both CpG and HNSCC, this would support a role for DNA methylation in the causation of HNSCC. doi:10.1371/journal.pmed.1000356.g004

development of an organism [84]), and as definitions of epigenetics incorporate such changes, they automatically fall within the field’s remit. Once epigenetic mechanisms, even if only contributory, are unequivocally implicated in disease pathogenesis, the prospect of epigenomic-based therapies becomes a realistic possibility. A wide range of pharmacological agents that target the epigenome, including DNMT inhibitors and HDAC inhibitors, are used in clinical practice, largely as anti-cancer treatments [11]. However, these agents require further development to enhance the specificity of their pleiotropic effects, and evaluation of 5

their efficacy in a non-cancer setting is essential. Combination therapies involving DNMT inhibitors or HDACs being employed with other agents are an active avenue of inquiry. miRNAs are also emerging as a promising technology in drug development following an increasing understanding of their biogenesis and function. The links between miRNA expression and common complex disease are growing, providing a greater impetus to pursue this useful tool for the targeted modulation of gene regulation. As with other epigenetic signatures, their utility might also lie in disease diagnosis and prognosis [85]. October 2010 | Volume 7 | Issue 10 | e1000356

Five Key Papers in the Field

Weaver IC, Cervoni N, Champagne FA, D’Alessio AC, Sharma S, et al. (2004) Epigenetic programming by maternal behaviour. Nat Neurosci 7: 847–854. This landmark paper demonstrated that the epigenomic state of a gene can be altered through behavioural programming and that this environmentally induced modification is potentially reversible. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, et al. (2005) Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A 102: 10604–10609. This article describes how epigenetic patterns in monozygotic twins become more discordant with advancing age. This epigenetic drift is postulated to be invoked through differences in environmental exposures. Bjornsson HT, Sigurdsson MI, Fallin MD, Irizarry RA, Aspelund T, et al. (2008) Intra-individual change over time in DNA methylation with familial clustering. JAMA 299: 2877–2883. This study showed greater than 10% methylation change over time, that individuals within families showed both gain and loss of methylation, and that this change in methylation showed familial clustering indicative of a genetic basis. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462: 315–322. This paper reports the first genome-wide, single base-pair resolution map of methylated cytosines in the mammalian genome from embryonic stem cell and fetal fibroblasts, showing widespread differences between the tissue types. Zhang D, Cheng L, Badner JA, Chen C, Chen Q, Luo W, et al. (2010) Genetic control of individual differences in gene-specific methylation in human brain. Am J Hum Genet 86: 411–419. This study demonstrated that DNA methylation is a heritable trait, determined in part by common genetic variation. The vast majority of genetically determined variation was observed to be in cis (correlation within 1Mb of a CpG site) with only a handful of SNPs determining trans methylation (distant regulation effects).

Conclusion Through examining the role of environmental factors in causing variation in

epigenetic patterns (exposure/epigenotype) and ultimately exploring the causal impact of epigenotype on disease outcomes (epigenotype/disease) using

genetical epigenomics and other methods, progress towards epigenetic interventions can be made. As genome-wide association studies and other approaches identify robust associations between genetic variants and epigenetic patterns, possibilities for elucidating causal pathways and predicting the effect of manipulation—through environmental (including lifestyle) modification or pharmacotherapeutic means—is considerable. In this way, epigenetic markers may become targets for modification as well as biomarkers for exposure and disease risk. The International Human Epigenome Consortium is poised to invest millions of dollars to map 1,000 reference epigenomes in a range of normal tissues and define the level of variation that exists between individuals [86]. The field of epigenetics in relation to common complex disease will undoubtedly continue to be the focus of much attention, and its progress, now that it has passed the starting line, will be followed with considerable interest.

Acknowledgments The authors would like to thank Prof Debbie A Lawlor and Dr Nick Embleton for their helpful comments on the manuscript.

Author Contributions ICMJE criteria for authorship read and met: CLR GDS. Agree with the manuscript’s results and conclusions: CLR GDS. Contributed to the writing of the paper: CLR GDS.

References 1. Feero WG, Guttmacher AE, Collins FS (2010) Genomic medicine–An updated primer. New Engl J Med 362: 2001–2011. 2. Bird A (2007) Perceptions of epigenetics. Nature 447: 396–398. 3. Vaissiere T, Sawan C, Herceg Z (2008) Epigenetic interplay between histone modifications and DNA methylation in gene silencing. Mutat Res 659: 40–48. 4. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, et al. (2009) Human DNA methylaomes at base pair resolution show widespread epigenomic differences. Nature 462: 315–322. 5. Byun HM, Siegmund KD, Pan F, Weisenberger DJ, Kanel G, et al. (2009) Epigenetic profiling of somatic tissues from human autopsy specimens identifies tissue- and individual-specific DNA methylation patterns. Hum Mol Genet 18: 4808– 4817. 6. Aguilera O, Fernandez AF, Munoz A, Fraga MF (2010) Epigenetics and environment: a complex relationship. J Appl Physiol, Apr 8 [Epub ahead of print]. 7. Meaney MJ (2010) Epigenetics and the biological definition of gene x environment interactions. Child Dev 81: 41–79. 8. Nicholls RD (2000) The impact of genomic imprinting for neurobehavioural and developmental disorders. J Clin Invest 105: 413–418. 9. Sharma S, Kelly TK, Jones PA (2010) Epigenetics in cancer. Carcinogenesis 31: 27–36.

PLoS Medicine | www.plosmedicine.org

10. Laird PW (2003) The power and the promise of DNA methylation markers. Nat Rev Cancer 3: 253–266. 11. Piekarz RL, Bates SE (2009) Epigenetic modifiers: Basic understanding and clinical development. Clin Cancer Res 15: 3918–3926. 12. Beck S, Rakyan VK (2008) The methylome: approaches for global DNA methylation profiling. Trends Genet 24: 231–237. 13. Jenuwein T, Allis CD (2001) Translating the histone code. Science 293: 1074–1080. 14. Feinberg AP (2009) Genome-scale approaches to the epigenetics of common human disease. Virchows Arch 456: 13–21. 15. Campion J, Milagro FI, Martinez JA (2009) Individuality and epigenetics in obesity. Obes Rev 10: 383–392. 16. Tollefsbol TO (2004) Methods of epigenetic analysis. In Tollefsbol TO. Epigenetics protocols. Secaucus (New Jersey) Springer Science & Business Media. pp 1–8. 17. Thompson RF, Atzmon G, Gheorghe C, Liang HQ, Lowes C, et al. (2010) Tissue specific dysregulation of DNA methylation in aging. Aging Cell, May 22 [Epub ahead of print]. 18. Talens RP, Boomsa DI, Tobi EW, Kremer D, Jukema JW, et al. (2010) Variation, patterns and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB J 9: 3135–3144.

19. Kim JK, Samaranayake M, Pradhan S (2009) Epigenetic mechanisms in mammals. Cell Mol Life Sci 66: 596–612. 20. Reik W, Dean W, Walter J (2001) Epigenetic reprogramming in mammalian development. Science 293: 1089–1093. 21. Bollati V, Baccarelli A (2010) Environmental epigenetics. Heredity 105: 105–112. 22. Heijmans BT, Tobi EW, Stein AD, Putter H, Blauw GJ, et al. (2008) Persistent epigenetic differences associated with prenatal exposure to famine in humans. Proc Natl Acad Sci U S A 105: 17046–17049. 23. Tobi EW, Lumey LH, Talens RP, Kremer D, Putter H, et al. (2009) DNA methylation differences after exposure to prenatal famine are common and timing- and sex-specific. Hum Mol Genet 18: 4046–4053. 24. Haig D (2007) Weismann rules! OK? Epigenetics and the Lamarckian temptation. Biol Philos 22: 415–428. 25. Boks MP, Derks EM, Weisenberger BJ, Strengman E, Janson E, et al. (2009) The relationship of DNA methylation with age, gender and genotype in twins and healthy controls. PLoS ONE 4: e6767. doi:10.1371/ journal.pone.0006767. 26. Calvanese V, Lara E, Kahn A, Fraga MF (2009) The role of epigenetics in ageing and age-related diseases. Ageing Res Rev 8: 268–276.

October 2010 | Volume 7 | Issue 10 | e1000356

27. Ferguson LR (2009) Epigenetic variation and customising nutritional intervention. Curr Pharmacogenomics Person Med 7: 115–124. 28. Kim KC, Friso S, Choi SW (2009) DNA methylation, an epigenetic mechanism connecting folate to healthy embryonic development and aging. J Nutr Biochem 20: 917–926. 29. Waterland RA (2006) Assessing the effects of high methionine intake on DNA methylation. J Nutr 136 (6 Suppl): 1706S–1710S. 30. Widiker S, Karst S, Wagener A, Brockman GA (2010) High fat diet leads to a decreased methylation of the Mc4r gene in the obese BFMI and the lean B6 mouse lines. J Appl Genet 51: 193–197. 31. Delage B, Dashwood RH (2008) Dietary manipulation of histone structure and function. Annu Rev Nutr 28: 347–366. 32. Myzak MC, Dashwood RH (2006) Histone deacetylases as targets for dietary cancer preventive agents: lessons learned with butyrate, diallyl disulfide and sulforaphane. Curr Drug Targets 7: 443–452. 33. Drummond MJ, Glynn EL, Fry CS, Dhanani S, Volpi E, et al. (2009) Essential amino acids increase miRNA-499, -208b and -23 in human skeletal muscle. J Nutr 139: 2279–2284. 34. Hsiung DT, Marsit CJ, Houseman EA, Eddy K, Furniss CS, et al. (2007) Global DNA methylation level in whole blood as a biomarker in head and neck squamous cell carcinoma. Cancer Epidemiol Biomarkers Prev 16: 108–114. 35. Kaur J, Demokan S, Tripathi SC, Macha MA, Begum S, et al. (2010) Promoter hypermethylation in indian primary oral squamous cell carcinoma. Int J Cancer. E-pub ahead of print. 5 April 2010. doi:10.1002/ijc.25377. 36. Philips JM, Goodman JI (2009) Inhalation of cigarette smoke induces regions of altered DNA methylation (RAMs) in SENCAR mouse lung. Toxicology 260: 7–15. 37. Launay JM, Del Pino M, Chironi G, Callebert J, Peoc’h K, et al. (2009) Smoking induces longlasting effects through a monoamine-oxidase epigenetic regulation. PLoS ONE 4: e7959. doi:10.1371/journal.pone.0007959. 38. Liu H, Zhou Y, Boggs SE, Belinsky SA, Liu J (2007) Cigarette smoke induces demethylation of prometastatic oncogene synuclein-gamme in lung cancer cells by downregulation of DNMT3B. Oncogene 26: 5900–5910. 39. Hussain M, Rao M, Humphries AE, Hong JA, Liu F, et al. (2009) Tobacco smoke induces polycomb-mediated repression of Dickkopf-1 in lung cancer cells. Cancer Res 69: 3570– 3578. 40. Schembri F, Sridhar S, Perdomo C, Gustafson AM, Zhang X, et al. (2009) MicroRNAs as modulators of smoking-induced gene expression changes I human airway epithelium. Proc Natl Acad Sci U S A 106: 2319–2324. 41. Izzotti A, Larghero P, Cartiglia C, Longobardi M, Pfeffer U, et al. (2010) Modulation of microRNA expression by budesonide, phenethyl isothiocyanate and cigarette smoke in mouse liver and lung. Carcinogenesis 31: 894–901. 42. Guerrero-Preston R, Goldman LR, BrebiMieville P, Ili-Ganga C, Lebron C, et al. (2010) Global hypomethylation is associated with in utero exposure to cotinine and perfluorinated alkyl compounds. Epigenetics, Aug 14 [Epub haead of print]. 43. Breton CV, Byun HM, Wenten M, Pan F, Yang A, et al. (2009) Prenatal tobacco smoke exposure affects global and gene-specific DNA methylation. Am J Respir Crit Care Med 180: 462–467. 44. Sauer J, Jang H, Zimmerly EM, Kim KC, et al. (2010) Agening, chronic alcohol consumption and folate are determinnats of genomic DNAmethylation, p16 promoter methylation and the expression of p16 in the mouse colon. Br J Nutr 104: 24–30.

PLoS Medicine | www.plosmedicine.org

45. Seitz HK, Stickel F (2007) Molecular mechanisms of alcohol-mediated carcinogenesis. Nat Rev Cancer 7: 599–612. 46. Pascual M, Boix J, Felipo V, Guerri C (2009) Repeated alcohol administration during adolescence causes changes in the mesolimbic dopaminergic and glutamatergic systems and promotes alcohol intake in the adult rat. J Neurochem 108: 920–31. 47. Miranda RC, Pietrzykowski AZ, Tang Y, Sathyan P, Mayfield D, et al. (2010) MicroRNAs: master regulators of ethanol abuse and toxicity? Alcohol Clin Exp Res 34: 575–87. 48. Tarantini L, Bonzini M, Apostoli P, Pegoraro V, Bollati V, et al. (2009) Effects of particulate matter on genomic DNA methylation content and iNOS promter methylation. Environ Health Perspect 117: 217–222. 49. Bollati V, Baccarelli A, Hou L, Bonzini M, Fustinoni S, et al. (2007) Changes in DNA methylation patterns in subjects exposed to lowdose benzene. Cancer Res 67: 876–880. 50. Baccarelli A, Wright RO, Bollati V, Tarantini L, Litonjua AA, et al. (2009) Rapid DNA methylation changes after exposure to traffic particles. Am J Respir Crit Care Med 179: 572–578. 51. Wilker EH, Baccarelli A, Suh H, Vokonas P, Wright RO, et al. (2010) Black carbon exposures, blood pressure and interactions with single nucleotide polymorphisms in microRNA processing genes. Environ Health Perspect 118: 943–948. 52. Shin CM, Kim N, Jung Y, Park JH, Kang GH, et al. (2010) The role of Helicobacter pylori infection in aberrant DNA methylation along multistep gastric carcinogenesis. Cancer Sci. E-pub ahead of print. 18 February 2010. doi:10.1111/j.1349-7006.2010.01535.x. 53. Tsai CN, Tsai CL, Tse KP, Chang HY, Chang YS (2002) The Epstein-Barr virus oncogene product, latent membrane protein 1, induces the downregulation of E-cadherin gene expression via activation of DNA methyltransferases. Proc Natl Acad Sci U S A 99: 10084–10089. 54. Backdahl L, Bushell A, Beck S (2009) Inflammatory signalling as mediator of epigenetic modulation in tissue-specific chronic inflammation. Int J Biochem Cell Biol 41: 176–84. 55. Medzhitov R, Horng T (2009) Transcriptional control of the inflammatory response. Nature Rev Immunol 9: 692–703. 56. Weaver IC, Cervoni N, Champagne FA, D’Alessio AC, Sharma S, et al. (2004) Epigenetic programming by maternal behaviour. Nat Neurosci 7: 847–854. 57. McGee SL, Fairlee E, Graham AP, Hargreaves M (2009) Exercise-induced histone modifications in human skeletal muscle. J Physiol 587: 5951–5981. 58. Radom Aizik S, Zaldivar FP, Jr., Oliver SR, Galassetti PR, Cooper DM (2010) Evidence for microRNA involvement in exercise-associated neutrophil gene expression changes. J Appl Physiol 109: 252–261. 59. Kadota M, Yang HH, Hu N, Wang C, Hu Y, et al. (2007) Allele-specific chromatin immunoprecipitation studies show genetic influence on chromatin state in human genome. PLoS Genet 3: e81. doi:10.1371/journal.pgen.0030081. 60. Wong CC, Caspi A, Williams B, Craig IW, Houts R, et al. (2010) A longitudinal study of epigenetic variation in twins. Epigenetics 5: 516–526. 61. Bjornsson HT, Sigurdsson MI, Fallin MD, Irizarry RA, Aspelund T, et al. (2008) Intraindividual change over time in DNA methylation with familial clustering. JAMA 299: 2877–2883. 62. French HJ, Attenborough R, Hardy K, Shannon F, Williams RBH (2009) Interindividual variation in epigenomic phenomena in humans. Mamm Genome 20: 604–611. 63. Heijmans BT, Kremer D, Tobi EW, Boomsa DI, Slagboom PE (2007) Heritable rather than agerelated and stochastic factors dominate variation

64.

65.

66.

67.

68.

69.

70.

71.

72.

73.

74.

75.

76. 77.

78.

79.

80.

81.

in DNA methylation of the human IGF2/H19 locus. Hum Mol Genet 16: 547–554. Zhang D, Cheng L, Badner JA, Chen C, Chen Q, et al. (2010) Genetic control of individual differences in gene-specific methylation in human brain. Am J Hum Genet 86: 411–419. Dimas AS, Dermitzakis ET (2009) Genetic variation of regulatory systems. Curr Opin Genet Dev 19: 586–590. Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, et al. (2009) Parental origin of sequence variants associated with complex diseases. Nature 462: 868–874. Davey Smith G, Lawlor DA, Harbord R, Timpson N, Day I, et al. (2007) Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. PLoS Med 4: e352. doi:10.1371/journal.pmed.0040352. Davey Smith G, Ebrahim S (2003) ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32: 1–22. Davey Smith G (2010) Mendelian randomization for strengthening causal inference in observational studies: applications to gene by environment interaction. Perspect Psychol Sci. In press. Sheehan NA, Didelez V, Burton PR, Tobin MD (2008) Mendelian randomization and causal inference in observational epidemiology. PLoS Med 5: e177. doi:10.1371/journal.pmed.0050177. Lewis SJ, Davey Smith G (2005) Alcohol, ALDH2 and esophageal cancer: a meta-analysis which illustrates the potentials and limitations of a Mendelian randomization approach. Cancer Epidemiol Biomarkers Prev 14: 1967–1971. Boccia S, Hashibe M, Galli P, De Feo E, Asakage T, et al. (2009) Aldehyde dehydrogenase 2 and head and neck cancer: a meta-analysis implementing a Mendelian randomization approach. Cancer Epidemiol Biomarkers Prev 18: 248–254. Chen L, Davey Smith G, Harbord R, Lewis S (2008) Alcohol intake and blood pressure: a systematic review implementing Mendelian randomization approach. PLoS Med 5: e52. doi:10.1371/journal.pmed.0050052. Linsel-Nitschke P, Gotz A, Erdmann J, Braenne I, Braund P, et al. (2008) Lifelong reduction of LDL-cholesterol related to a common variant in the LDL-receptor gene decreases the risk of coronary artery disease–a Mendelian randomization study. PLoS ONE 3: e2986. doi:10.1371/ journal.pone.0002986. Teslovich M, Musunumu K, Smith AV, Edmondson AC, Stylianou IM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707– 713. Schuldiner AR, Pollin TI (2010) Variation in blood lipids. Nature 466: 703–704. Freathy RM, Timpson NJ, Lawlor DA, Pouta A, Ben-Shlomo Y, et al. (2008) Common variation in the FTO gene alters diabetes-related metabolic traits to extent expected, given its effect on BMI. Diabetes 57: 1419–1426. Timpson N, Harbord R, Davey Smith G, Zacho J, Tybaerg-Hansen A, et al. (2009) Does greater adiposity increase blood pressure and hypertension risk? Mendelian randomization using Fto/ Mc4r genotype. Hypertension 54: 84–90. Timpson NJ, Sayers A, Davey Smith G, Tobias JH (2002) How does body fat influence bone mass in childhood? A Mendelian randomisation approach. J Bone Miner Res 24: 522–533. Timpson NJ, Lawlor DA, Harbord RM, Gaunt TR, Day INM, et al. (2005) C-reactive protein and its role in metabolic syndrome: Mendelian randomisation study. Lancet 366: 1954–1959. Zacho J, Tybjoerg-Hansen A, Jensen JS, Grande P, Sillensen H, et al. (2008) Genetically elevated C-reactive protein and ischaemic vascular disease. New Engl J Med 359: 1897–1908.

October 2010 | Volume 7 | Issue 10 | e1000356

82. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: 423–428. 83. Li H, Lu L, Manly KF, Chesler EJ, Bao L, et al. (2005) Inferring gene transcriptional modulatory

PLoS Medicine | www.plosmedicine.org

relations: a genetical genomics approach. Hum Mol Genet 14: 1119–1125. 84. Gilbert SF, Epel D (2009) Ecological developmental biology: Integrating epigenetics, medicine and evolution. MA, USA: Sinauer Associates Inc.

85. Liu Z, Sall A, Yang D (2008) MicroRNA: an emerging therapeutic target and intervention tool. Int J Mol Sci 9: 978–999. 86. Abbott A (2010) Project set to map marks on genome. Nature 463: 596–597.

October 2010 | Volume 7 | Issue 10 | e1000356

Spatial Epigenetic Control of Mono- and Bistable Gene Expression Ja´nos Z. Kelemen, Prasuna Ratna, Simone Scherrer, Attila Becskei* Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland

Abstract Bistability in signaling networks is frequently employed to promote stochastic switch-like transitions between cellular differentiation states. Differentiation can also be triggered by antagonism of activators and repressors mediated by epigenetic processes that constitute regulatory circuits anchored to the chromosome. Their regulatory logic has remained unclear. A reaction–diffusion model reveals that the same reaction mechanism can support both graded monostable and switch-like bistable gene expression, depending on whether recruited repressor proteins generate a single silencing gradient or two interacting gradients that flank a gene. Our experiments confirm that chromosomal recruitment of activator and repressor proteins permits a plastic form of control; the stability of gene expression is determined by the spatial distribution of silencing nucleation sites along the chromosome. The unveiled regulatory principles will help to understand the mechanisms of variegated gene expression, to design synthetic genetic networks that combine transcriptional regulatory motifs with chromatin-based epigenetic effects, and to control cellular differentiation. Citation: Kelemen JZ, Ratna P, Scherrer S, Becskei A (2010) Spatial Epigenetic Control of Mono- and Bistable Gene Expression. PLoS Biol 8(3): e1000332. doi:10.1371/journal.pbio.1000332 Academic Editor: Andre Levchenko, Johns Hopkins University, United States of America Received October 14, 2009; Accepted February 9, 2010; Published March 16, 2010 Copyright: ß 2010 Kelemen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work is supported by the Swiss National Foundation and by the UZH-URPP. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: GA, gene activation * E-mail: attila.becskei@imls.uzh.ch

or repressor-recruiting DNA sequences that act or interact over long distances, variously termed as long-range repressors, silencing proteins, and silencers in different systems and organisms [14–17]. Genes exposed to the antagonism of activators and repressors or silenced chromosomal regions have been frequently observed to display binary response [13,18–21]. Although regulatory principles underlying the graded and binary responses generated by networks with spatially homogeneously distributed components have been increasingly elucidated, the quantitative aspects of the behavior of epigenetic circuits anchored to the chromosome have remained unclear. We examined whether the spatial distribution of activator and repressor binding sites influences gene expression to become monostable or bistable. We examined long-range interactions between these sites. Since long intervening DNA sequences can receive signals from endogenous cellular pathways, we used heterologous synthetic gene expression systems precluding pleiotropic cellular effects. Synthetic networks have been instrumental in reconstituting nonmonotonous responses and in revealing the basic principles of binary response and bistability in transcriptional regulatory networks based on feedback or competition of activators and repressors [19,22–26]. We identified a concise nonlinear reaction–diffusion equation that explains gene expression of a large number of genetic constructs with different configurations. We found that binary response is not inherent to repressor proteins exhibiting synergy over long distances. Both graded and binary responses can arise depending on the spatial distribution of the binding sites of the repressors along the DNA.

Introduction Graded and switch-like responses reflect fundamental aspects of the functioning of regulatory networks. A graded, monostable response enables the faithful propagation of a signal, and it is often the default response of simple pathways, but regulatory mechanisms can improve the linearity and the dynamic range of the graded response [1,2]. Conversely, when the signal strength reaches a threshold value, the switch-like response is often manifested in ON and OFF states within a cell population. This binary response can be induced by positive feedback loops capable of generating bistability, but many other mechanisms can support it by rendering the underlying processes more nonlinear and stochastic [3–9]. Positive feedback loops in transcriptional or protein kinase networks have been increasingly recognized as a driving force of cellular differentiation [10,11]. The components of these networks are dissolved in the cytoplasm or nucleoplasm, and typically have a spatially homogeneous distribution. In contrast, inhomogeneously distributed regulatory components are frequently observed in eukaryotic transcriptional regulation. Binding of eukaryotic transcriptional factors—activators and repressors—to the DNA can lead to recruitment of enzymes and structural proteins of opposing functions, that induce structural changes and covalent modifications of chromatin, exemplified by acetylation and methylation [12,13]. This leads to a spatially inhomogeneous distribution of regulators along the DNA, constituting the epigenetic code. Activators loosen the chromatin structure. Conversely, the compaction of chromatin and heterochromatin formation are typically induced by repressors PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

concentration. The constant nucleation of silencing proteins is necessary for the establishment of steady-state concentration profiles of silencing proteins around the nucleation sites (Figure S1). Silencing proteins and their cofactors spread along the chromosome, whereby nonspecific protein DNA interactions can facilitate their sliding, a process described by one-dimensional diffusion [18,27–30]. The diffusivity, D(x, c), itself is a variable because the silencing proteins, in particular Sir3, can bridge neighboring DNA segments and condense the chromatin in a concentration-dependent manner, leading the heterochromatin formation [27]. Consequently, the superimposed concentration gradient becomes steeper, accelerating the flux of silencing proteins. Thus, D(x, c) was approximated by DAc, so that the L Lc DA c . This non-Fickian diffusion term was expressed as Lx Lx diffusion term arises in models where diffusional clustering or condensation of particles is described [31,32]. The reaction term represents an autocatalytic loop based on processes encompassing the cooperative binding of Sir3p and Sir4p, mutual binding of Sir2p, Sir3p, and Sir4p, deacetylation of chromatin by Sir2p creating higher affinity sites for Sir3p and Sir4p, and polymerization of Sir3p proteins [18,27,33–36].

Author Summary In the simplest scenario, a gene is expressed when an activator protein binds to its regulatory sequence, and is silenced when the regulatory sequence is bound by a repressor. Many genes are regulated by both activators and repressors, with the response determined by the combined influence of both factors. When the response is monostable graded, expression is finely tuned to a level that reflects the proportion of the bound activator to the bound repressor. Monostable graded systems allow cells to respond precisely to stimuli. If the response is bistable, the response of each cell depends on whether the activator or the repressor wins. Bistable regulation results in the same gene being expressed in some cells and silenced in others, an outcome that promotes cellular differentiation. It remains unclear, however, how different genetic regulatory structures code for monostable graded and bistable responses. We modeled mathematically the behavior of repressors as they bind to and spread their inhibitory effect along genes and found that the spatial distribution of the binding sites determines which response is chosen. If repressors bind both upstream and downstream of the coding sequence, the response is bistable. If they bind only to one side of the coding sequence, the response is monostable. We confirmed our theoretical findings using synthetic genetic constructs in yeast. These findings help to explain how variations in the location of regulatory elements can lead to cellular differentiation and adaption to varying environments.

rðcÞ~L

It is assumed that the autocatalytic association of the silencing proteins is superimposed onto a basal, nonspecific association, occurring at a rate of b. The former is represented by a Hill function, where L stands for the maximal association rate in the limit of c R ‘. The dissociation of the silencing proteins is a linear process, and occurs at a rate of kd. Initial conditions with uniformly distributed low and high concentrations were used to reflect biochemical fluctuations in the initial accumulation of the silencing proteins (Figure 1D). The simulation of the reaction–diffusion model (Equation 1) revealed that when two silencing nucleation sites were positioned into sufficient proximity, the two initial conditions gave rise to two distinct solutions representing two concentration profiles (Figures 1D and S2). The lowconcentration profile was composed of two isolated gradients around the silencing nucleation sites. The high-concentration profile represented a synergistic interaction of the two nucleation sites (Figure 1E).

Results Bistable Synergistic Interaction of Silencing Gradients Silencing is efficiently induced when multiple silencers interact [14]. To mimic this architecture, we inserted binding sites for the silencing protein Sir3p (in the form of a fusion protein) both downstream and upstream of a gene reporter construct, in the model organism Saccharomyces cerevisiae. When recruited to these dual recruitment constructs, Sir3p evoked a variegated GFP expression at intermediate levels of gene activation (GA) with a bimodal distribution of cellular fluorescence (Figure 1A and 1B). When GA was enhanced, all of the cells switched from the OFF to the ON expression state; so that the ON state was affected only by a residual repression (Figure 1A). Thus, a small change in the input generated a large change in the output. The ON and OFF cell populations represent a simple form of cellular differentiation. To understand the principles of this form of differentiation, we built a mathematical model based on realistic molecular processes. Due to the complexity and incomplete description of these processes, we sought to identify key mechanisms that can account for bistability in the dual recruitment constructs. The changes in the concentration of the silencing protein at a given point of the space-time, c(x, t), are governed by source s(x), reaction r(c), and nonlinear diffusion terms (Figure 1C, Table S1, and Text S1). Lc L Lc ~rðcÞzsðxÞz Dðx,cÞ Lt Lx Lx

Stability Diagram of Gene Expression as a Function of Transcriptional Activation The coexistence of two concentration profiles for the same parameter values is in accord with the co-occurrence of ON and OFF cells at intermediate GA (Figures 1A, 1E, and S3). For a more detailed analysis of bistability, the gene expression has to be calculated from the concentration profiles. Gene expression is determined jointly by transcriptional activation and silencing. Quantitatively, gene expression is defined as the product of GA and fold inhibition due to silencing (see also Materials and Methods). Transcriptional activators not only induce gene expression, but also reduce the spreading of silencing proteins because activators recruit enzymes that relax the structure of chromatin, diminishing the slope of the superimposed concentration gradient [37]. Furthermore, the recruited histone acetyltransferases decrease the number of the available highaffinity binding sites for the silencing proteins [18,33]. Therefore, the diffusivity was set to be inversely proportional to GA, DA = D0˙ KGA/(KGA + GA). Fold inhibition was equated with the concentration of silencing proteins at the gene regulatory region,

ð1Þ

The nucleation term, s(x), represents the recruitment of the silencing proteins, and it is a rectangular function. Its width, sw, is proportional to the number of tet operators, while the height of the rectangle, sh, is proportional to the amount of the silencing proteins recruited to the operators. Thus, sh is a function of the doxycycline PLoS Biology | www.plosbiology.org

cn {kd czb Kzcn

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

Figure 1. Reaction窶電iffusion model of bistable repression. (A) In the dual recruitment construct, tetR-Sir3p, denoted as R, binds to the [tetO]2 and [tetO]4 operators upstream and downstream of the reporter gene, respectively, in the absence of doxycycline. Repression is relieved after addition of d = 2 mM doxycycline, which dissociates tetR from the tet operators. Gene expression is activated by the estradiol (e)-inducible GEV, denoted as A. The fluorescence value represents the mean of the fitted Gaussian distribution of the cell fluorescence. The area of the circle reflects the proportions of the ON and OFF cells when the distribution was bimodal. (B) Fluorescence and DIC merged images of cells expressing [tetO]2-GFPT-YFP-[tetO]4 regulated by tetR-Sir3p. Cells were induced by e = 11 nM in the absence of doxycycline. (C) The steps involved in the reaction窶電iffusion model (from top to bottom): nucleation, autocatalytic recruitment, and nonlinear diffusion. The S-shaped distortion of the DNA symbolizes the aggregation of the silencing proteins. (D) Evolution of the simulated concentration distributions of silencing proteins along a DNA segment nucleated at two sites. The top and bottom panels show the convergence of the profiles to the steady state representing the low and high silencing states, respectively. The corresponding initial conditions were c(x, 0) = 2 and 4. The following parameters were used for Equation 1: L = 5, K = 7, n = 2, kd = 1, b = 0.01, and DA = 1, sh = 4, and sw = 0.057. The internucleation distance was 1.2 kb. (E) The low (gray continuous line) and the high (red dashed line) concentration profiles represent the long-term solution (200 time units after the initiation) of the model as specified in (D) to reflect the steady state. The blue lines denote the nucleation sites. (F) The two solutions overlap when silencing was nucleated at a single site, calculated as in (E), indicating that the solution is monostable (gray-red dashed line). doi:10.1371/journal.pbio.1000332.g001

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

where xu and xd correspond to the positions 20.38 kb and 0 kb, respectively (Figure 2A). When DA was high due to the weak GA, simulations initiated with both conditions converged to the synergistically interacting

assuming a linear relation between them. Since repression from the upstream and downstream sites interact multiplicatively [38]: fold inhibition{1~ðcðxu Þz1Þðcðxd Þz1Þ{1

ð2Þ

Figure 2. Prediction of gene expression based on the concentration profiles of silencing proteins. The values of the parameters are given in Table S1, unless otherwise indicated. (A) Inhibition of gene expression, expressed as fold inhibition 2 1, was calculated from the values of the silencing concentration gradient at the positions xu = 20.38 and xd = 0 kb (yellow dots), which span the transcriptional regulatory region of the gene (Equation 2). The upstream point, xu, corresponds approximately to the region of the activator binding sites while the downstream point, xd, corresponds to the transcriptional initiation site. These points were chosen as plausible sites of action of silencing proteins. The silencing nucleation sites are positioned at 20.6 and 0.6 kb in the dual nucleation setting. (B) The upwards and downwards arrows represent the solutions initiated with low (c(x, 0) #2) and high (c(x, 0) $4) starting concentrations, for the [O]2-Gene-[O]4 setting. When the solutions converge, the two arrows merge into an arrow with two arrowheads (monostable region). Double arrows represent weighed mean values of the two solutions to reflect the population average in the bistable region. The red and blue arrows represent solutions with DA = D0 ˙ KGA / (KGA + GA) and DA = D0 ˙ KGA / 1.36 ˙ (KGA + GA), respectively. The reduction of diffusivity for the blue arrows reflects the effect of the transcriptional activators bound to the downstream sites that do not contribute to GA. (C) GA reflects the ratio of expression at the applied estradiol concentration to that at maximal induction (200 nM estradiol), in the absence of repression (d = 2 mM). Fold inhibition 2 1 at the applied estradiol concentration reflects the change in gene expression when the repressor binds to the recruitment site (see Materials and Methods). Fold inhibition 2 1 was measured for the [tetO]2-GFP-[tetO]4 (red symbols) and the [tetO]2-GFP-GALUAS-[tetO]4 (blue symbols) constructs when the fluorescence distributions were unimodal (o) or bimodal (‘). The insertion of the GALUAS did not increase the maximal expression of the construct relative to the control constructs (unpublished data). (D) Calculations performed for the Gene-[O]4 setting as in (B). (E) Fold inhibition 2 1 was measured for the GFP-[tetO]4 (red symbols) and the GFP-GALUAS [tetO]4 (blue symbols) constructs, as in (B). doi:10.1371/journal.pbio.1000332.g002

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

segment (Figure 1E). To test this lateral amplification, we compared the inhibition of gene expression when Sir3p was recruited downstream of GFP either to a single site or to two sites separated by a 1-kb-long transcription unit, expressing Cherry (Figure 3A). Indeed, the efficiency of inhibition was stronger by a factor of three for the dual recruitment construct in comparison to the single recruitment construct (Figure 3B), suggesting that the model adequately describes the shape of the gradient. The lateral amplification is predicted to be stronger when DA is high (compare Figures 1E, 1F, and S5). The detection of lateral amplification in the convergent transcription constructs (Figure 3A) may have been facilitated by the presence of two terminators separating the GFP and Cherry genes, because silencing, and possibly the spreading of silencing proteins, can be enhanced by transcriptional terminators [38,39].

concentration profiles. Correspondingly, gene expression was inhibited strongly. In contrast, the inhibition was weak when GA was strong (Figure 2B). At intermediate activation, the strongly and weakly inhibited states co-occurred. In summary, increasing GA is accompanied by a transition from the monostable OFF to the monostable ON state through a bistable region, creating a characteristic bifurcation diagram (stability within the mono- and bistable terms refers to the number of steady states) (Figure 2B). The bifurcation diagram was in accordance with the transitions observed for the silenced gene expression as the GA was varied experimentally, recapitulating a classical binary response (Figures 1A and 2C). The model can be validated when further activator binding sites are inserted between the two silencing nucleation sites in a way that they do not contribute to gene expression (Figure 2C). In this case, the model predicted that the bifurcation diagram would not change qualitatively; only the respective stability regions would shift toward the lower GA levels since the diffusion of silencing proteins is further diminished (Figure 2B). We tested this prediction by inserting activator binding sites between the terminator of the reporter gene and the tet operators, where they do not activate gene expression (Figure 2C). Indeed, bimodal expression was observed for a lower range of GA (Figures 2C and S4). In the above model, the reduction of DA between the silencing nucleation sites was spatially uniform. We compared this simple model with a more complex one, in which the reduction of DA was more pronounced in the proximity of the activator binding sites. The solutions of the two models were in qualitative agreement (Figure S5).

Critical Nucleation Lengths Are Required for Synergistic Bistable Response Bistable systems can undergo bifurcations with respect to multiple parameters. Therefore, we explored the stability of predicted gene expression as a function of the width of the nucleation sites. The above simulations represented systems with two operators upstream and four operators downstream of the reporter gene (Figure 2B). When the width of the downstream nucleation site was halved, the bistable response persisted: the synergistic monostable, the bistable, and the low monostable concentration profiles alternated as gene expression increased (Figure 4A). Indeed, the experiments utilizing the [tetO]2-GFP[tetO]2 construct evidenced the bimodal gene expression at intermediate GA and strong average repression (Figure 4C). When the width of both nucleation segments was halved relative to the previous setting, bistability collapsed, and only the lowconcentration profiles were seen over the entire range of GA (Figure 4B). In the corresponding experiments, the number of tet

Lateral Amplification of Silencing Gradients Whereas the predicted concentration gradient is strongly amplified between the two nucleation sites, a moderate amplification was also predicted for outside of the internucleation

Figure 3. Lateral amplification of silencing gradients. (A) The lateral amplification of silencing gradients can be read out with constructs, in which GFP expression is repressed either by a single downstream cluster of recruitment sites, [tetO]2, or by two downstream clusters of recruitment sites separated by a transcription unit, [tetO]2-Cherry-[tetO]2. (B and C) Fold inhibition 2 1 was measured for GFP expression for the GFP-[tetO]2 and the GFP [tetO]2-Cherry-[tetO]2 constructs. The ratio of the inhibition strengths (see Materials and Methods) of the dual recruitment constructs to that of the single recruitment constructs was 3.2 6 0.31 and 1.77 6 0.31 for Sir3p (B) and Sum1p (C), respectively. doi:10.1371/journal.pbio.1000332.g003

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

Figure 4. Stability of gene expression and inhibition strength as the function of the number and distribution of nucleation sites. (A and B) Concentration profiles calculated for the [O]2-Gene-[O]2 (A) and [O]1-Gene-[O]1 (B) settings. The red dashed and gray continuous lines represent the solutions initiated with the two initial conditions. The two solutions overlap when GA is either weak or strong (thin and thick red-gray dashed lines). At intermediate GA, two distinct solutions evidenced the bistability (medium red dashed and gray dashed lines) for [O]2-Gene-[O]2. (C) Inhibition strength at single (upstream or downstream) and dual recruitment constructs. The inhibition strength is the average value for fold inhibition 2 1 in the [0.06, 0.6] interval of GA. The total number of tet operators is indicated for each dual recruitment construct [tetO]1-GFP-[tetO]1 (n = 2), [tetO]1-GFP-[tetO]2 (n = 3), [tetO]2-GFP-[tetO]2 (n = 4), and [tetO]2-GFP-[tetO]4 (n = 6). Empty symbols stand for constructs that display bimodal gene expression. doi:10.1371/journal.pbio.1000332.g004

Sum1-1p. This variant was identified in order to efficiently substitute Sir-dependent silencing, and it has a capability to induce pronounced heterochromatin formation [43,44]. Indeed, Sum1-1p displayed a stronger synergy than Sum1p (Figure S7), and bimodal expression was observed even up to 16 h after induction (Figure S6). We examined whether Sir3p and Sum1p interacted with the native HML I silencer synergistically. The Sir proteins are recruited to both the E and I silencers, which flank the heterochromatic HML genes, whereas Sum1p is recruited to the E silencer only [40]. The I silencer alone did not have an inhibitory effect on gene expression (Figure S8) [42]. When the reporter gene was flanked by an upstream I silencer and by downstream tet operators, both tetR-Sir3p and tetR-Sum1p induced bimodal gene expression at intermediate GA (Figures 5D, 5E, and S9). When the reporter gene was lengthened in the dual recruitment constructs, the synergistic and bistable inhibition of gene expression by Sum1p was abolished (Figure S10). This confirms that in addition to the critical nucleation strength, the two nucleation sites have to be within a critical distance to generate synergistic interaction of the silencing gradients (Figure S5). In summary, we observed similar responses for four different combinations of silencers and repressor proteins (Figures 2B, 3, and 5), suggesting that they follow the same regulatory principle

operators was reduced. The resulting [tetO]1-GFP-[tetO]1 construct displayed weak silencing and monostable gene expression (Figure 4C), confirming that synergistic interaction of gradients occurs only when the nucleation widths reach a certain threshold.

The Bistable Response Is Conserved for Repressors Exhibiting Long-Range Synergy A model of a biological dynamical system can be corroborated by replacing a network component with a functionally similar component. For this purpose, we tested the Sum1p repressor that binds to the E silencer of the HML heterochromatic locus and contributes to gene silencing [40]. Its cofactor, Hst1p, is a homolog of the silencing protein Sir2p [41]. When Sum1p was recruited as a tetR-Sum1p fusion protein to tet operators, it inhibited expression of GFP, independently of whether the tet operators were positioned upstream or downstream of the reporter gene (Figure 5A). When bound to both of these sites, Sum1p inhibited gene expression in a strong, synergistic way (Figure 5A). The synergistic interaction over long distance is a phenomenon typical of silencers and repressors acting at heterochromatic loci [14,42]. At intermediate GA, expression of GFP was bimodal (Figure 5C), similar to the observations with Sir3p. The bimodal expression was observed up to 8 h after induction of gene expression (Figure S6). We also examined a well-characterized mutant form of Sum1p, PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

Figure 5. Repression by Sum1p displays long-range synergy and evokes bimodal gene expression in the dual recruitment constructs. The symbols in the fold inhibition plots correspond to those used in Figure 2. (A) tetR-Sum1p was recruited to [tetO]2-GFP, GFP-[tetO]4, and [tetO]2-GFP-[tetO]4 constructs. The gray dashed line represents calculated multiplicative interaction of repression from upstream and downstream sites. Fold inhibition 2 1 at GA = 0.2 was 4.8 times higher for the dual recruitment construct in comparison to the multiplicative effect, confirming a strong synergy. (B) Calculations performed for the [O]2-Gene, Gene-[O]4, and [O]2-Gene-[O]4 setting as in Figure 2B. (C) tetR-Sum1p is recruited to the dual recruitment construct [tetO]2-GFP-[tetO]4. The fluorescence value represents the mean of the fitted Gaussian distribution of the cell fluorescence. The area of the circle reflects the proportions of the ON and OFF cells when the distribution was bimodal. (D) The inhibition strength at the I silencer-GFP-[tetO]4 constructs was 5.91 6 0.91 and 7.34 6 2.37 times higher for tetR-Sum1p and -Sir3p, respectively, than that at the parent GFP-[tetO]4 constructs. (E) Cellular fluorescence distributions due to the expression of the I silencer-GFP-[tetO]4 construct repressed by tetRSum1p. Dots are experimental data obtained after adaptive binning, while the lines are fits using two Gaussian distributions. The cells were induced by 1.5, 5.8, 8, 11, and 200 nM estradiol (denoted by black, blue, green, orange, and red colors, respectively), d = 0 mM. AU, arbitrary units. doi:10.1371/journal.pbio.1000332.g005

in a graded way to the binding of Sir3p to upstream regions of promoters containing up to seven operators (Figures 4C and 6D). Monostable graded response was also observed for the entire range of GA when tetR-Sum1p and tetR-Sir3p bound to four sites downstream of reporter genes (Figures 2E and 5A). The insertion of activator binding sites in-between the terminator of the reporter gene and downstream operators alleviated the inhibition of gene expression (Figure 2D and 2E), similar to the case for the dual recruitment constructs (Figure 2B and 2C). None of the above single recruitment constructs with operators clustered to a single chromosomal segment displayed bimodal gene expression. However, they all inhibited gene expression less than the dual recruitment constructs displaying synergistic inhibition of gene expression (Figure 4C). Thus, we hypothesized that bistability was not observed because the inhibition strength did not reach a critical value. In other words, the possibility cannot be excluded

that associates the synergistic interaction of repressors over large distances with bistability (Figure 5B).

Synergistic Repressors Generate Monostable Graded Response When Their Binding Sites Are Clustered in a Single Chromosomal Segment Surprisingly, when the silencing proteins were nucleated at a single segment, only one solution emerged using the same parameter values that generated bistability with the dual nucleation setting (Figure 1F and S2). This gradient generated by the single nucleation site was identical with the nonsynergistic solution of the dual nucleation setting (Figure 1E and 1F). Even when the single nucleation segment was broadened, the concentration profiles rose, but they remained monostable over the entire range of GA (Figure 6Aâ&#x20AC;&#x201C;6C). Indeed, expression was monostable and responded PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

Figure 6. A single cluster of silencing nucleation sites generates a graded, monostable response. (A) The concentration profiles were calculated when GA was set to 0.022 and silencing was nucleated at 20.6 kb. The nucleation site comprised one, two, four, and seven operators. The blue lines denote the width of the [O]7 nucleation site. The gray continuous and red dashed lines represent the simulated solutions initiated with low and high concentrations, c(x, 0), respectively. When they overlap, the system is monostable (red-gray dashed lines). (B) The concentration profiles were calculated for [O]7 as in (A), but GA was varied. (C) Gene expression was calculated from (B) by setting the maximal value of unrepressed gene expression to 1 (see Materials and Methods), so that the black, blue, and red lines correspond to a GA of 0.01, 0.15, and 0.43, respectively. A lognormal distribution was assigned to each calculated mean value. (D) Cellular fluorescence distributions due to the expression of [tetO]7-GFP, repressed by tetR-Sir3p (YSSD227.4) The cells were induced by 2.9, 5.7, 11, 22, 32, and 200 nM estradiol, in the absence of doxycycline. doi:10.1371/journal.pbio.1000332.g006

structs, whereas binary expression was found when two nucleation sites flanked a gene. The OFF and ON cells reflect the effect of the synergistically interacting and isolated silencing gradients, respectively (Figures 1E, 1F, and 4A). Thus, the ON cells are inhibited to a degree comparable to the repression of single nucleation constructs when GA is strong (Figure 5Aâ&#x20AC;&#x201C;5C), whereas the OFF cells are inhibited synergistically. A further exploration of the model revealed a high degree of plasticity of system behavior depending on the parameter values. In particular, the dual nucleation setting generated a graded response when the cooperativity of binding of silencing proteins was reduced (Figure S12). Furthermore, the single nucleation setting displayed bistability when the ratio of the diffusivity to the nucleation width was reduced. In the latter case, however, the silencing proteins did not propagate to long distances due to the low diffusivity, and consequently, they may have no or little impact on gene expression (Figure S13). It remains to be determined whether epigenetic silencing processes exist that assume such parameter values and display behaviors reproducing the above predictions.

that if silencing nucleated at a single cluster inhibited gene expression strongly enough, then the response would be binary. Therefore, we searched for single recruitment constructs with strong inhibitory potential. Fortuitously, when the tet operators were inserted between the activator binding sites and the TATA box, a strong inhibition of expression by both Sum1p and Sir3p was observed. In particular, Sum1p inhibited gene expression more strongly when bound to these intercalated operators in comparison to when Sum1p repressed gene expression synergistically in the dual recruitment constructs (Figure 7A). However, gene expression responded in a graded way over a broad range of activator and repressor binding when Sum1p or Sir3p bound to the intercalated operators (Figure 7B, 7C, and S11). In contrast, the dual recruitment constructs displayed bimodal gene expression when the binding of the activator and repressor was balanced (Figure 7C). The region of bistability was broader for Sir3p in comparison to Sum1p (Figure 7C), in accordance with the stronger synergistic repression and lateral amplification of the gradient by Sir3p (Figure 3B and 3C) [38]. Thus, our experiments confirmed the predictions of the reactionâ&#x20AC;&#x201C;diffusion model, revealing that the same mechanism can support both graded and binary gene expression depending on the spatial distribution of silencing nucleation sites. Monostable graded expression was characteristic of single nucleation conPLoS Biology | www.plosbiology.org

Discussion Eukaryotic transcriptional cis regulation governs developmental and differentiation programs [45]. Long-range interaction between 8

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

Figure 7. Graded responses can be generated by both Sum1p and Sir3p even when they strongly repress gene expression. (A) PGAL1tetO2 corresponds to the GAL1 promoter, in which the Mig1p binding sites, positioned between the GALUAS and the TATA box, were replaced by tet operators. Fold inhibition of gene expression of the respective constructs was obtained for unimodal (o) and bimodal (â&#x20AC;&#x2DC;) distributions. (B) Gaussians were fit to fluorescence distributions induced by 3.75, 7.5, 15, 30, and 200 nM estradiol, in the absence of doxycycline. AU, arbitrary units. (C) The means of the fitted Gaussians are color coded. When the distributions were bimodal, the squares were split into two triangles of different colors. The cells were induced by 3.75, 7.5, 15, 30, and 200 nM estradiol and 0, 10, 20, 40, 80,160, and 2,000 nM doxycycline. doi:10.1371/journal.pbio.1000332.g007

transcription factors makes the deciphering of the logic of this regulation difficult [16,17,46]. Whereas long-range interactions can occur even in prokaryotes through looping of the intervening DNA sequences, the long-range effects of eukaryotic activators (enhancers) and repressors (silencers) are often mediated by cofactors that spread along the chromatin, modifying its composition and conformation. Therefore, eukaryotic transcriptional cis regulation requires complex spatiotemporal models to understand its logic. PLoS Biology | www.plosbiology.org

We have devised a concise reactionâ&#x20AC;&#x201C;diffusion model that captures the important molecular aspects of long-range synergistic repression: autocatalytic recruitment of proteins and their spreading along the DNA that is accompanied by aggregation and condensation of chromatin. We presented a number of experimental tests that confirmed the model predictions. The central result of the model is that the response type depends on the distribution silencing nucleation sites. When two clusters of 9

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

nucleation sites flank a gene, the system is bistable. For the corresponding genetic constructs, stochastic gene expression with ON and OFF cells was observed. On the other hand, a monostable graded response was generated when silencing was nucleated at a single cluster even if it was relatively long. Both types of distributions of recruitment clusters for repressors and silencing proteins have been encountered in the genome. An increasing number of promoters have been identified that are dynamically regulated by a single group of binding sites for longrange repressors even within euchromatic regions [41,47,48]. In such cases, monostable graded expression is expected to be generated by repressors that follow the regulatory mechanisms we identified. On the other hand, the synergistic interaction of two or more silencers scattered through telomeric and subtelomeric regions is thought to be required for efficient heterochromatin formation in a broad range of organisms, including yeasts and the mammalian X chromosome [14]. The identification of such silencers is hampered by the fact that in isolation, they lose their silencing capability or may even activate gene expression, so a large number of protosilencers may be hidden in the genome [14]. Genes flanked by two or more silencers are expected to display a stochastic binary expression. Indeed, genes positioned to subtelomeric domains frequently display bimodal and stochastic gene expression in response to environmental stimuli [20,21,49]. For example, cell adhesion proteins are localized to subtelomeric domains and are expressed in a variegated way. This phenotypic diversity may enhance the survival and virulence of fungal cells [20,21]. Conversely, position-effect variegation, a phenomenon characterized by stochastic bimodal expression of a gene positioned to the silenced domains of the chromosome, can arise due to chromosomal aberrations and lead to developmental abnormalities and diseases [50â&#x20AC;&#x201C;52]. Interaction between multiple silencing gradients can also contribute to correlations in the stochastic fluctuations of expression of genes ordered along the chromosome [53,54]. Components or mechanisms employed in silencing are often conserved between yeast and higher organisms [33]. Long-range repression and heterochromatin formation can be efficiently reconstituted by tethering the appropriate proteins (or RNA) to

the chromosome in different organisms [19,34,55,56]. Therefore, well-defined genetic systems comparable to ours can be employed to examine if the regulatory logic we unveiled is evolutionarily conserved. Our results highlight a difference between signal transducers dissolved in the cell protoplasm and regulatory circuits anchored to the chromosome. Dissolved kinases or transcription factors produce either a monostable or bistable response in a single cell depending on whether they are constitutively regulated or embedded in feedback loops (Figure 8A). In contrast, the same long-range repressor can evoke a monostable graded response at one gene but can induce stochastic transitions between ON and OFF states at another gene (Figure 8B). The outcome is determined by the distribution and density of the recruitment sites of silencing proteins and activators. The dissolved cellular regulatory networks and the spatially inhomogeneously distributed chromosomal epigenetic circuits will jointly determine gene expression and stability of cellular differentiation states [54,57â&#x20AC;&#x201C;59]. Knowing the regulatory principles of the latter will certainly help to decipher their interaction and to understand how they shape cellular functioning.

Materials and Methods Strain Construction and Growth Conditions The expression of GFP from chromosomally integrated constructs was activated by GEV, an estradiol (e)-inducible transcriptional activator, when bound to the GALUAS, and was repressed by tetR fusion proteins (Tables S2 and S3). tetR dissociates from the tet operators in the presence of doxycycline (d), and repression was relieved at d = 2 mM. GEV is integrated into the genome into the MRP7 locus; having five copies in the resulting YSSH208. The plasmids containing the tetR-Sir3p and tetR-Sum1p constructs were integrated into the RET2 locus. The GFP reporter constructs were integrated into the YFR054c locus, unless otherwise specified. Cells containing inducible gene expression constructs were grown for 4 h after induction in minimal media, until a cell density of OD600 = 0.4â&#x20AC;&#x201C;0.8.

Figure 8. Control modes of dissolved and anchored regulatory circuits. (A) A regulator dissolved in the protoplasm under constitutive or autocatalytic control can trigger either a graded or binary response in a cell population. (B) A regulator anchored to the chromosome can trigger both graded and binary responses at different genes (black-green rectangles) within a single cell. doi:10.1371/journal.pbio.1000332.g008

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

the normalization of the coefficients a and b to the sum of 100, na + nb = 100. Thus, the sample sizes of M, A, and B were set empirically to correspond to percentages. Next, we performed statistical comparisons between the means of the metapopulations using two-sample t-test with unequal variance. When the difference, mM 2 m1 (m1 ,m2,) was significant (a = 1024), the distribution m(x) was considered bimodal.

Analysis of Mean Expression Values Cellular fluorescence Fe,d, was measured by flow cytometry. Total fluorescence of at least 5,000 cells was measured using flow cytometry. Five to 15% of the total cell population was selected in the forward-scatter versus side-scatter plot to measure GFP fluorescence of cells with similar size. GA is the uninhibited expression at a given estradiol concentration normalized by the maximally induced uninhibited expression (e = 0.2 mM, d = 2 mM). GAe,2 ~

Supporting Information Figure S1 Simulated evolution of concentration of silencing proteins in the absence of persistent nucleation, sh = 0. An initial pulse was provided in the form of c(x, 0) = 6 within the segment 20.6, x ,0.6 kb. DA = 0.64. The initiated accumulation of silencing proteins dissipates after around 15 time units, indicating that a constant source of silencing proteins is needed for the maintenance of concentration profiles in the range of parameter values used in our simulations. Found at: doi:10.1371/journal.pbio.1000332.s001 (0.12 MB TIF)

Fe,2 {FC F0:2,2 {FC

FC is the background fluorescence of the cells. Fold inhibition is the ratio of the unrepressed expression to the repressed expression (typically at d = 0), at a given degree of activation. FIe,d ~

Fe,2 {FC Fe,d {FC

Figure S2 Simulated concentration distribution of silencing proteins along a DNA segment with coarse spatial discretization. To account for the compartmental nature of chromatin, we employed the method of finite difference to simulate the model (Equation 1). For the Euler discretization of space and time, the space steps were sized according to the length of the nucleosome (0.16 kb) to ensure the numerical stability of the procedure, the time step was considerably smaller than the space step. The simulation ran to reach 200 time units, similar to the simulations employing the FEM. The concentration profiles are comparable to those in Figure 1E and 1F, using the same kinetic parameters, except for D0 = 0.5, sh = 4; sw had to be extended to 0.16 kb, because this is the minimal nucleation width using the coarse spatial discretization. The steady-state concentration profiles were obtained by extending the data points to lines (as with the zero-order hold procedure) to better illustrate the coarseness of the space resolution. Found at: doi:10.1371/journal.pbio.1000332.s002 (0.19 MB TIF)

Thus, normalized gene expression is the product of GA and fold inhibition at given concentrations of estradiol and doxycycline. Typically, the OFF cells had fluorescence levels very close to the cellular fluorescence background, which implies that the values of fold inhibition 2 1 calculated for the OFF cells after histogram fitting are associated with large measurement errors. For this reason, we calculated fold inhibition 2 1 for the entire cell population, which has a higher fluorescence value. The inhibition strength is the average value for fold inhibition 2 1 in the interval GA = [0.06, 0.6]. Error bars represent standard deviations calculated from three experiments.

Histogram Fitting and Bimodality Detection The logarithmic cellular fluorescence intensities of more than 30,000 cells were extracted from list mode files. The data were subjected to an adaptive binning algorithm [60] to determine the number of bins, and hence, a sampled function of the distribution. A mixture of two Gaussians (Equation 3) was then fitted to each discrete distribution using nonlinear regression.

x{m 2 2s22

Figure S3 Parameter dependence of the switch-like transition. The surface represents the bistable region, which separates the ON and OFF expression states. L, K, and n were varied in the range [0.5, 10], [0.5, 10], and [1,3], respectively, with steps of 0.5 units each. The rest of the parameters were kept constant at the same values as used for the dual nucleation model in Figure 1E. Two long-term solutions were calculated, using the low and high initial conditions, to determine the occurrence of bistability. The surface was extrapolated from the points corresponding to parameter triplets (L, K, n) that give rise to bistability. Note, that for n = 1 (lack of cooperativity), bistability did not occur. Found at: doi:10.1371/journal.pbio.1000332.s003 (0.24 MB TIF)

ðx{m1 Þ 2 ae 2s1 be mðxÞ~ pffiffiffiffiffiffi z pffiffiffiffiffiffi 2ps1 2ps2

ð3Þ

Finally, the data were transformed from the log space into the linear space. To systematically detect bimodality in a distribution, we performed the following procedure. The fluorescence distribution was first normalized to a mean of zero, mM = 0, and standard deviation of 1, sM = 1, and then subjected to binning and regression, as previously described. Subsequently, we considered three metapopulations for the further analysis. The first metapopulation corresponded to the measured events (M), with mM = 0 and sM = 1, since the distribution had been normalized. The population size was normalized to 100. The two remaining metapopulations, denoted A and B, represented the two fitted Gaussian components (Equation 3) with the mean and variance parameters (mi, si2) resulting from the nonlinear regression, whereas the respective population sizes na and nb resulted from PLoS Biology | www.plosbiology.org

Figure S4 Cellular fluorescence distributions due to the expression of the [tetO]2-GFP-GALUAS-[tetO]4 construct repressed by tetR-Sir3. The cells (PRY524.1) were induced by 2.1, 4.1, 8, 16, and 200 nM estradiol in the absence of doxycycline. Found at: doi:10.1371/journal.pbio.1000332.s004 (0.19 MB TIF) Figure S5 Comparison of the concentration profiles with uniform and nonuniform diffusivities within the [O]2-Gene-[O]4 setting. GA reduces the spreading of the silencing proteins, which can be mediated by histone acetylation, and by the activator-induced transcription that disrupts heterochromatin. The former process is expected to reduce diffusivity around the activator binding sites, whereas the latter reduces 11

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

diffusivity along the entire gene. In the main simulations, the diffusion coefficient was reduced uniformly in the segment flanked by the nucleation sites to imitate reduction of diffusivity along the entire gene (see also [A, C, and E]). For comparison, we simulated concentration profiles when the diffusivity was reduced nonuniformly, around the activator binding sites (B, D, and F). The results are comparable using the two approaches. (A) DA was reduced uniformly as GA was increased in-between the nucleation sites, whereas outside of this region, D0 = 0.64. Curves represent the functions DA = 0.52, 0.36, and 0.24. (B) The nonuniform distribution is given by DA(x) = D0N(1+fsN(m, s2))21 where N (m,s2) denotes the Gaussian distribution with mean m and variance s2. m was set to 20.38 kb, which corresponds to the activator binding site, while s equals the internucleation distance divided by four. D0 = 0.64. GA was increased by setting f to 1.5, 6, and 12. (C and D) The red dashed and gray continuous lines represent the solutions initiated with low and high starting concentrations. The internucleation distance was 1.2 kb. (E and F) Simulations as performed in (C) and (D), but the internucleation distance was increased to 1.5 kb. Consequently, the synergistic interaction between the two gradients was abolished. Found at: doi:10.1371/journal.pbio.1000332.s005 (0.55 MB TIF)

[GFP]2, GFP-T-YFP, GFP-T-lacZ integrated within the respective strains: YJKD-16, 23.4, 23.5, 23.6). The relative inhibition denotes the inhibition strength (see Materials and Methods) of the dual recruitment constructs normalized using the [tetO]2-GFP construct. The inhibition strength is the average value of the fold inhibition 2 1 interpolated on the interval GA = [0.06, 0.6]. Error bars represent standard deviations calculated from three experiments. (B and C) Cellular fluorescence distributions due to the expression of the [tetO]2-[GFP]2-[tetO]4 (B) and [tetO]2-GFP-TlacZ-[tetO]4 (C) constructs repressed by Sum1p. The cells were induced by 0, 4.1, 5.8, 16, and 200 nM estradiol, in the absence of doxycycline. No bimodal response was detected for the [tetO]2GFP-T-lacZ-[tetO]4 construct. Found at: doi:10.1371/journal.pbio.1000332.s010 (0.36 MB TIF) Cellular fluorescence distributions when expression is repressed by Sum1p. The cells (YJKD21.2.2 and YJK16) were induced by 3.75, 7.5, 15, 30, and 200 nM estradiol, in the absence of doxycycline. Found at: doi:10.1371/journal.pbio.1000332.s011 (0.24 MB TIF)

Figure S11

Monostable concentration profiles arise when cooperativity in the positive feedback loop is small. The Hill coefficient was reduced from 2 to n = 1.5. The following parameters were used for the simulations: sh = 6, L = 5, K = 5, b = 0.01, and kd = 1. The internucleation distance was 1.2 kb for the [O]2-Gene-[O]2 setting. (A) The red dashed and gray continuous lines represent the solutions initiated with low and high starting concentrations. The blue lines delimit the nucleation sites. When the two concentration profiles overlap redâ&#x20AC;&#x201C;gray dashed lines are visible. Monostable concentration profiles were obtained even at intermediate GA. (B) Inhibition of gene expression, expressed as fold inhibition 2 1, was calculated from the values of the silencing concentration gradients. Even though there is no bistability at intermediate GA, a sigmoidal change in fold inhibition can be seen in this range. Found at: doi:10.1371/journal.pbio.1000332.s012 (0.25 MB TIF) Figure S12

Long-term changes in the cellular fluorescence distributions due to the expression of the [tetO]2GFP-[tetO]4 construct repressed by Sum1p or Sum11p. The cells were induced by 0, 8, 11.3, 22, and 200 nM estradiol (denoted by black, blue, green, orange, and red colors, respectively), in the absence of doxycycline. Cells were grown exponentially for the period (8 h or 16 h) indicated. Bimodal expression can be seen 16 h after induction by 11.3 nM estradiol due to silencing by Sum1-1p. Found at: doi:10.1371/journal.pbio.1000332.s006 (0.45 MB TIF) Figure S6

Figure S7 Synergy of repression by Sum1-1p.

Sum1-1p is the T988I mutant form of Sum1p. tetR-Sum1-1p was recruited to [tetO]2-GFP (DHS43), GFP-[tetO]4 (DHS44), and [tetO]2GFP-[tetO]4 (DHS45) constructs. The gray dashed line represents calculated multiplicative interaction of repression from upstream and downstream sites. Fold inhibition 2 1 at GA = 0.2 was 13.1 times higher for the dual recruitment construct in comparison to the multiplicative effect, indicating a very strong synergy (see also Figure 5A). Found at: doi:10.1371/journal.pbio.1000332.s007 (0.16 MB TIF)

Figure S13 Bistable concentration profiles are confined to the proximity of the nucleating segment when diffusivity is low relative to the nucleation width. The following parameters were used for the simulations: sh = 0.3, L = 5, K = 7, b = 0.01, and kd = 1 for a [O]20-Gene setting. DA was set to the indicated values uniformly between the boundaries of the simulation. The blue lines delimit the nucleation segment, sw = 0.741 kb. The widening of the nucleation segment and reduction of the diffusivity renders the spatial aspect of the reactionâ&#x20AC;&#x201C;diffusion system less pronounced. Consequently, the behavior of the systems approximates that of a simple (nonspatial) positive feedback loop that generates bistability. The yellow dots denote the concentrations at 20.38 and 0 kb, which determine the level of GA. (A and B) The red dashed and gray continuous lines represent the solutions initiated with low and high starting concentrations with DA = 0.2 (A) and 0.6 (B). Bistable solution is obtained for lower diffusivity, DA = 0.2. It is evident that the silencing proteins do not propagate to long distances relative to the width of the nucleation segment and the concentrations of the silencing proteins at the gene regulatory region (yellow dots) are low even for the high-concentration profile. Thus, they have an effect on gene expression only in the vicinity of the nucleating segment. (C) The magnified version of the low-concentration profiles is displayed for DA = 0.2 (thin line) and 0.6 (thick line). It is evident that the concentration profile obtained for the lower diffusivity is more square-like. Found at: doi:10.1371/journal.pbio.1000332.s013 (0.29 MB TIF)

Figure S8 The I silencer alone does not repress the reporter gene. The expression induced by GEV at the I silencer-GFP-[tetO]4 construct (PRY544.1, 2545.1) was not lower than that at the GFP-tetO4 construct (YJK15), in nonrepressive conditions (tetR-Sum1p and tetR-Sir3p do not repress expression in the presence of 2 mM doxycycline). Thus, the I silencer alone does not repress the reporter gene; it has rather a weak activatory potential. Found at: doi:10.1371/journal.pbio.1000332.s008 (0.23 MB TIF) Figure S9 Cellular fluorescence distributions due to the expression of the I-silencer-GFP-tetO construct repressed by tetR-Sir3. The cells (PRY544.1) were induced by 1.5, 5.8, 8, 11, and 200 nM estradiol, in the absence of doxycycline. Found at: doi:10.1371/journal.pbio.1000332.s009 (0.19 MB TIF) Figure S10 Collapse of bimodal expression as the distance between the recruitment sites for tetR-Sum1 is increased. (A) Sum1p was recruited to the dual recruitment constructs enclosing reporter genes of varying lengths (GFP, PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

Table S1 Constants used in the equations. Found at: doi:10.1371/journal.pbio.1000332.s014 (0.04 MB DOC)

Acknowledgments We thank Melanie Anding for technical help, Walter Schaffner for helpful discussion, and Bernhard Dichtl for comments on the manuscript.

Table S2 Strains. Found at: doi:10.1371/journal.pbio.1000332.s015 (0.06 MB DOC)

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: AB. Performed the experiments: JZK PR. Analyzed the data: JZK AB. Contributed reagents/materials/analysis tools: SS. Wrote the paper: JZK AB.

Table S3 Plasmids.

Found at: doi:10.1371/journal.pbio.1000332.s016 (0.05 MB DOC) Text S1 Supporting text and references. Found at: doi:10.1371/journal.pbio.1000332.s017 (0.05 MB DOC)

References 1. Nevozhay D, Adams RM, Murphy KF, Josic K, Balazsi G (2009) Negative autoregulation linearizes the dose-response and suppresses the heterogeneity of gene expression. Proc Natl Acad Sci U S A 106: 5123–5128. 2. Takahashi S, Pryciak PM (2008) Membrane localization of scaffold proteins promotes graded signaling in the yeast MAP kinase cascade. Curr Biol 18: 1184–1191. 3. Ferrell JE, Jr., Bhatt RR (1997) Mechanistic studies of the dual phosphorylation of mitogen-activated protein kinase. J Biol Chem 272: 19008–19016. 4. Blake WJ, Balazsi G, Kohanski MA, Isaacs FJ, Murphy KF, et al. (2006) Phenotypic consequences of promoter-mediated transcriptional noise. Mol Cell 24: 853–865. 5. Paliwal S, Iglesias PA, Campbell K, Hilioti Z, Groisman A, et al. (2007) MAPKmediated bimodal gene expression and adaptive gradient sensing in yeast. Nature 446: 46–51. 6. Kim SY, Ferrell JE, Jr. (2007) Substrate competition as a source of ultrasensitivity in the inactivation of Wee1. Cell 128: 1133–1145. 7. Burnett JC, Miller-Jensen K, Shah PS, Arkin AP, Schaffer DV (2009) Control of stochastic gene expression by host factors at the HIV promoter. PLoS Pathog 5: e1000260. doi:10.1371/journal.ppat.1000260. 8. Ansel J, Bottin H, Rodriguez-Beltran C, Damon C, Nagarajan M, et al. (2008) Cell-to-cell stochastic variation in gene expression is a complex genetic trait. PLoS Genet 4: e1000049. doi:10.1371/journal.pgen.1000049. 9. Kalmar T, Lim C, Hayward P, Munoz-Descalzo S, Nichols J, et al. (2009) Regulated fluctuations in Nanog expression mediate cell fate decisions in embryonic stem cells. PLoS Biol 7: e1000149. doi:10.1371/journal.pbio.1000149. 10. Muzzey D, van Oudenaarden A (2006) When it comes to decisions, myeloid progenitors crave positive feedback. Cell 126: 650–652. 11. Macarthur BD, Ma’ayan A, Lemischka IR (2009) Systems biology of stem cell fate and cellular reprogramming. Nat Rev Mol Cell Biol 10: 672–681. 12. Rice KL, Hormaeche I, Licht JD (2007) Epigenetic regulation of normal and malignant hematopoiesis. Oncogene 26: 6697–6714. 13. Hutchins AS, Mullen AC, Lee HW, Sykes KJ, High FA, et al. (2002) Gene silencing quantitatively controls the function of a developmental trans-activator. Mol Cell 10: 81–91. 14. Fourel G, Lebrun E, Gilson E (2002) Protosilencers as building blocks for heterochromatin. Bioessays 24: 828–835. 15. Tiwari VK, McGarvey KM, Licchesi JD, Ohm JE, Herman JG, et al. (2008) PcG proteins, DNA methylation, and gene repression by chromatin looping. PLoS Biol 6: e306. doi:10.1371/journal.pbio.0060306. 16. Martinez CA, Arnosti DN (2008) Spreading of a corepressor linked to action of long-range repressor hairy. Mol Cell Biol 28: 2792–2802. 17. Nibu Y, Zhang H, Levine M (2001) Local action of long-range repressors in the Drosophila embryo. EMBO J 20: 2246–2253. 18. Talbert PB, Henikoff S (2006) Spreading of silent chromatin: inaction at a distance. Nat Rev Genet 7: 793–803. 19. Rossi FM, Kringstein AM, Spicher A, Guicherit OM, Blau HM (2000) Transcriptional control: rheostat converted to on/off switch. Mol Cell 6: 723–728. 20. Halme A, Bumgarner S, Styles C, Fink GR (2004) Genetic and epigenetic regulation of the FLO gene family generates cell-surface variation in yeast. Cell 116: 405–415. 21. Domergue R, Castano I, De Las Penas A, Zupancic M, Lockatell V, et al. (2005) Nicotinic acid limitation regulates silencing of Candida adhesins during UTI. Science 308: 866–870. 22. Becskei A, Seraphin B, Serrano L (2001) Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion. EMBO J 20: 2528–2535. 23. Yeh BJ, Lim WA (2007) Synthetic biology: lessons from the history of synthetic organic chemistry. Nat Chem Biol 3: 521–525. 24. Buetti-Dinh A, Ungricht R, Kelemen JZ, Shetty C, Ratna P, et al. (2009) Control and signal processing by transcriptional interference. Mol Syst Biol 5: 300.

PLoS Biology | www.plosbiology.org

25. Greber D, Fussenegger M (2007) Mammalian synthetic biology: engineering of sophisticated gene networks. J Biotechnol 130: 329–345. 26. Tan C, Marguet P, You L (2009) Emergent bistability by a growth-modulating positive feedback circuit. Nat Chem Biol 5: 842–848. 27. Adkins NL, McBryant SJ, Johnson CN, Leidy JM, Woodcock CL, et al. (2009) Role of nucleic acid binding in Sir3p-dependent interactions with chromatin fibers. Biochemistry 48: 276–288. 28. Sedighi M, Sengupta AM (2007) Epigenetic chromatin silencing: bistability and front propagation. Phys Biol 4: 246–255. 29. Biebricher A, Wende W, Escude C, Pingoud A, Desbiolles P (2009) Tracking of single quantum dot labeled EcoRV sliding along DNA manipulated by double optical tweezers. Biophys J 96: L50–52. 30. McKinney K, Mattia M, Gottifredi V, Prives C (2004) p53 linear diffusion along DNA requires its C terminus. Mol Cell 16: 413–424. 31. Bodnar M, Velazquez JJL (2005) Derivation of macroscopic equations for individual cell-based models: a formal approach. Math Methods Appl Sci 28: 1757–1779. 32. Murray JD (2007) Mathematical biology: I. An introduction. New York (New York): Springer. 555 p. 33. Buhler M, Gasser SM (2009) Silent chromatin at the middle and ends: lessons from yeasts. EMBO J 28: 2149–2161. 34. Chou CC, Li YC, Gartenberg MR (2008) Bypassing Sir2 and O-acetyl-ADPribose in transcriptional silencing. Mol Cell 31: 650–659. 35. Fourel G, Magdinier F, Gilson E (2004) Insulator dynamics and the setting of chromatin domains. Bioessays 26: 523–532. 36. King DA, Hall BE, Iwamoto MA, Win KZ, Chang JF, et al. (2006) Domain structure and protein interactions of the silent information regulator Sir3 revealed by screening a nested deletion library of protein fragments. J Biol Chem 281: 20107–20119. 37. Fourel G, Boscheron C, Revardel E, Lebrun E, Hu YF, et al. (2001) An activation-independent role of transcription factors in insulator function. EMBO Rep 2: 124–132. 38. Ratna P, Scherrer S, Fleischli C, Becskei A (2009) Synergy of repression and silencing gradients along the chromosome. J Mol Biol 387: 826–839. 39. Vasiljeva L, Kim M, Terzi N, Soares LM, Buratowski S (2008) Transcription termination and RNA degradation contribute to silencing of RNA polymerase II transcription within heterochromatin. Mol Cell 29: 313–323. 40. Irlbacher H, Franke J, Manke T, Vingron M, Ehrenhofer-Murray AE (2005) Control of replication initiation and heterochromatin formation in Saccharomyces cerevisiae by a regulator of meiotic gene expression. Genes Dev 19: 1811–1822. 41. Xie J, Pierce M, Gailus-Durner V, Wagner M, Winter E, et al. (1999) Sum1 and Hst1 repress middle sporulation-specific gene expression during mitosis in Saccharomyces cerevisiae. EMBO J 18: 6448–6454. 42. Boscheron C, Maillet L, Marcand S, Tsai-Pflugfelder M, Gasser SM, et al. (1996) Cooperation at a distance between silencers and proto-silencers at the yeast HML locus. EMBO J 15: 2184–2195. 43. Klar AJ, Kakar SN, Ivy JM, Hicks JB, Livi GP, et al. (1985) SUM1, an apparent positive regulator of the cryptic mating-type loci in Saccharomyces cerevisiae. Genetics 111: 745–758. 44. Yu Q, Elizondo S, Bi X (2006) Structural analyses of Sum1-1p-dependent transcriptionally silent chromatin in Saccharomyces cerevisiae. J Mol Biol 356: 1082–1092. 45. Bolouri H (2008) Embryonic pattern formation without morphogens. Bioessays 30: 412–417. 46. Halfon MS (2006) (Re)modeling the transcriptional enhancer. Nat Genet 38: 1102–1103. 47. Zhang Y, Lin N, Carroll PM, Chan G, Guan B, et al. (2008) Epigenetic blocking of an enhancer region controls irradiation-induced proapoptotic gene expression in Drosophila embryos. Dev Cell 14: 481–493. 48. Schwartz YB, Pirrotta V (2008) Polycomb complexes and epigenetic states. Curr Opin Cell Biol 20: 266–273.

March 2010 | Volume 8 | Issue 3 | e1000332

Chromosomal Epigenetic Regulatory Circuits

55. Kagansky A, Folco HD, Almeida R, Pidoux AL, Boukaba A, et al. (2009) Synthetic heterochromatin bypasses RNAi and centromeric repeats to establish functional centromeres. Science 324: 1716–1719. 56. Buhler M, Verdel A, Moazed D (2006) Tethering RITS to a nascent transcript initiates RNAi- and heterochromatin-dependent gene silencing. Cell 125: 873–886. 57. Bruggeman FJ, Oancea I, van Driel R (2008) Exploring the behavior of small eukaryotic gene networks. J Theor Biol 252: 482–487. 58. Benecke A (2006) Chromatin code, local non-equilibrium dynamics, and the emergence of transcription regulatory programs. Eur Phys J E Soft Matter 19: 353–366. 59. Hnisz D, Schwarzmuller T, Kuchler K (2009) Transcriptional loops meet chromatin: a dual-layer network controls white-opaque switching in Candida albicans. Mol Microbiol 74: 1–15. 60. Shimazaki H, Shinomoto S (2007) A method for selecting the bin size of a time histogram. Neural Comput 19: 1503–1527.

49. Choi JK, Hwang S, Kim YJ (2008) Stochastic and regulatory role of chromatin silencing in genomic response to environmental changes. PLoS ONE 3: e3002. doi:10.1371/journal.pone.0003002. 50. Xu EY, Zawadzki KA, Broach JR (2006) Single-cell observations reveal intermediate transcriptional silencing states. Mol Cell 23: 219–229. 51. Rando OJ, Paulsson J (2006) Noisy silencing of chromatin. Dev Cell 11: 134–136. 52. Saveliev A, Everett C, Sharpe T, Webster Z, Festenstein R (2003) DNA triplet repeats mediate heterochromatin-protein-1-sensitive variegated gene silencing. Nature 422: 909–913. 53. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S (2006) Stochastic mRNA synthesis in mammalian cells. PLoS Biol 4: e309. doi:10.1371/ journal.pbio.0040309. 54. Yin S, Wang P, Deng W, Zheng H, Hu L, et al. (2009) Dosage compensation on the active X chromosome minimizes transcriptional noise of X-linked genes in mammals. Genome Biol 10: R74.

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000332

Widespread Gene Conversion in Centromere Cores Jinghua Shi1, Sarah E. Wolf1,2, John M. Burke1, Gernot G. Presting3, Jeffrey Ross-Ibarra4, R. Kelly Dawe1,2* 1 Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America, 2 Department of Genetics, University of Georgia, Athens, Georgia, United States of America, 3 Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, Hawaii, United States of America, 4 Department of Plant Sciences, University of California, Davis, California, United States of America

Abstract Centromeres are the most dynamic regions of the genome, yet they are typified by little or no crossing over, making it difficult to explain the origin of this diversity. To address this question, we developed a novel CENH3 ChIP display method that maps kinetochore footprints over transposon-rich areas of centromere cores. A high level of polymorphism made it possible to map a total of 238 within-centromere markers using maize recombinant inbred lines. Over half of the markers were shown to interact directly with kinetochores (CENH3) by chromatin immunoprecipitation. Although classical crossing over is fully suppressed across CENH3 domains, two gene conversion events (i.e., non-crossover marker exchanges) were identified in a mapping population. A population genetic analysis of 53 diverse inbreds suggests that historical gene conversion is widespread in maize centromeres, occurring at a rate .161025/marker/generation. We conclude that gene conversion accelerates centromere evolution by facilitating sequence exchange among chromosomes. Citation: Shi J, Wolf SE, Burke JM, Presting GG, Ross-Ibarra J, et al. (2010) Widespread Gene Conversion in Centromere Cores. PLoS Biol 8(3): e1000327. doi:10.1371/journal.pbio.1000327 Academic Editor: Harmit S. Malik, Fred Hutchinson Cancer Research Center, United States of America Received October 5, 2009; Accepted February 3, 2010; Published March 9, 2010 Copyright: Ă&#x; 2010 Shi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by grants from the National Science Foundation (0421671, 0421619, 0607123). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CENH3, Centromeric Histone H3; ChIP, chromatin immunoprecipitation; FISH, fluorescent in situ hybridization; IDPs, insertion-deletion polymorphisms; LD, linkage disequilibrium * E-mail: kelly@plantbio.uga.edu

[12,13,14]. Nevertheless it is not accurate to presume that centromeres never experience genetic exchange. Empirical studies have revealed evidence for recombination between sister centromeres [15,16], gene conversion events have been inferred from sequence analysis of mammalian centromeres [17,18,19], and large intrachromosomal rearrangements have been documented in rice centromeres [20,21]. However, despite the extensive circumstantial evidence for genetic exchange among centromeres, the frequency and nature of the recombination has been difficult to measure. Maize centromeres contain a 156 bp tandem repeat known as CentC and an abundant class of Ty3/Gypsy-like transposons [22]. Several subfamilies of these so-called Centromeric Retroelements (CR elements, known as CRM in maize; [23]) exist, with CRM2 being the most abundant in the maize genome [24]. Over time, CR elements insert in and around each other resulting in a nested arrangement [25,26]. Such insertion sites have a high probability of being unique and are generally polymorphic among lines, thereby providing an excellent tool for the genetic analysis of centromeres [27,28]. Here we used transposon display [29] of CRM2 to generate centromere-specific markers in maize. Analysis of segregation in a mapping population, combined with CENH3 ChIP, allowed us to map the functional region of each maize centromere and provide direct evidence for conversion-type genetic exchanges within centromere cores. An analysis of haplotype variation and linkage disequilibrium in a broad panel of maize lines revealed further evidence for a high rate of gene conversion across all centromeres studied, consistent with an important role for stochastic processes in centromere evolution.

Introduction In spite of their highly conserved function as the site of kinetochore assembly and spindle attachment, centromeres are the most dynamic regions of complex genomes. The components, copy number, and structural organization of centromeric DNA are highly divergent even among closely related species [1,2,3]. This apparent conflict between essentiality and sequence dispensability remains one of the major unresolved paradoxes in genetics. It has been hypothesized that the rapid evolution of centromeric DNA is primarily the result of an arms race in which meiotic drive sweeps novel centromeric repeats to fixation while centromeric proteins adapt to suppress this behavior [4]. Alternatively, some authors have argued that the role of selection is minimal and that observed variation can be explained by stochastic events such as mutation and genetic exchange [5,6,7]. Both proposals lack strong empirical support, as centromere drive has only rarely been documented [8], and mutational events are difficult to document in complex repetitive areas. Centromeres are specified epigenetically by the presence of a centromere-specific histone H3 variant, CENH3, which organizes the overlying kinetochores [4]. Kinetochores affect the function and behavior of centromeric DNA in pronounced ways. Perhaps most notable is their effect on crossing over. Cytogeneticists have long known that centromeres severely repress meiotic crossing over [9], and this result has since been confirmed in all species studied [10,11,12]. As a consequence, centromeres are often defined as regions where the frequency of crossovers approaches zero PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

proportion of these and exhibit very low transposition rates as judged by the small proportion of elements with insertion times in the past 75,000 years [30]. CRM2 thus has the features of an excellent genetic marker, being conserved enough to easily identify while still providing substantial polymorphism. Transposon display (known as TD; see [29]) makes it possible to capture such transposon-induced polymorphisms. By pairing a transposonspecific primer with a restriction site adapter, presence or absence of a particular insertion can be scored by resolving PCR products on a polyacrylamide gel. When we used TD to display all the CRM2 elements in the maize, we found that the number of products exceeded the resolution of our gel assays. To make the results manageable, we therefore added three selective bases to the adapter primer such that only 1/64 of the total number of bands was amplified in any given experiment. The resulting data suggest that 80.3% of the CRM2 bands are polymorphic between B73 and Mo17 (74 of 376 observed bands did not segregate). To map CRM2 polymorphisms within centromeric regions, we scored a total of 257 CRM2 markers in 93 recombinant inbred lines from the maize IBM mapping population [31]. Of these, 238 mapped to 10 positions, each corresponding to a different maize centromere. The remaining 19 mapped at least one centimorgan outside of a centromere cluster and were classified as pericentromeric. The final data set revealed that the distribution of CRM2 markers is non-uniform among centromeres: there are 30 independent CRM2 markers on B73 centromere 2, for example, but only one marker on centromere 9. This result might be expected, as prior evidence has suggested repeat variation among maize centromeres [32]. An analysis of a B73/Mo17 hybrid line by fluorescent in situ hybridization (FISH) supports the interpretation that there is a rough correspondence between the number of markers recovered by CRM2 display and the intensity of CRM2 hybridization signal (Figure 1). Recombinant inbred lines should be homozygous for markers from only one parent at the vast majority of loci. However, we also detected lines that contained markers characteristic of both (27

Author Summary Centromeres, which harbor the attachment points for microtubules during cell division, are characterized by repetitive DNA, paucity of genes, and almost complete suppression of crossing over. The repetitive DNA within centromeres appears to evolve much faster than would be expected for genetically inert regions, however. Current explanations for this rapid evolution tend to be theoretical. On the one hand there are arguments that subtle forms of selection on selfish repeat sequences can explain the rapid rate of change, while on the other hand it seems plausible that some form of accelerated neutral evolution is occurring. Here, we address this question in maize, which is known for its excellent genetic mapping resources. We first developed a method for identifying hundreds of single copy markers in centromeres and confirmed that they lie within functional domains by using a chromatin immunoprecipitation assay for kinetochore protein CENH3. All markers were mapped in relation to each other. The data show that, whereas classical crossing over is suppressed, there is extensive genetic exchange in the form of gene conversion (by which short segments of one chromosome are copied onto the other). These results were confirmed by demonstrating that similar short exchange tracts are common among the centromeres from multiple diverse inbred lines of maize. Our study suggests that centromere diversity can be at least partially attributed to a high rate of previously ‘‘hidden’’ genetic exchange within the core kinetochore domains.

Results Generating Unique Centromeric Markers Using CRM2Display Maize centromeres contain hundreds of retrotransposons of the CRM family, with clearly orthologous subfamilies present in rice [30]. Elements of the CRM2 subfamily account for a large

Figure 1. Correspondence between CRM2 marker number and CRM2 FISH intensity. Metaphase chromosomes from a B73/Mo17 hybrid line (from a single cell). CRM2 LTR and telomeres are shown in green, CentC and the knob 180 bp repeat are shown in red, and chromosomes are shown in blue. The lower panel shows CRM2 FISH signal (in white), and beneath each centromeric region is the total number of CRM2 TD markers recovered from that centromere. doi:10.1371/journal.pbio.1000327.g001

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

centromeres) or neither of the parental centromeres (6 centromeres). The former could be the result of residual heterozygosity, whereas the latter was presumed to represent contamination during the propagation of the lines. A combination of flanking centromeric markers and FISH (Figure S1) allowed us to confirm these expectations and remove the heterozygous and/or contaminant centromeres from consideration (Table S1). Overall centromeric heterozygosity was 2.15%, in line with expectations (2.5%) from a 66 self-crossed population.

events (Figure S2). It is also possible (though much less likely) that these events represent exchange between non-homologous centromeres. Although we have not demonstrated that the observed marker exchanges are mechanistically gene conversion in the strictest sense, we will refer to them as conversion events throughout. Based on these observations, we can estimate that the IBM lines sustained a centromeric gene conversion rate of 1.8661024 conversion events per marker per generation (see Materials and Methods).

CRM2 Markers Interact with CENH3

Linkage Disequilibrium (LD) in Maize Centromeres

CENH3 chromatin is not continuously distributed over centromeric domains, and any assay of common centromere repeats will thus provide only a partial view of the functional centromere/kinetochore regions. To identify CRM2 markers that lie within functional regions, we added a chromatin immunoprecipitation (ChIP) step to the protocol (Figure 2). Centromeric chromatin was precipitated with anti-CENH3 antibodies, the DNA purified from its associated chromatin, and the sample further processed for CRM2 display. Of 212 markers scored by ChIP, 122 were precipitated with CENH3 (57.5%), 40 were not precipitated with CENH3, and 50 gave inconsistent results among replicates. As expected, none of the 19 known pericentromeric bands was immunoprecipitated by CENH3 antibodies. These results are consistent with prior work showing that roughly 30% of maize CRM sequences can be immunoprecipitated by CENH3 antisera [23] and that a visible proportion of the CRM elements in maize are not associated with CENH3 [33].

Direct observation of marker exchange in our mapping population confirms the existence of conversion events, but population genetic data are required to assess the historical impact that such processes may have had on maize centromeres. To this end, we genotyped a set of CRM2 TD markers in a panel of 53 inbred lines, including a 50-line core set representative of a broad base of maize genetic diversity [34]. Each line was genotyped with 75 markers derived from 10 centromeres (B73 centromeres 1, 2, 3, 5, 6, 8, and Mo17 centromeres 4, 7, 8, and 9; Figure 4). When scoring CRM2 markers in diverse inbreds, there is a possibility that unrelated bands might co-migrate with the B73- or Mo17-derived bands and thus be scored as false positives. To investigate this possibility, we confirmed all bands for a set of 12 sequenced markers on centromere 2 [24] using a second round of genotyping using 4 bp selective base primers. The data revealed that 98.2% of the genotypes (556 of 566) from centromere 2 had been scored correctly. The remaining data are reported as originally called with 3 bp primers and interpreted with an assumed false positive rate of 1.8% (Figure 4). Because all of the assayed lines are inbred, it is reasonable to interpret our multi-locus genotypes as haplotypes for population genetic analysis, even though the markers are genetically dominant. Initial investigation of average pairwise LD among markers, as measured by the ZnS statistic [35], revealed that observed haplotype configurations at 7 of the 9 centromeres cannot be explained by a model lacking historic genetic exchange (Table 1). To further test for evidence of genetic exchange, we applied the four-gamete test [36] to estimate the minimum number of genetic exchanges (Rmin) required to explain the observed data (assuming no recurrent mutation). As shown in Table 1, all nine centromeres were estimated to have nonzero Rmin (mean = 5.6), providing strong evidence for some form of genetic exchange. These Rmin values, moreover, are likely underestimates of the actual number of exchanges that have occurred at each centromere, as our markers cover only a small region of each centromere and Rmin is an inherently conservative statistic [36]. Genetic exchanges such as those measured by Rmin can be caused by either crossing over or gene conversion. These two types of exchange result in different predictions about the relationship between LD and physical distance. Crossing over produces a negative correlation between LD and distance. For instance, LD on maize chromosome arms decays to negligible levels within 2 kb [37]. In contrast, because gene conversion tracts are usually short [38] and do not affect flanking markers, gene conversion is not expected to produce a relationship between marker distance and linkage. We measured the relationship between LD and distance on centromere 2 (Figure 5), which has been fully sequenced [24]. Pairwise LD estimates reveal a block of high LD involving 3 markers spanning the only region of CentC repeats on this centromere ([24]; marked as a box on Figure 5B), but the data reveal no evidence for a correlation between LD and distance (Pearsonâ&#x20AC;&#x2122;s correlation coefficient of 0.11 does not differ from randomly permuted datasets; p = 0.32). This pattern differs

Sequence Conversion Events within Centromeres The IBM population presents a unique opportunity for identifying rare genetic exchanges within centromere cores. Since crossing over is suppressed in centromeres, the markers from a single centromere haplotype should always be inherited as a unit. While this is true for the great majority of centromeres, we also detected aberrant inheritance patterns. These fell into two categories: loss of a marker from a known centromere haplotype and gain or transfer of a marker from one haplotype to another (Figure 3). Marker loss is a negative result and difficult to confirm; such events may in principle represent deletions but could potentially represent technical errors and were thus not pursued further. In contrast, there are several definitive ways to confirm the gain of a marker in our scoring system, and we focused further analyses on these markers. There were four cases of marker gain, each potentially representing a genetic exchange event. We first cloned and sequenced each affected band from its parental line. We then performed a new round of TD using sequence-specific primers. In two such cases, the originally scored gained bands were not observed using the sequence-specific primers, indicating that the bands likely represent new polymorphisms that happened to comigrate with one of the mapped markers. Two other bandsâ&#x20AC;&#x201D; B73_8_ACC165 and Mo17_5_TCG264â&#x20AC;&#x201D;were confirmed by sequence to represent the parental markers. At least one of these markers (B73_8_ACC165) lies within the functional CENH3 core as assayed by ChIP display. The second marker (Mo17_ 5_TCG264) did not precipitate with CENH3 antisera in our hands, though we note that a negative result by ChIP does not necessarily imply that the marker is not centromeric. An analysis of flanking markers revealed that no crossing over was associated with either B73_8_ACC165 or Mo17_5_TCG264, ruling out the possibility that they represent crossing over at the edge of the affected centromeres and indicating that they represent gene conversion, double crossover, or similar sequence exchange PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

inconsequential. These results are thus inconsistent with the observed genetic exchange being the result of canonical crossing over. We therefore proceeded to estimate the rate of gene conversion on each centromere using two independent methods (Table 1). The first is based on the premise that gene conversion will increase the number of multilocus haplotypes in a sample. Coalescent simulations (see Materials and Methods; Figure 6) were used to estimate the gene conversion rate required to achieve the observed number of haplotypes. The resulting data suggest a mean estimate of 3.761025 conversion events per marker per generation and allow us to statistically reject a model with no gene conversion for all nine centromeres at p,0.05. Second, we used a composite likelihood method [39] to directly estimate gene conversion rates for each centromere. This second approach reveals similar rates of conversion across all nine centromeres, averaging ,161025 conversion events per marker per generation.

Discussion Our data indicate that gene conversion is common within centromeres and may play a fundamental role in determining the dynamics and distribution of centromere repeats. This conclusion is based on three primary lines of evidence. First, our mapping data provide what is to our knowledge the only experimental evidence for centromeric gene conversion. Indeed, two independent conversion events were identified in 93 recombinant inbred lines using a set of 238 CRM2 markers, corresponding to a rate of 1.8661024 exchanges per marker per generation. The second line of evidence comes from LD analysis of 75 markers typed in a set of 53 diverse inbred lines. These data show patterns consistent with genetic exchange, including unusually low LD and the clear presence of recombinant haplotypes (nonzero Rmin), but show no decay of LD with distance as would be expected in the presence of crossing over. Finally, two independent population genetic methods were used to directly estimate centromeric gene conversion, resulting in remarkably similar rates of ,161025 conversions per marker per generation. It is too early to tell how rates of gene conversion in centromeres compare to other regions of the maize genome, but one estimate of gene conversion at the maize anthocyaninless1 locus (,361025/marker/generation [40]) suggests they may be of a similar order of magnitude. It has been hypothesized that centromere evolution in eukaryotes with asymmetric meiosis has been primarily governed by an arms race in which meiotic drive occasionally sweeps novel centromeric repeats to fixation [4]. While the extreme LD observed around a short tract of CentC on centromere 2 may hint at an evolutionary history consistent with these ideas (Figure 5B), our finding of widespread gene conversion explains how high levels of diversity may be observed even in yeast where meiotic drive is a less likely explanation [7]. Sequence data from mammalian centromeres are further consistent with this view, suggesting in several studies that gene conversion has contributed to extant centromere variation and the production of novel higher order repeat arrays [17,18,19]. If centromeric gene conversion is indeed common in maize, yeast, and humans, it seems reasonable to hypothesize that gene conversion is an important process within the centromere cores of all eukaryotes.

Figure 2. ChIP display. The image shows CRM2 elements labeled with P33 on a polyacrylamide gel. The left panel shows results from chromatin immunoprecipitation with controls: pos, B73 nuclei used for the ChIP experiment; sup, supernatant that did not bind to CENH3 antibodies; C, CENH3-bound markers; neg, no antibody control (shows non-specific binding to the sepharose beads used for precipitation). The right panel shows an annotated comparison between sup and C lanes. The chromosomal locations of the bands precipitated are indicated. The dashes next to the S lane denote non-precipitated bands. doi:10.1371/journal.pbio.1000327.g002

Materials and Methods

dramatically from what has been observed in the rest of the genome (Figure 5, inset) [37]. Moreover, forcing the data to fit a model of nonlinear decay [37] results in an estimate of crossing over of 3.94610212 per bp per generationâ&#x20AC;&#x201D;so low as to be PLoS Biology | www.plosbiology.org

Genetic Stocks A ninety-four line IBM DNA Kit, provided by the Maize Genetics Cooperation Stock Center (http://www.maizemap.org/94_ibm. 4

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

Figure 3. The B73_8_ACC165 gene conversion event. This figure illustrates marker gain as primary data; see also Figure S2 for a visualization of how the data are interpreted. Panels show gel images acquired using fluorescent (FAM) labeling and capillary electrophoresis (images produced by GeneMarker software). IBM10 contains all Mo17 markers from centromere 8 as well as the centromere 8 B73_8_ACC165 marker (B73 markers are labeled in blue and Mo17 markers are labeled in red). IBM11 and IBM12 contain normal Mo17 and B73 centromeres, respectively. Only a subset of the (total 30) markers for centromere 8 is shown; see Figure S2 for the complete list. doi:10.1371/journal.pbio.1000327.g003

htm), was used for CRM2 display. IBM3 was excluded from the analysis because seven centromeres were heterozygous. Additional accessions of IBM lines used for confirmation and further ChIP and FISH analysis were obtained from the Maize Genetics COOP stock center (http://www.maizegdb.org/stock.php). A set of 53 maize inbred lines, including the majority of a 50line core set [34] with additional lines within NAM (nested association mapping) founder lines [41], were chosen to represent the genetic diversity for LD analysis. The inbreds assayed were B73, Mo17, A441, A632, B37, B57, B96, B97, C103, CI.7, CML5, CML52, CML61, CML69, CML77, CML103, CML220, CML228, CML247, CML254, CML261, CML277, CML311, CML321, CML322, CML328, CML333, F2, Hi27, HP301, I137TN, IDS28, IL14H, K55, Ki3, Ki11, KY21, M37w, Mo18w, Ms71, Nc304, Nc360, Nc348, Nc358, Oh7B, Oh43, Os420, P39, Tx303, Tzi8, Tzi9, Va85, and W401. All were obtained from the North Central Regional Plant Introduction Station, in Ames, Iowa. DNA was extracted from 3-wk-old seedlings using a modified CTAB protocol [42].

Genetically Mapping CRM2 Markers Mapping data were initially sent to a community IBM mapping service (CIMDE), which constructed a linkage map using a twopoint mapping method from a framework of 580 loci. After obtaining rough positions, we constructed a finer centromere map for each chromosome using MapMaker Version 3.0 [43]. In each centromere map, mapping scores for 20 flanking markers from the IBM2 2008 Neighbors linkage interpretation (www.maizegdb.org) were added to the file containing CRM2 markers scores. The closest IBM2 core bin markers were added as the first and last marker for each centromere map. In addition, we included as many ‘‘skeleton’’ markers (ISU map4, [13]) as possible. The CRM2 markers were then placed into the centromere framework using a multi-point method (the ‘‘try’’ MapMaker command).

Identifying CENH3-Associated Markers by ChIP Display Native ChIP was carried out as described previously [44] with minor modifications. Chromatin was extracted from young leaves (,8–15 cm) or young roots (,1 wk after germination). RNase-free DNase I (Promega, Madison, WI, USA) was utilized for chromatin digestion. Chromatin was digested to ,300–3,000 bp fragments as judged by agarose electrophoresis. After immunoprecipitation with anti-CENH3 antisera [23], the supernatant (unbound) and IP (bound) fractions were purified with a PCR purification kit (Invitrogen, Carlsbad, CA, USA) and used for CRM2 transposon display. Input DNA (before adding antibodies) was used as a positive control and a treatment without antibodies (No IgG) was used as a negative control (Figure 2). ChIP display was replicated three times for both B73 and Mo17; bands that were amplified in the IPed DNAs from all three experiments were considered to be associated with centromere cores.

CRM2 Transposon Display Transposon display was carried out as described elsewhere [24,29]. In this method, DNA is digested with BfaI and the samples PCR-amplified using CRM2 primers and adapter primers designed to anneal to the cleaved BfaI site. The method involves primary and selective amplification steps with different (nested) CRM2 primers being used in each step. The primers for primary amplification were CRM2_R1 (59GAGGTGGTGTATCGGTTGCT) and BfaI + 0 (59- GACGATGAGTCCTGAGTAG), and for selective amplification were P33 or FAMlabeled CRM2_R2 (59-CTACAGCCTTCCAAAGACGC) and BfaI + 3 selective bases (where different bases were added to the Bfa + 0 primer). A 58uC annealing temperature was used for the selective amplification. P33-labeled PCR products were separated on 6% polyacrylamide gels and FAM-labeled PCR products were separated by capillary electrophoresis and interpreted using GeneMarker software (SoftGenetics, LLC). PLoS Biology | www.plosbiology.org

Recovery and Sequencing of CRM2 Markers Sixty-four CRM2 bands were excised from TD gels and reamplified with primer set BfaI+0 and CRM2_R2. The PCR products were purified using QIAGEN (Valencia, CA) Gel Purification kit and were either directly sequenced or cloned into 5

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

Figure 4. CRM2 marker data from a set of diverse inbreds. Panels A and B together represent the entire data set. Columns show the 53 inbreds scored, while rows show the presence (black) or absence (white) of 75 CRM2 TD markers for the indicated centromeres. The columns containing B73 and Mo17 reference data are highlighted in grey. For centromere 2, only sequence-confirmed data are shown, whereas all other data were interpreted with a presumed false positive rate of 1.8%. doi:10.1371/journal.pbio.1000327.g004

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

Table 1. Linkage disequilibrium and gene conversion rates.

Centromere

Markers Rmin

ZnS

Gene Conversion Rate1

0.586

1.04

0.35

0.386**

1.04

0.91

0.326**

5.09

1.40

0.379**

4.09

1.42

0.487

0.461

0.90

0.320**

8.48

0.85

0.282**

8.18

1.36

8_B73

0.445*

2.12

0.36

8_Mo17

0.325**

3.64

0.91

0.249**

3.64

0.90

0.312**

1.62

0.93

Simulation Likelihood

8 9

N = number of haplotypes. 1 Rates presented as conversions per 105 markers. 2 All centromere 8 data combined. *p,0.05, ** p,0.001. doi:10.1371/journal.pbio.1000327.t001

a TOPO TA vector (Invitrogen, Carlsbad, CA) and then sequenced. As controls for the ChIP display method, 31 bands were cloned from both genomic DNA and ChIP display (IP) lanes, and the resulting sequences were found to be identical. All sequenced markers are available in GenBank as accessions GF099546–GF099610. Markers that were shown to interact with CENH3 are annotated with the statement ‘‘this sequence interacts with Centromeric Histone H3 (CENH3) and is within the functional centromere core.’’ We note that a subset of the sequenced markers was also used to construct the physical map of centromeres 2 and 5 [24].

Identifying and Confirming Heterozygous Centromeres in IBM Lines Heterozygous centromeres were first identified as cases where markers from both parents were present for a single centromere. A total of 27 such examples were identified. Seven heterozygous centromeres were found in a single line (IBM3) that was subsequently removed as a recent outcross contaminant. We made an effort to confirm as many of the remaining 20 heterozygous centromeres as possible using codominant insertion-deletion polymorphisms (IDPs; [13]) to confirm heterozygosity at closely linked flanking markers (16 centromeres) or by FISH of CentC content (one centromere, Figure S1). We were also able to eliminate as contaminants six centromeres that lacked markers from either parent and were together responsible for all of the nonparental bands observed on TD gels. Although they lacked B73 or Mo17 markers, four of the contaminant centromeres were shown to contain abundant CentC and CRM and one line segregated for knobs not present in either parent (Figure S1). The IDPs scored were IDP3936, IDP592, and IDP825 (chromosome 2); IDP3945 and IDP1433 (chromosome 3); IDP642, IDP476, and IDP625 (chromosome 4); IDP1408, IDP359, and IDP1607 (chromosome 5); IDP3788, IDP3799, IDP2581, IDP680, and IDP3887 (chromosome 6); IDP3795, IDP3810, and IDP3994 (chromosome 7); IDP334, IDP327, IDP811, and IDP88 (chromosome 8); and IDP4151, IDP8457, and IDP4017 (chromosome 9). PLoS Biology | www.plosbiology.org

Figure 5. Linkage disequilibrium in centromere 2. (A) Pairwise LD plotted against distance, fit to a decay function [37,49] using a value of r = 8.8161027. Inset shows the decay over the first 5 kb (in black) and the same function fit using the genome-wide median of r (in grey) [51]. (B) Heatmap of pairwise LD. Lighter colors show higher LD. The black box demarcates three markers that show high LD and flank the only cluster of CentC repeats on this centromere (the 180 kb region between positions 89.88 and 90.06 Mb on the physical map). doi:10.1371/journal.pbio.1000327.g005

Confirming Gene Conversion Events Two gene conversion events identified by B73_8_ACC165 and Mo17_5_TCG264 were confirmed in several experiments using different DNA samples and primers. The most definitive experiment for marker B73_8_ACC165 involved a highly specific primer with 11 selective bp. With this primer, the segregation was identical to the original observation, such that RIL IBM10, which contains the complete Mo17 centromere 8 haplotype, also 7

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

FISH FISH on mitotic cells was performed as described previously [32]. The following four repetitive DNA sequences were included in the probe cocktail: subtelomeric 4-12-1 (FITC labeled), CRM2 LTR (FITC labeled), CentC (Texas Red labeled), and knob180 (Texas Red labeled). The clones of 4-12-1, CentC, and knob180 were generously provided by Dr. James Birchler (University of Missouri). The CRM2 LTR was PCR amplified from genomic DNA using the following primer set: forward, 59-TCGTCAACTCAACCATCAGGT, and reverse, 59-GCAAGTAGCGAGAGCTAAACTTGA. All images were captured and processed using a Zeiss Axio Imager microscope and SlideBook 4.0 software (Intelligent Imaging Innovations, Denver, CO, USA).

Estimation of Gene Conversion Rate in IBM Lines Assuming that all markers have equal likelihood of being involved in an exchange event, and taking into account the decrease in heterozygosity during the 11 generations involved in preparing the mapping population, we can estimate the rate of x , where x is the observed number of gene exchange as M G exchanges, M the total number of markers, and G the effective number of generations available for exchange. We observed two exchange events, and scored 238 markers in each of the 93 lines remaining after removing contamination. A further 696 markers were removed because of contamination or inconsistent banding patterns, such that the total number of markers was M = 21,438. In a randomly mating population, all 11 generations would provide opportunities for exchange. But as RILs are inbred, each generation possesses less heterozygosity and thus fewer opportunities to observe an exchange event. Correcting for this, the P 1=2Ă&#x17E;n , and the Ă° effective number of generations is G~1z 11 n~1 total rate is 1.8661024 exchanges per marker per generation.

LD and Simulation Calculation of Rmin, pairwise r2, and ZnS utilized code from the analysis and msstats packages of the libsequence C++ library [46]. We modeled the decay of LD with distance [37] and tested the significance of the association between râ&#x20AC;&#x2DC;2 and distance along centromere 2 with 1,000 pairwise permutations. The significance of the ZnS statistic for each centromere was compared to results from 1,000 coalescent simulations under a bottleneck model (similar to [47]) with no recombination. Simulations were performed in ms [48] with the command line: ms 53 1000 -t 500 -r 0 1000000 -c c 1000 -eN 0.00556 0.00544 -eN 0.00611 1.

Figure 6. Haplotype estimation of gene conversion. Shown is the expected number of haplotypes observed under varying levels of gene conversion (c) from coalescent simulations of a maize domestication bottleneck. The solid line indicates the mean number of haplotypes, and the shaded region encloses the empirical 95% confidence intervals. Horizontal dotted lines represent the number of haplotypes observed from the centromeres indicated (m is the number of markers in that centromere). The most probable gene conversion rates occur where the dotted lines intersect with the solid lines. The last panel shows the outcome if all centromere 8 data are considered together (from both B73 and Mo17, such that m = 19). doi:10.1371/journal.pbio.1000327.g006

Estimation of Gene Conversion in Diverse Inbreds We used two independent methods to estimate gene conversion rates. First, composite likelihood methods [39], as implemented in the program maxhap (http://home.uchicago.edu/,rhudson1/ source/maxhap.html), were used to estimate the population gene conversion rate c ( = 4Neg), where g is the gene conversion rate per bp per generation. We assumed a gene conversion tract length of 1 kb, a population recombination rate of r = 4Ner = 1025 per kb, where r is the recombination rate per bp per generation, and that markers were evenly spaced across the centromere. Centromere sizes were based on map estimates [24]. Physical map positions from centromere 2 were utilized to verify that assumptions of order and distance had little effect on the final rate estimation (unpublished data). Using maxhap, we calculated the likelihood of different rates across a grid of 10,000 values of c/r from 1 to 106

contains marker B73_8_ACC165 from B73 centromere 8. For marker Mo17_5_TCG264, we directly sequenced the aberrantly scored bands in the affected RILs IBM24 and IBM54. Both lines contain the complete B73 centromere 5 haplotype as well as the Mo17_5_TCG264 marker from Mo17 centromere 5. We ruled out that crossover had occurred coincidently with marker gain using our established centromere map positions [24]. For centromere 5 we used the following markers: umc40, mmp60, rz87 - Cent5 - umc1591, umc2302, and umc1060. For centromere 8 we used bnlg1834, umc1157, umc1904 - Cent8 - AY110113, gpm572b, and IDP334. Map scores for the flanking gene markers have been previously published [13,45] and were obtained from maizegdb.org. PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

per kb, reporting the value of c which maximized the likelihood for each centromere. Our second estimator of gene conversion compared the number of multilocus haplotypes present in a sample of centromere markers to coalescent simulations under a demographic model of maize domestication. We simulated chromosomes nearly devoid of recombination across a grid of gene conversion rates, performing 1,000 coalescent simulations for each value investigated. Our model closely followed prior work [47] in assuming an ancestral diploid population size of 450,000 that underwent a domestication bottleneck of 2,450 individuals, starting 11,000 years ago and lasting 1000 years. Simulations were performed in MaCS [50] using the following command line: macs 53 10e6 -t 10e-3 -r 10e-6 -c c 1000 -eN 0.00556 0.00544 eN 0.00611 1-h 10e5. Custom programs built using the libsequence C++ library [46] were used to ascertain markers using a scheme mirroring our TD methods, to choose a random subset of markers for comparison to different centromeres, to incorporate a false positive error rate of 1.8% (i.e., randomly change marker absence to marker presence with a probability of 1.8%), and to count haplotypes from the resulting simulated data. In both cases, to extract the rate g from our estimates of c, we calculated the effective population size Ne from the mean genomewide nucleotide diversity in maize [51] assuming a mutation rate of 361028 [52]. To calculate conversion rates on a per marker basis, we assumed the average tract length to be 1 kb and the average CRM2 marker to be 200 bp long.

from a cross between IBM58 and B73, showing a chromosomal feature (a knob, in red) on chromosome 2 that is not present in either B73 or Mo17. CentC (faint) and the knob 180 bp repeat are shown in red, CRM2 LTR and telomeres are shown in green, and chromosomes are shown in blue. Found at: doi:10.1371/journal.pbio.1000327.s001 (1.66 MB TIF) Figure S2 A complete list of markers from centromere 8 covering the bnlg1834 to IDP334 interval and the genotypes of IBM10, 11, and 12. Map scores for the six flanking gene markers have been previously published [13,45] and were obtained from maizegdb.org. The distances in centromereflanking regions are shown in IBM cM units, which equate to roughly one fourth the size of a standard cM. The seven Mo17 within-centromere markers and 23 B73 within-centromere markers are distributed randomly and are not meant to convey actual distance or order relative to each other (all 30 markers map genetically to the same location). For each of the IBM genotypes, B73 polymorphisms are represented by the letter B and Mo17 polymorphisms are represented by the letter M. Found at: doi:10.1371/journal.pbio.1000327.s002 (0.55 MB TIF) Table S1 Heterozygosity, contamination, and gene conversion in IBM lines. 1 het = heterozygous; / = contaminant centromere; gc = gene conversion. 2 IBM3 was removed. Found at: doi:10.1371/journal.pbio.1000327.s003 (0.23 MB DOC)

Acknowledgments We thank Katrien Devos for patient guidance in mapping methodologies, and A. J. Eckert, S. Still, G. Coop, and J. van Heerwaarden for comments on an earlier version of the manuscript.

Supporting Information Figure S1 Confirmation of centromere heterozygosity and contamination by FISH. (A) A chromosome spread from IBM85, showing centromere heterozygosity at chromosome 4. Note the differing amount of red (CentC) signal on the circled chromosomes. (B) A gel image showing that IBM47 and IBM85 are heterozygous in centromere 4 flanking regions. These data show the results for the IDP476 marker. Molecular weights of the size standards (in bp) are also indicated. (C) A chromosome spread

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: JS RKD. Performed the experiments: JS SEW. Analyzed the data: JS JMB JRI RKD. Contributed reagents/materials/analysis tools: GGP RKD. Wrote the paper: JS JMB JRI RKD.

References 14. Yan H, Jin W, Nagaki K, Tian S, Ouyang S, et al. (2005) Transcription and histone modifications in the recombination-free region spanning a rice centromere. Plant Cell 17: 3227–3238. 15. Liebman SW, Symington LS, Petes TD (1988) Mitotic recombination within the centromere of a yeast chromosome. Science 241: 1074–1077. 16. Jaco I, Canela A, Vera E, Blasco MA (2008) Centromere mitotic recombination in mammalian cells. J Cell Biol 181: 885–892. 17. Schindelhauer D, Schwarz T (2002) Evidence for a fast, intrachromosomal conversion mechanism from mapping of nucleotide variants within a homogeneous alpha-satellite DNA array. Genome Res 12: 1815–1826. 18. Roizes G (2006) Human centromeric alphoid domains are periodically homogenized so that they vary substantially between homologues. Mechanism and implications for centromere functioning. Nucleic Acids Res 34: 1912–1924. 19. Pertile MD, Graham AN, Choo KH, Kalitsis P (2009) Rapid evolution of mouse Y centromere repeat DNA belies recent sequence stability. Genome Res 19: 2202–2213. 20. Ma J, Bennetzen JL (2006) Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proc Natl Acad Sci U S A 103: 383–388. 21. Ma J, Jackson SA (2006) Retrotransposon accumulation and satellite amplification mediated by segmental duplication facilitate centromere expansion in rice. Genome Res 16: 251–259. 22. Jiang J, Birchler JA, Parrott WA, Dawe RK (2003) A molecular view of plant centromeres. Trends Plant Sci 8: 570–575. 23. Zhong CX, Marshall JB, Topp C, Mroczek R, Kato A, et al. (2002) Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14: 2825–2836. 24. Wolfgruber TK, Sharma A, Schneider KL, Albert PS, Koo DH, et al. (2009) Maize centromere structure and evolution: sequence analysis of centromeres 2

1. Murphy WJ, Larkin DM, Everts-van der Wind A, Bourque G, Tesler G, et al. (2005) Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science 309: 613–617. 2. O’Neill RJ, Eldridge MD, Metcalfe CJ (2004) Centromere dynamics and chromosome evolution in marsupials. J Hered 95: 375–381. 3. Lee HR, Zhang W, Langdon T, Jin W, Yan H, et al. (2005) Chromatin immunoprecipitation cloning reveals rapid evolutionary patterns of centromeric DNA in Oryza species. Proc Natl Acad Sci U S A 102: 11793–11798. 4. Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293: 1098–1102. 5. Smith GP (1976) Evolution of repeated DNA sequences by unequal crossover. Science 191: 528–535. 6. Charlesworth B, Sneglowski P, Stephan W (1994) The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371: 215–220. 7. Bensasson D, Zarowiecki M, Burt A, Koufopanou V (2008) Rapid evolution of yeast centromeres in the absence of drive. Genetics 178: 2161–2167. 8. Fishman L, Saunders A (2008) Centromere-associated female meiotic drive entails male fitness costs in monkeyflowers. Science 322: 1559–1562. 9. Beadle GW (1932) A possible influence of the spindle fibre on crossing-over in drosophila. Proc Natl Acad Sci U S A 18: 160–165. 10. Lambie EJ, Roeder GS (1986) Repression of meiotic crossing over by a centromere (CEN3) in Saccharomyces cerevisiae. Genetics 114: 769–789. 11. Mahtani MM, Willard HF (1998) Physical and genetic mapping of the human X chromosome centromere: repression of recombination. Genome Res 8: 100–110. 12. Copenhaver GP, Nickel K, Kuromori T, Benito M, Kaul S, et al. (1999) Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286: 2468–2474. 13. Fu Y, Wen TJ, Ronin YI, Chen HD, Guo L, et al. (2006) Genetic dissection of intermated recombinant inbred lines using a new genetic map of maize. Genetics 174: 1671–1683.

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000327

Centromere Evolution

25. 26.

27.

28.

29.

30.

31.

32.

33.

34.

35. 36.

37. Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, et al. (2001) Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A 98: 11479–11484. 38. Jeffreys AJ, May CA (2004) Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat Genet 36: 151–156. 39. Hudson RR (2001) Two-locus sampling distributions and their application. Genetics 159: 1805–1817. 40. Yandeau-Nelson MD, Zhou Q, Yao H, Xu X, Nikolau BJ, et al. (2005) MuDR transposase increases the frequency of meiotic crossovers in the vicinity of a Mu insertion in the maize a1 gene. Genetics 169: 917–929. 41. Yu JM, Holland JB, McMullen MD, Buckler ES (2008) Genetic design and statistical power of nested association mapping in maize. Genetics 178: 539–551. 42. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemistry Bulletin 19: 11–15. 43. Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, et al. (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1: 174–181. 44. Topp CN, Zhong CX, Dawe RK (2004) Centromere-encoded RNAs are integral components of the maize kinetochore. Proc Natl Acad Sci U S A 101: 15986–15991. 45. Sharopova N, McMullen MD, Schultz L, Schroeder S, Sanchez-Villeda H, et al. (2002) Development and mapping of SSR markers for maize. Plant Mol Biol 48: 463–481. 46. Thornton K (2003) Libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19: 2325–2327. 47. Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, et al. (2005) The effects of artificial selection on the maize genome. Science 308: 1310–1314. 48. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338. 49. Hill WG, Weir BS (1988) Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol 33: 54–78. 50. Chen GK, Marjoram P, Wall JD (2009) Fast and flexible simulation of DNA sequence data. Genome Res 19: 136–142. 51. Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES, et al. (2009) A first-generation haplotype map of maize. Science 326: 1115–1117. 52. Clark RM, Tavare S, Doebley J (2005) Estimating a nucleotide substitution rate for maize from polymorphism at a major domestication locus. Mol Biol Evol 22: 2304–2312.

and 5 reveals dynamic Loci shaped primarily by retrotransposons. PLoS Genet 5: e1000743. doi:10.1371/journal.pgen.1000743. SanMiguel P, Gaut B, Tikhonov A, Nakajima Y, Bennetzen J (1998) The paleontology of intergene retrotransposons of maize. Nat Genet 20: 43–45. Nagaki K, Song J, Stupar R, Parokonny AS, Yuan Q, et al. (2003) Molecular and cytological analyses of large tracks of centromeric DNA reveal the structure and evolutionary dynamics of maize centromeres. Genetics 163: 759–770. Devos KM, Ma J, Pontaroli AC, Pratt LH, Bennetzen JL (2005) Analysis and mapping of randomly chosen bacterial artificial chromosome clones from hexaploid bread wheat. Proc Natl Acad Sci U S A 102: 19243–19248. Luce AC, Sharma A, Mollere OS, Wolfgruber TK, Nagaki K, et al. (2006) Precise centromere mapping using a combination of repeat junction markers and chromatin immunoprecipitation-polymerase chain reaction. Genetics 174: 1057–1061. Casa AM, Brouwer C, Nagel A, Wang L, Zhang Q, et al. (2000) Inaugural article: the MITE family heartbreaker (Hbr): molecular markers in maize. Proc Natl Acad Sci U S A 97: 10083–10089. Sharma A, Presting GG (2008) Centromeric retrotransposon lineages predate the maize/rice divergence and differ in abundance and activity. Mol Genet Genomics 279: 133–147. Lee M, Sharopova N, Beavis WD, Grant D, Katt M, et al. (2002) Expanding the genetic map of maize with the intermated B736Mo17 (IBM) population. Plant Mol Biol 48: 453–461. Kato A, Lamb JC, Birchler JA (2004) Chromosome painting using repetitive DNA sequences as probes for somatic chromosome identification in maize. Proc Natl Acad Sci U S A 101: 13554–13559. Jin W, Melo JR, Nagaki K, Talbert PB, Henikoff S, et al. (2004) Maize centromeres: organization and functional adaptation in the genetic background of oat. Plant Cell 16: 571–581. Liu K, Goodman M, Muse S, Smith JS, Buckler E, et al. (2003) Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics 165: 2117–2128. Kelly JK (1997) A test of neutrality based on interlocus associations. Genetics 146: 1197–1206. Hudson RR, Kaplan NL (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111: 147–164.

PLoS Biology | www.plosbiology.org

March 2010 | Volume 8 | Issue 3 | e1000327

Expression in Aneuploid Drosophila S2 Cells Yu Zhang1, John H. Malone1, Sara K. Powell2, Vipul Periwal3, Eric Spana4, David M. MacAlpine2, Brian Oliver1* 1 Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America, 2 Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, North Carolina, United States of America, 3 Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America, 4 Department of Biology, Duke University, Durham, North Carolina, United States of America

Abstract Extensive departures from balanced gene dose in aneuploids are highly deleterious. However, we know very little about the relationship between gene copy number and expression in aneuploid cells. We determined copy number and transcript abundance (expression) genome-wide in Drosophila S2 cells by DNA-Seq and RNA-Seq. We found that S2 cells are aneuploid for .43 Mb of the genome, primarily in the range of one to five copies, and show a male genotype (, two X chromosomes and four sets of autosomes, or 2X;4A). Both X chromosomes and autosomes showed expression dosage compensation. X chromosome expression was elevated in a fixed-fold manner regardless of actual gene dose. In engineering terms, the system ‘‘anticipates’’ the perturbation caused by X dose, rather than responding to an error caused by the perturbation. This feed-forward regulation resulted in precise dosage compensation only when X dose was half of the autosome dose. Insufficient compensation occurred at lower X chromosome dose and excessive expression occurred at higher doses. RNAi knockdown of the Male Specific Lethal complex abolished feed-forward regulation. Both autosome and X chromosome genes show Male Specific Lethal–independent compensation that fits a first order dose-response curve. Our data indicate that expression dosage compensation dampens the effect of altered DNA copy number genome-wide. For the X chromosome, compensation includes fixed and dose-dependent components. Citation: Zhang Y, Malone JH, Powell SK, Periwal V, Spana E, et al. (2010) Expression in Aneuploid Drosophila S2 Cells. PLoS Biol 8(2): e1000320. doi:10.1371/ journal.pbio.1000320 Academic Editor: Peter B. Becker, Adolf Butenandt Institute, Germany Received September 23, 2009; Accepted January 20, 2010; Published February 23, 2010 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Funding: This work was supported by the The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Intramural Research Program, National Human Genome Research Institute (NHGRI) extramural grant HG004279, and a Whitehead Foundation Scholar Award. The funders had no role in study design, data collection and analysis or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CGH, comparative genome hybridization; ChIP, chromatin immunoprecipitation; CPA, Bayesian change point analysis; dsRNA, double stranded RNA; MSL, male specific lethal; RPKM, reads per kb per million reads; RNAi, RNA interference * E-mail: oliver@helix.nih.gov

particular region or gene [5]. This indicates that the detrimental effect of aneuploidy is a collective function of multiple small effects, not a function of particular genes. Interestingly, while aneuploidy results in inviability at the organism level, aneuploid cells can out-compete diploid cells for growth in vivo or in vitro. Human cancer cells are a good example of proliferating cells characterized by aneuploidy [7]. Most tumors are nearly diploid or tetraploid with extra or lost chromosomes. Even tumors with a normal number of chromosomes contain other rearrangements that result in segmental aneuploidy. It is likely that aneuploidy results in a systems or gene interaction defect. Given that a deleterious effect of aneuploidy is likely to occur at the level of genome balance, understanding the response to aneuploidy requires the exploration of general control mechanisms that operate at the network level. We have turned to widely used Drosophila S2 tissue culture cells as an aneuploid model [8,9]. These cells are generally tetraploid [9] and studies of gene expression and X chromosome dosage compensation indicate that they are male [10]. As a natural consequence of chromosomal sex determination in Drosophila, females have two X chromosomes and two pairs of autosomes (2X;2A) and males have a single X chromosome (1X;2A) [11]. Therefore, male cells can be thought of as naturally occurring

Introduction The somatic cells of multicellular animals are almost exclusively diploid, with haploidy restricted to post-meiotic germ cells. Having two copies of every gene has an obvious advantage. Mutations arise de novo within cells of an organism and within organisms in populations, such that deleterious mutation-free haploid genomes are extremely rare. The wild type alleles of genes tend to be dominant to the recessive loss-of-function alleles, providing a degree of redundancy allowing diploid organisms to survive even with a substantial genetic load of deleterious mutations in each haplotype. While the dose of most individual genes is of little consequence to the organism, larger scale genomic imbalance, or aneuploidy, is detrimental [1–4]. Chromosomal aneuploidy occurs when whole chromosomes are lost or duplicated and segmental aneuploidy results from deletions, duplications, and unbalanced translocations. In Drosophila, a systematic genome-wide segmental aneuploidy study [5] demonstrated that of all genes (now known to be about 15,000 [6]), only about 50 are haploinsufficient and just one gene is triplo-lethal. However, these same experiments showed that large deletions and duplications result in reduced viability and fertility that depends on the extent of aneuploidy, and not on any PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

concentration at steady-state [24]. In addition to the enzymatic properties of transcription, more than a generation of molecular biologists has elegantly described extensive transcriptional regulation networks controlling key phenotypes [25]. These regulatory motifs are sensitive to changes in gene dose [26]. Feedback is an outstanding error-controlled regulator that detects deviations from the norm and implements corrective action. Feed-forward regulation differs in that it anticipates the possible effect of perturbations on the system rather than correcting the perturbation after the deviation occurs. This could operate if cells detect copy number and correct transcription levels before a quantitative error in transcript abundance is evident. In male embryos, the sex determination hierarchy detects X chromosome number and leads to association of the MSL complex with the X chromosome before zygotic transcription is activated [27], as expected for a feed-forward regulator. However, MSL is selectively bound to transcribed genes [28], which is also consistent with feedback regulation. By examining the response of X chromosome genes to dose in the presence and absence of MSL, we show that X chromosome dosage compensation results from a combination of MSL-dependent feed-forward regulation based on anticipated effects from unbalanced gene dose and a more general and dynamic response to perceived gene dose. The latter could be due to negative feedback, buffering, or both.

Author Summary While it is widely recognized that mutations in protein coding genes can have harmful consequences, one can also have too much or too little of a good thing. Except for the sex chromosomes, genes come in sets of two in diploid organisms. Extra or missing copies of genes or chromosomes result in an imbalance that can lead to cancers, miscarriages, and disease susceptibility. We have examined what happens to gene expression in Drosophila cells with the types of gross copy number changes that are typical of cancers. We have compared the response of autosomes and sex chromosomes and show that there is some compensation for copy number change in both cases. One response is universal and acts to correct copy number changes by changing transcript abundance. The other is specific to the X chromosome and acts to increase expression regardless of gene dose. Our data highlight how important gene expression balance is for cell function.

chromosomal aneuploids. The response to altered gene dose probably occurs at multiple levels, but transcription is an early step in the flow of information from the genome and is a likely site for control. For example, X chromosome dosage compensation clearly occurs at the transcriptional level [12] and is exquisitely precise [13]. The Male Specific Lethal (MSL) complex regulates the balanced expression of X chromosomes in wild type 1X;2A male flies. MSL is composed of at least four major proteins (Msl1, Msl2, Msl3, and Mof) and two non-coding RNAs (RoX1 and RoX2) [11]. Mof is an acetyltransferase responsible for acetylating H4K16 [11,14,15]. Mof is highly enriched on the male X chromosome as a component of the MSL complex. However, Mof also associates with a more limited repertoire of autosomal genes independently of MSL [16]. H4K16ac is associated with increased transcription in many systems [17]. Therefore, it is widely believed that this acetylation results in increased expression of the X chromosome [11], although an alternative hypothesis suggests that MSL sequesters Mof from the autosomes to drive down autosome expression [18]. Determining which of these mechanisms occurs is complicated by the very nature of sampling experiments when much of the transcriptome is altered. The number of X chromosome transcripts sampled from the transcriptome depends on the relative abundance of the X chromosome and autosome transcripts. The salient feature of both models is balanced X chromosome and autosome expression. While the term dosage compensation is used to describe X chromosome expression, dosage compensation is not restricted to X chromosomes in Drosophila. Autosomes also show significant, but much less precise, dosage compensation at the expression level [13,19â&#x20AC;&#x201C;21], suggesting that there is a general dose response genome-wide. Despite the clear role of MSL in X chromosome dosage compensation, the control system rules for MSL function and the contribution of global compensation mechanisms to the specific case of the X chromosome are poorly understood. There are three basic transcript control mechanisms that could modify the effect of gene dose: buffering, feedback, and feedforward [22]. Here we define buffering as the passive absorption of gene dose perturbations by inherent system properties. For example, if transcription obeys mass-action kinetics and the gene/transcription complex is considered an enzyme [23], then one would not expect a one-to-one relationship between mRNA and gene copy because of the small effect of a change in enzyme PLoS Biology | www.plosbiology.org

Results Segmental Aneuploidy in S2 Cells To determine the extent of aneuploidy in S2 cells, we performed next generation sequencing (DNA-Seq) and comparative genome hybridization (CGH). These data confirmed the predicted male genotype of S2 cells, as the average sequence depth of the X chromosome (reads per kb per million reads, RPKM) was 54% of the autosome RPKM (Figures 1 and 2A). We also found that S2 cells exhibit numerous large regions of segmental aneuploidy (Figure 1, Figure S1, Table S1). Stepwise deviations from expected dose covered ,42% (,40.0 Mb) of the autosomes and ,17% (,3.8 Mb) of the X chromosome (Figure S1). The vast majority of the aneuploid segments showed an extra or lost copy. There was high congruence between DNA-Seq and CGH methods. For example, we determined that .93% of calls for copy numbers between one and five made by DNA-Seq analysis were confirmed by CGH, even when comparing different lots of cells grown under slightly different conditions (Figure S2, Table S2). These data suggest that S2 cells are highly aneuploid but show a reasonably stable genotype. There was much more variability seen when copy number was greater than five (30% agreement between methods and cultures). This could be due to failure to call short segmental duplications or to repeat expansion/ retraction in different cultures. Regardless of cause, we decided to focus our subsequent expression analyses on the high-confidence one to five copy genes (Table S3).

Genome-Wide Compensation We observed striking differences in DNA-Seq read density among chromosome arms due to segmental aneuploidy (Figure 2A, p,10215, KS test). To determine if these DNA differences are also associated with similar changes at the transcript level, we profiled transcript expression by next generation sequencing (RNA-Seq). We validated RNA-Seq data by microarray profiling and found outstanding agreement (rs = 0.87, p = 0). Expression analysis revealed striking dosage compensation. Even though copy number values significantly differed at the chromosome level (Figure 2A), we found that expression from autosome arms and the X 2

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

Figure 1. S2 cell DNA copy number. (Aâ&#x20AC;&#x201C;D) DNA density and copy number profiles of the X chromosome (A, B) and chromosome 2L (C, D), showing copy number of aneuploidy segments along chromosome length. The RPKM DNA-Seq density in nonoverlapping 1 kb windows was plotted against the chromosome coordinates and the final deduced copy number is indicated (color key). The copy number was determined by Bayesian change point analysis (CPA) (A, C) and CGH (B, D). The CGH results are projected onto the DNA-Seq data. The average DNA densities of each aneuploid segment between predicted breakpoints (black lines) are shown. doi:10.1371/journal.pbio.1000320.g001

PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

Figure 2. Expression at varying copy numbers. (A, B) Boxplots showing the distribution of DNA-Seq read densities (in non-overlapping 1 kb windows) mapped to chromosome arms in S2 cells (A) and the distribution of RNA-Seq expression values at the gene-level (B). Pie charts (A, B) show the distributions of copy numbers on each chromosome arm (for expressed genes only). See Figure 1 for copy number color key. The X chromosome is in red. (C, D) Boxplots showing the distribution of RNA-Seq expression values by copy number (C) and expression per copy (D). Equivalent expression medians for two copies on the X and four copies on the autosomes are indicated (dashed line). For all boxplots, the 25th to 75th percentiles (boxes), medians (lines in boxes), and ranges (whiskers, 1.5 times the interquartile range extended from both ends of the box) are shown. Asterisks indicate significant differences from all other chromosome arms (A, B) or from the 2X or 4A baseline (C). doi:10.1371/journal.pbio.1000320.g002

copy indicates genome-wide.

chromosome were similar inter se (Figure 2B). In no case was the expression of a chromosome arm significantly different from all other arms (p.1022, KS test), indicating that dosage compensation occurs genome-wide, not just on the X chromosome. To examine the precision of dosage compensation, we determined the relationship between expression and copy number. Compensation was not perfect, as expression increased with copy number (Figure 2C, p,1024, KS test). This imperfect compensation resulted in a sublinear relationship between copy number and gene expression, such that per copy expression values decreased with increased copy number on the autosomes and especially on the X chromosome (Figure 2D). This inverse relationship between copy number and expression per PLoS Biology | www.plosbiology.org

that

partial

dosage

compensation

occurs

The X Chromosome X chromosome dosage compensation was of particular interest. In wild type males, X chromosome dose (1X) is 50% of autosomal dose (2A). In S2 cells this relationship occurred at 2X;4A due to tetraploidy. The precision of X chromosome dosage compensation in S2 cells was revealed by the indistinguishable expression of two copy X chromosome genes and four copy autosome genes (Figure 2C, p = 0.15, KS test). Thus X chromosome dosage compensation shows similar efficacy in diploid 1X;2A flies and in aneuploid 2X;4A tissue culture cells. 4

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

test). These data clearly indicate that MSL acts on expression based on X chromosome gene nature, rather than monitoring actual copy number. Drosophila X chromosomes are dosage compensated over the full range of gene expression values. Given that MSL is bound selectively to expressed genes, we also asked if there is a relationship between expression levels and dosage compensation. We determined that the RNAi treatments had the same effect on X chromosome gene expression regardless of expression levels (Figure 5E and 5F). Interestingly, these experiments also showed only a modest effect of mof on autosomal expression, suggesting that the proposed autosomal function of Mof [16] is subtle. The effect of Mof on autosomes was expression level dependent, as we observed a greater fold effect at low expression levels. However, the most overt effect of wild type Msl2 or Mof was a 1.35-fold increase in X chromosome expression at all expression values. These data indicate that MSL acts as a feed-forward multiplier causing a fixed-fold effect on X chromosome expression regardless of gene copy number and basal gene expression value.

The aneuploid S2 cells also allowed us to examine the effect of X chromosome dosage compensation when the X chromosome dose was greater or less than 50%. Precise X chromosome dosage compensation did not occur at these other gene doses (Figure 2C, p,1029, KS test). For example, when we compared expression from three copy genes on the X chromosome and autosomes, X chromosome gene expression per copy was higher despite identical copy number (Figure 2D). Thus, we suggest that X chromosome dosage compensation is error generating when the underlying X chromosome gene dose is equivalent to the autosomal gene dose. Similarly, we found under-compensated X chromosome expression when there was a single copy of an X chromosome segment. These data indicate that the anticipated or predicted X chromosome copy number that implements the sex and dosage compensation pathway determines X chromosome expression. The actual X chromosome dose is not a factor. This error generation following perturbation is a property of feed-forward regulation [22].

MSL Complex Genome-Wide Sublinear Expression Response to Gene Dose

To evaluate the effect of the MSL complex on appropriate and error generating X chromosome dosage compensation in S2 cells, we performed RNA interference (RNAi) experiments to knockdown expression of two genes encoding key MSL components, msl2 and mof. If MSL operates via feedback regulation, then knockdown should differentially alter expression depending on dose, whereas if MSL is a feed-forward regulator, the effect of MSL on expression should be X chromosome specific but dose independent. We selected double stranded RNAs (dsRNA) targeting msl2 and mof that resulted in greater than 90% knockdown at the mRNA (not shown) and protein levels (Figure 3A). MSL is a chromatinmodifying machine. We therefore also determined if RNAi altered X chromatin. The X chromosome showed high levels of acetylation at expressed genes (Figure 3B and 3C), and both msl2 and mof RNAi resulted in markedly reduced H4K16ac levels on the X chromosome as determined by chromatin immunoprecipitation on microarray (ChIP-chip, Figure 3B, 3D, and 3E). RNAi against mof also resulted in decreased autosomal H4K16ac (Figure 3B and 3E). All these data suggest that the RNAi treatments were effective. We then measured the effect of msl2 and mof RNAi on expression by RNA-Seq. As in the previous experiments, we validated expression by microarray expression profiling and found outstanding agreement (rs = 0.87â&#x20AC;&#x201C;0.89, p = 0, Figure S3). We observed decreased expression of X chromosome genes following either RNAi treatment (Figure 4, p,1022, KS test), consistent with the role of MSL in promoting expression of X chromosome genes relative to autosomes. For example, in mof RNAi cells we observed a median expression of 26.4 RPKM for autosomal genes present at four copies and only 18.6 RPKM for X chromosome genes present at two copies (p,10215, KS test). The msl2 or mof RNAi treatments broke the precise equilibration of 2X with 4A expression. We observed 1.35-fold greater X chromosome expression attributable to wild type Msl2 or Mof (average RNAi/Mock expression ratio = 0.74, p,10215, KS test), with little to no effect on autosomal expression (Figure 5A and 5B). If MSL acts as a strict feed-forward regulator, then MSL would have the same fold effect on all populations of X chromosome genes at a given copy number, irrespective of the actual copy number. Indeed, we observed a similar fold effect on the expression of X chromosome genes with different copy numbers (Figure 5C and 5D, 0.58,p,0.89 in msl2 RNAi, 0.21,p,0.91 in mof RNAi, KS PLoS Biology | www.plosbiology.org

X chromosome dosage compensation is 2-fold, but we observed only a 1.35-fold effect of MSL. If MSL is the only contributor to X chromosome dosage compensation and if knockdown was complete, we would expect X chromosome and autosome genes with the same copy number to show the same expression levels following msl2 or mof RNAi treatment. However, following either msl2 or mof RNAi, three copy genes on the X chromosome were still 1.19-fold over-expressed relative to three copy genes on autosomes (Figure 6A, p,0.01, KS test). This difference between expected and observed expression could be due to residual MSL activity exclusively, or due to a combination of residual MSL activity and an MSL-independent component of X chromosome dosage compensation. The MSL-independent compensation could be the same as observed on the autosomes. Given that the fixedfold properties of MSL also apply to residual activity, then the over-expression of X chromosome genes following RNAi treatment should also have a fixed fold effect if there is residual MSL activity. We observed significantly increased variance in the expression ratios between the X chromosome and autosomes following RNAi (p,1022, F test, Figure 6B). This supports the idea that much of the unexplained X chromosome dosage compensation is not due to a fixed-fold effect on expression. It is possible that there are MSL-dose dependent effects on X chromosome expression due to variable affinity, although the fixed-fold effect of MSL knockdown on the population of genes makes this less likely. These data suggest that there is an MSL-independent component of X chromosome dosage compensation. To determine if the MSL-independent component is the same dosage compensation system that operates on autosomes, we characterized the sublinear expression response to gene dose for the X chromosome and autosomes with or without RNAi treatment. There were three distinct trend lines for the relationship between copy number and expression: one for the autosomes and one each for the X chromosome with and without RNAi treatment (Figure 6A). There are an infinite number of possible sublinear curves. If the nature of the dose response on the X chromosome differed from the autosomes, or the presence or absence of MSL, then scaling should not result in a common fit. However, if the three dose response curves are the result of a common dosage compensation mechanism, then they should scale to yield a single curve that fits all three of the absolute doseresponse curves. 5

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

Figure 3. msl2 and mof RNAi. (A) Western analysis showing changes in MSL protein abundance following RNAi for msl2 and mof in S2 cells. (B) Kmeans clustering (k = 3) of H4K16ac ChIP/input ratio for expressed genes on the X chromosome and chromosome 3R in RNAi and mock treated S2 cells. Genes enriched (yellow) and depleted (blue) for H4K16ac are indicated. (C) Boxplots showing the distribution of H4K16ac ChIP/input ratios in mock treated cells for expressed genes on different chromosome arms. (Dâ&#x20AC;&#x201C;E) Boxplots showing the distribution of H4K16ac ChIP ratios between msl2 RNAi cells (D) or mof RNAi cells (E) and mock treated cells for expressed genes on different chromosome arms. Significant differences (p,1022) among chromosome arms (C) and between RNAi and mock treated cells (D, E) are indicated by asterisks. doi:10.1371/journal.pbio.1000320.g003

We set median expression fold change at 2X and 4A to 1.0 for both copy number and expression (Figure 6C). We found that X chromosome and autosomes show remarkably similar fold changes in expression relative to fold changes in copy number. AdditionPLoS Biology | www.plosbiology.org

ally, the relationship between X chromosome expression and copy number is MSL independent following scaling. These data suggest that like the autosomes, the X chromosome is subject to dosage compensation based on actual gene dose. The gene dose to 6

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

Figure 4. Expression following msl2 or mof RNAi. Boxplots showing the distribution of expression RPKM values at indicated copy number on the X chromosome (left) and autosomes (right) in RNAi and mock treated S2 cells. Equivalent expression of two copy X chromosome genes and four copy autosomal genes in mock treated cells is shown (dashed line). See Figure 2 for boxplot format. Asterisks indicate significant expression decrease in RNAi cells compared to mock treated cells. doi:10.1371/journal.pbio.1000320.g004

expression response fits a one parameter model y = x(EC50 +1)/ (EC50 + x), where y is transcript abundance, x is DNA copy number expressed as a ratio relative to wild type, and EC50 is the copy number required for half maximal expression (r2.0.99). This indicates that gene expression is a saturating function of gene dose regardless of chromosome location or the presence of MSL.

compensation may explain all of the final increase in S2 cell X chromosome expression (1.50-fold61.35-fold = 2.03-fold). While most work on dosage compensation focuses on the X chromosome [2,11], other organisms also show dosage compensation on autosomes [33]. For example, mammalian trisomies show only about a 1.3-fold increase in gene expression as a result of a 1.5-fold change in gene dose [34,35]. Compensation is likely to be a universal property of biological systems that enables cells to avoid deleterious effects of genetic load and other perturbations.

Discussion Our data indicate that the MSL complex and general compensation mechanisms independently contribute to male X chromosome dosage compensation. The MSL complex recognizes active X chromosome genes [28â&#x20AC;&#x201C;31]. We have shown that MSL then acts as a simple unidirectional multiplier of expression regardless of the actual gene dose and gene expression level. In contrast, buffering and feed-back are dose sensitive and absorb the expression perturbations caused by unbalanced dose. We suggest that all these mechanisms are critical for proper X chromosome dosage compensation. Some rough accounting illustrates the composite nature of X chromosome dosage compensation. In the Drosophila genus, dosage compensation results in a 2.0- to 2.2-fold increase in X chromosome expression in males relative to autosomes [13,32]. Similarly, in S2 cells we observed a 2.08-fold increase in X chromosome expression. The fixed-fold effect of MSL resulted in at least a 1.35-fold increase in X-chromosome expression. Doseresponsive compensation also acted to increase X chromosome expression and was independent of MSL function. We can estimate the contribution of dose-responsive compensation from work performed on whole flies and on S2 cells. Autosomal dosage compensation increases per copy expression by 1.4- to 1.6-fold in diploid flies with a single copy of tens of genes [13,19]. In agreement with those reported values, we can project that a 2-fold change in scaled DNA dose in S2 cells results in about a 1.5-fold increase in scaled gene expression. Thus, at face value, the layered effect of dose-responsive compensation and feed-forward dosage PLoS Biology | www.plosbiology.org

Materials and Methods Cell Strains and Media Drosophila S2 cells [9] (a.k.a. SL2) were obtained from Drosophila RNAi Screening Center (DRSC, Harvard Medical School, Boston, MA) and were grown at 25uC in Schneiderâ&#x20AC;&#x2122;s Drosophila Medium (Invitrogen, Carlsbad, CA) supplemented with 10% Fetal Bovine serum (SAFC Biosciences, Lenexa, KS) and PenicillinStreptomycin (Invitrogen, Carlsbad, CA). These cells were used for all experiments, except CGH, where S2-DRSC cells were obtained from the Drosophila Genomics Resource Center (#181, Bloomington, IN).

Sequencing We extracted S2 cell genomic DNA using a genomic DNA kit (Qiagen, Valencia, CA). Approximately 2 mg of purified genomic DNA was randomly fragmented to less than 1,000 bp by 30 min sonication at 4uC with cycles of 30 s pulses with 30 s intervals using the Bioruptor UCD 200 and a refrigerated circulation bath RTE-7 (Diagenode, Sparta, NJ). Sonicated chromatin (see ChIP protocol) was purified by phenol/chloroform extraction. We extracted S2 cell total RNA with Trizol (Invitrogen, Carlsbad, CA) and isolated mRNA using Oligotex poly(A) (Qiagen, Valencia, CA). The number of cells used for each extraction was counted using a haemocytometer. The quality of mRNA was examined by RNA 6000 Nano chip on a Bioanalyzer 7

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

Figure 5. Mof and Msl2 effects on expression. (A, B) Boxplots showing the distribution of expression ratios between msl2 RNAi cells (A) or mof RNAi cells (B) and mock treated cells by chromosome arms. The expected fold decrease in X chromosome expression after RNAi treatment is indicated (red dashed line). (C, D) Boxplots showing the expression ratios for msl2 (C) and mof (D) RNAi treated cells at indicated gene copy numbers. The X chromosome (left) and autosomes (right) are shown separately. (E, F) The relation between gene expression and fold expression change in msl2 (E) and mof (F) RNAi treated cells plotted as a moving average (20 gene/window). doi:10.1371/journal.pbio.1000320.g005

PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

TG-39, and reverse, 59-taatacgactcactatagggTGCGGTCGCTGTAGTCATAG-39. For RNAi treatment, S2 cells were resuspended in serum free media at 26106 cells/ml. Twenty mg dsRNA was added to 1 ml of cell suspension and incubated for 45 min at room temperature. Cells with the same serum free media treatment but without added dsRNA were used as mock treated controls. After the incubation, 3 ml complete medium was added and the cells were cultured for another 4 d. Cells were collected and split into three aliquots for mRNA extraction, chromatin immunoprecipitation, and western analysis.

2100 (Agilent, Santa Clara, CA) according to the manufacture’s protocol. One hundred ng of the extracted mRNA was then fragmented in fragmentation buffer (Ambion, Austin, TX) at 70uC for exactly 5 min. The first strand cDNA was then synthesized by reverse transcriptase using the cleaved mRNA fragments as template and high concentration (3 mg) random hexamer Primers (Invitrogen, Carlsbad, CA). After the first strand was synthesized, second strand cDNA synthesis was performed using 50U DNA polymerase I and 2U RNaseH (Invitrogen, Carlsbad, CA) at 16uC for 2.5 h. Deep sequencing of both DNA and short cDNA fragments were performed [36,37]. Libraries were prepared according to instructions for genomic DNA sample preparation kit (Illumina, San Diego, CA). The library concentration was measured on a Nanodrop spectrophotometer (NanoDrop products, Wilmington, DE), and 4 pM of adaptor-ligated DNA was hybridized to the flow cell. DNA clusters were generated using the Illumina cluster station, followed by 36 cycles of sequencing on the Illumina Genome Analyzer, in accordance with the manufacturer’s protocols. Two technical replicate libraries were constructed for each DNA-Seq sample. Two libraries were prepared from two biological replicates of each RNA material (RNAi or mock treated).

ChIP For ChIP [40], 5–106106 S2 cells were fixed with 1% formaldehyde in tissue culture media for 10 min at room temperature. Glycine was added to a final concentration of 0.125 M to stop cross-linking. After 5 min of additional incubation and two washes with ice-cold PBS, cells were collected and resuspended in cell lysis buffer (5 mM PH 8.0 PIPES buffer, 85 mM KCl, 0.5% Nonidet P40, and protease inhibitors cocktail from Roche, Basel, Switzerland) for 10 min and then resuspended in nuclei lysis buffer (50 mM PH 8.1 Tris.HCl, 10 mM EDTA, 1% SDS and protease inhibitors) for 20 min at 4uC. The nuclear extract was sheared to 200–1,000 bp by sonication on ice for 8 min (pulsed 8 times for 30 s with 30 s intervals using a Misonix Sonicator 3000; Misonix, Inc. Farmingdale, NY). The chromatin solution was then clarified by centrifugation at 14,000 rpm for 10 min at 4uC. Five ul anti-H4AcK16 (Millipore, Billerica, MA) was incubated with the chromatin for 2 h and then was bound to protein A agarose beads at 4uC overnight. The beads were washed three times with 0.1% SDS, 1% Trition, 2 mM EDTA, 20 mM PH 8.0 Tris, and 150 mM NaCl; three times with 0.1% SDS, 1% Trition, 2 mM EDTA, 20 mM PH 8.0 Tris, and 500 mM NaCl; and twice with 10 mM PH 8.1 Tris, 1 mM EDTA, 0.25 M LiCl, 1% NP40, and 1% sodium deoxycholate. The immunoprecipitated DNA was eluted from the beads in 0.1 M NaHCO3 and 1% SDS and incubated at 65uC overnight to reverse cross-linking. DNA was purified by phenol-chloroform extraction and ethanol precipitation. The precipitated DNA for Chromatin immunoprecipitation was amplified using a Ligation-mediated PCR (LM-

RNAi dsRNA for RNAi treatment [38] was produced by in vitro transcription of a PCR generated DNA template from Drosophila genomic DNA containing the T7 promoter sequence on both ends. Target sequences were scanned to exclude any complete 19 mer homology to other genes [39]. The dsRNAs were generated using the MEGAscript T7 kit (Ambion, Austin, TX) and purified using RNAeasy kit (Qiagen, Valencia, CA). Two different primer sets were used for each target gene, and the one with better RNAi efficiency was used for downstream experiments. The selected primer sequences for generation of msl2 dsRNA template by PCR were as follows: forward, 59-taatacgactcactatagggTTGCTCCGACTTCAAGACCT-39, and reverse, 59-taatacgactcactatagggGCATCACGTAGGAGACAGCA-39 and the selected primer sequences for generation of mof dsRNA template were as follows: forward, 59-taatacgactcactatagggGACGGTCATCACAACAGG-

Figure 6. Characterization of dose-response curves. (A, C) Median expression RPKM values plotted against the DNA copy for X chromosome and autosome genes in RNAi and mock treated S2 cells based on absolute (A) or scaled (C) data. Fitted trend lines for the X chromosome (red) and autosomes (black) following mock (solid), msl2 (dashed), and mof (dotted) RNAi treatment are indicated. (B) Boxplots and table showing the distribution of expression ratios among different copy numbers. Expression fold change values were calculated based on real median RPKM values (bold) or projected expression values. Asterisks indicate significant variation for the expression fold change between X chromosome and autosome genes at an equivalent dose in RNAi cells (p,1022). doi:10.1371/journal.pbio.1000320.g006

PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

PCR) protocol from FlyChip [41]. ChIP was performed on triplicate biological samples.

5.12 annotation (Oct. 2008) and calculated the total number of reads of all unique exons per kb of total length of unique exons per million mapped reads (RPKM) for each annotated gene. The RPKM calculation was done for individual RNA-Seq libraries separately, and then RPKM values were averaged for biological replicates (r2 = 0.98 between replicates). Non-expressed genes are not useful for ratiometric analysis and these were therefore excluded. We used RPKM values for intergenic regions to determine expression thresholds. For intergenic regions, the RPKM values were calculated for total number of reads between adjacent gene model pairs. Only 5% of intergenic regions in S2 cells have a RPKM value greater than or equal to 4. Therefore, we called genes with RPKM values no less than 4 in S2 cells as expressed with an estimated type I error rate of 5%. All microarray data (except CGH) and statistical tests were processed and analyzed in R/Bioconductor [46]. For the ChIPchip experiments, we used quantile normalization based on the input channel. The distributions of raw and normalized intensities were checked to make sure that normalization was appropriate (i.e., that the skew was maintained). We used the average ChIP/ input ratio from biological replicates (r2 = 0.40–0.54 between replicates). The ChIP/input ratios in RNAi and mock treated cells were used for K-means clustering analysis with 3 nodes using Euclidean similarity metric and genes on X chromosome and autosomes were clustered separately using Cluster3.0 and then visualized using Tree-View [47]. For expression profiling, we normalized using loess within each 12-plex and quantile between 12-plexes. Average probeset log2 intensities were calculated in both channels for each gene. Correlations between array intensities and RPKM values were estimated by Spearman’s rank correlation coefficient. The comparisons for the distributions of DNA densities or expression values among different chromosomes and different copy numbers were performed using two sample Kolmogorov-Smirnov tests (KS tests). Normalization is inherently problematic when a large fraction of the genome changes expression, as in the RNAi experiments. Given that 20% of the genome is encoded on the X chromosome (X) and 80% is encoded on autosomes (A), and that one samples transcripts from a total mRNA pool to generate an expression profile, and that X chromosome expression is reduced by half and autosome expression does not change, then autosomal transcripts must be over-sampled in the experiment. Conversely, if the autosome expression is doubled, then X chromosome transcripts must be under-sampled. While it is imprudent to formally state the precise contribution of X chromosome expression changes and autosomal expression changes due to MSL-mediated dosage compensation, we can determine which makes the larger contribution based on the RPKM, total mRNA, and cell count measurements. Using this information, we calculated the loglikelihood value for two hypotheses:

Microarrays Six hundred ng of amplified DNA (ChIP enriched DNA or input DNA) were labeled using 6ug Cy3- or Cy5-labeled random nonamers (Trilink Biosciences, San Diego, CA) with 50U Klenow (New England Biolabs, Ipswich, MA) and 2 mM dNTPs. The labeled DNA was purified and hybridized to FlyGEM microarrays [42]. Arrays were scanned on an Axon 4000B scanner (Molecular Devices Corporation, Sunnyvale, CA) and signal was extracted with GenePix v.5.1 image acquisition software (Molecular Devices Corporation). Two hundred ng aliquots of the same extracted mRNA used for RNA-Seq were labeled as described [42] and were hybridized to NimbleGen custom 12 plex microarrays at 42uC using a MAUI hybridization station (BioMicro Systems, Salt Lake City, UT) according to manufacturer instructions (NimbleGen Systems, Madison, WI). Arrays were scanned on an Axon 4200AL scanner (Molecular Devices Corporation, Sunnyvale, CA) and data were captured using NimbleScan 2.1 (NimbleGen Systems, Madison, WI).

Western Analysis Cell lysates were prepared from cells 4 d after dsRNA or mock treatment by boiling for 5 min in NuPAGE LDS sample buffer (Invitrogen, Carlsbad, CA). Samples were run by SDS-PAGE using a 4%–12% Bis-Tris gel (Invitrogen, Carlsbad, CA) and transferred to PVDF membrane. Blots were incubated with antiMSL antibody (1:500), anti-MOF antibody (1:3,000, gifts of M. Kuroda), or anti-a tubulin antibody (1:10,000, Sigma, St. Louis, MO) and then with HRP-secondary antibodies in PBS buffer with 0.1% Tween 20. Protein signals were detected by Pierce SuperSignal West Dura extended Duration Substrate (Thermo Fisher Scientific, Rockford, IL). Images were captured using a Fuji LAS3000 Imager and quantified using the Image Gauge software (Fuji Film, Tokyo, Japan).

Data Processing Both DNA-Seq and RNA-Seq sequence reads were compiled using a manufacturer-provided computational pipeline (Version 0.3) including the Firecrest and Bustard applications [36]. Sequence reads were then aligned with the Drosophila melanogaster assembly (BDGP Release 5, dm3) [6,43] using Eland. Only uniquely mapped reads with less than two mismatches were used. For DNA-Seq data, we counted the number of reads in the nonoverlapped 1 kb region along each chromosome using all sequenced reads from two technical DNA-Seq libraries and calculated the read density by the number of unique mapped reads per kb per million mapped reads (RPKM) [37]. The breakpoint positions of aneuploid segments were identified using the Bayesian analysis of change point (bcp) package from R [44]. Because some reads mapped to multiple positions in the genome and thus inappropriately lower the deduced copy number in regions with low sequence complexity, we removed all the 1 kb windows with RPKM lower than 2 (RPKM value of one copy = 2.29) prior to change point analysis. Breakpoints with posterior possibility .0.95 were used. Copy number was assigned to segments based on the fold between average segments RPKM value between breakpoints (2.2961.15 RPKM = 1 copy, 4.5861.15 RPKM = 2 copy, etc.). Genes spanning two segments were not used in gene expression analysis. For RNA-Seq data, we counted the number of unique mapped reads within all unique exons of Drosophila Flybase [45] Release PLoS Biology | www.plosbiology.org

H0 : ARNAi ~AWT ,XRNAi ~1=2XWT H1 : ARNAi ~2AWT ,XRNAi ~XWT Here hypothesis H0 states that the expression of autosomes (A) remains the same and the expression of the X chromosome (X) decreases by half after RNAi treatment. Hypothesis H1 states that the expression of autosomes (A) is increased by 2-fold after the RNAi treatment and the expression of X chromosome (X) remains the same. The expected sum of expression in the RNAi treated cells is 90% of wild type for H0 and 180% for H1. E is the measured mRNA per cell. In the duplicate RNA-Seq experiments, 10

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

we obtained mRNA yields of 0.16 pg and 0.17 pg/cell from mock treated, 0.15 pg and 0.19 pg/cell from Msl2 knockdown, and 0.14 pg and 0.20 pg/cell from Mof knockdown S2 cells.

log likelihood H0 ~

n X

{

i~1

1 2

EWTi {E WT aWT

Supporting Information Figure S1 Copy number determination by Bayesian Change Point Analysis of DNA-Seq read density. Found at: doi:10.1371/journal.pbio.1000320.s001 (1.12 MB PDF)

Figure S2 DNA-Seq densities of each copy number defined by DNA-Seq copy number calls or CGH copy number calls. Found at: doi:10.1371/journal.pbio.1000320.s002 (0.07 MB PDF)

2 2 !! EMofi {0:9E WT EMsl2 {0:9E WT z z aMsl2 aMof

RNA-Seq and array expression profiling. Found at: doi:10.1371/journal.pbio.1000320.s003 (2.01 MB PDF)

Figure S3

log likelihood H1 ~

n X i~1

{

1 2

EWTi {E WT aWT

EMsl2i {1:8E WT z aMsl2

Table S1 Copy number segments based on DNA-Seq. Found at: doi:10.1371/journal.pbio.1000320.s004 (0.04 MB XLS)

2 !! EMofi {1:8E WT z aMof

Table S2 Copy number validation by DNA-Seq and CGH. Found at: doi:10.1371/journal.pbio.1000320.s005 (0.09 MB DOC)

The log-likelihood of H0 – the log-likelihood of H1 = 26.4 suggests that X chromosome expression change contributes more than autosomal expression change to the observed measurements of expression in wide type cells relative to RNAi treated cells.

Table S3 The number of genes in each copy number category. Found at: doi:10.1371/journal.pbio.1000320.s006 (0.03 MB DOC)

Comparative Genomic Hybridization (CGH)

Acknowledgments

DNA was isolated from Drosophila S2-DRSC cells obtained from the Drosophila Genomics Resource Center (#181, Bloomington, IN) and from w1118 0–2 h embryos as described previously [48]. The isolated cell line and embryonic DNA were labeled with either Cy5 or Cy3 conjugated dUTP and subsequently hybridized to a custom Agilent genomic tiling array (GEO; GPL7787). Changes in copy number along each of the Drosophila chromosome arms were detected by a dynamic programming algorithm which divided each arm into the optimal number of copy number segments [49].

We thank members of Oliver laboratory and Carson Chow for helpful discussion and comments on the manuscript, Mathias Beller for help with cell culture and RNAi experiments, David Clark for help with ChIP experiments, Mitzi Kuroda for anti-Msl2 and anti-Mof reagents, and the NIDDK genomics core for assistance with Illumina sequencing.

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: YZ SKP ES DMM BO. Performed the experiments: YZ SKP. Analyzed the data: YZ JHM SKP VP DMM BO. Contributed reagents/materials/analysis tools: JHM ES. Wrote the paper: YZ BO.

Accession Numbers All Seq and array data sets are available at GEO under accession number GSE16344. The CGH data set is available at modENCODE submission ID 596.

References 13. Gupta V, Parisi M, Sturgill D, Nuttall R, Doctolero M, et al. (2006) Global analysis of X-chromosome dosage compensation. J Biol 5: 3. 14. Kelley RL, Solovyeva I, Lyman LM, Richman R, Solovyev V, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81: 867–877. 15. Akhtar A, Becker PB (2000) Activation of transcription through histone H4 acetylation by MOF, an acetyltransferase essential for dosage compensation in Drosophila. Mol Cell 5: 367–375. 16. Kind J, Vaquerizas JM, Gebhardt P, Gentzel M, Luscombe NM, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133: 813–828. 17. Ruthenburg AJ, Li H, Patel DJ, Allis CD (2007) Multivalent engagement of chromatin modifications by linked binding modules. Nat Rev Mol Cell Biol 8: 983–994. 18. Bhadra MP, Bhadra U, Kundu J, Birchler JA (2005) Gene expression analysis of the function of the male-specific lethal complex in Drosophila. Genetics 169: 2061–2074. 19. Stenberg P, Lundberg LE, Johansson AM, Ryden P, Svensson MJ, et al. (2009) Buffering of segmental and chromosomal aneuploidies in Drosophila melanogaster. PLoS Genet 5: e1000465. doi:10.1371/journal.pgen.1000465. 20. Birchler JA, Hiebert JC, Paigen K (1990) Analysis of autosomal dosage compensation involving the alcohol dehydrogenase locus in Drosophila melanogaster. Genetics 124: 679–686. 21. Devlin RH, Holm DG, Grigliatti TA (1982) Autosomal dosage compensation Drosophila melanogaster strains trisomic for the left arm of chromosome 2. Proc Natl Acad Sci U S A 79: 1200–1204. 22. Heylighen F, Joslyn C (2001) Cybernetics and second-order cybernetics. In: Meyers RA, ed. Encyclopedia of physical science & technology (3rd ed). New York: Academic Press. pp 1–23.

1. Henrichsen CN, Chaignat E, Reymond A (2009) Copy number variants, diseases and gene expression. Hum Mol Genet 18: R1–R8. 2. Payer B, Lee JT (2008) X chromosome dosage compensation: how mammals keep the balance. Annu Rev Genet 42: 733–772. 3. Vanneste E, Voet T, Le Caignec C, Ampe M, Konings P, et al. (2009) Chromosome instability is common in human cleavage-stage embryos. Nat Med 15: 577–583. 4. Veitia RA, Bottani S, Birchler JA (2008) Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects. Trends Genet 24: 390–397. 5. Lindsley DL, Sandler L, Baker BS, Carpenter AT, Denell RE, et al. (1972) Segmental aneuploidy and the genetic gross structure of the Drosophila genome. Genetics 71: 157–184. 6. Hoskins RA, Carlson JW, Kennedy C, Acevedo D, Evans-Holm M, et al. (2007) Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science 316: 1625–1628. 7. Weaver BA, Cleveland DW (2006) Does aneuploidy cause cancer? Curr Opin Cell Biol 18: 658–667. 8. Cherry S (2008) Genomic RNAi screening in Drosophila S2 cells: what have we learned about host-pathogen interactions? Curr Opin Microbiol 11: 262–270. 9. Schneider I (1972) Cell lines derived from late embryonic stages of Drosophila melanogaster. J Embryol Exp Morphol 27: 353–365. 10. Copps K, Richman R, Lyman LM, Chang KA, Rampersad-Ammons J, et al. (1998) Complex formation by the Drosophila MSL proteins: role of the MSL2 RING finger in protein complex assembly. Embo J 17: 5409–5417. 11. Lucchesi JC, Kelly WG, Panning B (2005) Chromatin remodeling in dosage compensation. Annu Rev Genet 39: 615–651. 12. Belote JM, Lucchesi JC (1980) Control of X chromosome transcription by the maleless gene in Drosophila. Nature 285: 573–575.

PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000320

Expression in Aneuploid Cells

36. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59. 37. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628. 38. Caplen NJ, Fleenor J, Fire A, Morgan RA (2000) dsRNA-mediated gene silencing in cultured Drosophila cells: a tissue culture model for the analysis of RNA interference. Gene 252: 95–105. 39. Kulkarni MM, Booker M, Silver SJ, Friedman A, Hong P, et al. (2006) Evidence of off-target effects associated with long dsRNAs in Drosophila melanogaster cellbased assays. Nat Methods 3: 833–838. 40. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, et al. (2000) Genomewide location and function of DNA binding proteins. Science 290: 2306–2309. 41. Birch-Machin I, Gao S, Huen D, McGirr R, White RA, et al. (2005) Genomic analysis of heat-shock factor targets in Drosophila. Genome Biol 6: R63. 42. Johnston R, Wang B, Nuttall R, Doctolero M, Edwards P, et al. (2004) FlyGEM, a full transcriptome array platform for the Drosophila community. Genome Biol 5: R19. 43. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, et al. (2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. 44. Erdman C, Emerson JW (2008) A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics 24: 2143–2148. 45. Wilson RJ, Goodman JL, Strelets VB (2008) FlyBase: integration and improvements to query tools. Nucleic Acids Res 36: D588–D593. 46. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80. 47. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95: 14863–14868. 48. MacAlpine DM, Rodriguez HK, Bell SP (2004) Coordination of replication and transcription along a Drosophila chromosome. Genes Dev 18: 3094–3105. 49. Huber W, Toedling J, Steinmetz LM (2006) Transcript mapping with highdensity oligonucleotide tiling arrays. Bioinformatics 22: 1963–1970.

23. Darzacq X, Shav-Tal Y, de Turris V, Brody Y, Shenoy SM, et al. (2007) In vivo dynamics of RNA polymerase II transcription. Nat Struct Mol Biol 14: 796–806. 24. Kacser H, Burns JA (1981) The molecular basis of dominance. Genetics 97: 639–666. 25. Ptashne M (2004) A genetic switch: phage lambda revisited. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. xiv, 154 p. 26. Mileyko Y, Joh RI, Weitz JS (2008) Small-scale copy number variation and large-scale changes in gene expression. Proc Natl Acad Sci U S A 105: 16659–16664. 27. Franke A, Dernburg A, Bashaw GJ, Baker BS (1996) Evidence that MSLmediated dosage compensation in Drosophila begins at blastoderm. Development 122: 2751–2760. 28. Alekseyenko AA, Larschan E, Lai WR, Park PJ, Kuroda MI (2006) Highresolution ChIP-chip analysis reveals that the Drosophila MSL complex selectively identifies active genes on the male X chromosome. Genes Dev 20: 848–857. 29. Kind J, Akhtar A (2007) Cotranscriptional recruitment of the dosage compensation complex to X-linked target genes. Genes Dev 21: 2030–2040. 30. Gilfillan GD, Konig C, Dahlsveen IK, Prakoura N, Straub T, et al. (2007) Cumulative contributions of weak DNA determinants to targeting the Drosophila dosage compensation complex. Nucleic Acids Res 35: 3561–3572. 31. Straub T, Grimaud C, Gilfillan GD, Mitterweger A, Becker PB (2008) The chromosomal high-affinity binding sites for the Drosophila dosage compensation complex. PLoS Genet 4: e1000302. doi:10.1371/journal.pgen.1000302. 32. Sturgill D, Zhang Y, Parisi M, Oliver B (2007) Demasculinization of X chromosomes in the Drosophila genus. Nature 450: 238–241. 33. Zhang Y, Oliver B (2007) Dosage compensation goes global. Curr Opin Genet Dev 17: 113–120. 34. Altug-Teber O, Bonin M, Walter M, Mau-Holzmann UA, Dufke A, et al. (2007) Specific transcriptional changes in human fetuses with autosomal trisomies. Cytogenet Genome Res 119: 171–184. 35. Laffaire J, Rivals I, Dauphinot L, Pasteau F, Wehrle R, et al. (2009) Gene expression signature of cerebellar hypoplasia in a mouse model of Down syndrome during postnatal development. BMC Genomics 10: 138.

PLoS Biology | www.plosbiology.org

February 2010 | Volume 8 | Issue 2 | e1000320

2-D Structure of the A Region of Xist RNA and Its Implication for PRC2 Association Sylvain Maenner1, Magali Blaud1, Laetitia Fouillen2, Anne Savoye1, Virginie Marchand1¤, Agne`s Dubois3, Sarah Sanglier-Cianfe´rani2, Alain Van Dorsselaer2, Philippe Clerc3, Philip Avner3, Athanase Visvikis1, Christiane Branlant1* 1 AREMS, Nancy Universite´, UMR 7214 CNRS-UHP 1, Faculte´ des Sciences et Techniques, BP 70239, Vandoeuvre-le`s-Nancy, France, 2 Laboratoire de Spectrome´trie de Masse BioOrganique, Institut Pluridisciplinaire Hubert Curien, De´partement des Sciences Analytiques, Universite´ de Strasbourg, CNRS UMR 7178, ECPM, Strasbourg, France, 3 Ge´ne´tique Mole´culaire Murine, CNRS2578, Institut Pasteur, Paris, France

Abstract In placental mammals, inactivation of one of the X chromosomes in female cells ensures sex chromosome dosage compensation. The 17 kb non-coding Xist RNA is crucial to this process and accumulates on the future inactive X chromosome. The most conserved Xist RNA region, the A region, contains eight or nine repeats separated by U-rich spacers. It is implicated in the recruitment of late inactivated X genes to the silencing compartment and likely in the recruitment of complex PRC2. Little is known about the structure of the A region and more generally about Xist RNA structure. Knowledge of its structure is restricted to an NMR study of a single A repeat element. Our study is the first experimental analysis of the structure of the entire A region in solution. By the use of chemical and enzymatic probes and FRET experiments, using oligonucleotides carrying fluorescent dyes, we resolved problems linked to sequence redundancies and established a 2-D structure for the A region that contains two long stem-loop structures each including four repeats. Interactions formed between repeats and between repeats and spacers stabilize these structures. Conservation of the spacer terminal sequences allows formation of such structures in all sequenced Xist RNAs. By combination of RNP affinity chromatography, immunoprecipitation assays, mass spectrometry, and Western blot analysis, we demonstrate that the A region can associate with components of the PRC2 complex in mouse ES cell nuclear extracts. Whilst a single four-repeat motif is able to associate with components of this complex, recruitment of Suz12 is clearly more efficient when the entire A region is present. Our data with their emphasis on the importance of inter-repeat pairing change fundamentally our conception of the 2-D structure of the A region of Xist RNA and support its possible implication in recruitment of the PRC2 complex. Citation: Maenner S, Blaud M, Fouillen L, Savoye A, Marchand V, et al. (2010) 2-D Structure of the A Region of Xist RNA and Its Implication for PRC2 Association. PLoS Biol 8(1): e1000276. doi:10.1371/journal.pbio.1000276 Academic Editor: Kathleen Hall, Washington University School of Medicine, United States of America Received August 3, 2009; Accepted November 25, 2009; Published January 5, 2010 Copyright: ß 2010 Maenner et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by the French Centre National de la Recherche Scientifique (CNRS, http://www.cnrs.fr/), the French Ministry of ‘‘Enseignement Superieur et la Recherche’’ (http://www.enseignementsup-recherche.gouv.fr/), the French National Agency for Research (ANR, http://www. agence-nationale-recherche.fr/) (contract Number ANR 07 BLAN 0047 102), Institut Pasteur, European Union contract from Epigenome NoE, Region Alsace, and AliX. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CMCT, N-cyclohexyl-N0-(2-morpholinoethyl)-carbodiimide metho-p-toluolsulfonate; DMS, dimethylsulfate; Eed, embryonic ectoderm development; ES cell, embryonic stem cell; Ezh2, enhancer of zeste homolog 2; FRET, Fo¨rster resonance energy transfer; HIV, human immunodeficiency virus; HOTAIR, HOX antisense intergenic RNA; Lnx3, ligand of numb-protein X 3; NMR, nuclear magnetic resonance; PRC2, polycomb group protein 2; PTB, polypyrimidine tract-binding protein; Rbap46–48, retinoblastoma-binding protein p46–p48; RNP, ribonucleoprotein particle; RRM, RNA recognition motif; Suz12, suppressor of zeste 12 protein homolog; XCI, X chromosome inactivation; Xist, X (inactive)-specific transcript * E-mail: christiane.branlant@maem.uhp-nancy.fr ¤ Current address: Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

coding RNA (17 kb in length in the mouse), which is capped, spliced, and polyadenylated. Little is known about its structure and mechanism of action. The Xist gene has a complex origin. It includes degenerated pieces of an ancient protein gene Lnx3 as well as genomic repeat elements derived at least in part from transposon integration events [6,7]. The most conserved Xist RNA regions correspond to repeat elements (denoted A to E in mouse [8]), which are organized as tandem arrays. The A region (positions 292 to 713 in mouse, accession no. gi|37704378|ref|NR_001463.2| [2], and 350 to 770 in human, accession no. gi|340393|gb|M97168.1| [5]) is the most highly conserved of the repeat regions and is critical for initiation of XCI. The observation that female mouse

Introduction In mammals, the transcriptional silencing of one of the two X chromosomes in female cells (X chromosome inactivation, XCI) ensures sex chromosome dosage compensation [1]. Once acquired early in development, the inactivated state is faithfully inherited through successive cell divisions. XCI initiation is associated with increased Xist RNA transcription. Whilst first retained near its transcription site, Xist RNA then spreads along the entire X chromosome from which it has been transcribed [2–5] whilst, a series of epigenetic marks, which include the repressive histone modifications H3K27me3, H3K9me3, are recruited to the presumptive inactive X chromosome. Xist RNA is a long nonPLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

structures. Such inter-repeat interactions appear to be required for the binding of the various components of the PRC2 complex. We identified the minimal number of repeats necessary for such binding. The implications of our results within the wider context of X-inactivation and of the XCI mechanism(s) underlying silencing are discussed.

Author Summary In placental mammal females, Xist RNA is crucial for inactivation of one of the two X chromosomes in order to maintain proper X chromosome dosage. It is known that the conserved A region of Xist RNA, which contains eight or nine repeated elements, plays an essential role in this process, however, little is known about its structure and mechanism of action. By using chemical and enzymatic probes, as well as FRET experiments, we performed the first experimental analysis of the solution structure of the entire Xist A region. Both mouse and human A regions were found to form two long stem-loop structures each containing four repeats. In contrast to previous predictions, interactions take place both between repeats and between repeats and spacers. Affinity-purification of RNAprotein complexes formed by incubation of RNA in mouse ES cell nuclear extract, followed by mass spectrometry and antibody-based analyses of their protein contents, showed that the isolated 4-repeat structures from the A region can recruit components of the PRC2 complex that is needed for X chromosome inactivation. However, association of one component of this complex, Suz12, was more efficient when the entire A region was used.

Results Probing of Mouse and Human A Region 2-D Structures We analysed in parallel both the entire mouse and human A regions, as the sequence divergence of the inter-repeat linking sequence between mouse and human was expected to provide insight into how the spacer regions might influence the A repeat structure. The specific primers for the two A regions are listed in Table S1. To test if the A region interacts with neighbouring Xist RNA sequences, an RNA that contained only the mouse A region (positions 277 to 760 in mouse Xist RNA) and a larger RNA including the sequence extending from positions 1 to 1137 were studied in parallel by limited enzymatic digestion. Very similar digestion profiles (Figure 1A and B) were obtained for the two RNAs when digestions were performed on the T7 RNA transcripts after folding under the conditions outlined in Materials and Methods. We conclude that the A region probably folds on itself without major interaction with other upstream and downstream Xist sequences. Hence, our subsequent analyses of the 2-D structure of the Xist A region were carried out in the absence of flanking sequences (positions 227 to 760 and 330 to 796 for the mouse and human A regions, respectively). Each enzymatic digestion and chemical modification assay was carried out in duplicate using different transcript preparations and each extension analysis repeated 2 or 3 times for each primer. Representative examples of the primer extension analyses are provided in Figure 1 and Figure S1A–S1E.

embryos carrying a mutated XistDA gene inherited from males are selectively lost during embryogenesis underlines the importance of this element [9]. Recent data have shown that an early event in silencing is the formation of a Xist RNA compartment and that the A region whilst not necessary for formation of this compartment is needed for relocation of X linked genes into this territory [10]. Over-expression of a XistDA RNA in transgenic mouse ES cells indicates that the A region whilst not necessary for Xist coating is implicated in the recruitment of the PRC2 complex [11–16]. The PRC2 complex contains the Suz12, Eed, Ezh2, and Rbap46–48 proteins [17,18]. Eed and Suz12 have been proposed to bind nucleic acids [19,20], whereas Rbap46–48 may interact with nucleosome protein components [17]. Lysine 27 trimethylation of histone H3 is catalysed by Ezh2 [12,14] and both Eed and Suz12 are required for this activity [20]. Recently a short 1,600 nucleotide-long RNA which contains the A region at its 59 extremity was suggested to be expressed early in XCI initiation and to bind the PRC2 complex [21]. Since the function of Xist RNA is expected to depend on its 2-D structure, studies aimed at establishing the 2-D structure of the Xist A region have considerable interest. Based on nucleotide sequence of the A region and computer prediction, Wutz and colleagues have proposed that each repeat forms two short stemloop structures [11]. Recent NMR analysis confirmed that one of these stem-loop structures can be formed in vitro by an RNA molecule bearing a single copy of the mouse repeat A sequences [22]. In Xist RNA, the repeat sequences are, however, separated by long spacer regions (21 to 48 nt long for mouse). Since current models fail to take account of this sequence complexity, an experimental analysis of the entire A region was thought likely to provide valuable information on the structure of the A region. As conventional probing experiments are, however, hindered by the presence of the repeated sequences and long U tracks, we applied a combined approach exploiting both chemical and enzymatic probing of RNA structure in solution and FRET experiments using fluorescent oligonucleotide probes complementary to different parts of the A region. Using this dual approach, we could show that repeats in the A region interact with each other to form long irregular stem-loop PLoS Biology | www.plosbiology.org

M-Fold Assisted Modelling of the Mouse A Region 2-D Structure The structure proposed by Wutz and colleagues, in which each of the repeats fold into a double stem-loop structure, could not explain the numerous V1 RNase cleavages that we observed with both the mouse and human A regions (Figure 2) [11]. We explored the possibility that each repeat folds into a unique longer stem-loop structure. Such folding was similarly unable to explain V1 RNase cleavages (Figure S2). We conclude that the 2-D structure may involve interactions between repeats and spacers and inter-repeat interactions. There is, however, a multitude of potential ways for duplex formation between repeats (Figures 3–5, models 1–3). Our design of the putative structure was orientated by the detection of six successive strong V1 RNase cleavages in the central poly A sequence (positions 550 to 555), suggesting the involvement of this segment in a helical structure. The strong modification by DMS of a sequence immediately downstream (positions 555 to 561) was an indication for a single-stranded state. One possible explanation for these data was the formation of a central stem-loop structure called SLS2M with a U track on one strand and an A track on the other strand (Figure 5). Formation of this central stem-loop structure was subsequently imposed as a constraint when exploring the possible folding of the mouse A region. This excluded structures in which two successive repeats would interact with each other (Model 1, Figure 3), since in this case, the entire poly A region would interact with the poly U track located upstream of repeat 3, which was not in agreement with the probing data (Figure 3). Another possible structure involved formation of an interaction between the 59 and 39 halves of the A region. This would generate a very long irregular 2

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 1. Probing of mouse Xist A region RNA structure alone and inside 59-terminal region. The two RNAs (A. for 59 terminal 1137-nt long RNA, B. for A region) were in vitro transcribed and renatured as described in Material and Methods, before being subjected to limited digestion with T1, T2, or V1 RNases under the conditions described in Materials and Methods. Extension analyses were performed using oligonucleotide 3866 (Table S1) as the primer. The resulting cDNAs were fractionated by electrophoresis on 7% denaturing polyacrylamide gel. Lanes U, G, C, and A correspond to the sequencing ladder obtained with the same primer. Lanes marked by Contr corresponds to primer extension analysis of undigested RNA transcripts. Nucleotide numbering on the left-hand side of the autoradiogram takes the first residue of mouse Xist RNA as residue 1. The sequences corresponding to repeats 3 and 4 are indicated by vertical bars on the right-hand side of the autoradiograms. doi:10.1371/journal.pbio.1000276.g001

stem-loop structure with an A rich terminal loop (Model 2, Figure 4). An alternative structure involved the folding on themselves of the 59 and 39 parts of the A region, with SLS2M in between (Model 3, Figure 5). Several other alternative pairings of the repeats were also exploredâ&#x20AC;&#x201D;none fitted the chemical and enzymatic data perfectly. The notion of independent folding of the 59 part of the A region (positions 318 to 521 in mouse RNA) was supported by M-fold analysis of this segment, which identified a highly stable long irregular stem-loop structure, SLS1M (DG = 241.96 kcal/mol at 0uC in 3 M NaCl), in which repeat 1 interacts with repeat 4 and repeat 2 interacts with repeat 3. It is the most thermodynamically stable structure proposed for this 59 segment and was predicted irrespective of whether the experimental data were introduced as a constraint in the M-fold search. In SLS1M, each repeat interacts both with another repeat and with a spacer segment, increasing the stability of the overall structure. Similarly, M-fold analysis of the 39 part of the mouse A region suggested that repeat 5 interacts with repeat 8 and a spacer region and repeat 6 with repeat 7 and a spacer region. The resulting SLS3M structure was predicted as the most favourable structure by M-fold when the experimental data were incorporated as a constraint. The overall predicted three stem-loop structure (Model 3) has a low calculated free energy PLoS Biology | www.plosbiology.org

(277.76 kcal/mol) and has a better fit to the experimental data than Models 1 and 2 (Figures 3 and 4), suggesting that, in solution, Model 3 is the most likely structure among the numerous possibilities.

The Possibility to Form Structure 3 Is Phylogenetically Conserved If a structure has biological relevance, it is generally conserved throughout evolution. Therefore, we tested whether the most favourable structures identified for the mouse A region were relevant to the human A region in solution. The sequence of the human A region differs from that of mouse by the presence of an additional repeat 5 and the absence of a long central polyA region. Experimental data (Figures 6 and 7 and Figure S3Aâ&#x20AC;&#x201C;E) suggested that the central repeat 5 forms a central stem-loop structure (SLS2H). Based on this, structures similar to mouse Models 2 and 3 could be proposed for the human A region which involve either a long irregular stem-loop structure including all the repeats (Model 2) or a three stem-loop structure (Model 3) with repeats 1 to 4 forming a first stem-loop structure (SLS1H), repeat 5 folded alone in a second stem-loop (SLS2H) and repeats 6 to 9 involved in a third stem-loop structure (SLS3H). As for the mouse A region: 3

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 2. Representation of experimental data on the previously proposed 2-D structure of the Xist A region. Each of the seven repeats as well as the eighth half repeat from the mouse A region was folded according to the previously predicted two stem-loop structure [11]. T1, T2, and V1 RNase cleavages were represented by arrows surmounted by circles, triangles, and squares, respectively. Nucleotides modified by DMS or CMCT are circled. Colours of circles and arrows indicate the yields of modifications and cleavages—red, yellow, and green for strong, medium, and low modification or cleavage, respectively. The V1 RNase cleavages and chemical modifications that cannot be explained by the two stem-loop structure models are encircled by blue lanes. doi:10.1371/journal.pbio.1000276.g002

The maintenance of interactions between both spacers and repeats during mammalian evolution of the A region implies that the nucleotide sequences involved in these interactions were either conserved or subjected to compensatory base changes. This was confirmed by the alignment of the mouse, human, orangutan, baboon, lemur, dog, rabbit, cow, and elephant A region sequences (Figure S5). Nucleotide sequence conservation extends out beyond the repeats themselves for the majority of the repeats, allowing formation of the SLS1 and SLS3 structures in all sequenced Xist RNAs (Figure S6).

(i) A 2-D structure in which each repeat interacts with its immediate downstream repeat (repeats 1, 3, 6, and 8 with repeats 2, 4, 7, and 9, respectively, Model 1) was not supported by the probing data (segment 688 to 696) (Figure S4); (ii) M-fold analysis of the 59 portion of the human A region (positions 370 to 530) either with or without the experimental data as a constraint identified SLS1H as the most stable structure (DG = 242.70 kcal/ mol) (Figure 7); (iii) SLS3H was retained as the most stable structure for the 39 part of the A region, when the experimental data were added as a constraint to an M-fold search; and (iv) the 3 stem-loop structure corresponding to Model 3 (DG = 286.6 kcal/ mol) had the best fit with probing data compared to the other 2-D models. Further support for Model 3 was provided by our observation of identical patterns of enzymatic cleavage for the entire human A region and for the isolated SLS1H portion (Figure 6). PLoS Biology | www.plosbiology.org

FRET Experiments Bring Additional Data in Favour of Model 3 Three oligonucleotide pairs (P1–P5, P2–P4, and P6–P7) were retained in order to test Model 3 by FRET experiments (Figure 8). This Model predicts that the P1–P5 and P2–P4 pairs of 4

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 3. Representation of experimental data on the possible 2-D structure 1 of the Xist A region. In model 1, the various stem-loop structures all involve two successive repeats. The repeats are indicated by red lines and are numbered from 1 to 8. Representations of chemical and enzymatic data are as in Figure 2. In each of the three panels, segments in which DMS and/or CMCT modifications were identified are indicated on a schematic drawing of the 2-D structure (full and dot lines for DMS and CMCT, respectively). In segments analyzed by the two chemical reagents, unmodified nucleotides are squared. However, one should take into consideration the fact that G residues are poorly modified by CMCT in the mild conditions that we used. The free energies of each stem-loop structures at 0uC and in 3 M NaCl were calculated with the M-fold software. doi:10.1371/journal.pbio.1000276.g003

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 4. Representation of experimental data on the possible 2-D structure 2 of the Xist A region. In model 2, repeats in the 59 half of the A region interact with repeats in the 39 half of this region. Representation of enzymatic cleavages and chemical modifications are as in Figures 2 and 3. doi:10.1371/journal.pbio.1000276.g004

oligonucleotides interact with the single-stranded segments which border the helix formed by repeats 1 and 4, whilst the P6–P7 pair of oligonucleotides should interact with the single-stranded segments bordering the helix formed by repeats 5 and 8. A marked FRET effect would therefore be expected for these three oligonucleotide pairs if the A region was folded as in Model 3. The distance between the fluorophores of these three pairs of oligonucleotides would, on the other hand, be expected to be much larger if region A was folded as in structures 1 or 2. Whilst tertiary structural interactions might decrease the distances, a lower level of FRET would still be expected to be observed for the three pairs of oligonucleotides if the A region was folded according to structures 1 or 2 (Figure 8). The P1 and P6 oligonucleotides bind to two single-stranded segments which flank the helix formed by repeats 1 and 8 in structure 2. A strong FRET effect for P1 and P6 would therefore be expected if the A region was folded according to structure 2. Upon binding to the A region, oligonucleotide P7 partially disrupts the base-pair interactions formed by the central poly A stretch. However, as similar levels of destabilization are expected for the three possible structures, binding of this oligonucleotide was not expected to favour one structure more than the two other ones. The same is true for oligonucleotide P5 that binds to the partner U stretch of the poly A sequence. To monitor the level of FRET obtained for oligonucleotides bordering a helix, we used the short R1–2 transcript containing repeats 1 and 2 and their bordering sequences, which adopt a single unique 2-D structure and the P1–P39 oligonucleotide pair (Figure S7). Other controls exploited the P3–P6 and P3–P5 pairs, which were not expected to be in close proximity in any of the three proposed structures (Figure 8). The oligonucleotide pairs used are shown in Figure 8, along with examples of typical fluorescence intensity spectra recorded in FRET experiments for the P2–P4 and P3–P6 pairs (Figure 8D). High FRET signals in the range of 50% were obtained for the P1/ P5, P2/P4, and P7/P6 oligonucleotide pairs, whilst lower FRET signals were observed for the P1–P6 (35%) and especially the P3– P6 and P3–P5 oligonucleotide pairs (25% and 22%, respectively). This is compatible with a large part of the molecules being folded in solution into structure 3. Folding of a large number of molecules into structure 2 would have led to a strong FRET signal for the P1–P6 pair and lower signals for the five other pairs, which was not observed. The strong FRET effects obtained for the P1/ P5, P2/P4, and P7/P6 oligonucleotide pairs argues strongly against folding according to structure 1. Based on our FRET data, we conclude that folding predominantly occurs according to Model 3.

Recruitment of the PRC2 Complex by the A Region Previous studies have shown that the PRC2 complex interacts with the Xi [12,14,15,23] and the A region has been proposed to recruit the PRC2 complex through the Ezh2 subunit, which would act as an RNA-binding subunit [21]. We wished to explore further the binding of the PRC2 complex to the A region in the light of our structural data. In particular, we were interested in determining how many A region repeats were required to bind the individual Eed, Ezh2, RbAp46, RbAp48, and Suz12 components of the PRC2 complex. PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 5. Representation of experimental data on the possible 2-D structure 3 of the Xist A region. In model 3, two stem-loop structures containing four repeats are separated by a small stem-loop corresponding to poly A and poly U sequences. Representation of enzymatic cleavages and chemical modifications are as in Figures 2 and 3. doi:10.1371/journal.pbio.1000276.g005

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 6. Probing of the 2-D structures of the entire and 59 half of human A region. Legend as in Figure 1, except that the transcripts correspond to the entire human A repeat region (human A RNA) and the 59 half of the human A region (SLS1H RNA), respectively. doi:10.1371/journal.pbio.1000276.g006

We initiated a proteomic approach based on affinity chromatography purification of complexes formed upon incubation of in vitro transcribed Xist A region RNAs with nuclear extracts, followed by protein identification by mass spectrometry and Western blot analysis. Mouse ES cells are a widely exploited model for the study of XCI initiation, and we reasoned that as Xist RNA acts as an initiator of XCI, proteins which have to interact with this RNA to ensure early Xist functions should already be present in the nuclear extract of ES female mouse cells prior to differentiation. We used a control RNA containing only the three MS2 protein binding sites and tested four RNAs containing different segments of the mouse Xist A region flanked by the three MS2 binding sites at their 39 end (Figure 9A). These RNAs denoted as 1R/MS2, 2R/MS2, 4R/MS2, Aregion/MS2, and HIV/MS2 contained, respectively, repeat 1 without any neighbouring sequence, repeats 3 and 4 and their bordering spacers (positions 401 to 552 in mouse Xist RNA), the SLS1M stem-loop structure, the entire A region, and a fragment of HIV-1 RNA (positions 5338 to 5514 in the BRU RNA) used as a negative control. In order to get an idea of the proteins capable of associating with the entire A region, the proteins bound to purified complexes formed on the Aregion/MS2 RNA were analysed by mass spectrometry. Among numerous proteins detected were protein PTB and components of the PRC2 complex (Ezh2, RbAp46, RbAp48, and Suz12) (Figure S8). We then evaluated by Western PLoS Biology | www.plosbiology.org

blot experiments the relative amounts of each of the PRC2 components in RNPs formed by the various RNAs tested. Whilst Eed, Ezh2, and PTB were detected in complexes formed on RNAs containing two or more repeats, binding of RbAp46 and RbAp48 was detected only when using RNAs with at least four repeats and Suz12 when the entire A region was used (Figure 9C). The control HIV-1 RNA bound none of these proteins (Figure 9B and 9C). To further explore these data, we performed a series of experiments in which fragments of the A region were transcribed in vitro as radio-labelled RNA without MS2 fusion and these RNAs were incubated with mouse ES nuclear extracts. Three distinct RNAs (the complete A region, 4R, and 2R RNAs; Figure 9D and 9E) were used for these experiments. In confirmation of the possible interaction of Eed with an RNA containing only two repeats, trace amounts of 2R RNA were retained on the beads when an anti-Eed antibody was used. Only complexes containing the 4R RNA or the entire A region were retained when anti-Suz12, anti-Ezh2, anti-RbAp46, and antiRbAp48 antibodies were used. These observations confirmed the importance of the corresponding regions for association of these proteins. Clearly, however, higher amounts of the entire A region compared to 4R RNA were bound when the anti-Suz12 and Ezh2 antibodies were used. We conclude that whilst some segments of the A region allow the binding of particular PRC2 components, the entire A region is required for efficient association of the entire complex. 8

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 7. Representation of experimental data on possible structure 3 of human Xist A region. Repeat 5 is located in a short central stem-loop structure flanked by two larger stem-loop structures that each contain four repeats. Representation of the enzymatic cleavages and chemical modifications are as in Figures 2 and 3. Indication of segments in which DMS and/or CMCT modifications were identified is represented as in Figure 3. The free energies of the stem-loop structures were calculated with the M-fold software. doi:10.1371/journal.pbio.1000276.g007

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 8. Steady-state fluorescence studies provide additional support to the A region Model 3. In (A, B, and C), the binding sites of oligonucleotides P1 to P6 used in the FRET experiments are shown for the three 2-D structures of the A region corresponding to Models 1, 2, and 3. The identity of the chromophore present in each oligonucleotide (donor Cy3 or acceptor Cy5) is indicated in green and blue, respectively. Cy3- and Cy5-labeled oligonucleotides were purchased from Eurogentec. As illustrated in (D), the emission fluorescence spectra from 530 to 745 nm of the donor oligonucleotide bound alone to the RNA (green curve) were collected, as well as the emission spectra obtained in the presence of the donor and acceptor oligonucleotides (violin curve). No energy transfer between Cy3- and Cy5-labeled oligonucleotides was detected in solution. The FRET efficiency for each pair of oligonucleotides was defined as the decrease in fluorescence of the donor at 564 nm in the presence of the acceptor. Two representative examples of FRET assays (oligonucleotide pairs P2/P4 and P3/P6) are shown in (D). The FRET efficiencies measured for the six pairs of oligonucleotides are provided in (E) (mean values of three independent experiments). Standard deviations (s) are shown. The relative efficiencies of the FRET obtained for each oligonucleotide pairs are schematically represented in (A, B, and C) by lines joining the oligonucleotides. The thickness of the lines reflects the efficiency of the FRET effect. doi:10.1371/journal.pbio.1000276.g008

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure 9. The PRC2 complex assembles on fragments of the A region containing at least four repeats. (A) Representation of the fusion RNAs used for formation of RNP complexes with ES cell nuclear extracts. Retention of RNA containing three MS2 coat protein binding sites (MS2) on amylose beads was mediated by the MS2/MBP fusion protein [31]. Analysis of the protein content of the RNP complexes formed on the A region/MS2 RNA was achieved by mass spectrometry (Figure S8). (B and C) Western blot assays using antibodies specific for the PTB, Suz12, RbAp46, RbAp48, Eed, and Ezh2 protein were used to evaluate the relative amounts of these proteins in the purified complexes formed with the various RNAs shown in (A). Antibodies were purchased from Santa Cruz (Sc), Abcam (ab), and Calbiochem (cb): anti-Ezh2 (anti-ENX-1 H-80, sc-25383), anti-RbAp46 (ab3535), antiRbAp48 (ab488), anti-Eed (ab4469), anti-Suz12 (ab12073), and anti-PTB (cb NA63). (D and E) Test of the association of radiolabelled fragments of the A

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

region with components of the PRC2 complex present in nuclear extracts of ES cells. The three RNAs tested are represented in (D). RNAs bound to the G sepharose beads were fractionated by electrophoresis on 7% denaturing gels. Autoradiograms of the gels are shown in (E). Input corresponds to 10% of the material incubated with the beads. doi:10.1371/journal.pbio.1000276.g009

and 245 kcal/mol), explaining why they are proposed by the Mfold software. In addition, these four-repeat structures may be stabilized in cellulo by RNA-protein interactions. Interestingly, protein PTB, which contains 4 RNA recognition motifs (RRMs), which are each able to interact with UCUU(C), UUCUCU, or CUCUCU sequences, showed high affinity for the A region in nuclear extract binding experiments (Figure 9) [25]. As UCUU motifs are present on each side of the large internal loops in the four-repeat structures and in the terminal loop, one might imagine that interactions of the RRMs of a single PTB molecule with these various segments may stabilize the four-repeat structure as suggested by previously proposed models for protein PTB-RNA interaction [26]. In spite that model 3 has the best fit with all the data compared to other models, this does not exclude the possibility of some local dynamic in small areas of the SLS1 and SLS3 structures. More precisely, the instability of a few base-pair interactions can explain the presence of both V1 RNase cleavages and chemical modifications in a limited number of very small segments of the A repeat region.

Discussion The A region of Xist RNA plays an essential role in the X inactivation process. Here, we show that in vitro the repeated elements in the A region of both mouse and human Xist RNAs interact together and with the intervening spacer regions. This leads to the formation of peculiar long irregular stem-loop structures containing four repeats and long U rich terminal and internal loops. Our proteomic analysis suggests that these fourrepeat structures may correspond to functional modules initiating the assembly of the PRC2 complex.

A New Conception of the 2-D Structure of the A Region Until now, both computer [11] and experimental analyses [22] of the possible 2-D structure of the A region of Xist RNA have privileged the individual repeat as the unit of folding. However, the presence of long intervening spacer sequences between the repeats suggests that these spacer sequences may participate in 2-D structure formation, and points to the potential inadequacy of previous models. Our detailed chemical and enzymatic probing of the A region structure in solution involving the design of specific primers for reverse transcriptase extension analysis enabled us to identify for the first time the double-stranded and single-stranded segments making up the A region structure in solution. The data obtained clearly demonstrate that the repeats do not fold on themselves but rather fold one with the other. Chemical and enzymatic probing of an RNA structure in solution often allows the building of a unique 2-D structure model in agreement with the experimental data. Studies on the A region were, however, complicated by the high degree of sequence redundancy. Use of a recently proposed biophysical approach, based on FRET assays [24], helped overcome these difficulties by providing information on the relative distances between the sequences flanking the various repeats. To our knowledge, up to now, this approach has only been used to confirm the 2-D structure model of telomerase RNA [24]. This method, which involves the utilization of oligonucleotides carrying donor and acceptor fluorescent dyes complementary to single-stranded segments in the studied RNA, proved particularly well suited to the study of the A region, since our probing data identified several long single-stranded segments which were able to bind the oligonucleotide probes. Among the possible 2-D structures for the A region, only one, structure 3, showed perfect agreement with the FRET data. Structure 3, which contains two long irregular stem-loop structures, each involving four repeats (four-repeat structure), also shows the best agreement with the chemical and enzymatic data. The two long stem-loop structures are separated by a short stem-loop structure, corresponding to a divergent region between mouse and human Xist RNAs. One repeat in this segment is common to all the sequenced Xist RNAs (Figure S5), except mouse RNA. In the latter, it is replaced by a poly A sequence forming a short stem loop with a poly U sequence. Interestingly, nucleotide sequence conservation in the A region extends to the spacer extremities, which contribute to the possibility of forming the four-repeat structures (Figure S5). Although the presence of large internal and terminal loops decreases the stability of stem-loop structures, the predicted free energies of the two four-repeat structures in both mouse and human RNAs have strongly negative values (between 233 PLoS Biology | www.plosbiology.org

Possible Functional Implication of the Four-Repeat Structure Our adaptation of the affinity purification chromatography, originally developed for purifying spliceosome complexes [27], to complexes formed upon incubation of different fragments of the A region with nuclear extracts prepared from undifferentiated mouse ES cells, coupled with mass spectrometry and Western blot analyses, was powerful. Together with immunoselection assays performed on assembled RNP complexes, it revealed the capability of four components of the PRC2 complex to associate with an RNA corresponding to one of the four-repeat structures formed by the A region. However, our observation that the entire A region is needed for efficient association of the Suz12 protein suggests a putative additional functional role for the entire A region in either binding Suz12 or in stabilising the binding of Suz12 to the four-repeat structure. This is too early to give a convincing molecular explanation of this observation. Further experiments are needed to understand why Suz12 displays different association properties compared to other members of the PRC2 complex. Whilst UV cross-linking of the RNP complexes formed with ES nuclear extract using the entire A region has confirmed the direct binding of PTB to this RNA region, direct binding of components of the PRC2 complex was not detected (unpublished data). Neither Ezh2 nor Eed, which were previously proposed to be recruited by Xist in an A region dependent manner [21], were cross-linked in significant amounts, suggesting that their association with the A region is mediated via association with other nuclear proteins. Therefore, the peculiar SLS1 and SLS3 structures in the A region may be needed to recruit nuclear proteins which have an affinity for components of the PRC2 complex or to reinforce the RNA affinity for these components. Mass spectrometry analysis of RNP complexes formed with the entire A region showed that, in addition to components of the PRC2 complex, a large number of other nuclear proteins can associate with this RNA region. In further studies, it will be important to identify which of these proteins are required for 12

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

PRC2 association and which ones bind directly to the A region structure. Our finding that Suz12 requires the entire A region, or more simply more than four repeats for efficient association with the RNA, is in good agreement with the observation of Wutz and colleagues (2002) that the presence of at least 5.5 repeats is needed to initiate inactivation [11]. Additional support for the functional significance of the four-repeat model comes from a reworking of data obtained by Wutz and colleagues, who tested the effect of a series of mutations within the A region on XCI initiation [11]. Our structural studies show that all the variants (XR, XSR, XCR) classed by Wutz et al. as active are able to form the four-repeat structure, whereas the two inactive variants (XS1 and XNX) cannot (Figure S9). Although several data argue in favour of a major role of the four-repeat structure in A repeat activity, we cannot exclude a possible role of alternative structures, for instance in modulating A repeat activity.

using mouse or HeLa cell genomic DNA, and cloned into plasmid pUC18 under the control of a T7 promoter. RNAs were generated by run-off transcription with T7 RNA polymerase as previously described [29]. DNA templates were digested with RNAse-free DNAse I and RNA transcripts were purified on denaturing 3% to 8% polyacrylamide gels.

Enzymatic and Chemical Probing of RNA Secondary Structure RNA 2-D structures in solution were probed as follows [29]: 200 ng of transcripts dissolved at a 80 nM concentration in buffer D (20 mM Hepes-KOH, pH 7.9, 100 mM KCl, 0.2 mM EDTA pH 8.0, 0.5 mM DTT, 0.5 mM PMSF, 20% (vol/vol) glycerol) were renatured by 10 min heating at 65uC, followed by slow cooling at room temperature with the addition of 1 ml of 62.5 mM MgCl2 to a final concentration of 3.25 mM MgCl2. After a 10 min preincubation at room temperature, RNase T1 (0.02 or 0.0375 U/ml) or T2 (0.025 or 0.0375 U/ml) was added under conditions such that it cleaved single-stranded segments. V1 RNase (2.561023 or 561023 U/ml) was used to cleave doublestranded and stacked residues. DMS (1 ml of a 1/4 or 1/8 (V/V) DMS/EtOH solution) was employed to modify single-stranded A and C residues and CMCT (4 or 5 ml of a 180 mg/ml solution) to modify single-stranded U and to a lower extent G residues. Reactions were stopped as described in [29]. Cleavage and modification positions were identified by primer extension [29]. Stable secondary structures having the best fit with experimental data were identified with the Mfold software, version 8.1 [30]. Probing data were introduced as a constraint in the search.

Possible Implication of PRC2 and Other Nuclear Protein Association with the A Region Although it is clear that the A region is essential for the X inactivation process, the precise role and mechanisms involved in the action of the A region remain unclear. Its deletion was shown to block silencing but not the coating of the X chromosome by Xist [11], an observation in agreement with a possible role of the A region in PRC2 recruitment. PRC2 is needed for apposition of some, but not all, of the epigenetic marks which are specific features of silenced chromatin in general and the inactive X in particular (methylation of histone H3 at position 27) [20]. The association of PRC2 with long ncRNAs before transfer of the PRC2 complex to chromatin may be a general mechanism for chromatin silencing processes that depend on long ncRNAs. Both the HOTAIR and Kcnq1ot1 long ncRNAs, which are involved in gene silencing, were recently found to bind the PRC2 complex [19,28]. Recruitment of PRC2 is a relatively early event in X inactivation [14] in agreement with a possible early association of this complex with Xist RNA prior to extensive Xist coating of chromatin. One could imagine that PRC2 is associated with the chromatin upon Xist coating through its interaction with proteins bound to Xist RNA. Alternatively coating of the Xist RNP may facilitate PRC2 transfer to chromatin by interaction of some of the RNP components with proteins of the chromatin structure. Lee and colleagues recently reported the existence of the 1600 nucleotide long RepA RNA carrying the A region at its 59 extremity, which may be expressed prior to expression of the entire Xist RNA and has been reported to recruit the PRC2 complex in a very early step of XCI [21]. Independent confirmation of these findings will be of major importance to the field. Screening of the numerous other proteins that we have found to be capable of association with the entire A region by mass spectrometry for their eventual specific involvement in the recruitment of genes to the X inactivation domain [10] or other early events characterising the onset of X initiation and silencing will be of potential major importance to our understanding of X inactivation.

Steady-State Fluorescence Measurements Fluorescence spectra were recorded at 4uC, with an excitation wavelength of 515 nm and scanning from 500 to 750 nm (excitation and emission bandwidth of 3 nm). The procedure used was derived from [24]. The RNA and Cy3-oligonucleotide were mixed at a 1:1 molar ratio in 160 ml of 150 mM NaCl, 3.25 mM MgCl2, and 15 mM Na citrate (pH 7.0) to a final concentration (0.38 mM) superior to the Kd, incubated at 85uC for 5 min, and slowly cooled at room temperature for 15 min. After 4 h of incubation at 4uC, the yield of oligonucleotide association was determined by electrophoresis in a non-denaturing gel. Fluorescence in the gel was measured with a Typhoon (9410) Healthcare scanner. When a satisfying yield of association was detected, the emission of the Cy3-labeled complex was measured on a flux spectrofluorometer (SAFAS). Ten spectra were averaged. Then, the Cy5-labeled oligonucleotide was added at a 1:1 molar ratio, and incubation carried out at 4uC for 4 h. Ten spectra were recorded and the Fluorescence Resonance Energy Transfer (FRET) for the Cy3â&#x20AC;&#x201C;Cy5 pair was calculated taking into account the bound/unbound ratio of Cy3-oligonucleotide. Each FRET experiment was repeated three times using different batches of transcripts.

Purification of RNP by MS2 Selection Affinity The entire mouse A region and several fragments were cloned 39 to a T7 promoter and 59 to the MS2 tag present in plasmid pAdML3 [27,31]. Nuclear extracts from undifferentiated female ES cells (LF2) were prepared according to [32], and dialyzed against buffer D. One hundred pmol of MS2-tagged RNAs were denatured, renatured, as described above, and incubated with a 5-fold molar excess of purified MS2-MBP fusion protein [31] at 4uC for 15 min. The RNA-MS2-MBP complexes formed were incubated

Materials and Methods RNA Preparation DNA fragments coding for the entire A regions of mouse and human Xist RNAs and their subfragments were PCR amplified PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

with amylose beads (40 ml, GE Healthcare) equilibrated in buffer D for 2 h at 4uC. After three washes with 500 ml of buffer D, 1 mg of nuclear extract in 150 ml of buffer D containing 5 mM of yeast tRNAs was added. After 15 min of incubation at 4uC with constant agitation, three successive washes were performed in Buffer D and RNP complexes eluted by incubation with 80 ml of Buffer D containing 10 mM maltose (30 min at 4uC). Half of the eluted RNP complex formed with the entire A region was fractionated by 10% SDS-PAGE for mass spectrometry analyses. For all the purified RNP complexes, 10% of the eluted material was used for Western blot analysis performed according to [33].

taking the first residue of mouse Xist RNA as residue 1. The position of the repeats is indicated by vertical bars on the righthand side of the autoradiograms. Two different analyses of CMCT modifications by primer extension with oligonucleotide 3757 are illustrated in (F). The autoradiogram on the right side of the panel was exposed for a longer time. Found at: doi:10.1371/journal.pbio.1000276.s001 (10.09 MB TIF) Figure S2 Representation of experimental data on a 2-D

structure in which repeats form individual stem-loop structures. Each of the seven repeats as well as the eighth half repeat in the mouse A region were folded into a unique stem-loop structure with an internal loop. T1, T2, and V1 RNase cleavages are represented by arrows surmounted by circles, triangles, and squares, respectively. Nucleotides modified by DMS or CMCT are circled. The colours of circles and arrows indicate the modification and cleavage yields, with red, yellow, and green corresponding, respectively, to strong, medium, and low modification or cleavage. The V1 RNase cleavages and chemical modifications that cannot be explained by this secondary structure model are circled in blue. Found at: doi:10.1371/journal.pbio.1000276.s002 (0.73 MB TIF)

Mass Spectrometry Analysis Each lane of the SDS-PAGE was cut into 2 mm sections, and proteins submitted to in-gel trypsin digestion. Analysis of extracted peptides was performed using nano-LC-MS-MS on a CapLC capillary LC system coupled to a QTOF2 mass spectrometer (Waters) according to standard protocols (Figure S8). The MS/MS data were analyzed using the MASCOT 2.2.0. algorithm (Matrix Science) for search against an in-house generated protein database composed of protein sequences of Rattus and Mus downloaded from UniprotKB http://beta.uniprot.org/ (August 07, 2008) and protein sequences of known contaminant proteins such as porcine trypsin and human keratins concatenated with reversed copies of all sequences. Spectra were searched with a mass tolerance of 0.3 Da for MS and MS/MS data, allowing a maximum of 1 missed cleavage with trypsin and with carbamidomethylation of cysteines, oxidation of methionines, and N-acetyl protein specified as variable modifications. Protein identifications were validated when one peptide had a Mascot ion score above 35. Evaluations were performed using the peptide validation software Scaffold (proteome Software).

Figure S3 Identification of enzymatic cleavage and chemical modification in human A region by primer extension analysis. The A region of human XIST RNA was treated as described for the mouse A region in the legend to Figure 1 of supporting data, except that the primers used for extension analyses were oligonucleotides 4563 (A), 4564 (B), 4622 (C), 4565 (D), and 4242 (E) (Table S1). Found at: doi:10.1371/journal.pbio.1000276.s003 (7.04 MB TIF) Figure S4 Representation of experimental data on the possible structure 1 of human Xist A region. In this model, stem-loop structures involve two successive repeats. The repeats are indicated by red lines and are numbered from 1 to 9. Representation of chemical and enzymatic data is as in Figures 2 and 3. The free energies of each stem-loop structure at 0uC and in 3 M NaCl were calculated with the M-fold software. Found at: doi:10.1371/journal.pbio.1000276.s004 (0.74 MB TIF)

Immunoselection of RNP RNA transcripts were dephosphorylated, 59-end labelled with [c-32P]ATP (3,000 Ci/mmol), purified and quantified according to [34]. About 70 pmol of the RNA were denatured, renatured as described above, and incubated with 30 mg of nuclear extract for 30 min at room temperature with constant agitation. About 40 ml of Protein G-sepharose beads suspension blocked with BSA (2 mg) and coated for 2 h at 4uC with 10 ml of each antibodies were incubated with the RNP complexes for 2 h at 4uC in 300 ml of immunoseletion buffer (150 mM NaCl, 10 mM Tris-HCl, pH 8.0, NP40 0.1%). Beads were washed three times for 10 min at 4uC with 750 ml of immunoselection buffer containing 0.5% NP40. RNAs were phenol extracted, ethanol precipitated, fractionated on 7% polyacrylamide gel, and analysed by autoradiography.

Figure S5 Conservation of sequences surrounding the repeats in vertebrate A regions. Sequence alignment of the mouse, human, Orangutan, baboon, lemur, dog, rabbit, cow, horse, and elephant A regions illustrating the degree of species conservation. Identical nts are indicated in red. Repeats are numbered from 1 to 9 and shown as red rectangles, mouse (gi|37704378|ref|NR_001463.2|), human (gi|340393|gb|M97168.1|), Orangutan (by http://www. ensembl.org/index.html, 292L3-1,185272), baboon (by http://www. ensembl.org/index.html, 157F22-1,190936), lemur (by http://www. ensembl.org/index.html, 176F24-1,134555), dog [by http://genome. ucsc.edu/,(canFam2) assembly canFam1_dna range = chrX:60100000â&#x20AC;&#x201C; 60735000)], rabbit (gi|1575009|gb|U50910.1|OCU50910), cow (gi|10181229|gb|AF104906.5|), horse (gi|1575005|gb|U50911.1|), and elephant (BROADE1:scaffold_119260:3220:3899:21 ENSEMBL). Found at: doi:10.1371/journal.pbio.1000276.s005 (0.20 MB DOC)

Supporting Information Figure S1 Identification of enzymatic cleavage and chemical modification of the mouse A region by primer extension. The A region of mouse Xist RNA was in vitro transcribed and renatured as described in Material and Methods, before being submitted to limited digestion with the T1, T2, or V1 RNases under the conditions described in Materials and Methods. Primer extension analyses were performed using oligonucleotides 3760 (A), 3973 (B), 3866 (C), 3758 (D), 3971 (E), or 3757 (F) (Table S1) as primers. The resulting cDNAs were fractionated by electrophoresis on 7% denaturing polyacrylamide gels. Lanes U, G, C, and A correspond to the sequencing ladder obtained with the corresponding primers. Lanes marked by Contr correspond to primer extension analysis of undigested RNA. Nucleotide numbering on the left side of the autoradiograms is calculated PLoS Biology | www.plosbiology.org

The possibility to form four-repeats stemloop (SLS1 and SLS3) structure is conserved in vertebrates. SLS1 and SLS3 in mouse, dog, human, rabbit, and elephant are folded according to the mouse SLS1 structure (Model 3). The name of each species is indicated below the structure. Sequence variations compared to the mouse A sequence are indicated in green. Found at: doi:10.1371/journal.pbio.1000276.s006 (1.21 MB TIF)

Figure S6

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

Figure S7 Control FRET experiment performed with two oligonucleotides bordering one helix. (A) Schematic presentation of the transcripts used in the control experiment. (B) Fluorescence spectra obtained with the donor P1 oligonucleotide bound to naked 2R/RNA (green curve) and with oligonucleotides P1/P39 bound to the RNA (violin curve). See legend in Figure 8 for details. Found at: doi:10.1371/journal.pbio.1000276.s007 (0.18 MB TIF)

Found at: doi:10.1371/journal.pbio.1000276.s009 (0.54 MB TIF) Table S1 Oligonucleotides used in this study. The name, sequence, and utilization of each oligonucleotide are given. Nucleotide positions of A region are numbered according to the genBank accession no. gi|37704378|ref|NR_001463.2| (Mouse Xist gene) [2] and no. gi|340393|gb|M97168.1| (Human XIST gene) [5]. Restriction sites and fluorescent dyes introduced by the oligonucleotides are indicated. Found at: doi:10.1371/journal.pbio.1000276.s010 (0.05 MB DOC)

Figure S8 Identification of components of the PRC2 complex by mass spectrometry. Peptides that served for each protein identification are summarized in a table with corresponding MS/MS spectra. (A) Identification of Enhancer of zeste homolog 2 (Ezh2). (B) Identification of Polycomb protein Suz12 (Suz12). (C) Identification of Retinoblastoma binding protein 4 (RbAp46). (D) Identification of Retinoblastoma binding protein 7 (RbAp48). (E) Detailed standard protocols for proteomic analyses. Found at: doi:10.1371/journal.pbio.1000276.s008 (3.01 MB DOC)

Acknowledgments Professor R. Lu¨hrmann (Max Planck Institut, Goettingen) is thanked for his generous gift of plasmids pMBP-MS2 and pAdML3 and for help in the use of the MS2-MBP RNP purification approach. Professor I. Motorine and C. Aigueperse (AREMS, Nancy) are acknowledged for their advice in implanting this approach. V. Se´gault (AREMS, Nancy) is thanked for advice concerning the immunopurification assays. S Mazeres (IPBS, Toulouse) is thanked for helping us to define the FRET experimental protocol.

Figure S9 Folding capacities of the synthetic A region

sequences whose activity was evaluated in [11]. Sequence XR corresponds to the positive control, XSR: replacement of GCCCAUCGCGGG by CGGGAUCGGCCC; XCR have small U-rich spacer regions. XS1: deletion of the GG dinucleotides in the second element of each repetition, XNX: replacement of GGGCAUCGGGGC by GCGCAUCGGAGC. Silencing properties of Xist RNA containing these synthetic variants A region are indicated in the right-hand side panel of the Table S1.

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: SM MB VM SSC AVD PC PA AV CB. Performed the experiments: SM MB LF AS AV. Analyzed the data: SM MB LF AS VM SSC PA AV CB. Contributed reagents/materials/analysis tools: AD. Wrote the paper: SM PA AV CB.

References 15. Mak W, Baxter J, Silva J, Newall AE, Otte AP, et al. (2002) Mitotically stable association of polycomb group proteins eed and enx1 with the inactive x chromosome in trophoblast stem cells. Curr Biol 12: 1016–1020. 16. Kohlmaier A, Savarese F, Lachner M, Martens J, Jenuwein T, et al. (2004) A chromosomal memory triggered by Xist regulates histone methylation in X inactivation. PLoS Biol 2: E171. doi:10.1371/journal.pbio.0020171. 17. Cao R, Wang L, Wang H, Xia L, Erdjument-Bromage H, et al. (2002) Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 298: 1039–1043. 18. Kuzmichev A, Nishioka K, Erdjument-Bromage H, Tempst P, Reinberg D (2002) Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. Genes Dev 16: 2893–2905. 19. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, et al. (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129: 1311–1323. 20. Cao R, Zhang Y (2004) The functions of E(Z)/EZH2-mediated methylation of lysine 27 in histone H3. Curr Opin Genet Dev 14: 155–164. 21. Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT (2008) Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322: 750–756. 22. Duszczyk MM, Zanier K, Sattler M (2008) A NMR strategy to unambiguously distinguish nucleic acid hairpin and duplex conformations applied to a Xist RNA A-repeat. Nucleic Acids Res 36: 7068–7077. 23. Kalantry S, Mills KC, Yee D, Otte AP, Panning B, et al. (2006) The Polycomb group protein Eed protects the inactive X-chromosome from differentiationinduced reactivation. Nat Cell Biol 8: 195–202. 24. Gavory G, Symmons MF, Krishnan Ghosh Y, Klenerman D, Balasubramanian S (2006) Structural analysis of the catalytic core of human telomerase RNA by FRET and molecular modeling. Biochemistry 45: 13304–13311. 25. Perez I, McAfee JG, Patton JG (1997) Multiple RRMs contribute to RNA binding specificity and affinity for polypyrimidine tract binding protein. Biochemistry 36: 11881–11890. 26. Auweter SD, Allain FH (2008) Structure-function relationships of the polypyrimidine tract binding protein. Cell Mol Life Sci 65: 516–527. 27. Deckert J, Hartmuth K, Boehringer D, Behzadnia N, Will CL, et al. (2006) Protein composition and electron microscopy structure of affinity-purified human spliceosomal B complexes isolated under physiological conditions. Mol Cell Biol 26: 5528–5543. 28. Pandey RR, Mondal T, Mohammad F, Enroth S, Redrup L, et al. (2008) Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell 32: 232–246. 29. Mougin A, Gregoire A, Banroques J, Segault V, Fournier R, et al. (1996) Secondary structure of the yeast Saccharomyces cerevisiae pre-U3A snoRNA and its implication for splicing efficiency. Rna 2: 1079–1093.

1. Lyon MF (1961) Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190: 372–373. 2. Brockdorff N, Ashworth A, Kay GF, McCabe VM, Norris DP, et al. (1992) The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71: 515– 526. 3. Borsani G, Tonlorenzi R, Simmler MC, Dandolo L, Arnaud D, et al. (1991) Characterization of a murine gene expressed from the inactive X chromosome. Nature 351: 325–329. 4. Cohen HR, Panning B (2007) XIST RNA exhibits nuclear retention and exhibits reduced association with the export factor TAP/NXF1. Chromosoma 116: 373–383. 5. Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, et al. (1992) The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71: 527– 542. 6. Duret L, Chureau C, Samain S, Weissenbach J, Avner P (2006) The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science 312: 1653–1655. 7. Shevchenko AI, Zakharova IS, Elisaphenko EA, Kolesnikov NN, Whitehead S, et al. (2007) Genes flanking Xist in mouse and human are separated on the X chromosome in American marsupials. Chromosome Res 15: 127–136. 8. Brockdorff N (2002) X-chromosome inactivation: closing in on proteins that bind Xist RNA. Trends Genet 18: 352–358. 9. Hoki Y, Kimura N, Kanbayashi M, Amakawa Y, Ohhata T, et al. (2009) A proximal conserved repeat in the Xist gene is essential as a genomic element for X-inactivation in mouse. Development 136: 139–146. 10. Chaumeil J, Le Baccon P, Wutz A, Heard E (2006) A novel role for Xist RNA in the formation of a repressive nuclear compartment into which genes are recruited when silenced. Genes Dev 20: 2223–2237. 11. Wutz A, Rasmussen TP, Jaenisch R (2002) Chromosomal silencing and localization are mediated by different domains of Xist RNA. Nat Genet 30: 167–174. 12. Silva J, Mak W, Zvetkova I, Appanah R, Nesterova TB, et al. (2003) Establishment of histone h3 methylation on the inactive X chromosome requires transient recruitment of Eed-Enx1 polycomb group complexes. Dev Cell 4: 481–495. 13. Plath K, Talbot D, Hamer KM, Otte AP, Yang TP, et al. (2004) Developmentally regulated alterations in Polycomb repressive complex 1 proteins on the inactive X chromosome. J Cell Biol 167: 1025–1035. 14. Plath K, Fang J, Mlynarczyk-Evans SK, Cao R, Worringer KA, et al. (2003) Role of histone H3 lysine 27 methylation in X inactivation. Science 300: 131–135.

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000276

Xist A Region Structure and PRC2 Association

30. Jaeger JA, Turner DH, Zuker M (1989) Improved predictions of secondary structures for RNA. Proc Natl Acad Sci U S A 86: 7706–7710. 31. Zhou Z, Sim J, Griffith J, Reed R (2002) Purification and electron microscopic visualization of functional human spliceosomes. Proc Natl Acad Sci U S A 99: 12203–12207. 32. Dignam JD, Martin PL, Shastry BS, Roeder RG (1983) Eukaryotic gene transcription with purified components. Methods Enzymol 101: 582–598.

PLoS Biology | www.plosbiology.org

33. Jacquenet S, Mereau A, Bilodeau PS, Damier L, Stoltzfus CM, et al. (2001) A second exon splicing silencer within human immunodeficiency virus type 1 tat exon 2 represses splicing of Tat mRNA and binds protein hnRNP H. J Biol Chem 276: 40464–40475. 34. Sambrook J, Fritsch EF, Maniatis T (1989) Molecular cloning. A laboratory manual. New York: Cold Spring Harbor Laboratory Press.

January 2010 | Volume 8 | Issue 1 | e1000276

Poised Transcription Factories Prime Silent uPA Gene Prior to Activation Carmelo Ferrai1,2., Sheila Q. Xie2., Paolo Luraghi1¤a, Davide Munari1¤b, Francisco Ramirez3, Miguel R. Branco2¤c, Ana Pombo2*, Massimo P. Crippa1* 1 Laboratory of Molecular Dynamics of the Nucleus, Division of Genetics and Cell Biology, S. Raffaele Scientific Institute, Milan, Italy, 2 Medical Research Council Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital Campus, London, United Kingdom, 3 South Ruislip, Middlesex, United Kingdom

Abstract The position of genes in the interphase nucleus and their association with functional landmarks correlate with active and/or silent states of expression. Gene activation can induce chromatin looping from chromosome territories (CTs) and is thought to require de novo association with transcription factories. We identify two types of factory: ‘‘poised transcription factories,’’ containing RNA polymerase II phosphorylated on Ser5, but not Ser2, residues, which differ from ‘‘active factories’’ associated with phosphorylation on both residues. Using the urokinase-type plasminogen activator (uPA) gene as a model system, we find that this inducible gene is predominantly associated with poised (S5p+S2p2) factories prior to activation and localized at the CT interior. Shortly after induction, the uPA locus is found associated with active (S5p+S2p+) factories and loops out from its CT. However, the levels of gene association with poised or active transcription factories, before and after activation, are independent of locus positioning relative to its CT. RNA-FISH analyses show that, after activation, the uPA gene is transcribed with the same frequency at each CT position. Unexpectedly, prior to activation, the uPA loci internal to the CT are seldom transcriptionally active, while the smaller number of uPA loci found outside their CT are transcribed as frequently as after induction. The association of inducible genes with poised transcription factories prior to activation is likely to contribute to the rapid and robust induction of gene expression in response to external stimuli, whereas gene positioning at the CT interior may be important to reinforce silencing mechanisms prior to induction. Citation: Ferrai C, Xie SQ, Luraghi P, Munari D, Ramirez F, et al. (2009) Poised Transcription Factories Prime Silent uPA Gene Prior to Activation. PLoS Biol 8(1): e1000270. doi:10.1371/journal.pbio.1000270 Academic Editor: Tom Misteli, National Cancer Institute, United States of America Received July 14, 2009; Accepted November 12, 2009; Published January 5, 2010 Copyright: ß 2009 Ferrai et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The work in AP’s laboratory was funded by the Medical Research Council (UK). The work in MPC’s laboratory was supported by grants from Istituto Superiore di Sanita`, Italy, Programma Malattie Rare, and Ministero dell’Istruzione, dell’Universita` della Ricerca, Italy, Progetto Oncologia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: CAMK2G, calcium calmodulin-dependent protein kinase II gamma; CT, chromosome territory; CTD, carboxy-terminal domain; FISH, fluorescence in situ hybridization; MN, micrococcal nuclease; MN-ChIP, MN-coupled Chromatin ImmunoPrecipitation; RNAP, RNA Polymerase II; TPA, tetradecanoyl phorbol acetate; uPA, urokinase-type Plasminogen Activator; VCL, vinculin. * E-mail: ana.pombo@csc.mrc.ac.uk (AP); crippa.massimo@hsr.it (MPC) ¤a Current address: Institute for Cancer Research and Treatment (IRCC), Torino, Italy ¤b Current address: FIRC Institute of Molecular Oncology (IFOM), Milan, Italy ¤c Current address: The Babraham Institute, Babraham Research Campus, Cambridge, United Kingdom . These authors contributed equally to this work.

(RNAP) enzymes, have been observed only when genes are actively transcribed, but not during the intervening periods of inactivity [2]. Although CTs do not represent general barriers to the transcriptional machinery [10,11] and transcription can occur inside CTs [3,12–14], the large-scale movements of chromatin, observed in response to gene induction, have often been interpreted as favouring gene associations with compartments permissive for transcription [15–17]. However, inducible genes frequently display an active chromatin configuration and are primed by initiation-competent RNAP complexes prior to induction [18–21]. Complex phosphorylation events at the C-terminal domain (CTD) of the largest subunit of RNAP correlate with initiation and elongation steps of the transcription cycle and are crucial for chromatin remodelling and RNA processing [22,23]. The mammalian CTD is composed of 52 repeats of an heptad

Introduction The spatial folding of chromatin within the mammalian cell nucleus, from the level of whole chromosomes down to single genomic regions, is thought to contribute to the expression status of genes [1–3]. Mammalian chromosomes occupy discrete domains called chromosome territories (CTs) and have preferred spatial arrangements within the nuclear landscape in specific cell types, which are conserved through evolution [1–3]. Subchromosomal regions containing inducible genes, such as the MHC type II or Hox gene clusters, relocate outside their CTs upon transcriptional activation or when constitutively expressed [4,5]. Genes can preferentially associate with specific nuclear domains according to their expression status. Most noteworthy, gene associations with the nuclear lamina largely correlate with silencing [6–9], whereas gene associations with transcription factories, discrete clusters containing many RNA polymerase II PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

,40 and ,80 kb, respectively (Figure 1A). In HepG2 cells, where the uPA gene is present as a single copy, its transcription can be induced through various stimuli, including treatment with phorbol esters [27]. Tetradecanoyl phorbol acetate (TPA) induces its expression by ,100-fold in HepG2 cells after 3 h of treatment (Figure 1B). The induction of the uPA gene within this short time of activation occurs in all cells of the population, as shown by immunofluorescence detection of uPA protein in single cells (Figure 1C); low levels of uPA protein are detected in a small proportion (7%) of the cell population prior to activation. We first investigated whether transcriptional induction of the uPA gene was associated with large-scale repositioning relative to its CT, using a whole chromosome 10 probe together with a BAC probe containing the uPA locus (Figure 1A). We performed fluorescence in situ hybridization on ultrathin (,150 nm) cryosections (cryoFISH), a method that preserves chromatin structure and organisation of transcription factories. Cells are fixed using improved formaldehyde fixation in comparison with standard 3D-FISH, which is particularly important for the preservation of chromatin structure and RNAP distribution [14,28]. CryoFISH also provides sensitivity of detection and high spatial resolution, especially in the z axis [14,29,30]. We find that, in the inactive state, the uPA locus is preferentially localized at the CT interior (60% loci inside or at the inner-edge, n = 166 loci) and relocates to the exterior upon activation (55% loci at outer-edge or outside, n = 208 loci; x2 test, p,0.0001; Figure 1D), concomitant with the 100-fold induction of mRNA levels determined by qRT-PCR (Figure 1B). Thus, we observed a striking change in the position of the uPA locus relative to its CT upon TPA activation, which correlates with a major increase in mRNA and protein expression across the whole population of cells.

Author Summary The spatial organization of the genome inside the cell nucleus is important in regulating gene expression and in the response to external stimuli. Examples of changing spatial organization are the repositioning of genes outside chromosome territories during the induction of gene expression, and the gathering of active genes at transcription factories (discrete foci enriched in active RNA polymerase). Recent genome-wide mapping of RNA polymerase II has identified its presence at many genes poised for activation, raising the possibility that such genes might associate with poised transcription factories. Using an inducible mammalian gene, urokinase-type plasminogen activator (uPA), and a system in which this gene is poised for expression, we show that uPA associates with poised transcription factories prior to activation. Gene activation induces two independent events: repositioning towards the exterior of its chromosome territory and association with active transcription factories. Surprisingly, genes inside the interior of the chromosome territory prior to activation are less likely to be actively transcribed, suggesting that positioning at the territory interior has a role in gene silencing.

consensus sequence Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7, and phosphorylation on Ser5 residues (S5p) is associated with transcription initiation and priming, whereas phosphorylation on Ser2 (S2p) correlates with transcriptional elongation [22,23]. To investigate whether primed genes are associated with discrete RNAP sites enriched in RNAP-S5p and the functional relevance of large-scale gene repositioning in promoting associations with the transcription machinery during gene activation, we investigated the expression levels, epigenetic status, nuclear position, and association with RNAP factories of an inducible gene, the urokinase-type plasminogen activator (uPA or PLAU; GeneID 5328), before and after activation. We use antibodies that specifically detect different phosphorylated forms of RNAP to investigate the association of the inducible uPA gene with transcription factories. Prior to induction, most uPA alleles are positioned inside their CT and extensively associated with RNAP sites marked by S5p. Transcriptional activation leads to looping out of the uPA locus from its CT, and increased association with active transcription factories marked by both S5p and S2p. However, the extent of gene association with factories, before and after activation, is independent of the uPA position relative to its CT. Unexpectedly, we find that the majority of uPA genes which are positioned at the CT interior prior to activation are seldom transcribed, in comparison with the few uPA genes located outside the CT which are active with the same frequency as the fully induced uPA genes.

Induction of the uPA Locus and Its Nuclear Relocation Are Independent of Local Chromatin Decondensation Chromatin repositioning in response to gene activation has often been associated with changes in chromatin structure and degree of condensation [15,17]. To establish whether the largescale relocation of the uPA locus during its transcriptional activation was accompanied by changes between closed and open chromatin conformations, we next assessed the chromatin structure of the uPA gene before and after TPA treatment (Figure 2). Micrococcal nuclease (MN) digestion of crosslinked, sonicated chromatin yields a decreasing nucleosomal ladder before and after TPA activation (Figure 2A and images unpublished, respectively; [31]). Systematic PCR amplification at and around the uPA regulatory regions revealed two populations of genomic DNA fragments that resist processive cleavage at high digestion time points (50 min; see also [31]). At the enhancer, the size of fragments is typically mononucleosomal (,150 bp; fragment E1; Figure 2B, 2C). At the promoter, the protected fragments have larger sizes (.300 bp; fragments P and Px; Figure 2B, 2C). This feature is consistent with the presence of RNAP-containing complexes at the promoter, which was previously observed at the transcriptionally active uPA gene in constitutively expressing cells, but absent after a-amanitin treatment [31]. The same population of larger promoter fragments was also detected in uninduced cells (Figure 2C), showing that the uPA gene displays transcription-associated features before activation. This was supported by an investigation of the epigenetic status of chromatin before and after activation. High resolution, MN-coupled chromatin immunoprecipitation (MN-ChIP; [31]) using antibodies specific for histone modifications associated with close (H3K9me2) or open (H3K4me2, H3K9ac, H3K14ac) chromatin [32] showed the presence of active, but not silent, chromatin marks at the

Results/Discussion Transcriptional Induction of the uPA Locus Promotes Relocation Outside Its CT The uPA gene encodes a serine protease that promotes cell motility, and its overexpression is known to correlate with cancer malignancies and tumor invasion [24â&#x20AC;&#x201C;26]. It is a 6.4 kb gene with 10 introns, and its regulatory regions have been extensively characterized [24]. uPA is located on human chromosome 10, separated from upstream and downstream flanking genes CAMK2G (calcium calmodulin-dependent protein kinase II gamma; GeneID 818) and VCL (vinculin; GeneID 7414) by PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

Figure 1. Activation of the uPA gene by TPA treatment induces large-scale chromatin repositioning in the nucleus of HepG2 cells. (A) Diagram illustrating the genomic context of the uPA locus and the genomic region detected by the BAC probe (RP11-417O11; ,228 kb) used for FISH experiments. CAMK2G, calcium-calmodulin-dependent protein kinase II gamma; VCL, vinculin. Arrows indicate the 59-39 transcription direction. (B) Kinetics of the transcriptional induction of the uPA gene with TPA. uPA RNA expression was assessed by quantitative RT-PCR after treatment with TPA for different times. Values were normalized to 18S rRNA and expressed relative to uninduced cells. (C) Induction of uPA protein expression with TPA. Indirect immunofluorescence analyses of uPA protein expression (red) before and after TPA treatment (3 h). Nuclei were counterstained with DAPI (blue). The proportion of uPA-expressing cells is indicated (n = 188 and 144 for untreated and treated cells, respectively). Bar: 20 mm. (D) TPA induces large-scale repositioning of the uPA locus (green) to the exterior of its own chromosome 10 territory (CT10; red). The position of the uPA locus relative to CT10 was determined in HepG2 cells, before and after TPA activation for 3 h, by cryoFISH using a whole chromosome 10 paint (red) and the digoxigenin-labelled BAC probe (green) represented in (A). Nucleic acids were counterstained with TOTO-3 (blue). Arrowheads indicate uPA loci. The positions of uPA loci relative to the CT were scored as ‘‘inside,’’ ‘‘inner-edge,’’ ‘‘edge,’’ ‘‘outer-edge,’’ and ‘‘outside’’; error bars represent standard deviations. Bars: 2 mm. doi:10.1371/journal.pbio.1000270.g001

The large-scale chromatin repositioning of the uPA locus relative to its CT (Figure 1D) cannot therefore be explained by changes from closed to open chromatin conformation.

promoter and enhancer of the uPA gene, both before and after TPA induction (Figure 2D). Positive detection of H3K9me2 was confirmed at the imprinted H19 gene (GeneID 283120; Figure S1). Upon activation, the larger promoter fragment (P) is no longer detected with H3K14ac antibodies, although this mark is still present at the smaller promoter fragments (uP and dP), and an enrichment of H3K9ac at the E5 fragment on the enhancer was also detected (Figure 2D). These changes are likely to reflect the presence of different populations of resistant fragments at the uPA regulatory regions upon induction. Taken together, these findings show that the uPA gene adopts an open chromatin state before transcriptional activation, which is maintained after induction. PLoS Biology | www.plosbiology.org

Inactive uPA Genes Are Associated with Poised Transcription Factories Prior to Induction The presence of RNAP phosphorylated on Ser5 residues at promoter regions of silent genes defines them as paused or poised genes [21–23]. To determine whether the inactive uPA gene was associated with RNAP prior to induction, we used MN-ChIP and antibodies specific for phosphorylated forms of RNAP that can discriminate between active and paused/poised RNAP complexes 3

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

Figure 2. The regulatory regions of the uPA locus display an open chromatin conformation before and after TPA activation. (A) Micrococcal nuclease (MN) digestion progressively cleaves cross-linked, sonicated bulk chromatin to mononucleosomes. Material from the 50 min digestion time-point was used for MN-ChIP experiments. (B) Diagram illustrating the regulatory region of the uPA gene, including the enhancer (E), promoter (P), and the position of amplified genomic fragments at and around the enhancer and promoter regions. Roman numerals indicate exons in the coding region (white and black boxes represent untranslated and translated regions, respectively). (C) Chromatin-associated features of the uPA enhancer and promoter regions as revealed by PCR amplification patterns of MN-digested chromatin DNA before and after transcriptional activation. HepG2 cells were grown 6TPA for 3 h, before chromatin preparation. Enhancer fragment 1 (E1), but not E2 and E3, is resistant to MNase cleavage after 50 min of digestion. Two promoter fragments larger than a single nucleosome (P and Px, 320 bp and 464 bp, respectively) are detected at the same digestion time point in the promoter region, consistent with protection due to additional bound proteins such as transcription factors and RNAP [31]. (D) MN-ChIP experiments identify histone modifications associated with open chromatin at the enhancer and promoter regions before and after TPA activation. Chromatin was cross-linked, sonicated, and treated with MN for 50 min, prior to immunoprecipitation with antibodies against specific histone modifications associated with open (H3K4me2, H3K9ac, H3K14ac) or closed (H3K9me2) chromatin. Control immunoprecipitations were performed with polyclonal anti-uPA receptor (uPAR) antibodies (unrelated). To improve the resolution of MN-ChIP in the promoter region, two smaller, overlapping genomic fragments spanning the entire P fragment (uP and dP, 140 and 199 bp, respectively) were also amplified. doi:10.1371/journal.pbio.1000270.g002

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

[21–23]. Antibody specificity has been extensively characterized previously [21,33] and was confirmed in HepG2 cells using Western blotting and immunofluorescence (Figure S2). MN-ChIP detected the initiating (S5p) but not the elongating (S2p) form of RNAP at the promoter and enhancer regions of the uPA gene prior to induction (Figure 3A), demonstrating that the uPA gene is primed with RNAP before activation. The detection of RNAP-S5p at the enhancer (Figure 3A) can be explained by an interaction with RNAP bound at the promoter, as previously observed in constitutively active uPA genes [31]. Previous studies describing the presence of RNAP at the promoters of paused or poised genes did not investigate a possible association with transcription factories marked specifically by the S5p modification [18,19,21,34]. We asked whether the association of primed uPA loci with RNAP-S5p could be detected at the single cell level and occur within specific nuclear substructures, using immuno-cryoFISH [14]. Using a BAC probe covering a genomic region centred on the uPA gene (Figure 1A) in combination with immunolabelling of the S5p or S2p forms of RNAP, we found that the vast majority of uPA loci were associated with sites containing RNAP-S5p prior to activation (87%69%, n = 165 loci; Figure 3B, 3C). A significantly lower proportion was found associated with RNAP-S2p foci (31%65%, n = 170 loci; x2 test, p,0.0001; Figure 3C). The primed uPA loci are therefore preferentially associated with a subpopulation of RNAP factories that contain RNAP-S5p, but not RNAP-S2p, prior to TPA activation. We call these sites ‘‘poised,’’ or S5p+S2p2, transcription factories. Scoring criteria for gene association with RNAP sites, typically used in the analyses of 3D-FISH results, often rely on proximity criteria that do not involve true physical associations, being sensitive to the limited z axis resolution (.500 nm) of standard confocal microscopes. This is particularly important when analysing highly abundant structures such as transcription factories which can exist at densities of 20/mm3 [35]. Although the use of ultrathin (,150 nm) cryosections mostly detects single factories [29], we were still concerned that the extent of uPA gene association with transcription factories marked by S5p or S2p observed experimentally (Figure 3C) could be due to different abundance of the two modifications and might be explained, at least in part, by random processes. To assess the impact of these two constraints, we generated one simulated uPA signal, for each experimental image, with the same number of pixels as the experimental site, but positioned at random coordinates within the nucleoplasm. Next, we measured the frequency of association of the randomly positioned loci with RNAP-S5p or -S2p sites (Figure S3B, S3C). We found that the association of randomly positioned BAC signals with S5p was 54%68% (Figure S3C; n = 68 loci), a significantly lower number than the experimental value of 87%69% for the uPA locus (Figure 3C; x2 test, p,0.0001). In contrast, the association of randomly positioned signals with S2p was 39%65% (n = 69 loci), similar to the experimental value of 31%65% (x2 test, p = 0.29; Figure S3C). One caveat of these analyses is that the observation of similar levels of association for experimental or simulated loci with transcription factories prior to activation cannot be used to argue that this association is not specific, but simply that it is as low as it would be if loci were positioned randomly. In summary, our results show that the association of a large proportion of uPA loci with poised, S5p+S2p2 factories before activation is specific, although the nuclear environment where the uPA loci are located is not devoid of active, S5p+S2p+ transcription factories, and therefore seems to be permissive for transcription. PLoS Biology | www.plosbiology.org

The uPA Gene Associates with Active Transcription Factories upon Activation To investigate the active state of the uPA gene and the engagement of the locus with active factories following transcriptional induction, we repeated the MN-ChIP and immunocryoFISH analyses for TPA-treated cells (Figure 3D–F). MNChIP showed that the enhancer and the promoter of the uPA gene are associated with the elongating (S2p) form of RNAP either together with RNAP-S5p (at the promoter) or exclusively (at the enhancer; Figure 3D). The absence of RNAP-S5p at the enhancer fragments analysed, in the presence of S2p, suggests that the enhancer may maintain an association with RNAP as it moves through the coding region during elongation, where S5p is known to decrease and S2p to augment [23]. Immuno-cryoFISH after TPA induction (Figure 3E, 3F) showed that the activated uPA locus now becomes associated with RNAP-S2p sites (72%67% loci, n = 183), while maintaining an association with RNAP-S5p (71%68%, n = 140 loci; x2 test, p = 0.98), consistent with the MNChIP results. The approximately 2-fold increase in association with RNAP-S2p from 31% to 72%, before and after TPA treatment, respectively, was highly statistically significant (x2 test, p,0.0001). Evaluation of simulated uPA loci positioned at random coordinates in the same experimental images showed that the increased association of the uPA locus with S2p observed after activation (72%) cannot be explained by random processes, as the frequency of association of simulated loci remained at 38%62% (n = 75 loci; x2 test, p,0.0001; Figure S3B, S3C). An increased association with S2p sites upon activation has also recently been described for the Hoxb1 gene in mouse embryonic stem cells, albeit at lower frequency [36]. Taken together we show that the activation of the uPA gene and large-scale repositioning of the locus relative to its CT coincide with the acquisition of the S2p modification of RNAP without major changes in chromatin structure.

Factories Containing S2p Are Also Marked by the S5p Modification and Are Less Abundant Than Sites Marked by S5p The striking agreement between the number of uPA loci associated with RNAP factories marked by S5p and S2p upon activation and the co-existence of the two RNAP modifications detected by MN-ChIP at the promoter suggest that active factories contain both modifications, as expected from concomitant initiation and elongation events at promoter and coding regions of highly active genes. To further investigate whether poised transcription factories marked by S5p alone are distinct from the active factories marked by S2p, we compared the number of RNAP-S5p and RNAP-S2p sites in HepG2 cells, before and after induction (Figure 4A–C). RNAP sites marked by S5p are significantly more abundant than RNAP sites marked by S2p (28% excess) both before and after activation (Student t test, p,0.0001 in both cases; Figure 4A– C), suggesting that a considerable number of transcription factories adopt the poised state. The excess number of sites containing S5p in the absence of S2p (Figure 4A–C) is consistent with recent reports identifying an abundance of primed genes [34,37–39] marked by RNAP-S5p and not RNAP-S2p [21] in embryonic stem cells or differentiated cells. To investigate to what extent S2p sites are also marked by S5p, we used an antibody-blocking assay ([30,40]; Figure 4D–G), in which sections were first incubated with antibodies against RNAPS5p before incubation with antibodies against RNAP-S2p. Simultaneous incubation with the two antibodies resulted in a 5

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

Figure 3. Inactive uPA loci associate with poised transcription factories rich in the initiating form of RNAP (S5p), prior to activation. (A, D) MN-ChIP analyses detect the initiating (S5p) form of RNAP bound at the uPA promoter and enhancer before TPA activation (A). Following activation the enhancer is associated with the elongating form of RNAP (S2p), while the promoter is associated with both forms (S5p and S2p) of the enzyme (D). HepG2 cells were grown 6TPA for 3 h, before chromatin preparation and MN-ChIP with antibodies specific for S5p and S2p forms of RNAP. Control (unrelated) antibodies were polyclonal anti-uPA receptor antibodies. (B, E) The position of the uPA locus (red) relative to S5p and S2p sites (green) was determined by immuno-cryoFISH before (B) and after (E) TPA activation for 3 h, using a rhodamine-labelled BAC

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

probe containing the uPA locus and antibodies specific for RNAP phosphorylated on S5 or S2 residues. The association of uPA genes with RNAP (S5p or S2p) was scored as ‘‘associated’’ (signals overlap by at least 1 pixel) or ‘‘separated’’ (signals do not overlap or are adjacent; see Figure S8 for additional examples). Arrowheads indicate the position of uPA loci. Nucleic acids were counterstained with TOTO-3 (blue). Bars: 2 mm. (C, F) Frequency of association of uPA loci with RNAP-S5p or -S2p sites before (C) and after (F) TPA activation. The decrease in association with S5p factories and the increase in association with S2p factories observed after activation were both statistically significant (x2 tests, p = 0.0006 and p,0.0001, respectively). doi:10.1371/journal.pbio.1000270.g003

,65% quenching of the detection of RNAP-S2p (images unpublished), due to a more efficient binding of 4H8, an IgG, in comparison with H5, an IgM. The rationale of antibody blocking experiments is that the binding of the first antibody prevents binding by the second, if the two respective epitopes are located within the distance corresponding to the size of the first bound antibody complex (Ser2 and Ser5 residues are separated by two aminoacids whereas IgGs are large proteins that measure ,9 nm). After pre-incubation with the specific S5p antibody (4H8), the overall intensity of RNAP-S2p sites was significantly reduced throughout the nucleoplasm, except in discrete interchromatin domains (Figure 4E, 4G), as compared to sections not incubated with this antibody (Figure 4D, 4G) or to sections incubated with an unrelated (anti-biotin) antibody (Figure 4F, 4G). A transcriptionally silent population of RNAP-S2p complexes are known to be stably accumulated in splicing speckles [33,41], which are nuclear domains enriched in splicing machinery, polyA+ RNAs, and may be important for post-transcriptional splicing of complex RNAs [42]. The reverse antibody-blocking experiment also confirmed the colocalisation between S2p and S5p sites but produced lower levels of signal depletion (unpublished), as expected due to the larger abundance of S5p sites (Figure 4C). The results from antibody-blocking experiments suggest that most nucleoplasmic S2p sites outside interchromatin clusters also contain the S5p modification, as expected for simultaneous initiation and elongation events on the same gene during cycles of active transcription. Furthermore, S5p-containing structures are in excess of the active factories, demonstrating the presence of discrete sites marked solely by S5p, which represent poised, S5+S2p2 transcription factories.

Analysis of spliced transcripts of CAMK2G and VCL confirms their active state prior to TPA induction and demonstrates similar effects upon activation (unpublished). Interestingly, low levels of uPA primary transcripts sensitive to a-amanitin treatment are detected prior to activation (Figure 5B), consistent with the detection of uPA protein in a small percentage of HepG2 cells before TPA treatment (Figure 1C). The small and disparate changes in the RNA levels of the two genes flanking uPA are in line with a recent investigation of the Hoxb cluster in mouse ES cells, but occur at much shorter genomic distances, in which the Cbx1 gene, 400 kb downstream of the Hoxb cluster, does not change expression levels in spite of increased chromatin repositioning relative to the CT [36]. The behaviour of the uPA flanking genes also agrees with a broader analysis of expression changes across a whole 300 kb region, which undergoes repositioning in response to murine transgenic integration of the b-globin locus-control region, where the expression levels of many genes do not change between the two states [43]. As the levels of primary transcripts at each gene in the locus before and after TPA induction may depend on complex parameters such as the frequency and speed of RNAP elongation, the stability of unprocessed transcripts, and the rate of intron splicing, we investigated whether TPA activation influenced the levels of association of each gene with S5p and S2p sites, using fosmid probes that cover ,42–46 kb of genomic sequence (Figure 5A). Measurements of the diameters of fosmid and BAC signals yielded average values of 353 nm for the uPA fosmid, in comparison with 586 nm for the BAC probe, which demonstrates a significant improvement in spatial resolution. We find that CAMK2G and VCL are extensively associated with both S5p and S2p sites and to a similar extent, irrespective of TPA treatment (association frequency between 62% and 82%; Figure 5C, 5D). Importantly, the relatively small changes in the levels of primary CAMK2G and VCL transcript upon TPA treatment (Figure 5B) are not reflected by detectable changes in their association with either S5p or S2p sites. This suggests that TPA activation does not influence the extent of CAMK2G and VCL association with the transcription machinery, and thus their state of activity is unlikely to have a major role in the relocation of the uPA locus from its territory. Similar analyses of uPA gene association with S5p and S2p sites using a fosmid probe (Figure 5C, 5D) confirms the results obtained with the larger BAC probe (Figure 3). Prior to induction, the gene is extensively associated with S5p sites (79%65%; n = 91; Figure 5C), but not with S2p sites (36%66%; n = 254; x2 test, p,0.0001; Figure 5D). Upon activation, the uPA fosmid probe associates with S5p and S2p sites to a similar extent (i.e., 75%69% and 71%65%, n = 93 and 225, respectively; x2 test, p = 0.48; Figure 5C, 5D) and at the same levels observed with the BAC probe (,70%; Figure 3F). Analyses of simulated fosmid signals (Figure S3D, S3E) support the notion that the association of fosmid signals with S5p sites prior to induction, or with both S5p and S2p after activation, are not explained by random processes (x2 test comparisons between experimental and simulated association pS5p/2TPA = 0.0007, pS5p/+TPA = 0.014, pS2p/+TPA, 0.0001), whereas the association with S2p sites prior to induction can be (pS2p/2TPA = 0.62).

uPA Flanking Genes, CAMK2G and VCL, Are Transcriptionally Active and Associated with Active Transcription Factories Independently of TPA Activation We have shown large scale repositioning of the uPA locus following TPA treatment of HepG2 cells (Figure 1D). The short genomic separation between uPA gene and neighbouring genes, CAMK2G and VCL (40 kb and 80 kb, respectively; Figure 5A), led us to investigate in more detail how the TPA treatment affected the transcriptional state of the three genes, by comparing the levels of unprocessed transcripts and their association with S5p and S2p factories (Figure 5). The levels of primary transcripts, produced before and after TPA treatment, were determined by qRT-PCR with primers that amplify the exon1-intron1 junction, using total RNA extracted from HepG2 cells (Figure 5B); cells were treated in parallel with aamanitin, an inhibitor of RNAP transcription, to discriminate populations of newly made from stable transcripts. Abundant detection of primary transcripts above a-amanitin levels shows that CAMK2G and VCL are actively transcribed prior to TPA activation, whereas uPA primary transcripts are weakly transcribed (Figure 5B; see also Figure 1B, 1C). The levels of CAMK2G and VCL primary transcripts decrease by 2.8-fold and increase by 1.5-fold, respectively, upon TPA treatment, whereas uPA primary transcripts increase by ,11-fold (Mann-Whitney U test, p = 0.05 for the three genes; n = 3 independent replicates). PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

Figure 4. Factories containing S2p are also marked by the S5p modification and are less abundant than sites marked by S5p. (Aâ&#x20AC;&#x201C;C) RNAP-S5p sites are more abundant than RNAP-S2p sites. Cryosections (,140 nm thick) from HepG2 cells grown 6TPA for 3 h were indirectly immunolabelled with antibodies specific for RNAP-S5p or -S2p (green), as indicated (A, B). Nuclei were counterstained with TOTO-3 (red). Representative images from TPA-treated cells are shown. RNAP-S5p and -S2p detection was optimised by using the highest concentrations of antibodies that give little detectable background in sections treated with alkaline phosphatase (see Figure S2E, S2F). Bars: 2 mm. Measurement of the number of S5p and S2p sites per unit area in the nucleoplasm (C) reveals a larger population of S5p than S2p sites, both before and after activation (Student t-test, p,0.0001 for both cases; number of nuclear sections analysed were for S5p, n = 45 and 41, and for S2p, n = 38 and 46, respectively, for 2 and +TPA). The decrease in S5p sites after TPA activation is statistically significant (p = 0.012), whereas no statistically significant difference was observed in the number of S2p sites (p = 0.44). (Dâ&#x20AC;&#x201C;F) Most S2p sites also contain the S5p modification. Cryosections were first indirectly immunolabelled in the absence (D) or presence (E) of an antibody against RNAP-S5p (4H8; red) or in the presence of an unrelated anti-biotin antibody (F; red) that detects mitochondria in the cytoplasm (F; arrowheads). After formaldehyde cross-linking to preserve the first immunocomplex, sections

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

were indirectly immunolabelled with an antibody against RNAP-S2p (H5; green). Nucleic acids were counterstained with TOTO-3 (insets), and confocal images collected using the same settings without signal saturation. Bars: 2 mm. Pre-incubation with 4H8 reduces the intensity of S2p signal throughout the nucleoplasm, except at interchromatin regions (E, arrows), in comparison to control samples incubated in the absence of 4H8 (D). Incubation with anti-biotin control antibody before labelling with H5 antibody (F) has no effect on S2p distribution throughout the nucleoplasm. (G) Measurements of average S2p intensity across the nucleoplasm show a ,3-fold decrease in S2p detection after blocking with 4H8. Omission of 4H8 or pre-incubation with unrelated antibodies does not affect the level of the S2p signal in the nucleoplasm. Dotted line indicates the background intensity in the S2p channel measured from sections incubated with all antibodies except H5 (2H5; images unpublished). Number of nuclear profiles was .20 for each sample. doi:10.1371/journal.pbio.1000270.g004

We were surprised to find that the smaller fosmid probe associates with S2p sites prior to induction to the same extent as the larger BAC probe that also covers the active flanking genes, CAMK2G and VCL (36% and 31%, respectively, x2 test, p = 0.26). Simultaneous detection of fosmid and BAC probes in combination with S2p detection (Figure S4B, S4C), confirmed that fosmid and BAC probes associate with S2p sites to a similar extent prior to activation (37%69% and 28%69%, respectively; n = 46; x2 test, p = 0.37; Figure S4C). Unexpectedly, whilst performing these analyses, we observed that a small proportion of uPA loci detected with the fosmid probe (15%63% and 20%68%, n = 47 and 41, respectively for 2 and +TPA; x2 test, p = 0.57; Figure S4D) were looped out from the signal labelled by the BAC probe independently of TPA activation, in a manner reminiscent of loci looping out from their CTs, but on a much smaller genomic length scale ([4,5]; Figure 1D). This mechanism provides a rationale for the independent behaviour of neighbouring genes with respect to their association with specific nuclear landmarks, such as shown here for the association with specific RNAP structures. The fosmid-based analyses allowed us to confirm at higher spatial resolution that the uPA gene is preferentially associated with a subpopulation of RNAP factories, which, prior to induction, contain RNAP-S5p, but not RNAP-S2p. After induction the uPA locus is highly associated with both S5p- and S2p-containing RNAP sites, consistent with its active state.

of uPA gene association with S2p sites can provide a higher estimate, with the caveat that these levels of association may also reflect, in part, an indirect colocalisation of uPA loci with transcription factories associated with the flanking genes. To verify whether detection of newly synthesized uPA transcripts at the uPA locus occurred concomitantly with its association with active, S2p factories, we performed triple labelling experiments in which we simultaneously detected the uPA locus, uPA transcripts, and S2p active factories (Figure 6B). We found that most uPA loci associated with an RNA signal are also associated with S2p sites both before and after TPA treatment (76% and 71%, n = 70 and 75, respectively; x2 test, p = 0.80; Figure 6B), confirming that S2p sites are active sites of transcription. As expected from the higher levels of uPA gene association with S2p than RNA-FISH sites, we find that of all the uPA loci associated with S2p sites (,70%) only half are also associated with uPA transcripts (unpublished; see also [36]). This difference is likely to reflect technical limitations in the detection of transcripts of short genes containing only small introns. Our analyses of uPA gene association with different phosphorylated forms of RNAP and with newly made transcripts show that the vast majority of uPA alleles are associated with poised S5p+S2p2 transcription factories prior to activation. We also identify a small population of alleles transcribed at low levels prior to activation and predominantly associated with sites that are marked by S2p (Figures 5B and 6B). Upon activation, uPA alleles become associated with RNAP sites marked by both S2p and S5p. We consistently find a 2-fold increase in the association of the uPA gene association with S2p sites or uPA transcripts (Figures 3C, 3F, 5D, and 6A), identifying an increased frequency of transcription upon TPA induction.

TPA Induction of Gene Expression Is Associated with an Increase in the Number of Active uPA Alleles In order to investigate whether the extent of uPA gene association with active transcription factories reflects their transcriptional activity, we combined the detection of the uPA locus by DNA-FISH with the visualisation of uPA transcripts by RNA-FISH using five tagged oligoprobes mapping at introns 4, 5, and 10, and exons 8 and 9. We find that the locus is already transcriptionally active prior to activation, with 13%68% of uPA alleles showing an association with RNA-FISH signals (n = 200 loci; Figure 6A), consistent with the detection of uPA protein and transcripts before induction (Figures 1C and 5B, respectively). RNase control experiments confirmed the specificity of the discrete RNA-FISH signals observed within the nucleus (Figure S5). We also show that the frequency of active alleles increases 2-fold, to 27%610%, after TPA treatment (n = 216; x2 test comparison for 2 and +TPA, p = 0.0022; Figure 6A), in agreement with the 2-fold increase observed in the extent of uPA association with RNAP-S2p (Figure 4B). The extent of uPA gene association with RNA signals after activation (27%) is consistently smaller than its association with S2p sites (70%; Figure 5D). However, it must be considered that the efficiency of detection of newly made transcripts at the site of transcription depends on the abundance of RNAP loading at each single gene, the stability of newly made transcripts at the site of synthesis, and the rate of splicing, and therefore is likely to provide a lower estimate for the frequency of gene activity. Intron lengths at the uPA gene are at most ,900 bp, and small introns can be promptly removed within seconds of synthesis. In contrast, the level PLoS Biology | www.plosbiology.org

uPA Gene Association with Poised or Active Factories Occurs Irrespectively of the Locus Position Relative to Its CT We next investigated whether the increased frequency of uPA gene transcription or association with transcription factories were dependent on the locus position relative to its CT, both before and after activation, when the locus is preferentially located at the CT interior and exterior, respectively. We performed triple labelling cryoFISH experiments for chromosome 10, the uPA locus, and transcription factories, before and after activation (Figure 7A–C). Analyses were initially performed with BAC probes, but also confirmed with fosmid probes (Figure S6). As previously observed in double labelling experiments (Figure 1D), the uPA locus was preferentially located at the CT interior and associated with S5p transcription factories before activation (Figure 7B). TPA activation induced the relocation of most uPA loci to the CT exterior and an association with factories marked with both S5p and S2p (Figure 7B). To determine whether the association of the uPA locus with poised or active factories was dependent on its position relative to the CT, we calculated the proportion of uPA locus association with RNAP at each CT position (Figure 7C). For simplicity, the data for the three regions around the edge of the CT (‘‘inneredge,’’ ‘‘edge,’’ and ‘‘outer-edge’’) were pooled into a single region, 9

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

Figure 5. CAMK2G and VCL genes are transcriptionally active and associated with active transcription factories independently of TPA activation. (A) Diagram depicting the uPA locus and the location of the fosmid probes used for the detection of CAMK2G (covering ,46 kb of genomic sequence), uPA (,44 kb), and VCL (,42 kb) genes in cryoFISH experiments. Arrows indicate the 59-39 transcription direction. (B) Detection of primary transcripts for the CAMK2G, uPA, and VCL genes in HepG2 cells 6TPA, incubated in the presence or absence of the RNAPII inhibitor aamanitin. Total RNA was extracted and the levels of primary transcripts determined by qRT-PCR using primers that amplify the exon1-intron1 junctions. Error bars represent the standard deviation of three independent replicates. (C, D) Frequency of association of CAMK2G, uPA, and VCL with RNAP-S5p (C) or -S2p (D) sites before and after TPA treatment for 3 h. The position of each fosmid signal relative to S5p and S2p sites was determined by immuno-cryoFISH, using rhodamine-labelled fosmid probes and antibodies specific for RNAP phosphorylated on S5 or S2 residues (images unpublished). The association of uPA and its flanking (CAMK2G and VCL) genes with RNAP (S5p or S2p) was scored as ‘‘associated’’ (signals overlap by at least 1 pixel) or ‘‘separated’’ (signals do not overlap or are adjacent as in Figure 3B, 3E). The association with S5p sites is similar for all genes across the locus before and after activation (C; nuPA = 91 and 93; nCAMK2G = 95 and 95; nVCL = 90 and 92, for 2 and +TPA, respectively). CAMK2G and VCL are also associated with S2p sites independently of TPA treatment (D; nCAMK2G = 104 and 121; nVCL = 79 and 86, for 2 and +TPA, respectively), whereas the association of the uPA gene with S2p specifically increases upon TPA induction (D; nuPA = 254 and 225). The increase in association with S2p factories observed after activation was statistically significant (x2 test, p,0.0001). doi:10.1371/journal.pbio.1000270.g005

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

Figure 6. The induction of uPA gene expression is associated with an increase in the number of active alleles. (A) TPA induction of uPA transcripts at the transcription site. uPA gene transcription was detected in HepG2 cells 6TPA by combining RNA and DNA-FISH in the same cryosection. Sections were first hybridized with mixture of five Cy3-labelled fiftymer oligonucleotide probes mapping at introns 4, 5, and 10, and exons 8 and 9 and the signal amplified with fluorescent antibodies. After cross-linking the immunocomplexes detecting uPA RNA, sections were hybridized with the fosmid uPA probe. uPA DNA and RNA signals were scored as ‘‘active’’ (signals overlap or adjacent to each other) or ‘‘inactive’’ (signals do not overlap) to determine the state of activity of the uPA alleles in 6TPA-treated cells. Arrowheads indicate the position of uPA-DNA (green) associated with or separated from uPA-RNA (red) signals. The use of exon probes results in the detection specific cytoplasmic RNA signals. Nucleic acids were counterstained with TOTO-3 (blue). Bars: 2 mm. TPA treatment increases the frequency of uPA allele association with uPA RNA signals from 13% to 27% (n = 200 and 216, respectively; p = 0.002). (B) The frequency of co-association of the uPA gene (blue) with RNAP-S2p (green) and uPA-RNA (red) was determined by triple labelling, using the fosmid uPA probe, antibodies specific for RNAP phosphorylated at S2 residues (H5), and Cy3-labelled oligonucleotide probes. The frequency of association of active uPA alleles with S2p sites was scored as ‘‘associated’’ (signals overlap by at least 1 pixel) or ‘‘separated’’ (signals do not overlap and adjacent signals). Arrowhead indicates the position of uPA-DNA (blue) and uPA-RNA (red) relative to S2p (green). Nucleic acids were counterstained with TOTO-3 (inset). Bar: 2 mm. Most uPA genes that are actively transcribed are also associated with RNAP-S2p, confirming that this RNAP modification marks active factories. doi:10.1371/journal.pbio.1000270.g006

frequencies across the different CT regions before and after TPA activation (n = 151 and 104, respectively; logistic regression analysis, p = 0.10). The increase in association of uPA loci with active factories marked by S2p after TPA is statistically significant (p,0.0001), but the effect of TPA on the level of association is the same across all positions relative to the CT (p = 0.62; Figure 7C). These results show that the uPA gene associates with poised or active transcription factories with similar frequencies across the different CT regions both before and after transcriptional

but analyses of the five regions gave similar results. Surprisingly, we found that the association of the uPA locus with S5p occurred with similar frequency at all locations relative to the CT, independently of TPA activation (n = 90 and 134 loci, respectively; logistic regression analysis, p = 0.20; Figure 7C). This shows that the CT interior is accessible to the transcription machinery and does not preclude the interaction of a primed gene with poised, S5p+S2p2 factories. In the case of S2p, we also found that the uPA locus associates with active transcription factories with similar PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

Figure 7. Increased association of uPA locus with active transcription factories upon TPA activation is independent of the large-

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

scale repositioning of the uPA locus relative to its chromosome territory. (A) The position of the uPA locus (uPA-DNA, red) relative to chromosome 10 (CT10, blue) and to S5p and S2p sites (green) was determined in HepG2 cells 6TPA activation for 3 h, by immuno-cryoFISH using a digoxigenin-labelled BAC probe containing the uPA locus, whole chromosome 10 paint, and antibodies specific for RNAP phosphorylated on S5 or S2 residues. Arrowheads indicate the position of uPA loci. Nuclei acids were counterstained with DAPI (images unpublished). Bars: 2 mm. (B) The frequencies of uPA loci which are associated or separated from RNAP-S5p (left hand column) or RNAP-S2p (right hand column) were measured at each position relative to the CT (‘‘inside,’’ ‘‘inner-edge,’’ ‘‘edge,’’ ‘‘outer-edge,’’ and ‘‘outside’’) before and after TPA induction. uPA locus association with S5p and S2p sites was detected in all position analyzed. (C) The proportion of uPA loci that associate with RNAP-S5p (left hand column) or RNAPS2p (right hand column) before and after TPA activation was determined at each CT position. Association of the locus with S5p or S2p sites before and after activation are independent of its position relative to the CT. (D) Combined detection of the chromosome 10 (CT10, blue), the uPA gene (uPA-DNA, green), and uPA RNA (uPA-RNA, red) was performed by RNA- and DNA-FISH using a whole chromosome paint, the uPA fosmid probe, and Cy3-conjugated oligonucleotide probes. Nuclei acids were counterstained with TOTO-3 (insets). Arrowheads indicate the position of uPA-RNA and uPA-DNA signals. Bar: 2 mm. (E) The proportion of uPA genes that associate with uPA RNA before and after TPA activation was calculated at each CT position. Prior to activation, uPA genes at the CT interior are less frequently transcribed than loci outside the territory. After TPA treatment, the frequency of transcriptional events is the same at all CT positions. doi:10.1371/journal.pbio.1000270.g007

activation. Therefore, looping of the uPA locus out of its CT is not required for the association of the uPA gene with active transcription factories. To investigate whether the large-scale chromatin movements that accompany TPA induction of the uPA gene had an effect on the association of the flanking genes, CAMK2G and VCL, with active (S2p) factories, we performed triple labelling cryoFISH experiments for chromosome 10, the CAMK2G, or VCL loci detected with fosmid probes and active (S2p) factories (Figure S7). We find that the association of CAMK2G or VCL with S2p also occurs with similar frequency at all locations relative to the CT, both before and after TPA activation (logistic regression analysis, p = 0.84 and 0.64 for CAMK2G and VCL, respectively; nCAMK2G/2TPA = 108, nCAMK2G/+TPA = 128, nVCL/2TPA = 147, nVCL/+TPA = 134). These results show that the association of the uPA, CAMK2G, and VCL genes with RNAP-S2p occurs independently of CT position. A recent analysis of gene activation induced by the insertion of a strong (ß-globin) enhancer in a gene rich-region also showed no effect on the frequency of locus association with active transcription factories at different positions relative to the CT [43], although this region preferentially localises at the CT edge. In the case of the murine Hoxb locus, a small preferential association of Hoxb1 and flanking genes with active transcription factories is observed outside the CT upon retinoic acid treatment [36]. Different mechanisms of gene regulation may act on different genes and depend on the kinetics of induction over the shorter activation (3 h) of the uPA gene by TPA treatment in comparison with retinoic acid treatment for several days to induce Hox genes. Finally, to investigate whether the CT position of the uPA gene has an influence on its transcriptional activity, we labelled the uPA gene, chromosome 10, and uPA transcripts simultaneously (Figure 7D, 7E). We used the fosmid uPA probe for highest spatial resolution. We find that the uPA gene is transcribed with the same frequency irrespectively of its CT position upon TPA activation (logistic regression analysis, p = 0.74; n = 100). These results differ from the murine Igf2bp1 and Cbx1 genes, flanking the Hoxb cluster, which are also transcriptionally active at all CT positions, independently of Hoxb induction, but are preferentially active outside the CT [36]. Difficulties in the detection of the Hoxb1 transcripts did not allow a similar analysis of allelic transcription upon induction [36], to help establish how general the correlation is between gene positioning outside the CT and transcriptional states. Prior to activation, we unexpectedly found that the largest fraction of uPA loci, which are internal to the CT, are less likely to be transcriptionally active (logistic regression analysis, p = 0.0001; n = 103), whereas the smaller proportion of uPA loci not located at the preferred internal CT position is transcribed at the same frequency as upon TPA induction (Figure 7E). These results suggest that the internal CT positioning has a silencing effect on the primed uPA locus prior to its induction, which helps prevent PLoS Biology | www.plosbiology.org

transcript elongation or interferes with transcript stability, revealing unexpected properties of locus positioning within the nuclear landscape. In summary, our analyses of the uPA gene prior to induction showed that it was (a) preferentially positioned at the interior of its CT; (b) in a poised state, characterized by open chromatin configuration and the presence of RNAP-S5p at regulatory regions; and (c) preferentially associated with poised, S5p+S2p2 transcription factories. Transcriptional activation induces largescale relocation of the gene towards the CT exterior and a preferred association with factories containing both (S5p and S2p) RNAP modifications, as expected in the active state. Although the correlation between looping out of the CT and the change in RNAP configuration suggested that the external position might favour transcriptional activation, triple-labelling experiments showed that the position of the uPA locus relative to its CT and the association with poised or active transcription factories are independent events. RNA-FISH experiments confirm that after TPA induction both external and internal positions of the uPA gene, with respect to its CT, are equally competent for transcription. However, positioning of the uPA locus inside the CT, before activation, may help control the levels of transcription, as uPA genes that are found outside of the CT before TPA treatment are more likely to be transcribed (Figure 7E). Our findings reinforce the idea that the interior of CTs is not repressive for the association of genes with transcription machinery, suggesting that large-scale chromatin movements are unlikely to be necessary for genes to find transcription factories, although they may influence the extent of association for specific subsets of genes. This study expands current models of gene regulation by showing that silent genes can be associated with poised transcription factories and that factory association and gene position relative to the CT can be independent factors. Our results are also compatible with the notion that poised transcription factories represent a sub-population of specialized sites that may allow primed genes to respond rapidly and efficiently to specific activation signals.

Materials and Methods A detailed description of the experimental procedures is given in Text S1.

Cell Culture, RNA Detection, and Western Blotting HepG2 cells were cultured in the absence or presence of 100 ng/ml TPA (Sigma) for the indicated times as previously described [27]. Treatment of HepG2 cells with 1 mM flavopiridol (1 h; Sanofi-Aventis) was used for the inhibition of RNAP-S2p phosphorylation by CDK9, and 75 mg/ml a-amanitin (5 h; Sigma) to inhibit RNAP transcription. For the quantification of mature 13

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

(,150 nm thick) from HepG2 cells were treated 6 AP prior to immunolabelling with phosphorylation dependent RNAP antibodies. Sections were indirectly immunolabelled with antibodies against RNAP-S5p (4H8; C, E), or RNAP-S2p (H5; D, F). Absence of signal after pre-treatment of cryosections with AP (E, F) shows that 4H8 and H5 antibodies bind specifically to phosphorylated epitopes, and do not detect unphosphorylated RPB1. Nucleic acids were counterstained with TOTO-3 (insets). Bar: 2 mm. Found at: doi:10.1371/journal.pbio.1000270.s002 (8.10 MB TIF)

and unprocessed transcript levels of uPA, CAMK2G, or VCL genes, total RNA was extracted and amplified by RT-PCR. Western blotting was performed using total HepG2 protein extracts and antibodies specific to different RNAP phosphoforms. Experimental details and information about the antibodies used can be found in Text S1.

MN-ChIP and PCR Reactions Chromatin cross-linking, MNase digestion, and immunoprecipitation were performed as described previously [31]. See Text S1 for primer sequences (Table S1), antibodies used, and experimental details.

Figure S3 Frequency of association of simulated uPA loci with RNAP-S5p and RNAP-S2p sites. (A) Diagram of the genomic location of the uPA gene and the regions covered by the BAC (RP11-417O11; ,228 kb) and fosmid (G248P85612C10; ,44 kb) probes used for FISH experiments. Arrows indicate the 5939 transcription direction. (B, D) To analyse the frequency of association of a simulated uPA locus positioned at random coordinates with RNAP-S5p or -S2p sites, we generated a new image containing the original experimental S5p (B, D; green) or S2p (images unpublished) distribution, and the experimental uPA signal (Exp-uPA; blue; arrowheads), and an additional, simulated uPA signal with the same number of pixels, but positioned at random nucleopasmic coordinates (Siml-uPA; red; arrows). This analysis was performed for both BAC (B) and fosmid (D) experiments presented in Figures 3B, 3C, 3E, 3F and 5C, 5D, respectively. Nucleic acids were counterstained with TOTO-3 (insets). Bars: 2 mm. (C, E) Frequency of association of experimental and simulated uPA loci with RNAP-S5p and RNAP-S2p in the same experimental images of HepG2 cells treated 6TPA. Experimental uPA loci associate more frequently with S5p sites than simulated loci, positioned at random nucleoplasmic coordinates, both before and after TPA treatment, for both BAC (C) and fosmid (D) probes. In contrast, the level of association of experimental BAC or fosmid loci with S2p sites is similar to the levels of simulated (random) loci before, but not after, TPA activation. This confirms that the increased association of the uPA gene with S2p sites detected following activation is not due to random processes and is not affected by the size of the probe used. The numbers of simulated sites were nBAC,S5p = 68 and 62; nBAC,S2p = 69 and 75; nfosmid,S5p = 47 and 40; nfosmid,S2p = 50 and 46, for 2 and +TPA, respectively. Found at: doi:10.1371/journal.pbio.1000270.s003 (8.58 MB TIF)

Immunofluorescence, Ultracryosectioning, and cryoFISH uPA protein expression was detected with specific rabbit antiserum antibodies. For high-resolution imaging using cryoFISH, ultrathin cryosections (,140–150 nm thick) were immunolabelled and/or labelled by fluorescence in situ hybridization (FISH) essentially as described before [14]. RNA-FISH was performed using oligonucleotide probes (http://www.singerlab. org/protocols). See Text S1 for information about the antibodies and probes used, and for experimental details.

Microscopy, Quantitative Image Analyses, and Statistics Images were acquired by confocal microscopy and analysed quantitatively. Statistical analyses were performed using x2 test, logistic regression analysis, ANOVA, Student t-test, or MannWhitney U test. See Text S1 for further details.

Supporting Information Figure S1 H3K9me2 histone modification is present at H19 gene promoter, but not the uPA gene promoter. Cross-linked, sonicated chromatin from 6TPA-treated HepG2 cells was digested with MN for 50 min before immunoprecipitation with antibodies that recognize lysine 9 dimethylated histone H3 (H3K9me2), associated with closed chromatin. Control (‘‘unrelated’’) antibodies were polyclonal anti-uPAR antibodies. Immunoprecipitated DNA was amplified using primers spanning the 59 portion of the imprinted H19 gene and the uP fragment of the uPA gene (see scheme in Figure 2D). Found at: doi:10.1371/journal.pbio.1000270.s001 (0.72 MB TIF)

Figure S4 The uPA gene loops out of its chromatin domain. (A) Diagram illustrating the genomic location of the uPA gene and the regions covered by the BAC (RP11-417O11; ,228 kb, blue) and fosmid (G248P85612C10; ,44 kb, red) probes used for FISH experiments. Arrows indicate the 59-39 transcription direction. (B) The association of BAC and fosmid signals (arrowhead) relative to RNAP-S2p sites (green) was determined simultaneously by immuno-cryoFISH before (unpublished image) and after (B) TPA activation for 3 h, using digoxigenin-labelled BAC (blue) and rhodamine-labelled fosmid (red) probes. High magnification images show examples of the coassociation of both BAC- and fosmid-uPA signals with S2p sites (top) or the association of fosmid-uPA signal, but not the BAC signal with S2p sites (bottom). Nucleic acids were counterstained with DAPI (inset). Bar: 2 mm. (C) Frequency of the association of BAC or fosmid signals with S2p sites is similar between probes (x2 test, p = 0.37 and p = 0.81, n = 46 and 40, for 2 and +TPA, respectively). Error bars are standard deviations from two replicate experiments. (D) Fosmid signals (red) can loop out of BAC foci (green). Arrowheads indicate the position of BAC and fosmid signals. Insets show higher magnification images. Nucleic acids were counterstained with TOTO-3 (blue). Bar: 2 mm. Frequency of co-localisation of BAC foci with fosmid-uPA signals show that

Figure S2 Characterization of antibodies against differ-

ent phosphorylated forms of RNAP. (A, B) Reactivity of different RNAP antibodies against hyper- (IIO) and hypophosphorylated (IIA) forms of the largest subunit of RNAP (RPB1) was assessed by Western blotting using total protein extracts from HepG2 cells treated for 1 h in the absence (A) or presence (B) of 1 mM flavopiridol, a specific inhibitor of CDK9, the Ser2 kinase. Both IIO and IIA bands are detected by antibody N-20 (A), raised against the amino-terminus of RPB1, which binds independently of phosphorylation. Antibodies against S5p (4H8 and H14) or S2p (H5) only detect the IIO band (A, B). Treatment of Western blots with alkaline phosphatase (AP; A) prior to immunolabelling reveals the specificity of 4H8, H14, and H5 antibodies for phosphorylated epitopes, and has no effect on the binding of an antibody to the N terminus of RPB1. The specificity of H5 antibodies to the S2p modification is shown by loss of binding in flavopiridol-treated samples (B). Binding of 4H8 and H14 antibodies to IIO band is insensitive to flavopiridol treatment in these conditions, consistent with their specificity for the Ser5 modification (S5p) catalyzed by CDK7, as previously shown (B and [21]). Protein loading was controlled using histone H2B antibodies. (C–F) Cryosections PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

15%–20% of uPA alleles detected with the fosmid probe loop out from the BAC signals. The difference between the levels of fosmid looping 6TPA was not statistically significant (x2 test, p = 0.57; n = 47 and 41, for 2 and +TPA, respectively). Found at: doi:10.1371/journal.pbio.1000270.s004 (6.78 MB TIF)

position or TPA activation. The association of CAMK2G or VCL loci with RNAP-S2p were determined relative to the chromosome 10 territory (CT10) in HepG2 cells, before and after TPA activation (3 h), by cryoFISH using a whole chromosome 10 paint and digoxigenin-labelled CAMK2G and VCL fosmid probes. (A) Images represent examples of CAMK2G loci (red, arrowheads) that co-localise with RNAP-S2p (green) inside (left) or outside (right) of CT10 (blue). Nucleic acids were counterstained with DAPI (insets). Bar: 2 mm. (B) The proportion of CAMK2G or VCL loci, which associate with RNAP-S2p, was determined at each CT position (inside, edge, outside), before and after TPA activation. Association of either gene with S2p is independent of their position relative to the CT and to TPA treatment. Found at: doi:10.1371/journal.pbio.1000270.s007 (5.98 MB TIF)

Figure S5 Control experiments for cryo-RNA-FISH. (A–

D) Cryosections (,150 nm thick) of HepG2 cells were hybridised with Cyanine3 labelled uPA oligonucleotide probes before (A) and after (B–D) TPA activation. Inspection of nucleoplasmic regions identifies frequent uPA-RNA signals in TPA-treated cells (B, arrowheads). Pre-incubation of sections with RNase A (C) or omission of oligonucleotide probes (D) abolishes most uPA-RNA signals within the nucleoplasm, demonstrating its specificity. Nucleic acids were counterstained with TOTO-3 (red). Bars: 2 mm. Found at: doi:10.1371/journal.pbio.1000270.s005 (8.38 MB TIF)

Figure S8 Examples of classification criteria for the association of uPA loci with RNAP-S5p and RNAP-S2p sites. The position of the uPA locus (red) with S5p and S2p sites (green) was determined by immuno-cryoFISH using a rhodaminelabelled BAC probe containing the uPA locus and antibodies specific for RNAP phosphorylated at residues S5 or S2 of the CTD. Associated uPA loci co-localise with S5p or S2p sites if signals overlap by at least a single pixel, whereas separated sites do not show overlap of the two signals and include loci that may touch an RNAP site without signal overlap. Found at: doi:10.1371/journal.pbio.1000270.s008 (4.11 MB TIF)

Detection of the uPA gene with a fosmid probe recapitulates the CT looping and position-independent association with S5p and S2p factories observed with the BAC probe. (A) The position of the uPA locus (fosmiduPA, green) relative to the chromosome 10 territory (CT10, red) was determined in HepG2 cells, before and after TPA activation for 3 h, by cryoFISH using a whole chromosome 10 paint and a digoxigenin-labelled uPA fosmid probe. The positions of uPA loci were scored as ‘‘inside,’’ ‘‘inner-edge,’’ ‘‘edge,’’ ‘‘outer-edge,’’ and ‘‘outside’’ relative to its CT as in Figure 1D. Nucleic acids were counterstained with TOTO-3 (blue). Arrowheads indicate uPA loci. Bar: 2 mm. The histogram shows that the fosmid-uPA probe recapitulates the TPA-induced CT looping observed with the BAC probe (Figure 1D), as expected. In the inactive state, the locus is preferentially localized at the CT interior (61% loci inside or at the inner-edge, n = 234 loci), and relocates to the exterior upon activation (58% loci at outer-edge or outside, n = 230 loci; x2 test, p,0.0001). (B) The proportion of uPA loci detected using the fosmid probe, which associate with RNAP-S5p before and after TPA activation, was calculated at each CT position (inside, edge, outside) as for the BAC probe (Figure 7C). Association of the locus with S5p sites before and after activation is independent of its position relative to the CT (logistic regression analysis; p = 0.18 and p = 0.26 before and after TPA, n = 68 and 71, respectively). Overall no effect of TPA treatment on the association of the uPA gene with S5p was detected (logistic regression analysis, p = 0.54). Association of uPA loci detected with the fosmid probe with S2p sites before and after activation is also independent of its position relative to the CT (logistic regression analysis, p = 0.27 and p = 0.79 before and after TPA, n = 113 and 140, respectively). This same analysis also detected an increased association of uPA gene with S2p sites upon activation (logistic regression analysis, p = 0.0004). Found at: doi:10.1371/journal.pbio.1000270.s006 (5.63 MB TIF) Figure S6

Table S1 MN-ChIP primers. List of primers used in MN-ChIP analyses in 59 to 39 orientation. Found at: doi:10.1371/journal.pbio.1000270.s009 (0.06 MB DOC) Text S1 Supplementary information. Found at: doi:10.1371/journal.pbio.1000270.s010 (0.10 MB DOC)

Acknowledgments We thank Stefano Biffo, Marco E. Bianchi, Andre´ Mo¨ller, Julie K. Stock, and Emily Brookes for critically reading the manuscript; Alessandro Marcello and Peter R. Cook for advice on RNA-FISH protocols and probe design; and C. Covino (A.L.E.M.B.I.C., Milano, Italy) for help with confocal microscopy. Flavopiridol was generously provided by Sanofi Aventis and the National Cancer Institute, National Institutes of Health.

Author Contributions The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: CF AP MPC. Performed the experiments: CF SQX PL DM. Analyzed the data: CF SQX FR AP MPC. Contributed reagents/materials/analysis tools: CF SQX MRB AP. Wrote the paper: CF SQX AP MPC. Performed the experiments on uPA expression (mRNA and protein), MN digestion of chromatin, MN-ChIP: CF PL. Performed immuno-cryoFISH experiments and analyses: SQX. Performed statistical analysis: FR AP.

uPA-flanking genes, CAMK2G and VCL, associate with S2p factories independently of CT

Figure S7

References 6. Finlan LE, Sproul D, Thomson I, Boyle S, Kerr E, et al. (2008) Recruitment to the nuclear periphery can alter expression of genes in human cells. PLoS Genet 4: e1000039. doi:10.1371/journal.pgen.1000039. 7. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, et al. (2008) Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453: 948–951. 8. Kumaran RI, Spector DL (2008) A genetic locus targeted to the nuclear periphery in living cells maintains its transcriptional competence. J Cell Biol 180: 51–65. 9. Reddy KL, Zullo JM, Bertolino E, Singh H (2008) Transcriptional repression mediated by repositioning of genes to the nuclear lamina. Nature 452: 243–247. 10. Kimura H, Sugaya K, Cook PR (2002) The transcription cycle of RNA polymerase II in living cells. J Cell Biol 159: 777–782.

1. Misteli T (2007) Beyond the sequence: cellular organization of genome function. Cell 128: 787–800. 2. Fraser P, Bickmore W (2007) Nuclear organization of the genome and the potential for gene regulation. Nature 447: 413–417. 3. Pombo A, Branco MR (2007) Functional organisation of the genome during interphase. Curr Opin Genet Dev 17: 415–455. 4. Chambeyron S, Bickmore WA (2004) Chromatin decondensation and nuclear reorganization of the HoxB locus upon induction of transcription. Genes Dev 18: 1119–1130. 5. Volpi EV, Chevret E, Jones T, Vatcheva R, Williamson J, et al. (2000) Largescale chromatin organization of the major histocompatibility complex and other regions of human chromosome 6 and its response to interferon in interphase nuclei. J Cell Sci 113: 1565–1576.

PLoS Biology | www.plosbiology.org

January 2010 | Volume 8 | Issue 1 | e1000270

Poised Transcription Factories

11. Becker M, Baumann C, John S, Walker DA, Vigneron M, et al. (2002) Dynamic behavior of transcription factors on a natural promoter in living cells. EMBO Rep 3: 1188–1194. 12. Verschure PJ, van Der Kraan I, Manders EM, van Driel R (1999) Spatial relationship between transcription sites and chromosome territories. J Cell Biol 147: 13–24. 13. Abranches R, Beven AF, Aragon-Alcaide L, Shaw PJ (1998) Transcription sites are not correlated with chromosome territories in wheat nuclei. J Cell Biol 143: 5–12. 14. Branco MR, Pombo A (2006) Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol 4: e138. doi:10.1371/journal.pbio.0040138. 15. Heard E, Bickmore W (2007) The ins and outs of gene regulation and chromosome territory organisation. Curr Opin Cell Biol 19: 311–316. 16. Ragoczy T, Bender MA, Telling A, Byron R, Groudine M (2006) The locus control region is required for association of the murine beta-globin locus with engaged transcription factories during erythroid maturation. Genes Dev 20: 1447–1457. 17. Ragoczy T, Telling A, Sawado T, Groudine M, Kosak ST (2003) A genetic analysis of chromosome territory looping: diverse roles for distal regulatory elements. Chromosome Res 11: 513–525. 18. Boehm AK, Saunders A, Werner J, Lis JT (2003) Transcription factor and polymerase recruitment, modification, and movement on dhsp70 in vivo in the minutes following heat shock. Mol Cell Biol 23: 7628–7637. 19. Spilianakis C, Kretsovali A, Agalioti T, Makatounakis T, Thanos D, et al. (2003) CIITA regulates transcription onset via Ser5-phosphorylation of RNA Pol II. EMBO J 22: 5125–5136. 20. Gomes NP, Bjerke G, Llorente B, Szostek SA, Emerson BM, et al. (2006) Genespecific requirement for P-TEFb activity and RNA polymerase II phosphorylation within the p53 transcriptional program. Genes Dev 20: 601–612. 21. Stock JK, Giadrossi S, Casanova M, Brookes E, Vidal M, et al. (2007) Ring1mediated ubiquitination of H2A restrains poised RNA polymerase II at bivalent genes in mouse ES cells. Nat Cell Biol 9: 1428–1435. 22. Saunders A, Core LJ, Lis JT (2006) Breaking barriers to transcription elongation. Nat Rev Mol Cell Biol 7: 557–567. 23. Phatnani HP, Greenleaf AL (2006) Phosphorylation and functions of the RNA polymerase II CTD. Genes Dev 20: 2922–2936. 24. Crippa MP (2007) Urokinase-type plasminogen activator. Int J Biochem Cell Biol 39: 690–694. 25. Look MP, Foekens JA (1999) Clinical relevance of the urokinase plasminogen activator system in breast cancer. APMIS 107: 150–159. 26. Van Veldhuizen PJ, Sadasivan R, Cherian R, Wyatt A (1996) Urokinase-type plasminogen activator expression in human prostate carcinomas. Am J Med Sci 312: 8–11. 27. Iban˜ez-Tallon I, Caretti G, Blasi F, Crippa MP (1999) In vivo analysis of the state of the human uPA enhancer following stimulation by TPA. Oncogene 18: 2836–2845. 28. Guillot PV, Xie SQ, Hollinshead M, Pombo A (2004) Fixation-induced redistribution of hyperphosphorylated RNA polymerase II in the nucleus of human cells. Exp Cell Res 295: 460–468.

PLoS Biology | www.plosbiology.org

29. Pombo A, Hollinshead M, Cook PR (1999) Bridging the resolution gap: Imaging the same transcription factories in cryosections by light and electron microscopy. J Histochem Cytochem 47: 471–480. 30. Pombo A, Jackson DA, Hollinshead M, Wang Z, Roeder RG, et al. (1999) Regional specialization in human nuclei: visualization of discrete sites of transcription by RNA polymerase III. EMBO J 18: 2241–2253. 31. Ferrai C, Munari D, Luraghi P, Pecciarini L, Cangi M, et al. (2007) A transcription-dependent MNase-resistant fragment of the uPA promoter interacts with the enhancer. J Biol Chem 282: 12537–12546. 32. Nightingale KP, O’Neill LP, Turner BM (2006) Histone modifications: signalling receptors and potential elements of a heritable epigenetic code. Curr Opin Genet Dev 16: 125–136. 33. Xie SQ, Martin S, Guillot PV, Bentley DL, Pombo A (2006) Splicing speckles are not reservoirs of RNA polymerase II, but contain an inactive form, phosphorylated on serine2 residues of the C-terminal domain. Mol Biol Cell 17: 1723–1733. 34. Zeitlinger J, Stark A, Kellis M, Hong JW, Nechaev S, et al. (2007) RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet 39: 1512–1516. 35. Jackson DA, Iborra FJ, Manders EM, Cook PR (1998) Numbers and organization of RNA polymerases, nascent transcripts, and transcription units in HeLa nuclei. Mol Biol Cell 9: 1523–1536. 36. Morey C, Kress C, Bickmore WA (2009) Lack of bystander activation shows that localization exterior to chromosome territories is not sufficient to up-regulate gene expression. Genome Res e-published ahead of print April 23, 2009, doi: 10.1101/gr.089045.108. 37. Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA (2007) A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130: 77–88. 38. Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, et al. (2007) RNA polymerase is poised for activation across the genome. Nat Genet 39: 1507–1511. 39. Core LJ, Waterfall JJ, Lis JT (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322: 1845–1848. 40. Iborra FJ, Escargueil AE, Kwek KY, Akoulitchev A, Cook PR (2004) Molecular cross-talk between the transcription, translation, and nonsense-mediated decay machineries. J Cell Sci 117: 899–906. 41. Mintz PJ, Patterson SD, Neuwald AF, Spahr CS, Spector DL (1999) Purification and biochemical characterization of interchromatin granule clusters. EMBO J 18: 4308–4320. 42. Johnson C, Primorac D, McKinstry M, McNeil J, Rowe D, et al. (2000) Tracking COL1A1 RNA in osteogenesis imperfecta: splice-defective transcripts initiate transport from the gene but are retained within the SC35 domain. J Cell Biol 150: 417–432. 43. Noordermeer D, Branco MR, Splinter E, Klous P, van Ijcken W, et al. (2008) Transcription and chromatin organization of a housekeeping gene cluster containing an integrated beta-globin locus control region. PLoS Genet 4: e1000016. doi:10.1371/journal.pgen.1000016.

January 2010 | Volume 8 | Issue 1 | e1000270

Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human James R. Wagner1, Bing Ge2, Dmitry Pokholok3, Kevin L. Gunderson3, Tomi Pastinen2,4, Mathieu Blanchette1* 1 School of Computer Science, McGill University, Montreal, Quebec, Canada, 2 McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada, 3 Illumina, San Diego, California, United States of America, 4 Department of Human and Medical Genetics, McGill University Health Centre, McGill University, Montreal, Quebec, Canada

Abstract Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (,750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 39 end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases. Citation: Wagner JR, Ge B, Pokholok D, Gunderson KL, Pastinen T, et al. (2010) Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human. PLoS Comput Biol 6(7): e1000849. doi:10.1371/journal.pcbi.1000849 Editor: Wyeth W. Wasserman, University of British Columbia, Canada Received December 15, 2009; Accepted June 2, 2010; Published July 8, 2010 Copyright: Ă&#x; 2010 Wagner et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was funded in part by Genome Canada (www.genomecanada.ca), Genome Quebec (www.genomequebec.com), and the National Science and Engineering Research Council of Canada (www.nserc.ca). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: blanchem@mcb.mcgill.ca

allele a may disrupt the binding site, resulting in lower expression. While the lower expression of allele a may be compensated by an increased transcription rate at allele A in heterozygous individuals, this may not be the case for individuals who are homozygous aa, which may result in phenotypic variation. Researchers have tried to identify causative regulatory variants by measuring the total expression (i.e. expression of both copies) of a particular gene across multiple individuals, treating this as a Quantitative Trait Locus (eQTL), and mapping nearby cis-regulatory regions to the gene expression (reviewed in [3]). A key problem with this type of approach is that environmental differences across individuals can affect gene expression, making the mapping problem very challenging. Instead, a focus on the relative expression of two alleles within the same cell has been suggested to factor out environmental sources of variation, allowing for more sensitive and specific detection of epigenetic and genetic phenomena related to local control of gene expression [4]. Combining AI measurements obtained from a set of individuals with genotyping information about these same individuals, one can map cis-regulatory variants [5â&#x20AC;&#x201C;8] or detect epigenetic variation in allelic expression [9,10].

Introduction In a diploid cell, each gene is present in two copies. The vast majority of microarray-based or RNA sequencing-based gene expression studies do not distinguish between the two copies and measure the sum of the expression of the two alleles. This hides the fact that the two alleles are not necessarily expressed at equal levels, a phenomenon called allelic imbalance (AI) [1]. The complete shut down of one allele results in monoallelic expression (ME). The most drastic example of ME is X-chromosome inactivation, where, in females, one of the two copies of the X chromosome is inactivated and packaged into heterochromatin [2]. Less drastic is random monoallelic expression, whereby a randomly selected copy of a gene or chromosomal region is silenced by epigenetic mechanisms (e.g. methylation). In contrast, imprinting results in parent-of-origin specific inactivation of the maternal or paternal allele, depending on the locus. While monoallelic expression completely silences one of the two alleles, less drastic allelic expression differences can result from a heterozygous Aa regulatory site. For example, allele A of a transcription factor binding site may allow binding and result in normal expression of the target gene on that chromosome, while PLoS Computational Biology | www.ploscompbiol.org

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

a real number; (iii) the regions affected are typically quite large, whereas AI can affect a single, short gene, or even only part of a gene. The approaches listed above are thus not easily applicable to the detection of AI in gene expression. An alternate family of statistical approaches called changepoint methods has been proposed for segmenting array CGH data into regions exhibiting consistent signals [28,29]. These non-parametric, model-free approaches have the benefit of segmenting real-numbered data without enforcing discretization. However, they are difficult to generalize to a situation like ours, where signals come from a mixture of discrete (sites with no expression, sites with expression but no imbalance) and continuous (sites with real-valued imbalance) state space. In this paper, we introduce a family of signal processing approaches for the analysis of AI data obtained from genotyping arrays. We consider both statistical approaches (Z-score computation) and machine learning approaches (Hidden Markov Models) to identify transcripts that show AI and to quantify the latter. We introduce a new type of left-to-right HMM for the joint prediction of allelic imbalance in the 53 samples considered. Our algorithms are evaluated using permutation testing and succeed at identifying regions with known AI. Our approaches reveal that more than 25% of transcripts (coding or non-coding) are subject to differential expression between the two alleles and that patterns of AI are varied and complex. The tools and data sets described here will help biologists and geneticists to identify regions of allelic imbalance, understand the mechanisms at play, identify the genetic or epigenetic causative agents, and associate expression polymorphisms with disease susceptibility.

Author Summary Measures of gene expression, and the search for regulatory regions in the genome responsible for differences in levels of gene expression, is one of the key paths of research used to identify disease causing genes, as well as explain differences between healthy individuals. Typically, experiments have measured and compared gene expression in multiple individuals, and used this information to attempt to map regulatory regions responsible. Differences in environment between individuals can, however, cause differences in gene expression unrelated to the underlying regulatory sequence. New genotyping technologies enable the measurement of expression of both copies of a particular gene, at loci that are heterozygous within a particular individual. This will therefore act as an internal control, as environmental factors will continue to affect the expression of both copies of a gene at presumably equal levels, and differences in expression are more likely to be explicable by differences in regulatory regions specific to the two copies of the gene itself. Differences between regulatory regions are expected to lead to differences in expression of the two copies (or the two alleles) of a particular gene, also known as allelic imbalance. We describe a set of signal processing methods for the reliable detection of allelic expression within the genome. Past studies with the goal of detecting AI have typically relied upon panels of SNPs with relatively low density, located in only a subset of transcribed genes of the genome [10â&#x20AC;&#x201C;12]. A simple threshold for the ratios of expression of the two alleles at a heterozygous locus is usually established (e.g. 1.5 or 2-fold) and a gene is called as imbalanced based upon whether or not the SNP(s) within it exceed this threshold. Optimal AI profiling in a genomewide manner would require high-density sampling of expressed heterozygous sites in the genome. We recently generated the first large-scale, high-resolution assay of allelic expression [13]. In this study, Illumina genotyping arrays were used to measure differential allelic expression at 755,284 polymorphic sites in lymphoblastoid cell lines (LCL) derived from 53 CEU samples included in the HapMap project [14]. Because of the noise in single point AI measurements made at each heterozygous locus, sophisticated analytical methods are required to make the most out of this data. In this paper, we develop signal processing approaches for the accurate identification and delineation of transcripts with allelic imbalance, either in a single individual at a time, or in a collection of samples. To our knowledge, no hypothesis-free computational approaches have been proposed for the analysis of this type of data. Detection of AI in Ge et al. [13] relied heavily upon RefSeq, Vega, and UCSC gene annotations, and SNPs were first partitioned into windows corresponding to these annotated regions as well as intergenic regions and windows with significant AI were reported. Sophisticated bioinformatics approaches have been developed for a related, but simpler, problem in the past, that of detecting Copy Number Variants (CNV) or Loss Of Heterozygosity (LOH) in cancer cells using array-based Comparative Genomic Hybridization (CGH) [15â&#x20AC;&#x201C;18] or genotyping arrays [19â&#x20AC;&#x201C;25]. These include the PennCNV program [26] and the QuantiSNP program [27], that use a Hidden Markov Model related to one of the approaches considered here. However, CNV or LOH regions have properties that make them easier to detect than regions of allelic imbalance: (i) the signal, coming from genomic DNA is generally quite strong, whereas gene expression can be very low; (ii) the number of copies of an allele is a small integer, whereas the allelic expression ratio is PLoS Computational Biology | www.ploscompbiol.org

Methods Allelic Imbalance Data Allelic imbalance was assayed using Illumina Infinium Human1M/Human1M-Duo SNP bead microarrays. These arrays, originally designed for genotyping, have probes for approximately 1.1 Million polymorphic sites from HapMap, of which 755284 where used for this study. Each probe estimates the abundance of each of the two possible alleles in the sample. Normally, genomic DNA is hybridized onto the chip and the genotypes are easily inferred from the probe intensities. We have previously described how one can take advantage of this technology to measure allelic expression in a high-resolution, genome-wide manner [13]. Briefly, total RNA is extracted and cDNAs are synthesized based on a protocol on heteronuclear RNA, allowing us to measure unspliced primary transcripts [8]. The cDNA sample is hybridized onto the array and each probe estimates the abundance of each of the two alleles in the sample. In parallel, genomic DNA from the same cell line is hybridized, which provides the basis for normalization of the cDNA hybridization while providing us with the genotype of each sample. Details for the full process of experimentally obtaining the raw imbalance information, as well as the sample information, can be obtained from [13]. Data obtained from technical replicates show that although the total expression level (sum of RNA abundance in both alleles) measured at a given SNP is highly reproducible (R2 = 0.864), single point allelic expression ratios are much more noisy (R2 = 0.632), especially for low expression levels (see 9). This suggests that careful data analysis is required to extract as much information as possible. Let ai ~fai1 ,ai2 g be the set of two alleles present at polymorphic site i in the population, for i~1:::n (the rare cases where three or more alleles exist at the same site are ignored in this study). For notational simplicity, we assume that the genome consists of a 2

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

single pair of chromosomes. In reality, the analysis that follows is repeated separately for each autosome. Genotype phasing consists of the decomposition of the genotype of an individual into its two homologous chromosomes. For individual k, let xk ~xk1 ,xk2 ,:::,xkn and yk ~yk1 ,yk2 ,:::,ykn , be these two chromosomes, where xki ,yki [ai . Phasing remains a computationally and statistically challenging problem [30]. In the case of HapMap individuals, phased genotypes are available, although they are not error free. Removal of SNPs not phased in CEU HapMap release R22 resulted in 755284 SNPs which were utilized in our study. k k (ai1 ) and XDNA (ai2 ) be the intensity read outs obtained Let XDNA from the probes interrogating site i when hybridizing the genomic DNA of individual k. If individual k is heterozygous at site i (i.e. k k (ai1 ) and XDNA (ai2 ) to be large. xki =yki ), then we expect both XDNA k When it is homozygous, say for ai1 , (i.e. xi ~yki ~ai1 ), we expect k k (ai1 ) to be large and XDNA (ai2 ) to be small. The genotype of XDNA an individual can thus be deduced from the ratio of the two measurements. k k (ai1 ) and XRNA (ai2 ), the intensity read outs Consider now XRNA obtained from the probes interrogating site i when hybridizing cDNA obtained from whole cell RNA extraction. When heterozygous site i sits in a transcribed region with no allelic imbalance, both k k (ai1 ) and XRNA (ai2 ) will be relatively large. Any difference XRNA between the two may indicate allelic imbalance. Regions that are not transcribed will obtain low values for both alleles. We consider the following pair of observations at each site i:

(exonic and intronic) having roughly 1.3 times the SNP density as intergenic regions (one SNP per 3.5 kb in genic regions, one SNP per 4.5 kb in intergenic regions). Figure 1(a) shows the distribution of E over all genic and intergenic positions. The distribution of expression levels in gene regions is clearly bimodal: a good fraction of genes are not transcribed in LCL, and most but not all intergenic sites are not transcribed. Assuming that 50% of genes and 10% of intergenic sites are expressed, we can deconvolve these distributions to obtain the distribution of E for expressed and nonexpressed regions (Figure 1(b)). For two individuals, experiments were done in triplicates. As seen in Figure S1 (a) and (b), the technical noise in the measurement of both E and R is quite significant. As expected, R values are particularly noisy at low expression levels.

Identification of Transcripts with Allelic Imbalance The main problem addressed in this study is the statistically robust identification of genomic regions with significant and consistent allelic imbalance. We start by noting that the data is too noisy to accurately call imbalance based on each SNP individually (e.g. by simply using on Rki ), especially for regions whose expression level is relatively low. We thus consider approaches that take advantage of the fact that most regions with AI are relatively long and are expected to contain more than one SNP. Four main approaches were designed, implemented and compared. Each method aims to robustly assign a score AI(i) to each SNP i, so that SNPs that belong to transcripts with significant allelic imbalance obtain large (positive or negative) scores. In all our AI detection algorithms, AI is detected without reference to any kind of gene annotation, contrasting with the annotation-driven approach used by Ge et al. [13], which allows us to identify regions of AI whose boundaries does not necessarily correspond to annotated genes. The first three approaches consider data from each sample individually while the last considers data from all samples jointly in order to improve the detection of AI in individual samples. The four approaches considered are first summarized below and then described in details. The code implementing each algorithm is available at http://www.mcb. mcgill.ca/,blanchem/AI/code.zip.

k k X (ai1 )zXRNA (ai2 ) Eik ~log RNA k k XDNA (ai1 zXDNA (ai2 ) measures the total transcript abundance, and 0

1 k XRNA (ai1 ) B X k (a ) C i1 B C Rki ~ logB DNA C, k @ XRNA (ai2 ) A k (ai2 ) XDNA which measures the fold imbalance between the expression of the two alleles. Normalization with the DNA sample, which, for heterozygous sites, is known to be balanced, normalizes for probe sensitivity and biases. Values for E and R were collected at 755284 sites. Those sites are not uniformly distributed in the genome, with genic regions

N N

Simple smoothing refers to the approach where the allelic imbalance log-ratio of a SNP is taken as the average of its own log-ratio and that of the m surrounding SNPs on either side. The Z-Score approach involves binning SNPs based on their expression level, assigning each SNP a Z-Score based on

Figure 1. Distribution of E values. (a) Distribution over genic/intergenic regions (b) deconvolutions to expressed/non-expressed regions. doi:10.1371/journal.pcbi.1000849.g001

PLoS Computational Biology | www.ploscompbiol.org

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

N N

with known AI - see below), but is far from being as accurate as the proposed Z-Score approach, because it leads to bleeding edges at transcript boundaries. We also investigated a version of the ZScore approach where SNPs are not binned by expression level prior to Z-Score computation; this resulted in a small but significant decrease in accuracy, showing that the appropriate modeling of the dependency between the noise in allelic ratio and the total expression level is an important feature of our approach.

its own allelic imbalance ratio, and then determining the ZScores of windows of consecutive SNPs and assigning this score to each SNP within the window. The ergodic HMM approach models the AI data in a given individual as being generated by a Hidden Markov Model whose states correspond to different levels of total expression and allelic ratios. The left-to-right HMM approach is an extension of the ergodic model that allows using the AI data from all individuals in order to assess the frequency of AI at each site, and then use those as site-specific priors on the transition probabilities to predict AI regions separately for each individual, but in the context of the data from other individuals.

Single-Sample Ergodic Hidden Markov Model Approach The linear nature of the data in question lends itself well to a Hidden Markov Model (HMM) in which each data point corresponds to a particular SNP, the hidden states correspond to qualitative descriptions of the allelic imbalance (e.g. positive imbalance, negative imbalance, no imbalance), and emissions correspond to the total expression Ei and the allelic log-ratio Ri observed at site i. We built an HMM consisting of a total of eight hidden states (see Figure 2a). Seven of these states correspond to SNPs take belong to expressed transcripts in the LCL sample in question, with various levels of imbalance: S~fSzzz ,Szz ,Sz ,S0 ,S{ ,S{{ ,S{{{ g, corresponding to strongly positive imbalance (Szzz ), moderately positive imbalance (Szz ), slightly positive imbalance (Sz ), balance (S0 ), slightly negative imbalance (S{ ), moderately negative imbalance (S{{ ) and strongly negative imbalance (S{{{ ). There is also a state (SN ) that corresponds to SNPs located in regions that are predicted not to be transcribed, and for which allelic imbalance is meaningless. The emission probability for each state s[S is modeled with a pair of normal distributions for the E and R values, with parameters (mE,s , s2E,s ), and (mR,s , and s2R,s ) respectively. Whereas both total expression E and allelic imbalance measurements R are observed at heterozygous sites, only the expression is measured at homozygous sites. In the latter case, the imbalance data is left unobserved (i.e. all 8 states are equally likely to have generated the R observation). Homozygous SNPs can thus be included in the model training and predictions, and can help delineating regions of based on expression levels. An HMM with a realistic correspondence to the data can in principle be built with 2Kz2 states, where K§1 represents the number of levels of positive (and negative) imbalance that the model represents. Larger values of K should in principle be favorable as they allow a finer discretization of allelic ratios. Models with K[f1,2,3,4g were trained and the false discovery rate measured and compared (see section 0). It was found that K~3 performed better than K~1 and K~2, and similarly to K~4 (Figure S2), so this value was used for both the ergodic and left-toright models. Certain parameters of the HMM are trained using the BaumWelch algorithm, while others are fixed. For SN , the emission probability distribution for E is modeled non-parametrically by the histogram of Figure 1(b) (black curve) whereas all expressing states share the same total expression distribution from Figure 1(b) (red curve). These emission probability distributions are kept constant during the training procedure. The Baum-Welch algorithm [31] is used to find maximum likelihood estimators for mR,s and s2R,s , for s[S, as well as all transition probabilities and the initial state probability. The Baum-Welch algorithm is an expectation-maximization (EM) [32] approach that alternates between the Expectation step (or E-step), in which the posterior probability over states is computed for each site using the Forward-Backward algorithm, and the Maximization step (or MStep) where the parameters of the emission and transition probability distributions are adjusted to best reflect the observed data given these posterior probabilities. Formulas for updating the

Simple Smoothing Approach Consider heterozygous site i and define window W(i,m) to be the set consisting of m heterozygous sites to the left of i, m heterozygous sites to the right of m, andP i itself. The simple smoothing approach estimates AI smoothing (i)~ j[W (i,m) Rj =(2mz1). Any site i with DAI smoothing (i)Dwtsmoothing would then be reported as having imbalance, for some appropriate threshold tsmoothing . Based on False Discovery Rate assessment (described below), a value of m~4 was determined to be the optimal window size and was used for all results reported.

Z-Score Approach At sites with no allelic imbalance, the value of Ri is modeled adequately using a normal distribution centered at 0. However, the variance is inversely correlated with the total expression Ei , as AI is difficult to estimate when the total expression is low (see Figure S1b). The range of possible values of E are subdivided into 100 bins of equal size and the mean mb and variance s2b of R values were determined for SNPs belonging to every expression level bin b. A site-specific Z-Score Z(i) is assigned to heterozygous site i as Z(i)~(Ri {mbin(Ei ) )=sbin(Ei ) . Homozygous sites, being uninformative with respect to allelic ratios, are excluded from the analysis. Consider now a collection of w consecutive heterozygous (ignoring possibly intervening homozygous sites) SNPs i1 ,i2 ,:::,iw . We define P Z(ik ) k~1:::w the regional Z-score as Z(i1 ,i2 ,:::,iw )~ pffiffiffiffi . Assuming the w normality of noise in Ri measurements, Z(i1 ,i2 ,:::,iw ) follows a Normal(0,1) distribution under the null hypothesis of absence of allelic imbalance. Regional Z-Scores are first computed for every possible window of w~1:::50 heterozygous sites. The region with the highest regional Z-score (in absolute value), Zmax is selected first and we set AI zscore (i)~Z max for all sites heterozygous i within the region. This region is then masked out and the next highest scoring nonoverlapping window is selected. The process is repeated until all heterozygous sites have a Z-Score assigned. We note that because the AI zscore (i) is obtained based on the best window that contains site i, there is an complex issue of multiple hypothesis testing that makes that this measure will not follow a Normal(0,1) distribution under the null hypothesis (i.e. absence of AI). In consequence, one cannot easily translate AI zscore (i) into a p-value. We also considered a variant of the Z-Score approach where each SNP is assigned the Z-Score of the fixed-size window centered around it. This approach, which can be seen as an improved version of our simple smoothing approach, indeed improves on the latter (based on permutation testing and comparison to transcripts PLoS Computational Biology | www.ploscompbiol.org

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

Figure 2. Architecture of the two Hidden Markov model used in this study. (a) Ergodic HMM architecture. HistoExp and HistoNoExp refer to the distributions depicted in Figure 1(b). For readability, states Szzz and S{{{ are not shown. (b) Multi-sample left-to-right HMM architecture. States Szzz , Szz , S{{ , and S{{{ are not shown for clarity. Only transition probabilities are trained. All copies of a given state have the same emission probability distribution, described on their left. doi:10.1371/journal.pcbi.1000849.g002

base pairs will be T l , which is efficiently computed using the eigenvalue decomposition of T. To ensure that our training procedure was not subject to overfitting, we used 2-fold cross validation (dividing the 53 samples into one 26-sample data set and one 27-samples data set) and trained our 8-state ergodic HMM separately on each half the samples. The parameters and transition probabilities obtained were nearly identical, and so were the FDR estimates obtained by running each HMM on the complementary data set, indicating that overfitting is not an issue.

emission probability parameters and transition probabilities are adapted straightforwardly from Mitchell [33]. We considered training one HMM per individual (which would allow the flexibility to model inter-experiment variation in noise, for example), or to train a single HMM based on the data from all individuals (which would have the benefit of being based on more data). The latter option produced slightly better results and this is the strategy we used for the rest of the study. We also considered filtering out sites with low total expression, as their allelic expression ratio may be less reliable. However, slightly better results were obtained without any filtering (allowing non-expressed SNPs to naturally be classified as belonging to state SN ). Training on the whole data set took less than Baum-Welch 20 iterations and 3 hours to converge on a standard desktop computer (convergence is defined as two consecutive iterations where no parameter or transition probability changed by more than 10{ 5 or 1% of their value). Restarts from different initial values converged to nearly the same values. The Viterbi algorithm [34] can then be used to identify, in each individual, predicted regions of different levels of positive or negative imbalance. The Forward-Backward algorithm [35] yields an estimate of the posterior probability of each state at each site. In the latter case, a useful summary score for each site is the posterior expected allelic Pexpression log-ratio, which we use as AI predictor: AI ergodic (i)~ s[S PrÂ˝Si ~sDE1::n ,R1::n :ms . Until now we have assumed homogenous transition probabilities, regardless of the distance in base pairs between consecutive SNPs along the chromosome. However, a more accurate model would factor in the distance between neighboring SNPs, to increase the probability of self-loops (i.e. staying in the same state) when the two sites are nearby but increase the probability of state change for two distant sites. Such an approach has been used previously in HMMs designed to detect CNVs [27]. We obtained a unit transition probability matrix T as the d-th root of the transition matrix obtained via Baum-Welch training of the homogeneous model, where d is the average distance (in base pairs) between two consecutive SNPs in our data. Then, the transition probability matrix used for a pair of sites separated by l PLoS Computational Biology | www.ploscompbiol.org

Multi-Sample Left-to-Right HMM Approach The previous HMM is called ergodic because it models an ergodic, homogeneous Markov chain over the state space (i.e. the set of transition probabilities is independent of the position along the genome). One limitation of this HMM is that it does not take full advantage of the fact that data exists for multiple individuals and that, while not all individuals are expected to have AI in exactly the same regions, one does expect AI hotspots where a significant fraction of the individuals would have imbalance. That would be the case, for example, for genes where one allele is commonly or always silenced via epigenetic mechanisms, or when AI is due to a common regulatory variant. The approach proposed in this section aims at predicting AI regions separately in each individual, while taking into consideration the data observed in all individuals. In doing so, we still want to be able to identify AI regions that are unique to a given individual, but are hoping to improve the detection of regions with common AI. For example, AI regions containing only a few SNPs, or those where the imbalance is only moderate, may be missed when present in a single individual, but may be detectable if present in a large fraction of the population. In addition, we may be able to detect boundaries of AI regions with more accuracy when they are shared among individuals. The approach utilized to address this is termed the left-to-right HMM [35] (see Figure 2 (b)), similar to profile HMMs [36]. Each site has its own copy of the set of states and transitions can only occur between states associated with neighboring sites, from left to 5

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

right. Each copy of a given state shares the same emission probability distributions that are modeled the same way as with the ergodic HMM. However, transition probabilities will vary across positions, making the model non-homogeneous (in contrast to our ergodic HMM approach). This configuration allows for greater fine tuning at the level of each individual SNP or region, though at the cost of a substantially larger set of transition probabilities to be learned. The training of our left-to-right HMM is a two stage process. In the first stage, emission probabilities, transition probabilities, and start probabilities are estimated for the ergodic version of the HMM using the Baum-Welch algorithm described above, using all available individuals. The parameters of the emission probabilities of the states in the left-to-right HMM will be set to those obtained on the ergodic training and will not be reestimated. The obtained ergodic non-homogeneous distancecorrected transition probabilities will be used as prior for those of the left-to-right HMM. In the second stage, we now switch to learning the transition probabilities of the left-to-right HMM. We assume that the data set from each individual is the result of an independent run of the HMM: Pr((E 1 ,R1 ),(E 2 ,R2 ),:::,(E k ,Rk )DHMM)~Pi~1:::k Pr(E i ,Ri DHMM), and we seek to identify the set of transition probabilities of the left-toright HMM that maximizes this joint likelihood. Consider a site i that is not imbalanced in any individual but where site iz1 is positively imbalanced in a large fraction of the individuals. The maximum likelihood estimator for the transition from state S0 (i) to state Sz (iz1) will be higher than at other positions where few individual enter an imbalanced region. Now consider an individual where there is only weak evidence of AI starting at position iz1. When using an ergodic HMM for our predictions, the weak AI region will probably not be detected. However, in the left-to-right HMM, with the increased transition probability, the AI path becomes more likely, so provided that there is sufficient imbalance, the most likely path may now to go through one of the imbalanced state. Estimating transition probabilities between two sites separated by l base pairs is done using a simple modification to the standard Baum-Welch algorithm, where the update rule for transitions is: P j j : l j~1:::k (Pr(Si ~a,Siz1 ~b))zW T (a,b) where T l t’i,iz1 (a,b)~ P j (Pr(S ~a))zW i j~1:::k is the l-th power of the unit transition probability obtained previously and W indicates the pseudocount weight described in the following paragraph. The regularization obtained by using the ergodic transition probability as prior reduces the risks of overfitting while improving the convergence of the training procedure. In practice, based upon permutation tests and resulting FDR scores, a parameter of W ~1 was determined to be optimal (data not shown). Once the left-to-right HMM is trained using the data from all 53 individuals (which took 161 Baum-Welch iterations - less than 4 hours on a standard desktop computer), the standard Viterbi or Forward-Backbward algorithms are used to identify AI regions separately for each individual. As with the case of the ergodic HMM, we use the posterior expected allelic expression log-ratio AI LtoR (i) to summarize AI evidence at SNP i. Overfitting is a possible issue with our left-to-right HMM, as the number of parameters estimated is much larger than for the ergodic HMM. We performed 5-fold cross-validation, training on 4/5 of the data and predicting on 1/5. Thanks to our regularization procedure, the predictions obtained were very similar to those obtained by training and testing on the full data set, with only a marginal decrease in FDR. PLoS Computational Biology | www.ploscompbiol.org

Cross-Hybridization Upon study of some of the regions where AI was predicted in most or all individuals but where not known imprinted regions existed, we found that nearly half were a likely artifact of crosshybridization. All these suspicious regions were the results of a segmental duplication, where a fragment of a gene was duplicated. Because the fragments still matches the genic region, sites within them will appear to be expressed (as they match the transcript of the paralogous region), and polymorphisms will cause mismatches between the probe and the true transcript, which will result in apparent AI. We thus used the human Blastz self-alignment from the UCSC Genome Browser [37,38] to filter out regions corresponding to recent duplications. A possible alternate approach would consist of using the results of the genomic DNA hybridization to identify probes that match more that one location in the genome, with the possible added benefit of detecting DNA possible copy-number variation.

False-Discovery Rate Estimation Due to the relatively small number of ‘‘gold standard’’ regions known to exhibit AI, the best available option for comparison of the various models is through permutation tests. The goal was to preserve some of the structure of the genome such that only SNPs with approximately equal expression levels and heterozygosity would be swapped, i.e., the only factor that is swapped freely is that of the allelic imbalance ratio. Permuted data sets were generated as follows. Sites were partitioned into five levels based on the number of individuals in which they are heterozygous. Five bins were also assigned based on the average level of expression seen across all individuals. Each SNP was then finally assigned to one of 25 bins, with one bin for each of the possible combinations of heterozygosity frequency and expression levels. Sites were randomly permuted within each bin, preserving the correspondence between sites in different individuals (in the case of the leftto-right HMM, the first stage of training of global HMM parameters was first done on non-permuted data, and then the second stage of model training was done on permuted data). Preserving expression levels and heterozygosity is important to create permuted data sets that are as realistic as possible, in particular with respect to the fact that expressed sites are found in contiguous genomic regions rather than dispersed randomly in the genome. Each of the prediction methods described produces one AI score per site and per individual. For each method M, the number of regions of consecutive SNPs exceeding a given score threshold t, Nreal (t,M) and Nperm (t,M) was determined in the real and permuted data, resulting in a False-Discovery Rate of Nperm (t,M) . FDR(t,M)~ Nreal (t,M)

Results Each of our four approaches was applied to the data set and the AI predictions for each individual are available at http://www. mcb.mcgill.ca/,blanchem/AI/AIPredictions.zip.

Illustrative Case Studies We use two examples to highlight the features of the data and the methods developed. Figure 3 gives a sample of the raw data and predictions made by each method in the BLK locus. BLK is a gene that has previously been described as allelically imbalanced in LCL [13]. Interestingly, in this individual, two other neighboring genes have strong allelic imbalance, with FAM167A showing expression on the opposite allele compared to BLK and 6

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

Figure 3. Raw data and predictions. Example of genomic region with allelic imbalance. From top to bottom: Raw allelic log-ratio; Simple smoothing predictions; Z-score predictions; Ergodic 8-state predictions (expected allele log-ratio); Left-to-right 8-state HMM predictions (expected allele log-ratio); Raw total expression; UCSC known genes track. Data shown is for HapMap individual NA11840. Note: Allelic ratios at homozygous sites are not shown. doi:10.1371/journal.pcbi.1000849.g003

annotated gene, may reflect the presence of alternative alleledependent promoters. They may also represent completely novel unannotated transcripts. Another frequently observed pattern is the presence of AI within annotated transcripts, near the 59 or 39 end (e.g. the 39 end of the ITIH5 gene). Finally, AI regions often encompass one or more complete genes (e.g. GATA3 and NM_207423), possibly because of epigenetic modification of one of the two alleles. We note based on analysis done in [13] that SFTMBT2 and ITIH5 show evidence of heritable allelic expression, whereas GATA3 does not show correlation with common genetic variants and could represent epigenetic modification of expression in LCLs.

GATA4 also obtaining strong an consistent signals. Although in this example the boundaries of allelic expression domains align nicely with known gene boundaries, this is not the case in general. As is obvious from the figure, the raw expression and allelic ratio data are quite noisy. The simple smoothing approach succeeds at identifying the main regions of allelic imbalance but does so much less reliably and precisely than the other three approaches. Notice that this individual has no heterozygous sites in the 59 end of FAM167A. This results in different behaviors for each method. The ergodic approach assigns gradually decreasing expected allelic log-ratios in that region, while the Z-Score approach only predicts imbalance in the 39 end of the gene. However, the left-to-right HMM has the benefit of considering data from other individuals, which have some heterozygous sites in the 59 region of the gene, which allows it to predict strong and consistent negative allelic logratios over the whole gene, and a sharp transition entering the BLK transcript. A similar phenomenon is observed for GATA4. Figure 4 shows the set of predictions made by the Viterbi algorithm using the left-to-right HMM on the extended GATA3 locus, in all 53 samples. The region exhibits a large diversity of patterns of AI. In some cases, the region of AI closely matches an annotated gene (e.g. SFTMBT2 in several individuals). Often, AI regions do not overlap any known gene (e.g. the region located upstream of SFMBT2). Such regions, especially when they abut an PLoS Computational Biology | www.ploscompbiol.org

Evaluation and Validation The accuracy of the AI predictions made by each method was evaluated using both permutation testing (in order to assess the false discovery rate) and comparison to previously characterized AI transcripts.

Permutation Testing We first estimated the false-discovery rate (FDR) of each method using a permutation test where genomic sites are randomly permuted, subject to some constraints (preservation of heterozygosity and expression level; see Methods). This randomized data set preserves 7

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

Figure 4. Allelic imbalance in 53 HapMap individual in the GATA3 locus. Each row reports the sites where AI has been predicted by the 8state left-to-right HMM with the Viterbi algorithm. Each AI SNP is marked with a vertical black line; the impression of gray levels is an artifact of SNP density. Genes from RefSeq [44] are illustrated below. doi:10.1371/journal.pcbi.1000849.g004

the level of imbalance observed at each site, but randomly disperses sites in such a way that few regions are expected to exhibit strong and consistent allelic ratios over several consecutive sites (as real AI PLoS Computational Biology | www.ploscompbiol.org

transcripts should). For each algorithm, the number of genomic regions with AI score above some threshold t in the real data was compared to the corresponding number on the permuted data - the 8

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

Figure 5. False discovery rates (FDR). obtained by permutation testing at thresholds resulting in different numbers of AI regions being predicted. doi:10.1371/journal.pcbi.1000849.g005

simple smoothing approach and *45% more sensitive than the second best approach, which is the ergodic HMM. Similar observations hold for other FDR thresholds. Therefore, the information obtained from the total expression levels, as well as the added site-specific transition probabilities are beneficial in terms of obtaining reliable AI predictions. This is particularly noteworthy for regions whose AI is weaker (those ranking between the 500 to 1000th per individual), for which the FDR remains quite low with the left-toright HMM but quickly increases with all other methods.

ratio of these two numbers is an estimate of the FDR of the algorithm (note that the FDR could also be estimated at the individual SNP level, rather than at the region level; the conclusions are the same). Figure 5 shows the FDR curves obtained for each method, as a function of the number of predictions made. All methods are able to detect the most obvious cases of AI (roughly 200 regions per individual, where all methods have near-zero FDR). However, as our threshold decreases and the number of regions predicted increases, the performance of the four approaches become quite different. Setting 5% as an acceptable FDR, the simple smoothing, Z-Score, ergodic HMM, and left-to-right HMMs result in 360, 622, 662, and 954 predicted regions with AI. In other words, at that FDR level, the best approach, left-to-right HMM, is *160% more sensitive than the

Comparison to Known AI Transcripts Although no comprehensive set of validated AI transcripts exists to date, a set of 62 imprinted genes (containing 1099 SNPs in our data

Figure 6. Enrichment for SNPs called as allelically imbalanced in imprinted and AI genes. (a) Overlap with regions experimentally verified to be imprinted. (b) Overlap with experimentally validated imbalanced genes from Verlaan et al. [8]. doi:10.1371/journal.pcbi.1000849.g006

PLoS Computational Biology | www.ploscompbiol.org

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

set) have been collected from the literature and posted on www. geneimprint.com. Most imprinted regions are easily detected by most methods, as they affect relatively large genomic regions and their allelic expression ratios are extremely large. Figure 6 shows how the enrichment of the overlap between imprinted genes and the predictions made by each of the four methods varies as a function of the number of sites being predicted with AI. (The enrichment of the overlap between a set of predicted AI regions and a set of annotated regions is the ratio of the size of the overlap to the expected size of the overlap if AI regions had been selected randomly in the genome.) Imprinted SNPs are enriched 5 to 20-fold among the top predictions made by each algorithm (except the Z-Score approach, which assigns high scores to other types of regions). Focussing on the left-to-right HMM AI predictions at a 5% FDR threshold (which consist of roughly 40,000 SNPs per individual), we find that 67% (resp. 35%) of SNPs in imprinted regions are predicted to have AI in at least one (resp. five) individual. Manual inspection of imprinted genes that have gone undetected by any of our methods reveals genes that are short, contain few heterozygous SNPs, or are expressed at a very low levels in LCL. Allelic imbalance resulting from cis-regulatory variation typically have allele ratios less extreme than imprinted genes and are thus more difficult to detect. A set of 61 transcripts (containing 1596 SNPs in our data set) with AI resulting from cis-regulatory variation in LCL have been identified and validated by Verlaan et al. [8]. Figure 6 (b) shows the fold-enrichment of these SNPs among those predicted as AI SNPs by each of our methods. Here, the predictions made by the two types of HMMs perform

significantly better than the Z-Score and smoothing approaches, detecting approximately 50% and 100% more validated SNPs. Overall, our best approach is again the left-to-right HMM, which predicts 87% (resp. 70%) of the 1596 validated SNPS as imbalanced in at least one (resp. five) individual(s). Inspection of AI genes that were undetected showed that they exhibited little evidence of allelic imbalance by our method (see Figure S3). These represent likely false positives in earlier study as well as more localized effects caused by few independent AI measurements and driving the association tests in previous analyses [13].

Distribution of AI in the Genome and Across Individuals Our predictions allow a first glimpse into the diversity of allelic expression patterns in the human genome, although a comprehensive analysis of AI regions is beyond the scope of this study. We first observe that AI in LCL samples is widespread, with on average 9.7% (resp. 5.6%) of an individualâ&#x20AC;&#x2122;s genes containing at least one (resp. all) imbalanced SNP (using the left-to-right HMM with a threshold corresponding to an FDR of 5%). Considered in total, 54.4% of genes show at least one imbalanced SNP in at least one individual, and 45.6% of genes have all of their SNPs showing allelic imbalance in at least one individual. Note that only approximately 50% of genes in total are detectably expressed in LCL [39], and hence candidates for being allelically imbalanced. Thus, the majority of expressed genes show AI in one or more individuals. Figure 7 reports the distribution of AI regions across various types of genomic regions. While a substantial fraction (19%) of AI

Figure 7. Classification of AI regions based on their overlap with annotated protein-coding genes. The classification of an AI region is done based on a set of simple rules that allow for a sizable margin of error in the boundaries of the AI regions. Intergenic: Little or no overlap with annotated genes. Multiple transcripts: Overlaps several genes. Exact transcript: The left and right boundaries of the AI region match gene boundaries within 20 kb. 59 (resp. 39) end of transcript: AI region is at the 59 end (resp. 39 end) of the gene only. Intronic: AI region is within the gene but away from the gene boundaries. Extended 59 (resp 39): AI region extends upstream (resp. downstream) of the gene. doi:10.1371/journal.pcbi.1000849.g007

PLoS Computational Biology | www.ploscompbiol.org

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

significant number of genes with allelic imbalance [13]. However, taking full advantage of this technology requires advanced signal processing approaches to accurately detect, delineate and quantify allelic expression. Furthermore, relying too heavily on known gene annotation may hide the fact that most AI does not perfectly align with gene boundaries. Indeed, the approaches proposed here, which do not make use of gene annotations, reveal that allelic imbalance is widespread and exhibits complex patterns in relation to annotated genes. Although our approach was specifically applied to the analysis of data obtained from high-density genotyping arrays, it should be readily applicable to studies based on data obtained next generation RNA sequencing. Detection of AI based on data from genotyping arrays proves challenging because of the significant noise in the allelic ratio measured at individual SNPs and because of the complex patterns of AI. To our knowledge, our study represents the first in-depth, statistical and computational analysis of a large scale, genomewide allelic imbalance data set. Because of the noise level in allelic expression ratios at individual SNPs, one must rely on the fact that transcripts with allelic imbalance will generally contain several SNPs that are expected to show imbalance. Our Z-Score approach identifies regions where the allele ratio is significantly different from the expected one-to-one ratio. An aspect of the data that is not exploited by the Z-Score approach is that the total expression and allelic ratio are expected to be consistent across the transcript. Our two HMM approaches model this explicitly, and obtain better results in part because of this. An additional improvement in accuracy of AI detection is obtained by our left-to-right HMM, which considers jointly the data from all individuals to serve as prior for the detection of AI in each one. This approach yields improved detection of AI regions that are shared among many individuals, while being able to detect those present in only one or a few samples. This new type of machine learning problem, where a collection of sequences of observation are expected to have been derived from a common (but unknown) model but where each individual can significantly deviate from that model is a situation that may arise in a number of other situations where our left-toright HMM approach may be useful, including for comparative genomics based gene predictions [41] (where different species are expected to share some but not all of their exon structure). Although a detailed biological analysis of allelic imbalance and its phenotypic consequences is beyond the scope of this paper, our predictions reveal that AI is widespread, with roughly 10% of genes showing evidence of AI in a given individual, and with the majority of genes expressed in LCLs showing AI in at least one of our 53 samples. Although roughly 60% of AI regions are clearly related to an annotated transcript, they often reflect the presence of alternative promoters, splicing, or transcription termination. An increasing proportion of the genetic burden of disease is being associated with differences in gene regulation [42]. At the same time greater complexity of gene regulation and the transcriptome are being uncovered [43]. Therefore, hypothesisfree methods detecting allelic imbalance are a prerequisite to advancing our understanding of population variation in cisregulatory control by heritable or epigenetic mechanisms.

Figure 8. Commonality of allelic imbalance. Number of SNPs in AI regions, as a function of the number of individuals with AI at the same site. doi:10.1371/journal.pcbi.1000849.g008

regions closely match annotated gene boundaries, most exhibit more complex relationships to annotated protein-coding gene transcripts, a larger portion of AI regions (28%) are within annotated genes but cover only a fraction of the transcript. In nearly half of those, allelic expression is found toward the 39 end of the gene, possibly because of allele-specific transcription termination or mRNA degradation, or the presence of an allele-specific alternate transcription start site within the annotated gene. The presence of AI regions at the 59 end of the transcript appears somewhat less frequent. 22% have little or no overlap with protein-coding genes, although this fraction is enriched for other types of transcripts such as LINC-RNAs [40]. Our data set affords a first glimpse into the commonality of allelic imbalance at a given site across individuals. We calculated the number of individual showing AI (based on the Viterbi predictions; see Figure 8). The very long tail of this distribution indicates that a lot of AI is shared among a portion of the population. In fact, *65% of an individualâ&#x20AC;&#x2122;s AI regions are found in at least 10 other individuals. Allelic imbalance, whether caused by genetic or epigenetic causes, is thus highly structured in the human population. On the other hand, rare AI, defined as that seen in at most 10% of our individuals, constitutes approximately 20% of an individualâ&#x20AC;&#x2122;s AI regions, while 4% are unique to that individual. We note however that because AI regions found in a large number of samples are easier to detect than those that are less common in the population, we may underestimate the proportion of AI that is found in a small number of individuals. We note that the left-toright HMM predictions used for this analysis are potentially biased towards over-predicting sites with common AI and under-predicting those with rare AI. We thus repeated the analysis with the ergodic HMM approach, which does not suffer from this bias. The results were very similar, with only a very slight shift toward less frequent AI.

Supporting Information

Discussion

Figure S1 Analysis of the noise using technical replicates. (a) Replicability of expression value E. (b) Replicability of allelic ratio R. Found at: doi:10.1371/journal.pcbi.1000849.s001 (0.14 MB TIF)

The recent development of a genome-wide high-density assay of allelic imbalance based on genotyping arrays has resulted in a vast improvement in our understanding of this type of variation and in our ability to map this variation to causative regulatory SNPs [13]. A relatively simple gene-based analysis was sufficient to identify a PLoS Computational Biology | www.ploscompbiol.org

Performance of ergodic HMM with different levels of discretization. False-discovery rate obtained by ergodic HMMs

Figure S2

July 2010 | Volume 6 | Issue 7 | e1000849

Whole-Genome Differential Allelic Expression

with 4, 6, 8, and 10 states (corresponding to 1, 2, 3 and 4 levels of positive and negative allelic imbalance). Found at: doi:10.1371/journal.pcbi.1000849.s002 (0.15 MB TIF)

Acknowledgments We thank Javad Sadri for useful discussions, as well as three anonymous reviewers for their suggestions.

Analysis of AI data in false-negative regions. Red: Genome-wide distribution of AI measurements (total expression vs allelic ratio). Green: AI measurements in genes identified as imbalanced by Verlaan et al. [8] but not predicted as such by our approach. These genes show no sign of imbalance in our data. Found at: doi:10.1371/journal.pcbi.1000849.s003 (0.62 MB TIF)

Figure S3

Author Contributions Conceived and designed the experiments: JRW TP MB. Performed the experiments: JRW. Analyzed the data: JRW. Contributed reagents/ materials/analysis tools: BG DP KLG TP. Wrote the paper: JRW MB.

References 1. Pastinen T, Hudson T (2004) Cis-acting regulatory variation in the human genome’’. Science 306: 647–650. 2. Carrel L, Willard H (2005) X-inactivation profile reveals extensive variability in x-linked gene expression in females. Nature 434: 400–404. 3. Rockman MV, Kruglyak L (2006) Genetics of global gene expression. Nature Reviews Genetics 7: 862–872. 4. Pastinen T, Sladek R, Gurd S, Sammak A, Ge B, et al. (2004) A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics 16: 184–193. 5. Pastinen T, Ge B, Gurd S, Gaudin T, Dore C, et al. (2005) Mapping common regulatory variants to human haplotypes. Hum Mol Genet 14: 3963– 3971. 6. Serre D, Gurd S, Ge B, Sladek R, Sinnett D, et al. (2008) Global differential allelic expression in the human genome: A robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression. PLoS Genetics 4: e1000006. 7. Campino S, Forton J, Raj S, Mohr B, Auburn S, et al. (2008) Global validating discovered cis-acting regulatory genetic variants: Application of an allele specific expression approach to hapmap populations. PLoS One 3: e4105. 8. Verlaan DJ, Ge B, Grundberg E, Hoberman R, Lam KC, et al. (2009) Targeted screening of cis-regulatory variation in human haplotypes. Genome Research 19: 118–127. 9. Pollard KS, Serre D, Wang X, Tao H, Grundberg E, et al. (2008) A genomewide approach to identifying novel-imprinted genes. Human Genetics 122: 625–634. 10. Gimelbrant A, Hutchinson J (2007) Widespread monoallelic expression on human autosomes. Science 318: 1136–1140. 11. Pant KPV, Tao H, Beilharz EJ, Ballinger DG, Cox DR, et al. (2006) Analysis of allelic differential expression in human white blood cells. Genome Research 16: 331–339. 12. Lo SH, Wang Z, Hu Y, Yang HH, Gere S, et al. (2003) Allelic variation in gene expression is common in the human genome. Genome Research 13: 1855–1862. 13. Ge B, Pokholok DK, Kwan T, Grundberg E, Morcos L, et al. (2009) Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nature Genetics 41: 1216–1222. 14. International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, et al. (2007) A second generation human haplotype map of over 3.1 million snps. Nature 449: 851–861. 15. Rueda O, Diaz-Uriarte R (2007) Flexible and accurate detection of genomic copy-number changes from aCGH. PLoS Comput Biol 3: e122. 16. Marioni J, Thorne N, Valsesia A, Fitzgerald T, Redon R, et al. (2007) Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol 8: R228. 17. Shah SP (2008) Computational methods for identification of recurrent copy number alteration patterns by array cgh. Cytogenetic and genome research 123: 343–351. 18. Shah SP, Xuan X, Deleeuw RJ, Khojasteh M, Lam WL, et al. (2006) Integrating copy number polymorphisms into array cgh analysis using a robust hmm. Bioinformatics 22. 19. Li C, Beroukhim R, Weir B, Winckler W, Garraway L, et al. (2008) Major copy proportion analysis of tumor samples using snp arrays. BMC Bioinformatics 9: 204. 20. Wu L, Zhou X, Li F, Yang X, Chang C, et al. (2009) Conditional random pattern algorithm for loh inference and segmentation. Bioinformatics 25(1): 61–7. 21. Yau C, Holmes C (2008) CNV discovery using SNP genotyping arrays. Cytogenet Genome Res 123(1–4): 307–12. 22. Baross A, Delaney A, Li H, Nayar T, Flibotte S, et al. (2007) Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics 8: 368.

PLoS Computational Biology | www.ploscompbiol.org

23. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, et al. (2005) A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res 65(14): 6071–9. 24. Bengtsson H, Irizarry R, Carvalho B, Speed T (2008) Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24(6): 759–767. 25. Bengtsson H, Wirapati P, Speed T (2009) A single-array preprocessing method for estimating full-resolution raw copy numbers from all affymetrix genotyping arrays including genomewideSNP 5 and 6. Bioinformatics 25(17): 2149–56. 26. Wang K, Li M, Hadley D, Liu R, Glessner J, et al. (2007) Penncnv: An integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome snp genotyping data. Genome Research 17: 1665–1674. 27. Colella S, Yau C, Taylor J, Mirza G, Butler H, et al. (2007) QuantiSNP: an objective bayes hidden-markov model to detect and accurately map copy number variation using snp genotyping data. Nucleic Acids Res 35(6): 2013–25. 28. Venkatraman E, Olshen A (2007) A faster circular binary segmentation algorithm for the analysis of array cgh data. Bioinformatics 23(6): 657–663. 29. Fearnhead P (2006) Exact and efficient bayesian inference for multiple changepoint problems. Statistics and Computing 16: 203–213. 30. Browning S (2008) Missing data imputation and haplotype phase inference for genome-wide association studies. Human Genetics 124: 439–450. 31. Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41: 164–171. 32. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B 39: 1–38. 33. Mitchell T (1997) Machine Learning. McGraw Hill. 34. Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13: 260269. 35. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77: 257286. 36. Eddy SR (1998) Profile hidden markov models (review). Bioinformatics 14: 755–763. 37. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, et al. (2002) The human genome browser at ucsc. Genome Res 12: 996–1006. 38. Kent W, Baertsch R, Hinrichs A, Miller W, Haussler D (2003) Evolution’s cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20): 11484–11489. 39. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, et al. (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33: 422–425. 40. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, et al. (2009) Many human large intergenic noncoding rnas associate with chromatin-modifying complexes and affect gene expression. Proceedings of the National Academy of Sciences of the United States of America 106: 11667–11672. 41. Siepel A, Diekhans M, Brejova B, Langton L, Stevens M, et al. (2007) Targeted discovery of novel human exons by comparative genomics. Genome Research 17(12): 1763–73. 42. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nature reviews Genetics 10: 184–194. 43. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo´ R, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the encode pilot project. Nature 447: 799–816. 44. Pruitt KD, Tatusova T, Maglott DR (2007) Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35.

July 2010 | Volume 6 | Issue 7 | e1000849

Altering a Histone H3K4 Methylation Pathway in Glomerular Podocytes Promotes a Chronic Disease Phenotype Gaelle M. Lefevre1¤a, Sanjeevkumar R. Patel2, Doyeob Kim1¤b, Lino Tessarollo3, Gregory R. Dressler1* 1 Department of Pathology, University of Michigan, Ann Arbor, Michigan, United States of America, 2 Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America, 3 Neural Development Section, National Cancer Institute, Frederick, Maryland, United States of America

Abstract Methylation of specific lysine residues in core histone proteins is essential for embryonic development and can impart active and inactive epigenetic marks on chromatin domains. The ubiquitous nuclear protein PTIP is encoded by the Paxip1 gene and is an essential component of a histone H3 lysine 4 (H3K4) methyltransferase complex conserved in metazoans. In order to determine if PTIP and its associated complexes are necessary for maintaining stable gene expression patterns in a terminally differentiated, non-dividing cell, we conditionally deleted PTIP in glomerular podocytes in mice. Renal development and function were not impaired in young mice. However, older animals progressively exhibited proteinuria and podocyte ultra structural defects similar to chronic glomerular disease. Loss of PTIP resulted in subtle changes in gene expression patterns prior to the onset of a renal disease phenotype. Chromatin immunoprecipitation showed a loss of PTIP binding and lower H3K4 methylation at the Ntrk3 (neurotrophic tyrosine kinase receptor, type 3) locus, whose expression was significantly reduced and whose function may be essential for podocyte foot process patterning. These data demonstrate that alterations or mutations in an epigenetic regulatory pathway can alter the phenotypes of differentiated cells and lead to a chronic disease state. Citation: Lefevre GM, Patel SR, Kim D, Tessarollo L, Dressler GR (2010) Altering a Histone H3K4 Methylation Pathway in Glomerular Podocytes Promotes a Chronic Disease Phenotype. PLoS Genet 6(10): e1001142. doi:10.1371/journal.pgen.1001142 Editor: Veronica van Heyningen, Medical Research Council Human Genetics Unit, United Kingdom Received March 30, 2010; Accepted September 28, 2010; Published October 28, 2010 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Funding: This work was supported by NIH grant DK073722 and DK054740 to GRD. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: dressler@umich.edu ¤a Current address: Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America ¤b Current address: CDI Biosciences, Madison, Wisconsin, United States of America

developmental contexts. Genes of the Polycomb and Trithorax families encode proteins that are required for methylation of different histone lysine residues and often correlate with gene silencing or activation, respectively [6–9]. Many Trithorax group proteins, such as Drosophila TRX and human KMT2A (MLL), are histone H3 lysine 4 (H3K4) methyltransferases (KMTs) and are essential for maintaining gene expression patterns in diverse organisms. Recently, we discovered a novel co-factor, PTIP (Pax Transactivation-domain Interacting Protein), which is encoded by the Paxip1 gene. The PTIP protein co-purifies with the mammalian lysine methyltransferases KMT2B and KMT2C (formerly ALR and MLL3), is broadly expressed, and is essential for embryonic development [10–12]. At least in one case, PTIP is able to recruit the KMT2B complex to a developmental DNA binding protein in a locus specific manner [13]. Loss of PTIP function in the mouse results in gross developmental effects at gastrulation, with reduced levels of global H3K4 di- (me2) and trimethylation (me3) observed [13,14]. In cultured mouse embryonic stem cells, PTIP is needed to maintain pluripotency, Oct4 expression, and normal levels of H3K4 trimethylation [15]. Similarly, in neuronal stem cells, differentiation is abrogated and levels of H3K4 methylation are reduced in tissue specific PTIP knockouts [13]. In mouse embryo fibroblasts, loss of PTIP blocks

Introduction The process of embryonic development determines the differentiated state of all cells by establishing unique gene expression patterns, or signatures, for individual cell types that define their phenotypes. Once a differentiated state is established, it is difficult to erase that epigenetic imprint and reprogram the cell towards a different cell lineage or phenotype. Although reprogramming can be forced by nuclear transplantation [1] or by the expression of Oct4 and accessory factors [2,3], the low efficiency of these processes speaks to the inherent stability of a differentiated cell. Gene expression patterns must be established and maintained by compartmentalizing the genome into active and inactive regions, which is thought to occur through the covalent modifications of DNA and its associated nucleosomes. Such modifications include DNA methylation of CpG islands and methylation, acetylation, and ubiquitination of histone tails, all of which are thought to determine chromatin structure and accessibility [4,5]. This epigenetic code is thus imprinted upon the primary genetic code during embryonic development to help establish cell lineages and restrict fate. The genetics and biochemistry of histone modifications have been well studied in a variety of model organisms and PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

function had not been previously studied in podocytes. Our results demonstrate a maintenance function for PTIP-mediated H3K4 methylation and identify a novel role for Ntrk3 in podocyte foot process patterning.

Author Summary While all cells contain essentially the same genome, adult differentiated cells have specific patterns of gene expression for unique physiological functions. Gene expression depends on specific proteins that activate some genes and repress others so that a stable pattern of expression is maintained. During embryonic development, epigenetic modifications of the genome may compartmentalize the genome into actively expressed or repressed domains through the methylation of specific histone residues on chromatin. We studied a specific pathway of histone H3 lysine 4 methylation by deleting the co-factor PTIP in a differentiated cell type. We then asked whether this epigenetic pathway is still important for maintaining the correct pattern of gene expression. Using the podocyte cells of the glomerulus as a model system, mice that carry deletions of the PTIP protein only in these podocytes show changes in gene expression patterns over time and exhibit a slowly progressing chronic disease phenotype. Chromatin immunoprecipitation showed a loss of PTIP binding and lower H3K4 methylation at the Ntrk3 locus, whose expression was significantly reduced. These data demonstrate the need for maintaining the correct epigenetic pattern in an aging, differentiated cell type and point to modifications in epigenetics as potential disease causing factors.

Results Generation of a Podocyte-Specific Paxip1 Deletion To specifically knockout PTIP protein in fully differentiated mouse podocytes, we utilized both floxed (fl) and conventional null (-) alleles of Paxip1 and a Cre driver strain specific for glomerular podocytes. The Paxip1fl/2:CreNPHS2 mice were crossed to Paxip1fl/fl animals to generate Paxip1fl/fl or Paxip1fl/2 with or without CreNPHS2. The CreNPHS2 mice utilize the NPHS2 promoter to express Cre recombinase only in late developing and mature podocytes [24,25]. The resulting progenies were born in the expected Mendelian ratios and did not show any gross kidney defects during the first 4 weeks of life (data not shown). For simplicity, we will refer to the mice as either PTIP2 (Paxip1fl/2:CreNPHS2; Paxip1fl/fl:CreNPHS2) or PTIP+ (Paxip12/fl, or Paxip1fl/fl). PCR analysis indicated that recombination occurred at the Paxip1 locus in DNAs isolated from kidneys but not in DNAs from tails (Figure 1A). Previous work established that the Paxip1fl allele produces normal levels of protein, but Cre-mediated excision of exon 1 and the promoter region results in complete absence of PTIP protein, essentially creating a null allele [13,15]. The specificity of the Cre driver strain was confirmed by crossing CreNPHS2 mice to the Rosa26-LacZ reporter mice (Figure 1B). In 1 month old kidneys, lacZ expression was restricted to the glomerulus only, indicating efficient Cre mediated excision at this time. Immunostaining for PTIP and the podocyte marker WT1 also confirmed that PTIP protein levels were reduced only in the podocyte cells and not the mesangial or endothelial components of the glomerular tuft (Figure 1C). Previous work showed that a loss of PTIP function results in reduced levels of total H3K4me3 levels in embryos and cultured cells [13â&#x20AC;&#x201C;17]. To test whether podocytes showed reduced H3K4me3, we stained kidney sections with antibodies specific for this modification (Figure 1D). Many podocytes were observed with reduced signal intensities. To quantitate this effect, images were analyzed for signal intensity by integrating a fixed area over the nuclei of both podocytes and other cell types (Figure 1E). Podocytes were co-stained with WT1 antibodies. The ratio of podocyte signal (WT1+) to other cell types (WT12) was calculated by counting at least 6 cells of each type per glomerulus. The ratios from at least 8 glomeruli were averaged for each genotype and shown to decrease by more than 20% in PTIP2 kidneys compared to PTIP+ controls (p,0.01). These data confirmed that the specific deletion of PTIP in the podocytes correlates with a reduction in H3K4me3 in this cell type.

differentiation by inhibiting PPARc and C/EPBa activation and H3K4 methylation at their respective promoters [16]. Similarly, the Drosophila homologue of PTIP is also essential for development, epigenetic control of gene expression, and global histone H3K4 methylation [17]. During cell division, patterns of histone methylation must be inherited by daughter cells such that the cellular phenotype is maintained. For repressive histone methylation marks, such as histone H3 lysine 27, the EED (Embryonic Ectodermal Development) protein is thought to bind and recruit the Polycomb Repressor Complex 2 to replicate and maintain gene silencing after mitotic cell division [18,19]. For highly expressed genes, the KMT2A (MLL1) protein associates with promoter regions on condensed mitotic chromatin and is required to rapidly reactivate such genes after cell division [20]. These data suggest a model whereby histone methylation patterns are replicated during mitosis, but do not address the necessity for maintaining epigenetic modifications in terminally differentiated, non-dividing cells. Furthermore, changes in the expression of epigenetic regulatory genes have been reported in a variety of cancers [21] and disease states [22], but whether these are the cause or the result of disease remains to be determined. To address the necessity of H3K4me3 in a stable non-dividing cell type, we utilized a Podocin-Cre transgenic driver to delete PTIP in the glomerular podocyte, a highly specialized and architecturally distinct cell that establishes the kidney filtration barrier. Podocytes are clinically relevant cells whose properties and expression profiles change in glomerular diseases and in older animals [23]. While the ubiquitous expression of PTIP, its role in H3K4 methylation, and its necessity in development and differentiation are all well established, whether PTIP deletion in terminally differentiated cells can induce changes in the pattern of H3K4me3 and gene expression has not been demonstrated. We show that loss of PTIP results in changes in the transcriptional profile of terminally differentiated podocyte cells, which ultimately leads to a chronic glomerular disease phenotype. Among the most affected is the neurotrophin receptor encoding gene Ntrk3, whose PLoS Genetics | www.plosgenetics.org

Development of a Chronic Glomerular Disease Phenotype Podocytes play a critical role in the establishment and maintenance of the glomerular filtration barrier. Interdigitated podocyte foot processes cover the glomerular basement membrane and form specialized junctions, called slit diaphragms, which create a highly selective barrier that filters small and negatively charged proteins and solutes from the blood to the urinary space. Damage to or loss of podocytes impairs the filtration barrier and results in increased rates of excretion of high molecular weight proteins, such as albumin, into the urine. Thus, we checked mice for proteinuria beginning at 1 month of age (Figure 2A). At 1 month, low levels of albumin were detected in the urine but these were not significantly different between PTIP+ and 2

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

Figure 1. Generation of a Podocyte-Specific Paxip1 Deletion. A) PCR genotyping with primer pairs specific for the excised, null allele indicates Paxip1 excision only in the kidney DNA and only in mice carrying the CreNPHS2 transgene. B) Enzymatic staining for b-galactosidase activity (blue) in kidney sections from 1 month old mice with the indicated genotypes. C) Immunostaining for WT1 (green) and PTIP (red) in glomeruli at 3 months of age show reduced PTIP signals in the WT1 positive cells (arrows) of PTIP2 kidneys compared to PTIP+ control littermates. The overlays were counterstained with DAPI to mark all nuclei. Thus double positives (WT1 and PTIP) are light purple whereas single positives (WT1 only) are green. D) Immunostaining for H3K4me3 and WT1 in kidneys of 3 months old PTIP+ and PTIP2 mice. Note reduced intensity of podocyte cells (arrows) in PTIP2

PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

mice, when compared to other cells on the sections. E) Image analysis of immunostaining for H3K4me3 from 3 month old PTIP+ and PTIP2 mice. The total signal strength was calculated by integrating over a fixed area and the data are expressed as the ratio of podocytes to mesangial and endothelial cell signals. Mean ratios from 6 podocytes and 6 other cell types were calculated from 8 independent samples for each genotype. Error bars are one standard deviation from the mean. The p value was calculated by the students t-test for 2 independent variables. doi:10.1371/journal.pgen.1001142.g001

patterning (Figure 4B, 4C). Transmission electron micrographs at 3 months also revealed that the slit-diaphragms were not evenly spaced and fusion of foot processes was frequent (Figure 4D–4F). By 12 months, the remaining podocytes in the PTIP2 kidneys were broader, flatter and displayed significant fusion or effacement (Figure 4G, 4H), consistent with the high levels of albumin detected in the urine. These data demonstrate that the initial glomerular phenotype in PTIP2 kidneys is due primarily to differences in podocyte foot process morphology, which occurs prior to the loss of cell bodies.

PTIP2 animals. However, by 3 months of age the PTIP2 mice showed significantly higher levels of albumin in the urine and these levels increased further at 6 and 12 months. The urine albumin to creatinine ratio (ACR) provides a quantitative assay that correlates with filtration barrier integrity. No significant differences were observed at 1 month (Figure 2B). However, by 3 and 12 months, ACR were 10 and 30 fold higher respectively in urines of PTIP2 animals compared to PTIP+ mice. Mice that carried the CreNPHS2 transgene in a Paxip1+/+ or a Paxip1fl/+ genetic background did not show any renal abnormalities at 12 months (data not shown), consistent with many published reports that have used this particular Cre driver strain [25–28]. Renal pathology was characterized by light microscopy at 1, 3, and 12 months of age. Standard Masson’s Trichrome and Periodic-Acid-Shiff stainings revealed significant sclerosis and matrix deposition in 12 month old glomeruli from PTIP2 animals (Figure 2C). However, 3 month old kidneys did not show significant differences for most glomerular sections, at the light microscopy level, although evidence of limited matrix expansion could be observed in a small number of glomeruli of PTIP2 kidneys. In 12 month old kidneys, significant interstitial fibrosis and protein filled cysts were also observed (Figure 2D). These are likely to be secondary effects due to the glomerular pathology. Glomerular pathology and increased albuminuria can be the direct result of podocyte death [29]. Thus, we used a variety of markers to characterize the glomerular architecture and the numbers of podocyte cells at various ages to insure that the phenotype of the PTIP2 mice was not just the result of early podocyte cell death. Immunostaining with WT1, Nephrin, and Podocin antibodies enabled us to determine the podocyte numbers, as average per mid-cross section, and to indirectly assess the integrity of the slit diaphragm (Figure 3). The number of WT1 positive podocytes was not significantly different between PTIP+ and PTIP2 glomeruli at 1 or 3 months of age. At 6 months, PTIP2 glomeruli had slightly fewer podocytes and by 12 months, the number of podocytes was half that of the PTIP+ littermates. Immunostainings for podocyte markers such as WT1, Nephrin, and Podocin did not reveal dramatic differences at 1 or 3 months, despite the increase in proteinuria, although some discontinuous staining could be seen with Podocin antibodies in PTIP2 glomeruli (Figure 3B). Consistent with this data, TUNEL staining for apoptosis did not reveal differences between PTIP+ and PTIP2 kidneys at 1 or 3 months of age (data not shown). Thus, the breakdown of the filtration barrier was not due to simple podocyte depletion at these early times. However by 12 months of age, the extensive network of Nephrin staining was partially depleted in PTIP2 glomeruli (Figure 3B). At the light microscopy level, the effects of PTIP loss on glomerular architecture seemed minimal at 3 months of age, yet the levels of albumin in the urine suggested significant functional defects. Thus, we utilized scanning and transmission electron microscopy to characterize the podocytes at the ultra structural level (Figure 4). Scanning electron micrographs revealed disorganized foot processes at 3 months. While PTIP+ podocytes had regularly arrayed tertiary foot-processes that were almost parallel (Figure 4A), the PTIP2 podocyte foot processes were much more irregular and flattened. The parallel pattern of interdigitation was clearly different and resembled a jigsaw puzzle with random PLoS Genetics | www.plosgenetics.org

Alteration of the Gene Expression Program Precedes the Disease Phenotype Alterations in cellular phenotypes could be the result of changes in the transcriptional program of PTIP2 podocytes. Thus, we prepared RNA from glomeruli enriched fractions at 1 month of age, prior to the onset of any significant phenotype, and assayed for gene expression changes by Affymetrix microarrays. We compared glomerular RNA preps from 10 independent PTIP2 animals and 8 PTIP+ littermates at 1 month of age. The data were highly consistent and indicated both gain and loss of gene expression in the PTIP2 kidneys (Table 1 and Table 2). The entire dataset can be accessed at the Gene expression Omnibus (GSE17709). Expression changes were confirmed by quantitative RT-PCR for selected genes (Figure 5). Among the genes increased was Protamine1 (Prm1), which is not normally expressed in podocytes or other somatic cells but is found only in spermatids where it is essential for chromatin condensation and fertility [30,31]. The changes in RNA expression observed were surprising and did not correspond to any common pathways. In fact, the podocyte-specific genes that are known to function in cell viability and slit diaphragm integrity were largely unchanged (Table S1 and Figure 5C). The data suggest that loss of PTIP in podocytes alters the transcriptional program to affect a limited number of genes whose functions in the podocytes have not been previously characterized.

PTIP Deletion Affects Ntrk3 Expression and Histone Methylation Among the most interesting genes whose expression was down regulated in PTIP2 kidneys was the neurotrophic tyrosine kinase receptor type 3 (Ntrk3, formerly called TrkC), whose expression in podocytes had not been previously described. The Ntrk3 gene encodes two proteins that recognize neurotrophin 3 (NT-3) and functions in axon guidance and innervation and in cardiac development [32–34]. Ntrk3 promotes axon outgrowth and guidance, presumably through actin based extension and retraction of cellular processes [35]. Given that podocyte foot processes are also actin based and may require some type of guidance, we examined the role of Ntrk3 further. Quantitative RT-PCR confirmed that Ntrk3 expression was down approximately 10 fold in glomerular preps from PTIP2 compared to PTIP+ animals (Figure 5A). We also examined Ntrk3 levels in kidneys by coimmunostaining kidney sections with Ntrk3, WT1 and Nephrin antibodies (Figure 6). At 3 months of age, Ntrk3 could be seen in glomeruli of PTIP+ kidneys, however the staining intensity in PTIP2 kidneys was severely reduced in almost every glomerulus examined (Figure 6D, 6J). Some slight filamentous staining 4

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

Figure 2. Chronic Glomerular Disease in PTIP2 Kidneys. A) Coomassie blue staining of SDS/PAGE gels of urine samples from PTIP+ and PTIP2 mice at 1 month and 3 months. Mouse albumin (al) is shown as a control. B) Urine albumin to creatinine ratios (ACR) as measured at 1, 3, and 12 months of age in PTIP+ and PTIP2 animals. C) Histological sections from kidneys at 3 and 12 months. Representative glomerular sections were stained with Masson’s Trichrome (3 and 12 months) or Periodic Acid-Shiff (12 months). Significant matrix deposition was observed in 12 months old PTIP2 glomeruli. D) Low power view of a kidney section at 12 months of age shows tubulointerstitial fibrosis, protein filled cysts, and glomerular sclerosis in PTIP2 animals. doi:10.1371/journal.pgen.1001142.g002

mature glomeruli, those located closest to the medullary zone. Podocyte foot processes from Ntrk32/2 mice exhibited disorganized secondary and tertiary processes that crisscrossed randomly over capillary vessels and were poorly interdigitated (Figure 9A9, 9B9). Few sections showed the characteristic spacing indicative of the slit diaphragms at the glomerular basement membranes (Figure 9D9). These data suggest a critical role for Ntrk3 in the fine patterning events of secondary and tertiary foot process formation and interdigitation.

remained in the PTIP2 glomeruli, but the overall intensity was markedly different. In PTIP+ glomeruli, Ntrk3 staining was remarkably similar to Nephrin (Figure 6G–6I). However, Nephrin staining intensity was unaffected in PTIP2 glomeruli even though Ntrk3 was much lower (Figure 6J–6L). The Ntrk3 expression in glomerular preps and its decrease in the PTIP2 kidneys suggested a function in foot process growth, guidance, and/or pattern formation. In order to more directly link PTIP to the Ntrk3 locus, we designed chromatin immunoprecipitation experiments to examine the presence of PTIP and the changes in histone methylation patterns around the transcription initiation site (+1) of Ntrk3 (Figure 7). Chromatin was prepared from whole glomerular preps from PTIP+ and PTIP2 kidneys, which also included mesangial and endothelial cells. Despite the presence of other cell types in the glomerular chromatin, we were able to detect a 5–6 fold decrease in PTIP localization to sequences around the start site of Ntrk3 transcription when comparing PTIP+ to PTIP2 chromatin (Figure 7B). No significant amount of PTIP was detected further upstream (21200), nor did we see a significant difference, between PTIP+ and PTIP2 chromatin, in PTIP localization within the 59 UTR of exon 1 (Figure 7B, P4 site). Clear differences in H3K4me2 were also measured, with an approximately 50–60% decrease in PTIP2 chromatin with primer pairs P2–P4, but not with P1 at 21200 (Figure 7C). Similarly, H3K4me3 levels were also decreased in PTIP2 chromatin at P2–P4 but not at P1 (Figure 7D). We also examined changes in Polycomb mediated epigenetic silencing marks using an antibody against H3K27me3 (Figure 7E), which appeared unchanged at all sites examined. These data demonstrate recruitment of PTIP to the promoter region of Ntrk3 in normal glomeruli.

Discussion In this report, we utilized a conditional deletion to ask whether the PTIP dependent H3K4 methylation function is required in a terminally differentiated cell type, to maintain its differentiated state and its cell-type specific transcriptional program. Using the glomerular podocyte cell as a model, we show that deletion of PTIP results in subtle changes in gene expression patterns that ultimately lead to a slowly progressing disease state. These data support a model in which the gross stability of the differentiated state or podocyte cell survival, at least in the short term, does not depend on the PTIP/KMT complex, as many of the podocyte specific genes examined were unchanged in the absence of PTIP. Rather, the loss of PTIP was more subtle and revealed unexpected changes in a small number of genes and ultimately led to a chronic disease phenotype resembling glomerular sclerosis. Typical characteristics of chronic glomerular disease were present, including microalbuminuria, podocyte foot process fusion or effacement, remodeling of the filtration barrier, and increased extracellular matrix deposition. Methylation of histone H3 at lysine 4 correlates with gene expression and is thought to regulate cellular identity by establishing and maintaining a stable epigenetic state. The PTIP protein is part of an H3K4 methyltransferase complex that includes the mammalian Trithorax homologues KMT2B and/or KMT2C [10,11,13,16]. Previous studies in flies and mice demonstrated reduced H3K4 methylation in Paxip1 mutants and severe early lethal phenotypes. In the mouse, complete loss of PTIP protein results in developmental arrest just after gastrulation [14], a phenotype more severe than any individual mouse KMT2 family gene mutation [12,36,37], whereas a hypomorphic Paxip1 allele is lethal later in development [38]. In flies, maternal and zygotic ptip null embryos are embryonic lethal and fail to express many segmentation genes [17]. In mouse embryonic stem cells, PTIP protein is required for normal levels of H3K4 methylation and for maintaining pluripotency in cell culture [15], whereas in embryonic fibroblasts PTIP is required for adipocyte differentiation [16]. All of these findings suggest that a PTIP H3K4 methyltransferase complex is needed for differentiation of stem cells and progenitor cells in development. However in terminally differentiated cells, the requirement for active H3K4 methylation may be different and the lack of cell division may abrogate the need for de novo methylation. Our results suggest that PTIP must still function in some non-dividing cells, perhaps as part of a maintenance complex, as overall levels of H3K4 methylation were reduced and activation and suppression of a small number of genes was affected.

Ntrk3 Mutants Have Podocyte Foot Process Defects In order to determine if the loss of Ntrk3 alone would impact normal glomerular patterning, we examined homozygous Ntrk3 mutant mice. The Ntrk3 mutants die shortly after birth due to cardiac and neuromuscular defects; however their kidneys had not been studied previously. Therefore, we collected urine and kidney tissue for light and electron microscopy from 3–4 day old Ntrk3 mutants and littermates. At three days post partum, Ntrk3 mutants were small and sickly. Higher levels of albumin could be observed in the urines of Ntrk32/2 pups (Figure 8A), compared to control littermates, although this could be due to delayed or arrested kidney development. Glomerular development was examined in kidney sections of 4 day old newborns (Figure 8B). At this time, nephrons are still undergoing development and glomeruli at the periphery are just beginning to form whereas cortical glomeruli closer to the medulla are already fully functional. The tight junction protein Magi2 specifically localizes to podocyte cell junctions and exhibited altered patterning in Ntrk3 mutant kidneys, with discontinuous staining and excessive looping of the developing tuft. In mature glomeruli, Nephrin staining was reduced and patchy in the Ntrk3 mutants. The number of podocytes did not seem affected in the Ntrk32/2 mice at this time. Ultra structural analysis of Ntrk3 mutant kidneys revealed podocyte patterning defects both by scanning and transmission EM (Figure 9). At 4 days post-partum, we examined the most PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

Figure 3. Podocyte Viability and Glomerular Morphology. A) After immunostaining with WT1 and Nephrin antibodies, podocyte nuclei were counted in mid-cross sections through glomeruli whose vascular and proximal tubular poles were visible. Glomerular surface area for mid-cross sections was measured by morphometry and is expressed in relative units. B) Immunostaining for WT1 (pink) and Nephrin (green) at 3 months of age shows little significant difference between PTIP+ and PTIP2 glomeruli. However, Podocin staining (green, lower panels) appears less and discontinuous in PTIP2 glomeruli. Nuclei were counterstained with DAPI. By 12 months, large regions cleared of Nephrin positive staining were evident within the glomerular tufts of PTIP2 animals. doi:10.1371/journal.pgen.1001142.g003

PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

Figure 4. Ultrastructural Analysis of PTIP2 Kidneys. Podocytes of PTIP2 mice showed progressive foot process disorganization and effacement, as observed by scanning (A–C, G, H) and transmission (D–F, I, J) electron microscopy. Podocyte foot processes of 3-month-old PTIP+ mice were regularly interdigitated (A, D, G), whereas those of age-matched PTIP2 podocytes (B, C, E, F, H) displayed varying degrees of disorganization (B, E) and effacement (C, F). Note that slit diaphragms could still be observed between foot processes during the early stages of disorganization (E, arrows). G–J) In addition to the foot process alterations, capillary loop deformation/enlargement (H, J) and mesangium expansion (J, asterisks) were observed in glomeruli of 12-month-old (G, H) and 3-month-old (I, J) mice analyzed by EM. Scale bars: (A–C) 1 mm; (D–F) 100 nm; (G–J) 2 mm. doi:10.1371/journal.pgen.1001142.g004

The mature podocyte is generally believed to be a non-dividing cell type, as classic cell BrdU labeling experiments do not mark this population over time [39]. However, more recent genetic lineage tracing experiments suggest that there is a population of parietal epithelial cells at the vascular pole of the Bowman’s capsule that can replenish podocytes over time [40,41]. This replacement of podocytes appears slow under normal conditions, but may be especially critical in cases of glomerular injury. In our animal model, we would expect any podocyte replacement to also delete the Paxip1 gene once expression of the Cre driver is activated. Given that we do not see significant loss of podocytes until at least 6 months of age, it may be that alterations in the transcriptional profile are not lethal. Rather, loss of podocytes may be the result of the damaged filtration barrier, the increase in the mesangium, and the general environment of the glomerulus in older mice. Alternatively, if podocyte replacement is accelerated in our model, it may be that by 6 months the ability of parietal cells to replenish the podocyte population is exhausted. In either case, the effects of manipulating the H3K4 methylation pathway is more apparent in older mice, suggesting a critical role for such epigenetic pathways in aging cells and tissues. The changes in gene expression observed in response to PTIP deletion are surprising in that most of the well-characterized podocyte-specific genes appear unaffected. However, changes PLoS Genetics | www.plosgenetics.org

include both activation and suppression of previously uncharacterized genes in the podocytes. Activation of the Prm1 gene in PTIP2 kidneys is unusual as this gene has only been associated with sperm maturation and is thought to encode a unique chromatin binding protein [31,42]. Activation of the Padi4 gene could impact gene expression by deimination of arginines in the histone H3 tail, which prevents methylation [43]. The impact of increased Padi4 is likely to be complex as arginine methylation can correlate with gene activation or repression, depending on the context and specific residues. The most compelling gene affected in PTIP2 podocytes was Ntrk3, whose expression in the glomerulus had not been previously characterized. The reduction of Ntrk3 expression in PTIP2 kidneys and the phenotype of Ntrk32/2 newborn kidneys suggest that this receptor is critical for tertiary foot process pattern formation. The podocyte is a highly specialized cell with a complex network of processes that cover the glomerular basement membrane. The large primary processes are microtubule containing structures, whereas the tertiary, interdigitated foot processes contain actin microfilaments [44]. Adjacent foot processes are connected through a specialized junctional complex, called the slit diaphragm, which is essential for maintaining a functional filtration pore. Some of the essential proteins in the slit-diaphragm, such as Nephrin, Podocin, and Neph1 are well characterized and 8

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

Table 1. Genes Up-Regulated in PTIP2 Podocyte.

Probe

Symbol

Description

UniGene

p-value

Fold Change*

1439379

Prm1

protamine 1

Mm.42733

6.38

1418398

Tspan32

tetraspanin 32

Mm.28172

2.92

1422760

Padi4

peptidyl arginine deiminase, type IV

Mm.250358

0.001

1.9

1433744

Lrtm2

leucine-rich repeats and transmembrane domains 2

Mm.121498

1.89

1433529

E430002G05Rik

RIKEN cDNA E430002G05 gene

Mm.28649

1.77

1436329

Egr3

early growth response 3

Mm.103737,

1.74

1449071

Myl7

myosin, light polypeptide 7, regulatory

Mm.46514

0.001

1.68

1419527

Comp

cartilage oligomeric matrix protein

Mm.45071

1.58

1419487

Mybph

myosin binding protein H

Mm.379067

0.001

1.3

1431991

2410004P03Rik

RIKEN cDNA 2410004P03 gene

Mm.159048

1.26

1430062

Hhipl1

hedgehog interacting protein-like 1

Mm.36423

0.004

1.19

1453228

Stx11

syntaxin 11

Mm.248648

0.003

1.16

1416077

Adm

adrenomedullin

Mm.1408

1.14

1457780

Stx11

syntaxin 11

Mm.248648

0.001

1.1

1434984

6330514A18Rik

RIKEN cDNA 6330514A18 gene

Mm.17613

0.004

1.08

1453152

Mamdc2

MAM domain containing 2

Mm.50841

0.012

1.05

1435830

5430435G22Rik

RIKEN cDNA 5430435G22 gene

Mm.44508

0.002

1.01

1439761

D830026I12Rik

RIKEN cDNA D830026I12 gene

Mm.136046

0.008

*log2 scale. doi:10.1371/journal.pgen.1001142.t001

Table 2. Genes Down-Regulated in PTIP2 Podocytes.

Probe

Symbol

Description

UniGene

p-value

Fold Change*

1425425

Wif1

Wnt inhibitory factor 1

Mm.32831

24.37

1441491

A330068G13Rik

RIKEN cDNA A330068G13 gene

Mm.227543

23.68

1433825

Ntrk3

neurotrophic tyrosine kinase, receptor, type 3

Mm.33496

23.09

1446622

A330068G13Rik

RIKEN cDNA A330068G13 gene

Mm.227543

22.14

1452779

3110006E14Rik

RIKEN cDNA 3110006E14 gene

Mm.23960

21.57

1452416

Il6ra

interleukin 6 receptor, alpha

Mm.2856

21.55

1420903

St6galnac3

Mm.440929

21.53

1450309

Astn2

astrotactin 2

Mm.445312

21.53

1433939

Aff3

AF4/FMR2 family, member 3

Mm.336679

21.53

1437403

Samd5

sterile alpha motif domain containing 5

Mm.101115

0.001

21.48

1429896

5830408B19Rik

RIKEN cDNA 5830408B19 gene

Mm.291322

21.35

1455296

Adcy5

adenylate cyclase 5

Mm.41137

21.3

1431946

Necab3

N-terminal EF-hand calcium binding protein 3

Mm.143748

21.29

1434777

Mycl1

v-myc myelocytomatosis viral oncogene homolog 1

Mm.1055

21.26

1419139

Gdf5

growth differentiation factor 5

Mm.4744

0.001

21.25

1441559

LOC627626

similar to CG11212-PA

Mm.390999

0.003

21.25

1441667

Smyd1

SET and MYND domain containing 1

Mm.234274

21.23

1423561

Nell2

NEL-like 2 (chicken)

Mm.3959

0.016

21.18

1450501

Itga2

integrin alpha 2

Mm.5007

21.17

1435832

Lrrc4

leucine rich repeat containing 4

Mm.443660

21.11

1455188

Ephb1

Eph receptor B1

Mm.22897

0.046

21.11

1455888

Lingo2

leucine rich repeat and Ig domain containing 2

Mm.132507

0.007

21.05

1426960

Fa2h

fatty acid 2-hydroxylase

Mm.41083

21.04

1453841

2310050P20Rik

RIKEN cDNA 2310050P20 gene

0.033

21.01

1421207

Lif

leukemia inhibitory factor

Mm.4964

*log2 scale. doi:10.1371/journal.pgen.1001142.t002

PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

other genes whose functions are not well understood are also impacted. Histone methylation by Trithorax or Polycomb complexes can imprint positive and negative epigenetic marks on chromatin during development. More recently, histone methyltransferases have been associated with cancer and other disease states. However, in many cases it is not clear whether changes in the expression of epigenetic modifiers are the cause or the result of disease progression. The results presented here suggest that mutations in an epigenetic pathway, which result in alterations of H3K4 methylation patterns, can lead to a chronic disease through subtle changes in gene expression patterns. This implies a direct function for HMTs in maintaining gene expression and the differentiated state in healthy organisms.

Methods Animals Mice carrying the Paxip1 null (Paxip12) and floxed (Paxip1fl) alleles were previously described and genotyped as indicated [14,53]. To obtain the specific deletion of the Paxip1fl allele in glomerular podocytes, these mice were crossed with the previously characterized 2.5P-Cre mice [24,25], which express the Cre recombinase under the control of the human NPHS2 promoter (CreNPHS2). Among the next generations, mice carrying the Cre allele (Paxip1fl/fl:CreNPHS2 and Paxip1fl/2:CreNPHS2 mice) were considered as conditional null mutants (PTIP2), whereas littermates that did not express the Cre recombinase were used as controls (PTIP+). All animal procedures were approved by the University Committee on Use and Care of Animals (UCUCA) of the University of Michigan and performed in compliance with ULAM recommendations.

Antibodies Rabbit polyclonal antibodies used to detect Nephrin (1:1000) and Podocin (1:500) were kindly provided by L.B. Holzman (University of Pennsylvania, Philadelphia, PA). Chicken anti-PTIP was described previously [54]. Additional antibodies were commercially available: mouse clone 6F-H2 anti-WT1 (1:1000, DAKO, Carpinteria, CA), anti-H3K4me3 and anti-H3K27me3 (AbCam, Cambridge, MA), anti-Magi2 (Sigma-Aldrich, St. Louis, MO), anti-Ntrk3 (AF1404, R & D Systems, Minneapolis, MN), Alexa Fluor 488 F(ab9)2 fragment of goat anti-rabbit IgG, Alexa Fluor 568 F(ab9)2 fragment of goat anti-mouse IgG, Alexa Fluor 488 donkey anti-goat IgG (1:500; Molecular Probes, Life Technologies, Carlsbad, CA).

Figure 5. Gene Expression in the Glomerulus. Real-time qRT-PCR for the indicated genes was performed on total RNA isolated from glomerular preparations. A) Confirmation of two genes that are downregulated in PTIP2 (black) kidneys compared to controls PTIP+ (open) kidneys. B) Confirmation of two genes that are up-regulated in PTIP2 kidneys compared to controls. C) Expression levels of podocyte marker genes in PTIP+ and PTIP2 glomerular preparations. doi:10.1371/journal.pgen.1001142.g005

mutations are associated with severe nephrotic syndromes [45]. Yet, how foot process outgrowth is regulated and maintained is not clear. Our data suggests that Ntrk3, and by inference its ligand NTâ&#x20AC;&#x201C;3, may be important for foot process growth and patterning. NT-3 is known to promote neuronal axon guidance by stimulating actin polymerization and lamellipodia formation [46,47]. In cultured neuronal cells, NT-3 promotes localization of b-actin mRNA to the growth cones to stimulate motility and chemotaxis [48,49]. Podocytes express many proteins known to function in neurite outgrowth, such as semaphorins, neuropilins, and ephrins. A recent report even describes the release and up-take of glutamate containing synaptic-like vesicles by podocytes [50]. Furthermore, foot processes are dynamic and can retract quickly in response to polyamines like protamine sulfate [51,52]. This raises the possibility that sensing mechanisms are required for rapid actin dynamics; such mechanisms may be common to both podocytes and neurons. Still, reduction of Ntrk3 alone is unlikely to cause the phenotypic changes in PTIP2 podocytes over time, as PLoS Genetics | www.plosgenetics.org

Urine Collection and Analysis Mice had access to a standard breeder chow (Purina 5008) and water ad libitum. Urine was collected early in the afternoon for three consecutive days from individual mice at 1, 3, 6 and 12 months of age and stored frozen until use. After thawing, 2 mL urine was run on a SDS-PAGE and stained with Coomassie Blue to test for the presence of proteins/albumin, using recombinant mouse albumin (Sigma-Aldrich, St. Louis, MO) as a control. Quantitative assessment of urine albumin and creatinine concentrations were determined by ELISA using the Albuwell M and Creatinine Companion kits (Exocell Inc., Philadelphia, PA).

Specimen Preparation for Microcopy Analyses Mice at 1, 3, 6, and 12 months of age were sacrificed and their kidneys were perfused, fixed, and processed for histology, indirect fluorescence and electron microcopy analyses. Briefly, mice were anesthetized by intraperitoneal injection of 40 mg/kg sodium 10

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

Figure 6. Ntrk3 in the Glomerulus. Fresh frozen tissues were sectioned and fixed in methanol followed by immunostaining with goat anti-Ntrk3, rabbit anti-WT1, or rabbit anti-Nephrin, as indicated. PTIP+ sections (A–C, G–I) showed strong Ntrk3 staining in all glomeruli, in a pattern similar to Nephrin. The PTIP2 kidney sections (D–F, J–L) showed much lower levels of Ntrk3 protein in glomeruli. All micrographs were taken at manually set, equal exposures. Right panels (C, F, I, L) are overlays of Ntrk3 and WT1 or Ntrk3 and Nephrin and are counterstained with DAPI (blue) to visualize all cell nuclei. doi:10.1371/journal.pgen.1001142.g006

PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

secondary fluorescent antibodies and DAPI in PBS, 0.1% Triton, 2% goat serum for 1 hour in the dark at room temperature. The sections were washed again and mounted in Mowiol. Stained and fluorescent-labeled sections were analyzed under a Nikon ES800 microscope. Micrographs were taken with a digital spot camera, using equivalent exposure times among sections. For Ntrk3 staining, fresh frozen sections were dried, fixed in methanol at 220uC and washed in PBS, 0.1% Tween 20 before incubation with anti-Ntrk3 antibodies at 1 mg/ml. For quantitation of immunofluorescent signals, ImageJ 1.42 was utilized. H3K4me3 stained sections were digitally captured and light intensity measured by placing a fixed size circular area over the nuclei of cells and summing all pixels over the given area. At least 6 podocytes and 6 control cells, either mesangial or endothelial, were measured for each of 8 glomerular tufts (at least 48 podocytes and 48 other cells for each genotype). The average signal intensity was then expressed as a ratio of podocyte intensity to non-podocyte cell intensity for each of the glomerular micrographs taken. For Cre activity detection, the Rosa26-lacZ reporter strain was used [56]. Mice carrying CreNPHS2 and Paxip1fl/fl were crossed to Rosa26-stop-lacZ:Paxip1fl/+ to generate Paxip1fl/fl:CreNPHS2: Rosa26-lacZ animals. Kidneys were excised at 1 month of age and stained for b-galactosidase activity as described [57].

Scanning and Transmission Electron Microscopy Longitudinal slices of kidneys from PTIP+ and PTIP2 mice fixed with 2.5% glutaraldehyde in 0.1M Sorensen’s buffer (pH 7.2) for 2 hours at room temperature were processed for scanning electron microscopy following standard procedures. Briefly, after several washes with the Sorensen’s buffer alone, the samples were dehydrated by successive washes in graded ethanol solutions, critical point dried, mounted on a stub, sputter coated with goldpalladium, and examined under an AMRAY 1910 field emission scanning electron microscope. Pieces of the kidney cortex (1 mm3), fixed with 2.5% glutaraldehyde in Sorenson’s buffer for 2 hours at room temperature, were processed for transmission electron microscopy following standard procedures. They were embedded in PolyBed 812 resin (Polysciences Inc.), cut into 1-micron slices and stained with toluidine blue. Sample areas were selected based on the presence of glomeruli and cut into ultra-thin sections for analysis under a Philips CM-100 transmission electron microscope. The selected SEM and TEM images are representative of at least 10 different glomeruli per kidney.

Figure 7. Chromatin Immunoprecipitation (ChIP) at the Ntrk3 Locus. A) Schematic of the sequences surrounding the first Ntrk3 exon , the transcription start site (+1) and the ATG start codon (+627) are indicated. The positions of the four primers used for PCR analyses of the immunoprecipitated chromatin are shown. B) ChIP experiment using anti-PTIP antibodies and chromatin from whole glomeruli enriched from PTIP+ (open bars) and PTIP2 (grey bars) kidneys. C) ChIP experiment as in B but with anti-H3K4me2 antibodies. D) ChIP experiment as in B but using anti-H3K4me3 antibodies. E) ChIP experiment as in B but using anti-H3K27me3 antibodies. For B–E, all values are expressed as the mean of 3 replicates; error bars are one standard deviation. Statistically significant differences are indicated (*P,0.05). doi:10.1371/journal.pgen.1001142.g007

pentobarbital and prepared for systemic perfusion. A saline solution was first injected through the abdominal aorta to the entire mouse body at a pressure of approximately 70 mmHg as previously described [55]. As soon as the general bloodstream had been cleared, a solution of 4% paraformaldehyde in PBS was substituted. It was left to perfuse at the same flow conditions for approximately 10 minutes. Kidneys were removed, decapsulated, cut into pieces, and incubated for 2 additional hours in the appropriate fixative solution before being processed for histology, indirect immunofluorescence, and electron microscopy.

Isolation of Mouse Glomeruli Glomeruli were isolated from the kidneys of individual mice by sieving as described [58]. Briefly, 1 month-old mice were sacrificed by CO2 inhalation and kidneys were removed. After decapsulation, the kidneys were finely minced on ice and passed sequentially through nylon meshes of 90 and 41 microns (Sefar Filtration Inc., Depew, NY). The glomeruli-enriched fraction (GEF) was retained on top of the 41-micron mesh, while kidney tubules were flushed through. RNA was isolated directly from the mesh.

Histology and Indirect Immunofluorescence Kidneys were fixed in 4% paraformaldehyde, embedded in paraffin, sectioned at 5 microns, and stained with Periodic Acid Shiff or Masson Trichrome. For immunofluorescence analyses with Nephrin, PTIP, WT1 and Magi2, sections were dewaxed, rehydrated, and microwaved for 10 minutes in a citric acid-based antigen unmasking solution (Vector Laboratories, Burlingame, CA). Sections were permeabilized with 0.3% Triton X-100 in PBS and blocked with 10% goat serum in PBS. Primary antibodies were incubated overnight at 4uC in PBS, 0.1% Triton, 2% goat serum. Sections were washed twice and incubated with the PLoS Genetics | www.plosgenetics.org

RNA Extraction and Reverse Transcription Total RNA was extracted from the GEF of individual 1-monthold mice using the RNeasy Tissue Micro Kit (Qiagen, Valencia, CA) following the manufacturer’s instructions. RNA concentration and purity were determined by nanodrop analysis on an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). Using the Ovation RNA Amplification System V2 (NuGEN Technologies, San Carlos, CA), 500 ng total RNA was reversed transcribed and linearly amplified into single-stranded cDNA, 12

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

Figure 8. Analysis of Ntrk3 Mutant Kidneys. A) Comassie stained SDS/PAGE gels of urine collected from 4 day old Ntrk32/2 and wild-type littermates. B) Immunostaining of 4 day old kidneys from wild-type and Ntrk32/2 kidneys as indicated. From left to right, glomeruli are shown at increasingly older stages of development. Note discontinuous Magi2 staining and reduced Nephrin staining in older glomeruli of Ntrk32/2 kidneys compared to control littermates. doi:10.1371/journal.pgen.1001142.g008

which concentration and purity were determined by nanodrop analysis on an Agilent Bioanalyzer 2100 (Agilent Technologies).

Microarray and Real-Time qPCR Analyses Microarray analyses were done by the University of Michigan Comprehensive Cancer Center (UMCCC) Affymetrix and Microarray Core Facility. The FL-Ovation cDNA Biotin Module V2 kit (NuGEN Technologies, San Carlos, CA) was used to produce biotin-labeled cRNA, which was then fragmented and hybridized to a Mouse 430 2.0 Affymetrix GeneChip 39 expression array (Affymetrix, Santa Clara, CA). Array hybridization, washes, staining, and scanning procedures were carried out according to standard Affymetrix protocols. Expression data were normalized by the robust multiarray average (RMA) method and fitted to weighted linear models in R, using the affy and limma packages of Bioconductor, respectively [59,60]. Only probe sets with a variance over all samples superior to 0.1, a p-value inferior or equal to 0.05 after adjustment for multiplicity using the false discovery rate [61], and a minimum 2-fold difference in expression were selected for the analysis. The complete data set is available from the Gene Expression Omnibus database (accession number GSE17709). Microarray data were confirmed by real-time quantitative PCR analysis. 25â&#x20AC;&#x201C;50 ng single-stranded cDNA was amplified in triplicate in a 384-well plate, using the 7900HT Fast Real Time PCR system (Applied Biosystems, Foster City, CA) and expression levels of selected genes was determined by SYBR Green or TaqMan assays (Applied Biosystems). PCR primers pairs and TaqMan probes used in this study are presented in Table S2.

Chromatin Immunoprecipitation Glomeruli were isolated from 6 PTIP+ and 6 PTIP2 kidneys by sieving as described above. Glomeruli were resuspended in 1 ml PBS and cross linked with 1% formaldehyde for 10 minutes with rocking at room temperature. Chromatin preparation, immunoprecipitation, and PCR analysis was essentially as described previously [13]. Primers pairs for the Ntrk3 locus were as follows: P1, 59- CAATGTATTTTGCTTCCTTGCC, 59- AAGAAAGG-

Figure 9. Ultrastructural Analysis of Ntrk3 Mutant Kidneys. Kidneys from Ntrk32/2 (9) and control littermates at 4 days of age were examined by scanning (A, B) and transmission electron microscopy (C, D). A, B) Note the disorganized patterning and irregularly shaped primary and secondary foot processes. C, D) Note the fusion of foot processes and the lack of well-spaced slit diaphragms in Ntrk3 mutants in D. Scale bars are 1 mm in A and B, 2 mm in C, and 500 nm in D. doi:10.1371/journal.pgen.1001142.g009

PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

GTTAGGGGAATCCG; P2, 59- AACCCGTGCGTTTCGTAAGG, 59- GGAGGAAGGAGGAGAAGGAAGATG; P3, 59GCATCTTCCTTCTCCTCCTTCCTC, 59- AAGTCACCAAGTCCCACCTCCTAG; P4, 59- TTTGCCTTCCCACCGTCTGTTG, 59- TGCCTTTGAAACGCCGAAC.

Acknowledgments We thank L. Holzman for antibodies and critical discussion, A. Soofi for maintaining the mouse colony, C. Johnson and J. Washburn of the University of Michigan Cancer Center Micrarray core for the microarray analysis, D. Sorenson and C. Edwards for help with the electron microscopy and image analysis, and E. Hughes and T. Saunders of the University of Michigan transgenic Animal Core for help generating the Paxip1 floxed allele.

Supporting Information Table S1 Podocyte-specific genes that are unchanged after PTIP deletion. Found at: doi:10.1371/journal.pgen.1001142.s001 (0.03 MB DOC)

Author Contributions Conceived and designed the experiments: GML DK GRD. Performed the experiments: GML SRP DK LT GRD. Analyzed the data: GML SRP GRD. Contributed reagents/materials/analysis tools: LT. Wrote the paper: GRD.

Table S2 Quantitative RT-PCR primer sets and probes. Found at: doi:10.1371/journal.pgen.1001142.s002 (0.02 MB DOC)

References 25. Moeller MJ, Sanden SK, Soofi A, Wiggins RC, Holzman LB (2003) Podocytespecific expression of cre recombinase in transgenic mice. Genesis 35: 39–42. 26. Ho J, Ng KH, Rosen S, Dostal A, Gregory RI, et al. (2008) Podocyte-specific loss of functional microRNAs leads to rapid glomerular and tubular injury. J Am Soc Nephrol 19: 2069–2075. 27. Suleiman H, Heudobler D, Raschta AS, Zhao Y, Zhao Q, et al. (2007) The podocyte-specific inactivation of Lmx1b, Ldb1 and E2a yields new insight into a transcriptional network in podocytes. Dev Biol 304: 701–712. 28. El-Aouni C, Herbach N, Blattner SM, Henger A, Rastaldi MP, et al. (2006) Podocyte-specific deletion of integrin-linked kinase results in severe glomerular basement membrane alterations and progressive glomerulosclerosis. J Am Soc Nephrol 17: 1334–1344. 29. Wharram BL, Goyal M, Wiggins JE, Sanden SK, Hussain S, et al. (2005) Podocyte depletion causes glomerulosclerosis: diphtheria toxin-induced podocyte depletion in rats expressing human diphtheria toxin receptor transgene. J Am Soc Nephrol 16: 2941–2952. 30. Steger K, Pauls K, Klonisch T, Franke FE, Bergmann M (2000) Expression of protamine-1 and -2 mRNA during human spermiogenesis. Mol Hum Reprod 6: 219–225. 31. Cho C, Willis WD, Goulding EH, Jung-Ha H, Choi YC, et al. (2001) Haploinsufficiency of protamine-1 or -2 causes infertility in mice. Nat Genet 28: 82–86. 32. Genc B, Ozdinler PH, Mendoza AE, Erzurumlu RS (2004) A chemoattractant role for NT-3 in proprioceptive axon guidance. PLoS Biol 2: e403. doi:10.1371/ journal.pbio.0020403. 33. Donovan MJ, Hahn R, Tessarollo L, Hempstead BL (1996) Identification of an essential nonneuronal function of neurotrophin 3 in mammalian cardiac development. Nat Genet 14: 210–213. 34. Tessarollo L, Tsoulfas P, Donovan MJ, Palko ME, Blair-Flynn J, et al. (1997) Targeted deletion of all isoforms of the trkC gene suggests the use of alternate receptors by its ligand neurotrophin-3 in neuronal development and implicates trkC in normal cardiogenesis. Proc Natl Acad Sci U S A 94: 14776–14781. 35. Paves H, Saarma M (1997) Neurotrophins as in vitro growth cone guidance molecules for embryonic sensory neurons. Cell Tissue Res 290: 285–297. 36. Milne TA, Briggs SD, Brock HW, Martin ME, Gibbs D, et al. (2002) MLL targets SET domain methyltransferase activity to Hox gene promoters. Mol Cell 10: 1107–1117. 37. Glaser S, Schaft J, Lubitz S, Vintersten K, van der Hoeven F, et al. (2006) Multiple epigenetic maintenance factors implicated by the loss of Mll2 in mouse development. Development 133: 1423–1432. 38. Mu W, Wang W, Schimenti JC (2008) An allelic series uncovers novel roles of the BRCT domain-containing protein PTIP in mouse embryonic vascular development. Mol Cell Biol 28: 6439–6451. 39. Pabst R, Sterzel RB (1983) Cell renewal of glomerular cell types in normal rats. An autoradiographic analysis. Kidney Int 24: 626–631. 40. Appel D, Kershaw DB, Smeets B, Yuan G, Fuss A, et al. (2009) Recruitment of podocytes from glomerular parietal epithelial cells. J Am Soc Nephrol 20: 333–343. 41. Ronconi E, Sagrinati C, Angelotti ML, Lazzeri E, Mazzinghi B, et al. (2009) Regeneration of glomerular podocytes by human renal progenitors. J Am Soc Nephrol 20: 322–332. 42. Wykes SM, Krawetz SA (2003) The structural organization of sperm chromatin. J Biol Chem 278: 29471–29477. 43. Cuthbert GL, Daujat S, Snowden AW, Erdjument-Bromage H, Hagiwara T, et al. (2004) Histone deimination antagonizes arginine methylation. Cell 118: 545–553. 44. Faul C, Asanuma K, Yanagida-Asanuma E, Kim K, Mundel P (2007) Actin up: regulation of podocyte structure and function by components of the actin cytoskeleton. Trends Cell Biol 17: 428–437. 45. Patrakka J, Tryggvason K (2007) Nephrin–a unique structural and signaling protein of the kidney filter. Trends Mol Med 13: 396–403.

1. Campbell KH, McWhir J, Ritchie WA, Wilmut I (1996) Sheep cloned by nuclear transfer from a cultured cell line. Nature 380: 64–66. 2. Okita K, Ichisaka T, Yamanaka S (2007) Generation of germline-competent induced pluripotent stem cells. Nature 448: 313–317. 3. Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, et al. (2007) In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448: 318–324. 4. Berger SL (2007) The complex language of chromatin regulation during transcription. Nature 447: 407–412. 5. Bernstein BE, Meissner A, Lander ES (2007) The mammalian epigenome. Cell 128: 669–681. 6. Ringrose L, Paro R (2004) Epigenetic regulation of cellular memory by the Polycomb and Trithorax group proteins. Annu Rev Genet 38: 413–443. 7. Schuettengruber B, Chourrout D, Vervoort M, Leblanc B, Cavalli G (2007) Genome regulation by polycomb and trithorax proteins. Cell 128: 735–745. 8. Ringrose L, Paro R (2007) Polycomb/Trithorax response elements and epigenetic memory of cell identity. Development 134: 223–232. 9. Schwartz YB, Pirrotta V (2007) Polycomb silencing mechanisms and the management of genomic programmes. Nat Rev Genet 8: 9–22. 10. Issaeva I, Zonis Y, Rozovskaia T, Orlovsky K, Croce CM, et al. (2007) Knockdown of ALR (MLL2) reveals ALR target genes and leads to alterations in cell adhesion and growth. Mol Cell Biol 27: 1889–1903. 11. Cho YW, Hong T, Hong S, Guo H, Yu H, et al. (2007) PTIP associates with MLL3- and MLL4-containing histone H3 lysine 4 methyltransferase complex. J Biol Chem. 12. Lee J, Saha PK, Yang QH, Lee S, Park JY, et al. (2008) Targeted inactivation of MLL3 histone H3-Lys-4 methyltransferase activity in the mouse reveals vital roles for MLL3 in adipogenesis. Proc Natl Acad Sci U S A 105: 19229–19234. 13. Patel SR, Kim D, Levitan I, Dressler GR (2007) The BRCT-Domain Containing Protein PTIP Links PAX2 to a Histone H3, Lysine 4 Methyltransferase Complex. Dev Cell 13: 580–592. 14. Cho EA, Prindle MJ, Dressler GR (2003) BRCT domain-containing protein PTIP is essential for progression through mitosis. Mol Cell Biol 23: 1666–1673. 15. Kim D, Patel SR, Xiao H, Dressler GR (2009) The role of PTIP in maintaining embryonic stem cell pluripotency. Stem Cells 27: 1516–1523. 16. Cho YW, Hong S, Jin Q, Wang L, Lee JE, et al. (2009) Histone methylation regulator PTIP is required for PPARgamma and C/EBPalpha expression and adipogenesis. Cell Metab 10: 27–39. 17. Fang M, Ren H, Liu J, Cadigan KM, Patel SR, et al. (2009) Drosophila ptip is essential for anterior/posterior patterning in development and interacts with the PcG and trxG pathways. Development 136: 1929–1938. 18. Hansen KH, Bracken AP, Pasini D, Dietrich N, Gehani SS, et al. (2008) A model for transmission of the H3K27me3 epigenetic mark. Nat Cell Biol 10: 1291–1300. 19. Margueron R, Justin N, Ohno K, Sharpe ML, Son J, et al. (2009) Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461: 762–767. 20. Blobel GA, Kadauke S, Wang E, Lau AW, Zuber J, et al. (2009) A reconfigured pattern of MLL occupancy within mitotic chromatin promotes rapid transcriptional reactivation following mitotic exit. Mol Cell 36: 970–983. 21. Chi P, Allis CD, Wang GG. Covalent histone modifications - miswritten, misinterpreted and mis-erased in human cancers. Nat Rev Cancer 10: 457–469. 22. Gluckman PD, Hanson MA, Buklijas T, Low FM, Beedle AS (2009) Epigenetic mechanisms that underpin metabolic and cardiovascular diseases. Nat Rev Endocrinol 5: 401–408. 23. Wiggins RC (2007) The spectrum of podocytopathies: a unifying view of glomerular diseases. Kidney Int 71: 1205–1214. 24. Moeller MJ, Sanden SK, Soofi A, Wiggins RC, Holzman LB (2002) Two Gene Fragments that Direct Podocyte-Specific Expression in Transgenic Mice. J Am Soc Nephrol 13: 1561–1567.

PLoS Genetics | www.plosgenetics.org

October 2010 | Volume 6 | Issue 10 | e1001142

H3K4 Methylation and Chronic Glomerular Disease

46. Castellani V, Bolz J (1999) Opposing roles for neurotrophin-3 in targeting and collateral formation of distinct sets of developing cortical neurons. Development 126: 3335–3345. 47. Tessarollo L, Coppola V, Fritzsch B (2004) NT-3 replacement with brainderived neurotrophic factor redirects vestibular nerve fibers to the cochlea. J Neurosci 24: 2575–2584. 48. Zhang HL, Eom T, Oleynikov Y, Shenoy SM, Liebelt DA, et al. (2001) Neurotrophin-induced transport of a beta-actin mRNP complex increases betaactin levels and stimulates growth cone motility. Neuron 31: 261–275. 49. Zhang HL, Singer RH, Bassell GJ (1999) Neurotrophin regulation of beta-actin mRNA and protein localization within growth cones. J Cell Biol 147: 59–70. 50. Rastaldi MP, Armelloni S, Berra S, Calvaresi N, Corbelli A, et al. (2006) Glomerular podocytes contain neuron-like functional synaptic vesicles. Faseb J 20: 976–978. 51. Kerjaschki D (1978) Polycation-induced dislocation of slit diaphragms and formation of cell junctions in rat kidney glomeruli: the effects of low temperature, divalent cations, colchicine, and cytochalasin B. Lab Invest 39: 430–440. 52. Kurihara H, Anderson JM, Kerjaschki D, Farquhar MG (1992) The altered glomerular filtration slits seen in puromycin aminonucleoside nephrosis and protamine sulfate-treated rats contain the tight junction protein ZO-1. Am J Pathol 141: 805–816. 53. Kim D, Wang M, Cai Q, Brooks H, Dressler GR (2007) Pax transactivationdomain interacting protein is required for urine concentration and osmotolerance in collecting duct epithelia. J Am Soc Nephrol 18: 1458–1465.

PLoS Genetics | www.plosgenetics.org

54. Lechner MS, Levitan I, Dressler GR (2000) PTIP, a novel BRCT domaincontaining protein interacts with Pax2 and is associated with active chromatin. Nucleic Acids Res 28: 2741–2751. 55. Verma R, Wharram B, Kovari I, Kunkel R, Nihalani D, et al. (2003) Fyn binds to and phosphorylates the kidney slit diaphragm component Nephrin. J Biol Chem 278: 20716–20723. 56. Soriano P (1999) Generalized lacZ expression with the ROSA26 Cre reporter strain. Nat Genet 21: 70–71. 57. Kim D, Dressler GR (2005) Nephrogenic factors promote differentiation of mouse embryonic stem cells into renal epithelia. J Am Soc Nephrol 16: 3527–3534. 58. Salant DJ, Darby C, Couser WG (1980) Experimental membranous glomerulonephritis in rats. Quantitative studies of glomerular immune deposit formation in isolated glomeruli and whole animals. J Clin Invest 66: 71–81. 59. Irizarry RA, Wu Z, Jaffee HA (2006) Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22: 789–794. 60. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3. 61. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical approach to multiple testing. J Royal Stat Soc 57: 289–300.

October 2010 | Volume 6 | Issue 10 | e1001142

The IG-DMR and the MEG3-DMR at Human Chromosome 14q32.2: Hierarchical Interaction and Distinct Functional Properties as Imprinting Control Centers Masayo Kagami1, Maureen J. O’Sullivan2, Andrew J. Green3,4, Yoshiyuki Watabe5, Osamu Arisaka5, Nobuhide Masawa6, Kentarou Matsuoka7, Maki Fukami1, Keiko Matsubara1, Fumiko Kato1, Anne C. Ferguson-Smith8, Tsutomu Ogata1* 1 Department of Endocrinology and Metabolism, National Research Institute for Child Health and Development, Tokyo, Japan, 2 Department of Pathology, School of Medicine, Our Lady’s Children’s Hospital, Trinity College, Dublin, Ireland, 3 National Center for Medical Genetics, University College Dublin, Our Lady’s Hospital, Dublin, Ireland, 4 School of Medicine and Medical Science, University College, Dublin, Ireland, 5 Department of Pediatrics, Dokkyo University School of Medicine, Tochigi, Japan, 6 Department of Pathology, Dokkyo University School of Medicine, Tochigi, Japan, 7 Department of Pathology, National Center for Child Health and Development, Tokyo, Japan, 8 Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom

Abstract Human chromosome 14q32.2 harbors the germline-derived primary DLK1-MEG3 intergenic differentially methylated region (IG-DMR) and the postfertilization-derived secondary MEG3-DMR, together with multiple imprinted genes. Although previous studies in cases with microdeletions and epimutations affecting both DMRs and paternal/maternal uniparental disomy 14-like phenotypes argue for a critical regulatory function of the two DMRs for the 14q32.2 imprinted region, the precise role of the individual DMR remains to be clarified. We studied an infant with upd(14)pat body and placental phenotypes and a heterozygous microdeletion involving the IG-DMR alone (patient 1) and a neonate with upd(14)pat body, but no placental phenotype and a heterozygous microdeletion involving the MEG3-DMR alone (patient 2). The results generated from the analysis of these two patients imply that the IG-DMR and the MEG3-DMR function as imprinting control centers in the placenta and the body, respectively, with a hierarchical interaction for the methylation pattern in the body governed by the IG-DMR. To our knowledge, this is the first study demonstrating an essential long-range imprinting regulatory function for the secondary DMR. Citation: Kagami M, O’Sullivan MJ, Green AJ, Watabe Y, Arisaka O, et al. (2010) The IG-DMR and the MEG3-DMR at Human Chromosome 14q32.2: Hierarchical Interaction and Distinct Functional Properties as Imprinting Control Centers. PLoS Genet 6(6): e1000992. doi:10.1371/journal.pgen.1000992 Editor: Wolf Reik, The Babraham Institute, United Kingdom Received December 29, 2009; Accepted May 19, 2010; Published June 17, 2010 Copyright: ß 2010 Kagami et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by grants from the Ministry of Health, Labor, and Welfare; from the Ministry of Education, Science, Sports and Culture; and from Takeda Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: tomogata@nch.go.jp

like phenotypes have revealed that epimutations (hypermethylation) and microdeletions affecting both DMRs of maternal origin cause paternalization of the 14q32.2 imprinted region, and that epimutations (hypomethylation) affecting both DMRs of paternal origin cause maternalization of the 14q32.2 imprinted region, while microdeletions involving the DMRs of paternal origin have no effect on the imprinting status [2,5–8]. These findings, together with the notion that parent-of-origin specific expression patterns of imprinted genes are primarily dependent on the methylation status of the DMRs [9], argue for a critical regulatory function of the two DMRs for the 14q32.2 imprinted region, with possible different effects between the body and the placenta. However, the precise role of individual DMR remains to be clarified. Here, we report that the IG-DMR and the MEG3-DMR show a hierarchical interaction for the methylation pattern in the body, and function as imprinting control centers in the placenta and the body, respectively. To our knowledge, this is the first study demonstrating not only different roles between the primary and secondary DMRs at a single imprinted region, but also an essential regulatory function for the secondary DMR.

Introduction Human chromosome 14q32.2 carries a cluster of protein-coding paternally expressed genes (PEGs) such as DLK1 and RTL1 and non-coding maternally expressed genes (MEGs) such as MEG3 (alias, GTL2), RTL1as (RTL1 antisense), MEG8, snoRNAs, and microRNAs [1,2]. Consistent with this, paternal uniparental disomy 14 (upd(14)pat) results in a unique phenotype characterized by facial abnormality, small bell-shaped thorax, abdominal wall defects, placentomegaly, and polyhydramnios [2,3], and maternal uniparental disomy 14 (upd(14)mat) leads to less-characteristic but clinically discernible features including growth failure [2,4]. The 14q32.2 imprinted region also harbors two differentially methylated regions (DMRs), i.e., the germline-derived primary DLK1-MEG3 intergenic DMR (IG-DMR) and the postfertilizationderived secondary MEG3-DMR [1,2]. Both DMRs are hypermethylated after paternal transmission and hypomethylated after maternal transmission in the body, whereas in the placenta the IG-DMR alone remains as a DMR and the MEG3-DMR is rather hypomethylated [1,2]. Furthermore, previous studies in cases with upd(14)pat/matPLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Author Summary Genomic imprinting is a process causing genes to be expressed in a parent-of-origin specific mannerâ&#x20AC;&#x201D;some imprinted genes are expressed from maternally inherited chromosomes and others from paternally inherited chromosomes. Imprinted genes are often located in clusters regulated by regions that are differentially methylated according to their parental origin. The human chromosome 14q32.2 imprinted region harbors the germlinederived primary DLK1-MEG3 intergenic differentially methylated region (IG-DMR) and the postfertilization-derived secondary MEG3-DMR, together with multiple imprinted genes. Perturbed dosage of these imprinted genes, for example in patients with paternal and maternal uniparental disomy 14, causes distinct phenotypes. Here, through analysis of patients with microdeletions recapitulating some or all of the uniparental disomy 14 phenotypes, we show that the IG-DMR acts as an upstream regulator for the methylation pattern of the MEG3-DMR in the body but not in the placenta. Importantly, in the body, the MEG3DMR functions as an imprinting control center. To our knowledge, this is the first study demonstrating an essential function for the secondary DMR in the regulation of multiple imprinted genes. Thus, the results provide a significant advance in the clarification of underlying epigenetic features that can act to regulate imprinting.

Results Clinical reports We studied an infant with upd(14)pat body and placental phenotypes (patient 1) and a neonate with upd(14)pat body, but no placental, phenotype (patient 2) (Figure 1). Detailed clinical features of patients 1 and 2 are shown in Table 1. In brief, patient 1 was delivered by a caesarean section at 33 weeks of gestation due to progressive polyhydramnios despite amnioreduction at 28 and 30 weeks of gestation, whereas patient 2 was born at 28 weeks of gestation by a vaginal delivery due to progressive labor without discernible polyhydramnios. Placentomegaly was observed in patient 1 but not in patient 2. Patients 1 and 2 were found to have characteristic face, small bell-shaped thorax with coat hanger appearance of the ribs, and omphalocele. Patient 1 received surgical treatment for omphalocele immediately after birth and mechanical ventilation for several months. At present, she is 5.5 months of age, and still requires intensive care including oxygen administration and tube feeding. Patient 2 died at four days of age due to massive intracranial hemorrhage, while receiving intensive care including mechanical ventilation. The mother of patient 1 had several non-specific clinical features such as short stature and obesity. The father of patient 1 and the parents of patient 2 were clinically normal.

Figure 1. Clinical phenotypes of patients 1 and 2 at birth. Both patients have bell shaped thorax with coat hanger appearance of the ribs and omphalocele. In patient 1, histological examination of the placenta shows proliferation of dilated and congested chorionic villi, as has previously been observed in a case with upd(14)pat [2]. For comparison, the histological finding of a gestational age matched (33 weeks) control placenta is shown in a dashed square. The horizontal black bars indicate 100 mm. doi:10.1371/journal.pgen.1000992.g001

fibroblasts, and placenta at 38 weeks of gestation, and from fresh leukocytes of upd(14)pat/mat patients and formalin-fixed and paraffin-embedded placenta of a upd(14)pat patient [2,3].

Structural analysis of the imprinted region We first examined the structure of the 14q32.2 imprinted region (Figure 2). Upd(14) was excluded in patients 1 and 2 as well as in the mother of patient 1 by microsatellite analysis (Table S1), and FISH analysis for the two DMRs identified a familial heterozygous deletion encompassing the IG-DMR alone in patient 1 and her mother and a de novo heterozygous deletion encompassing the MEG3-DMR alone in patient 2 (Figure 2). The microdeletions were further localized by SNP genotyping for 70 loci (Table S1) and quantitative real-time PCR (q-PCR) analysis for four regions around the DMRs (Figure S1A), and serial direct sequencing for the long PCR products harboring the deletion junctions successfully identified the fusion points of the microdeletions in patient 1 and her mother and in patient 2 (Figure 2). According to the NT_026437 sequence data at the NCBI Database (Genome Build 36.3) (http://preview.ncbi.nlm.nih.gov/guide/), the deletion

Sample preparation We isolated genomic DNA (gDNA) and transcripts (mRNAs, snoRNAs, and microRNAs) from fresh leukocytes of patients 1 and the parents of patients 1 and 2, from fresh skin fibroblasts of patient 2, and from formalin-fixed and paraffin-embedded placental samples of patient 1 and similarly treated pituitary and adrenal samples of patient 2 (although multiple body tissues were available in patient 2, useful gDNA and transcript samples were not obtained from other tissues probably due to drastic postmortem degradation). We also made metaphase spreads from leukocytes and skin fibroblasts. For comparison, we obtained control samples from fresh normal adult leukocytes, neonatal skin PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Table 1. Clinical features in patients 1 and 2.

Patient 1

Patient 2

Upd(14)pat (n = 20)c

Present age

5.5 months

Deceased at 4 days

0–9 years

Sex

Female

Male:Female = 9:11

Karyotype

46,XX

Gestational age (weeks)

28–37

Delivery

Caesarean

Vaginal

Vaginal:Caesarean = 6:7

Pregnancy and delivery

Polyhydramnios

Yes

20/20 (,28)d

Amnioreduction (weeks)

26 (28, 30)

6/6

Placentomegaly

Yes

10/10

Prenatal growth failure

1/13

Birth length (cm)

43 (WNR)a

34 (WNR)a

Birth weight (kg)

2.84 (.90 centile)a

1.32 (WNR)a

Postnatal growth failure

Yes

…

Present stature (cm)

56.3 (23.0 SD)b

…

Present weight (kg)

5.02 (23.0 SD)b

…

Frontal bossing

Yes

5/7

Hairy forehead

Yes

9/10

Blepharophimosis

Yes

14/15

Depressed nasal bridge

Yes

13/13

Anteverted nares

Yes

6/10

Small ears

Yes

11/12

Protruding philtrum

Yes

15/15

Growth pattern

5/6

Characteristic face

Puckered lips

3/10

Micrognathia

Yes

11/12

Thoracic abnormality Bell-shaped thorax

Yes

17/17

Mechanical ventilation

Yes

17/17

Abdominal wall defect Diastasis recti

…

15/17

Omphalocele

Yes

2/17e

Others Short webbed neck

Yes

14/14

Cardiac disease

Yes (PDA)

5/10

Inguinal hernia

2/6

Coxa valga

Yes

3/4

Joint contractures

Yes

8/10

Kyphoscoliosis

4/7

Extra features

Hydronephrosis (bilateral)

WNR: within the normal range; SD: standard deviation; and PDA: patent ductus arteriosus. a Assessed by the gestational age- and sex-matched Japanese reference data from the Ministry of Health, Labor, and Welfare (http://www.e-stat.go.jp/SG1/estat/ GL02020101.do). b Assessed by the age- and sex-matched Japanese reference data.. c In the column summarizing the clinical features of 20 patients with upd(14)pat, the denominators indicate the number of cases examined for the presence or absence of each feature, and the numerators represent the number of cases assessed to be positive for that feature; thus, the differences between the denominators and the numerators denote the number of cases evaluated to be negative for that feature (adopted from reference [2]). d Polyhydramnios has been identified by 28 weeks of gestation. e Omphalocele is present in two cases with upd(14)pat and in two cases with epimutations [2]. doi:10.1371/journal.pgen.1000992.t001

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Figure 2. Physical map of the 14q32.2 imprinted region and the deleted segments in patient 1 and her mother and in patient 2 (shaded in gray). PEGs are shown in blue, MEGs in red, and the IG-DMR (CG4 and CG6) and the MEG3-DMR (CG7) in green. It remains to be clarified whether DIO3 is a PEG, although mouse Dio3 is known to be preferentially but not exclusively expressed from a paternally derived chromosome [35]. For MEG3, the isoform 2 with nine exons (red bars) and eight introns (light red segment) is shown (Ensembl; http://www.ensembl.org/index.html). Electrochromatograms represent the fusion point in patient 1 and her mother, and the fusion point accompanied by insertion of a 66 bp segment (highlighted in blue) with a sequence identical to that within MEG3 intron 5 (the blue bar) in patient 2. Since PCR amplification with primers flanking the 66 bp segment at MEG3 intron 5 has produced a 194 bp single band in patient 2 as well as in a control subject (shown in the box), this indicates that the 66 bp segment at the fusion point is caused by a duplicated insertion rather than by a transfer from intron 5 to the fusion point (if the 66 bp is transferred from the original position, a 128 bp band as well as a 194 bp band should be present in patient 2) (the marker size: 100, 200, and 300 bp). In the FISH images, the red signals (arrows) have been identified by the FISH-1 probe and the FISH-2 probe, and the light green signals (arrowheads) by the RP11-566I2 probe for 14q12 used as an internal control. The faint signal detected by the FISH-2 probe in patient 2 is consistent with the preservation of a ,1.2 kb region identified by the centromeric portion of the FISH-2 probe. doi:10.1371/journal.pgen.1000992.g002

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Figure 3. Methylation analysis of the IG-DMR (CG4 and CG6) and the MEG3-DMR (CG7). Filled and open circles indicate methylated and unmethylated cytosines at the CpG dinucleotides, respectively. (A) Structure of CG4, CG6, and CG7. Pat: paternally derived chromosome; and Mat:

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

maternally derived chromosome. The PCR products for CG4 (311 bp) harbor 6 CpG dinucleotides and a G/A SNP (rs12437020), and are digested with BstUI into three fragment (33 bp, 18 bp, and 260 bp) when the cytosines at the first and the second CpG dinucleotides and the fourth and the fifth CpG dinucleotides (indicated with orange rectangles) are methylated. The PCR products for CG6 (428 bp) carry 19 CpG dinucleotides and a C/T SNP (rs10133627), and are digested with TaqI into two fragment (189 bp and 239 bp) when the cytosine at the 9th CpG dinucleotide (indicated with an orange rectangle) is methylated. The PCR products for CG7 harbor 7 CpG dinucleotides, and are digested with BstUI into two fragment (56 bp and 112 bp) when the cytosines at the fourth and the fifth CpG dinucleotides (indicated with orange rectangles) are methylated. These enzymes have been utilized for combined bisulfite restriction analysis (COBRA). (B) Methylation analysis. Upper part shows bisulfite sequencing data. The SNP typing data are also denoted for CG4 and CG6. The circles highlighted in orange correspond to those shown in Figure 3A. The relatively long CG6 was not amplified from the formalin-fixed and paraffin-embedded placental samples, probably because of the degradation of genomic DNA. Note that CG4 is differentially methylated in a control placenta and is massively hypermethylated in a upd(14)pat placenta, whereas CG7 is rather hypomethylated in a upd(14)pat placenta as well as in a control placenta. Lower part shows COBRA data. U: unmethylated clone specific bands (311 bp for CG4, 428 bp for CG6, and 168 bp for CG7); and M: methylated clone specific bands (260 bp for CG4, 239 bp and 189 bp for CG6, and 112 bp and 56 bp for CG7). The results reproduce the bisulfite sequencing data, and delineate normal findings of the father of patient 1 and the parents of patient 2. doi:10.1371/journal.pgen.1000992.g003

SNORD114-29 in a control subject and the mother of patient 1 but not in patient 1. For skin fibroblasts, although all MEGs but no PEGs were expressed in control subjects, neither MEGs nor PEGs were expressed in patient 2. For placentas, although all imprinted genes were expressed in control subjects, PEGs only were expressed in patient 1. For the pituitary and adrenal of patient 2, DLK1 expression alone was identified. Expression pattern analyses using informative cSNPs revealed monoallelic MEG3 expression in the leukocytes of the mother of patient 1 (Figure 5D), and biparental RTL1 expression in the placenta of patient 1 (no informative cSNP was detected for DLK1) and biparental DLK1 expression in the pituitary and adrenal of patient 2 (RTL1 was not expressed in the pituitary and adrenal) (Figure 5E), as well as maternal MEG3 expression in the control leukocytes and paternal RTL1 expression in the control placentas (Figure S2). Although we also attempted q-PCR analysis, precise assessment was impossible for MEG3 in the mother of patient 1 because of faint expression level in leukocytes and for RTL1 in patient 1 and DLK1 in patient 2 because of poor quality of mRNAs obtained from formalin-fixed and paraffin-embedded tissues.

size was 8,558 bp (82,270,449–82,279,006 bp) for the microdeletion in patient 1 and her mother, and 4,303 bp (82,290,978– 82,295,280 bp) for the microdeletion in patient 2. The microdeletion in patient 2 also involved the 59 part of MEG3 and five of the seven putative CTCF binding sites A–G [10], and was accompanied by insertion of a 66 bp sequence duplicated from MEG3 intron 5 (82,299,727–82,299,792 bp on NT_026437). Direct sequencing of the exonic or transcribed regions detected no mutation in DLK1, MEG3, and RTL1, although several cDNA polymorphisms (cSNPs) were identified (Table S1). Oligoarray comparative genomic hybridization identified no other discernible structural abnormality (Figure S1B).

Methylation analysis of the two DMRs and the seven putative CTCF binding sites We next studied methylation patterns of the previously reported IG-DMR (CG4 and CG6) and MEG3-DMR (CG7) (Figure 3A) [2], using bisulfite treated gDNA samples. Bisulfite sequencing and combined bisulfite restriction analysis using body samples revealed a hypermethylated IG-DMR and MEG3-DMR in patient 1, a hypomethylated IG-DMR and differentially methylated MEG3DMR in the mother of patient 1, and a differentially methylated IG-DMR and hypermethylated MEG3-DMR in patient 2, and bisulfite sequencing using placental samples showed a hypermethylated IG-DMR and rather hypomethylated MEG3-DMR in patient 1 (Figure 3B). We also examined methylation patterns of the seven putative CTCF binding sites by bisulfite sequencing (Figure 4A). The sites C and D alone exhibited DMRs in the body and were rather hypomethylated in the placenta (Figure 4B), as observed in CG7. Furthermore, to identify an informative SNP(s) pattern for allelespecific bisulfite sequencing, we examined a 349 bp region encompassing the site C and a 356 bp region encompassing the site D as well as a 300 bp region spanning the previously reported three SNPs near the site D, in 120 control subjects, the cases with upd(14)pat/mat, and patients 1 and 2 and their parents. Consequently, an informative polymorphism was identified for a novel G/A SNP near the site D in only a single control subject, and the parent-of-origin specific methylation pattern was confirmed (Figure 4C). No informative SNP was found in the examined region around the site C, and no other informative SNP was identified in the two examined regions around the site D, with the previously known three SNPs being present in a homozygous condition in all the subjects analyzed.

Discussion The data of the present study are summarized in Figure 6. Parental origin of the microdeletion positive chromosomes is based on the methylation patterns of the preserved DMRs in patients 1 and 2 and the mother of patient 1 as well as maternal transmission in patient 1. Loss of the hypomethylated IG-DMR of maternal origin in patient 1 was associated with epimutation (hypermethylation) of the MEG3-DMR in the body and caused paternalization of the imprinted region and typical upd(14)pat body and placental phenotypes, whereas loss of the hypomethylated MEG3-DMR of maternal origin in patient 2 permitted normal methylation pattern of the IG-DMR in the body and resulted in maternal to paternal epigenotypic alteration and typical upd(14)pat body, but no placental, phenotype. In this regard, while a 66 bp segment was inserted in patient 2, this segment contains no known regulatory sequence [11] or evolutionarily conserved element [12] (also examined with a VISTA program, http://genome.lbl.gov/vista/ index.shtml). Similarly, while no control samples were available for pituitary and adrenal, the previous study in human subjects has shown paternal DLK1 expression in adrenal as well as monoallelic DLK1 and MEG3 expressions in various tissues [11]. Furthermore, the present and the previous studies [2] indicate that this region is imprinted in the placenta as well as in the body. Thus, these results, in conjunction with the finding that the IG-DMR remains as a DMR and the MEG3-DMR exhibits a non-DMR in the placenta [2], imply the following: (1) the IG-DMR functions hierarchically as an upstream regulator for the methylation pattern of the MEG3-DMR on the maternally inherited chromosome in the body, but not in the placenta; (2) the hypomethylated

Expression analysis of the imprinted genes Finally, we performed expression analyses, using standard reverse transcriptase (RT)-PCR and/or q-PCR analysis for multiple imprinted genes in this region (Figure 5A–5C). For leukocytes, weak expression was detected for MEG3 and PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Figure 4. Methylation analysis of the putative CTCF protein binding sites A–G. (A) Location and sequence of the putative CTCF binding sites. In the left part, the sites C and D are painted in yellow and the remaining sites in purple. In the right part, the consensus CTCF binding motifs are shown in red letters; the cytosine residues at the CpG dinucleotides within the CTCF binding motifs are highlighted in blue, and those outside the CTCF binding motifs are highlighted in green [10]. (B) Methylation analysis. Upper part shows bisulfite sequencing data, using leukocyte genomic DNA samples. Since PCR products for the site B contain a C/A SNP (rs11627993), genotyping data are also indicated. The circles highlighted in blue correspond to those shown in Figure 4A. The sites C and D exhibit clear DMRs. Lower part indicates the results of the sites C and D using leukocyte and/or placental genomic DNA samples. The findings are similar to those of CG7. (C) Allele-specific methylation pattern of the CTCF binding site D. A novel G/A SNP has been identified in a single control subject, as shown on a reverse chromatogram delineating a C/T SNP pattern, while the previously reported three SNPs were present in a homozygous condition. Methylated and unmethylated clones are associated with the ‘‘G’’ and the ‘‘A’’ alleles, respectively. doi:10.1371/journal.pgen.1000992.g004

MEG3-DMR functions as an essential imprinting regulator for both PEGs and MEGs in the body; and (3) in the placenta, the hypomethylated IG-DMR directly controls the imprinting pattern of both PEGs and MEGs. These notions also explain the epigenotypic alteration in the previous cases with epimutations or microdeletions affecting both DMRs (Figure S3). It remains to be clarified how the IG-DMR and the MEG3DMR interact hierarchically in the body. However, the present data, together with the previous findings in cases with epimutations [2,5–8], imply that MEG3-DMR can remain hypomethylated only in the presence of a hypomethylated IG-DMR and is methylated when the IG-DMR is deleted or methylated irrespective of the parental origin. Furthermore, mouse studies have suggested that the methylation pattern of the postfertilization-derived Gtl2-DMR (the mouse homolog for the MEG3-DMR) is dependent on that of the germline-derive IG-DMR [13]. Thus, a preferential binding of some factor(s) to the unmethylated IGDMR may cause a conformational alteration of the genomic structure, thereby protecting the methylation of the MEG3-DMR. It also remains to be elucidated how the IG-DMR and the MEG3-DMR regulate the expression of both PEGs and MEGs in the placenta and the body, respectively. For the MEG3-DMR, however, the CTCF binding sites C and D may play a pivotal role in the imprinting regulation. The methylation analysis indicates that the two sites reside within the MEG3-DMR, and it is known that the CTCF protein with versatile functions preferentially binds to unmethylated target sequences including the sites C and D [10,14–16]. In this regard, all the MEGs in this imprinted region can be transcribed together in the same orientation and show a strikingly similar tissue expressions pattern [1,12], whereas PEGs are transcribed in different directions and are co-expressed with MEGs only in limited cell-types [1,17]. It is possible, therefore, that preferential CTCF binding to the grossly unmethylated sites C and D activates all the MEGs as a large transcription unit and represses all the PEGs perhaps by influencing chromatin structure and histone modification independently of the effects of expressed MEGs. In support of this, CTCF protein acts as a transcriptional activator for Gtl2 (the mouse homolog for MEG3) in the mouse [18]. Such an imprinting control model has not been proposed previously. It is different from the CTCF protein-mediated insulator model indicated for the H19-DMR and from the noncoding RNA-mediated model implicated for several imprinted regions including the KvDMR1 [19]. However, the KvDMR1 harbors two putative CTCF binding sites that may mediate noncoding RNA independent imprinting regulation [20], and the imprinting control center for Prader-Willi syndrome [21] also carries three CTCF binding sites (examined with a Search for CTCF DNA Binding Sites program, http://www.essex.ac.uk/bs/ molonc/spa.html). Thus, while each imprinted region would be regulated by a different mechanism, a CTCF protein may be involved in the imprinting control of multiple regions, in various manners. PLoS Genetics | www.plosgenetics.org

This imprinted region has also been studied in the mouse. Clinical and molecular findings in wildtype mice [1,22,23], mice with PatDi(12) (paternal disomy for chromosome 12 harboring this imprinted region) [13,24,25], and mice with targeted deletions for the IG-DMR (DIG-DMR) [22,26] and for the Gtl2-DMR (DGtl2DMR) [27] are summarized in Table 2. These data, together with human data, provide several informative findings. First, in both the human and the mouse, the IG-DMR is differentially methylated in both the body and the placenta, whereas the MEG3/Gtl2-DMR is differentially methylated in the body and exhibits non-DMR in the placenta. Second, the IG-DMR and the MEG3/Gtl2-DMR show a hierarchical interaction on the maternally derived chromosome in both the human and the mouse bodies. Indeed, the MEG3/Gtl2-DMR is epimutated in patient 1 and mice with maternally inherited DIG-DMR, and the IG-DMR is normally methylated in patient 2 and mice with maternally inherited DGtl2-DMR. Third, the function of the IGDMR is comparable between human and mouse bodies and different between human and mouse placentas. Indeed, patient 1 has upd(14)pat body and placental phenotypes, whereas mice with the DIG-DMR of maternal origin have PatDi(12)-compatible body phenotype and apparently normal placental phenotype. It is likely that imprinting regulation in the mouse placenta is contributed by some mechanism(s) other than the methylation pattern of the IGDMR, such as chromatin conformation [22,28,29]. Unfortunately, however, the data of DGtl2-DMR mice appears to be drastically complicated by the retained neomycin cassette in the upstream region of Gtl2. Indeed, it has been shown that the insertion of a lacZ gene or a neomycin gene in the similar upstream region of Gtl2 causes severely dysregulated expression patterns and abnormal phenotypes after both paternal and maternal transmissions [30,31], and that deletion of the inserted neomycin gene results in apparently normal expression patterns and phenotypes after both paternal and maternal transmissions [31]. (In this regard, although a possible influence of the inserted 66 bp segment can not be excluded formally in patient 2, phenotype and expression data in patient 2 are compatible with simple paternalization of the imprinted region.) In addition, since the apparently normal phenotype in mice homozygous for DGtl2DMR is reminiscent of that in sheep homozygous for the callipyge mutation [32], a complicated mechanism(s) such as the polar overdominance may be operating in the DGtl2-DMR mice [33]. Thus, it remains to be clarified whether the MEG3/Gtl2-DMR has a similar or different function between the human and the mouse. Two points should be made in reference to the present study. First, the proposed functions of the two DMRs are based on the results of single patients. This must be kept in mind, because there might be a hidden patient-specific abnormality or event that might explain the results. For example, the abnormal placental phenotype in patient 1 might be caused by some co-incidental aberration, and the apparently normal placenta in patient 2 might be due to mosaicism with grossly preserved MEG3-DMR in the placenta and grossly deleted MEG3-DMR in the body. Second, 8

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Figure 5. Expression analysis. (A) Reverse transcriptase (RT)-PCR analysis. L: leukocytes; SF: skin fibroblasts; and P: placenta. The relatively weak GAPDH expression for the formalin-fixed and paraffin-embedded placenta of patient 1 indicates considerable mRNA degradation. Since a single exon was amplified for DLK1 and RTL1, PCR was performed with and without RT for the placenta of patient 1, to exclude the possibility of false positive results caused by genomic DNA contamination. (B) Quantitative real-time PCR (q-PCR) analysis of MEG3, MEG8, and miRNAs, using fresh skin fibroblasts (SF) of patient 2 and four control neonates. Of the examined MEGs, miR433 and miR127 are encoded by RTL1as. (C) RT-PCR analysis for the formalin-fixed and paraffin-embedded pituitary (Pit.) and the adrenal (Ad.) in patient 2. The bands for DLK1 are detected in the presence of RT and undetected in the absence of RT, thereby excluding contamination of genomic DNA. (D) Monoallelic MEG3 expression in the leukocytes of the mother of patient 1. The three cSNPs are present in a heterozygous status in gDNA and in a hemizygous status in cDNA. D: direct sequence. (E) Biparental RTL1 expression in the placenta of patient 1 and biparental DLK1 expression in the pituitary and adrenal of patient 2. D: direct sequence; and S: subcloned sequence. In patient 1, genotyping of RTL1 cSNP (rs6575805) using gDNA indicates maternal origin of the ‘‘C’’ allele and paternal origin of the ‘‘T’’ allele, and sequencing analysis using cDNA confirms expression of maternally as well as paternally derived RTL1. Similarly, in patient 2, genotyping of DLK1 cSNP (rs1802710) using gDNA denotes maternal origin of the ‘‘C’’ allele and paternal origin of the ‘‘T’’ alleles, and sequencing analysis using cDNA confirms expression of maternally as well as paternally inherited DLK1. doi:10.1371/journal.pgen.1000992.g005

the clinical features in the mother of patient 1 such as short stature and obesity are often observed in cases with upd(14)mat (Table S2). However, the clinical features are non-specific and appear to be irrelevant to the microdeletion involving the IG-DMR, because loss of the paternally derived IG-DMR does not affect the imprinted status [2,26]. Indeed, MEG3 in the mother of patient 1 showed normal monoallelic expression in the presence of the differentially methylated MEG3-DMR. Nevertheless, since the upd(14)mat phenotype is primarily ascribed to loss of functional DLK1 (Figure S3B) [2,34], it might be possible that the PLoS Genetics | www.plosgenetics.org

microdeletion involving the IG-DMR has affected a cis-acting regulatory element for DLK1 expression (for details, see Note in the legend for Table S2). Further studies in cases with similar microdeletions will permit clarification of these two points. In summary, the results show a hierarchical interaction and distinct functional properties of the IG-DMR and the MEG3DMR in imprinting control. Thus, this study provides significant advance in the clarification of mechanisms involved in the imprinting regulation at the 14q32.2 imprinted region and the development of upd(14) phenotype. 9

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Figure 6. Schematic representation of the observed and predicted methylation and expression patterns. Deleted regions in patients 1 and 2 and the mother of patient 1 are indicated by stippled rectangles. P: paternally derived chromosome; and M: maternally derived chromosome. Representative imprinted genes are shown; these genes are known to be imprinted in the body and the placenta [2] (see also Figure S2). Placental samples have not been obtained in patient 2 and the mother of patient 1 (highlighted with light green backgrounds). Thick arrows for RTL1 in patients 1 and 2 represent increased RTL1 expression that is ascribed to loss of functional microRNA-containing RTL1as as a repressor for RTL1 [26,36â&#x20AC;&#x201C; 38]; this phenomenon has been indicated in placentas with upd(14)pat and in those with an epimutation and a microdeletion involving the two DMRs (Figure S3A and S3C) [2]. MEG3 and RTL1as that are disrupted or predicted to have become silent on the maternally derived chromosome are written in gray. Filled and open circles represent hypermethylated and hypomethylated DMRs, respectively; since the MEG3-DMR is rather hypomethylated and regarded as non-DMR in the placenta [2] (see also Figure 3), it is painted in gray. doi:10.1371/journal.pgen.1000992.g006

Materials and Methods

Sample preparation For leukocytes and skin fibroblasts, genomic DNA (gDNA) samples were extracted with FlexiGene DNA Kit (Qiagen), and RNA samples were prepared with RNeasy Plus Mini (Qiagen) for DLK1, MEG3, RTL1, MEG8 and snoRNAs, and with mirVana miRNA Isolation Kit (Ambion) for microRNAs. For paraffinembedded tissues including the placenta, brain, lung, heart, liver, spleen, kidney, bladder, and small intestine, gDNA and RNA samples were extracted with RecoverAll Total Nucleic Acids Isolation Kit (Ambion) using slices of 40 mm thick. For fresh control placental samples, gDNA and RNA were extracted using ISOGEN (Nippon Gene). After treating total RNA samples with

Ethics statement This study was approved by the Institutional Review Board Committees at National Center for Child health and Development, University College Dublin, and Dokkyo University School of Medicine, and performed after obtaining written informed consent.

Primers All the primers utilized in this study are summarized in Table S3. PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Table 2. Clinical and molecular findings in wild-type and PatDi(12) mice and mice with maternally inherited DIG-DMR and DGtl2DMR.

Wildtype

DIG-DMR (,4.15 kb)a

PatDi(12)

DGtl2-DMR (,10 kb)b Neomycin cassette (+)

,Body. Phenotype

Normal

Abnormalc

PatDi(12) phenotypec

Normal at birth Lethal by 4 weeks

Methylation pattern IG-DMR Gtl2-DMR

Methylated

Methylatedd

Differential

Methylated

Epimutated

Monoallelic

Increased (,2x)

Biparental

Differential

Differential Methylatedd

Expression pattern Pegs

Grossly normal

Increased (2x or 4.5x)f Monoallelic

Absent

Decreased (,0.2,0.5x)g

Normal

Placentomegaly

Apparently normal

Not determined

IG-DMR

Differential

Methylated

Not determined

Gtl2-DMR

Non-DMR

Not determined

Monoallelic

Not determined

Increased (1.5,1.8x)g

Megs ,Placenta. Phenotype Methylation pattern

Expression pattern Pegs Megs

Monoallelic

Not determined

Decreased (0.6,0.8x)

Remark

Decreased (0.5,0.85x)g

Decreased (,0.1,1.0)g

Paternal transmissioni

Paternal transmission

Biparental transmissionj a The deletion size is smaller than that of patient 1 and her mother in this study, especially at the centromeric region. b The microdeletion also involves Gtl2, and the deletion size is larger than that of patient 2 in this study. c Body phenotype includes bell-shaped thorax with rib anomalies, distended abdomen, and short and broad neck. d Hemizygosity for the methylated DMR of paternal origin. e Hypermethylation of the maternally derived DMR. f 2x Dlk1 and Dio3 expression levels and 4.5x Rtl1 expression level. The markedly elevated Rtl1 expression level is ascribed to a synergic effect between activation of the usually silent Rtl1 of maternal origin and loss of functional microRNA-containing Rtl1as as a repressor for Rtl1 [26,36â&#x20AC;&#x201C;38]. g The expression level is variable among examined tissues and examined genes. h The DIG-DMR of paternal origin has permitted normal Gtl2-DMR methylation pattern, intact imprinting status, and normal phenotype in the body (no data on the placenta). i The DGtl2-DMR of paternal origin is accompanied by normal methylation pattern of the IG-DMR and variably reduced Pegs expression and increased Megs expression in the body, and has yielded severe growth retardation accompanied by perinatal lethality. j The homozygous mutants have survived and developed into fertile adults, despite rather altered expression patterns of the imprinted genes. doi:10.1371/journal.pgen.1000992.t002

DNase, cDNA samples for DLK1, MEG3, MEG8, and snoRNAs were prepared with oligo(dT) primers from 1 mg of RNA using Superscript III Reverse Transcriptase (Invitrogen), and those for microRNAs were synthesized from 300 ng of RNA using TaqMan MicroRNA Reverse Transcription Kit (Applied Biosystems). For RTL1, cDNA samples were synthesized with RTL1-specific primers that do not amplify RTL1as. Control gDNA and cDNA samples were extracted from adult leukocytes and neonatal skin fibroblasts purchased from Takara Bio Inc. Japan, and from a fresh placenta of 38 weeks of gestation. Metaphase spreads were prepared from leukocytes and skin fibroblasts using colcemide (Invitrogen).

rhodamine anti-digoxigenin, and the RP11-566I2 probe was labeled with biotin and detected by avidin conjugated to fluorescein isothiocyanate. For quantitative real-time PCR analysis, the relative copy number to RNaseP (catalog No: 4316831, Applied Biosystems) was determined by the Taqman real-time PCR method using the probe-primer mix on an ABI PRISM 7000 (Applied Biosystems). To determine the breakpoints of microdeletions, sequence analysis was performed for long PCR products harboring the fusion points, using serial forward primers on the CEQ 8000 autosequencer (Beckman Coulter). Direct sequencing was also performed on the CEQ 8000 autosequencer. Oligoarray comparative genomic hybridization was performed with 16244K Human Genome Array (catalog No: G4411B) (Agilent Technologies), according to the manufacturerâ&#x20AC;&#x2122;s protocol.

Structural analysis Microsatellite analysis and SNP genotyping were performed as described previously [2]. For FISH analysis, metaphase spreads were hybridized with a 5,104 bp FISH-1 probe and a 5,182 bp FISH-2 probe produced by long PCR, together with an RP11566I2 probe for 14q12 used as an internal control [2]. The FISH-1 and FISH-2 probes were labeled with digoxigenin and detected by PLoS Genetics | www.plosgenetics.org

Methylation analysis Methylation analysis was performed for gDNA treated with bisulfite using the EZ DNA Methylation Kit (Zymo Research). After PCR amplification using primer sets that hybridize both methylated and unmethylated clones because of lack of CpG 11

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

normal subject. Genotyping has been carried out for RTL1 cSNP using gDNA and cDNA samples of a fresh placenta and gDNA sample from the mother, showing that both maternally and nonmaternally (paternally) derived alleles are delineated in the gDNA, whereas a non-maternally (paternally) inherited allele alone is detected in cDNA. This cSNP has also been examined in the placenta of patient 1 (Figure 5E). Furthermore, the results confirm that the primers utilized in this study have amplified RTL1, but not RTL1as. Found at: doi:10.1371/journal.pgen.1000992.s002 (0.39 MB TIF)

dinucleotides within the primer sequences, the PCR products were digested with appropriate restriction enzymes for combined bisulfite restriction analysis. For bisulfite sequencing, the PCR products were subcloned with TOPO TA Cloning Kit (Invitrogen) and subjected to direct sequencing on the CEQ 8000 autosequencer.

Expression analysis Standard RT-PCR was performed for DLK1, RTL1, MEG3, MEG8, and snoRNAs using primers hybridizing to exonic or transcribed sequences, and one ml of PCR reaction solutions was loaded onto Gel-Dye Mix (Agilent). Taqman real-time PCR was carried out using the probe-primer mixtures (assay No: Hs00292028 for MEG3 and Hs00419701 for MEG8; assay ID: 001028 for miR433, 000452 for miR127, 000568 for miR379, and 000477 for miR154) on the ABI PRISM 7000. Data were normalized against GAPDH (catalog No: 4326317E) for MEG3 and MEG8 and against RNU48 (assay ID: 0010006) for the remaining miRs. The expression studies were performed three times for each sample. To examine the imprinting status of MEG3 in the leukocytes of the mother of patient 1, direct sequence data for informative cSNPs were compared between gDNA and cDNA. To analyze the imprinting status of RTL1 in the placental sample of patient 1 and that of DLK1 in the pituitary and adrenal samples of patient 2, RTPCR products containing exonic cSNPs informative for the parental origin were subcloned with TOPO TA Cloning Kit, and multiple clones were subjected to direct sequencing on the CEQ 8000 autosequencer. Furthermore, MEG3 expression pattern was examined using leukocyte gDNA and cDNA samples from multiple normal subjects and leukocyte gDNA samples from their mothers, and RTL1 expression pattern was analyzed using gDNA and cDNA samples from multiple fresh normal placentas and leukocyte gDNA from the mothers.

Figure S3 Schematic representation of the observed and predicted methylation and expression patterns in previously reported cases with upd(14)pat/mat-like phenotypes and in normal and upd(14)pat/mat subjects. For the explanations of the illustrations, see the legend for Figure 6. Previous studies have indicated that (1) Epimutation-1, Deletion-1, Deletion-2, and Deletion-3 lead to maternal to paternal epigenotypic alteration; (2) Epimutation-2 results in paternal to maternal epigenotypic alteration; and (3) Deletion-4 and Deletion-5 have no effect on the epigenotypic status [2,5â&#x20AC;&#x201C;8,26]. (A) Cases with typical or mild upd(14)pat phenotype. Epimutation-1: Hypermethylation of the IG-DMR and the MEG3-DMR of maternal origin in the body, and that of the IG-DMR of maternal origin in the placenta (the MEG3-DMR is rather hypomethylated in the placenta) (cases 6â&#x20AC;&#x201C;8 in Kagami et al. [2]). Deletion-1: Microdeletion involving DLK1, the two DMRs, and MEG3 on the maternally inherited chromosome (case 2 in Kagami et al. [2]). Deletion-2: Microdeletion involving DLK1, the two DMRs, MEG3, RTL1, and RTL1as on the maternally inherited chromosome (cases 3 and 5 in Kagami et al. [2]). Deletion-3: Microdeletion involving the two DMRs, MEG3, RTL1, and RTL1as on the maternally inherited chromosome (case 4 in Kagami et al. [2]). These findings are explained by the following notions: (1) Epimutation (hypermethylation) of the normally hypomethylated IG-DMR of maternal origin directly results in paternalization of the imprinted region in the placenta and indirectly leads to paternalization of the imprinted region in the body via epimutation (hypermethylation) of the usually hypomethylated MEG3-DMR of maternal origin. Thus, the epimutation (hypermethylation) is predicted to have impaired the IG-DMR as the primary target, followed by the epimutation (hypermethylation) of the MEG3-DMR after fertilization; (2) Loss of the hypomethylated MEG3-DMR of maternal origin leads to paternalization of the imprinted region in the body; and (3) Loss of the hypomethylated IG-DMR of maternal origin results in paternalization of the imprinted region in the placenta. Furthermore, epigenotype-phenotype correlations imply that the severity of upd(14)pat phenotype is primarily determined by the RTL1 expression dosage rather than the DLK1 expression dosage [2]. (B) Cases with upd(14)mat-like phenotype. Epimutation-2: Hypomethylation of the IG-DMR and the MEG3-DMR of paternal origin (Temple et al. [5], Buiting et al. [6], Hosoki et al. [7], and Zechner et al. [8]). Deletion-4: Microdeletion involving DLK1, the two DMRs, and MEG3 on the paternally inherited chromosome (cases 9 and 10 in Kagami et al. [2]). Deletion-5: Microdeletion involving DLK1, the two DMRs, MEG3, RTL1, and RTL1as on the paternally inherited chromosome (case 11 in Kagami et al. [2] and patient 3 in Buiting et al. [6]). These findings are consistent with the following notions: (1) Epimutation (hypomethylation) of the normally hypermethylated IG-DMR of paternal origin directly results in maternalization of the imprinted region in the placenta and indirectly leads to maternalization of the imprinted region in the body through epimutation (hypomethylation) of the usually hypermethylated MEG3-DMR of paternal origin. Thus, epimutation (hypomethylation) is predicted to have affected the IG-DMR

Supporting Information Figure S1 Structural analysis. (A) Quantitative real-time PCR analysis (q-PCR) for four regions (q-PCR-1-4) in patient 2. The qPCR-1 and q-PCR-2 regions are present in two copies whereas qPCR-3 and q-PCR-4 regions are present in a single copy in patient 2. The four regions are present in two copies in the parents and a control subject, in a single copy in the two previously reported patients with microdeletions involving the examined regions (Deletion-1 and Deletion-2 are case 2 and case 3 in Kagami et al. [2], respectively), and in three copies in a hitherto unreported case with 46,XX,der(17)t(14;17)(q32.2;p13)pat who have three copies of the 14q32.2 imprinted region. Since the microsatellite locus D14S985 is present in two copies (Table S1) and the MEG3DMR is deleted (Figure 2) in patient 2, this has served to localize the breakpoints. (B) Oligoarray comparative genomic hybridization for a ,1 Mb imprinted region. All the signals remain within the normal range (-1 SD , +1 SD) (shaded in light blue) in patients 1 and 2. Found at: doi:10.1371/journal.pgen.1000992.s001 (1.17 MB TIF) Figure S2 Expression analysis. (A) Maternal MEG3 expression in the leukocytes of normal subjects. Genotyping has been performed for three cSNPs using genomic DNA (gDNA) and cDNA of leukocytes from control subjects and gDNA samples of their mothers, indicating that both maternally and non-maternally (paternally) derived alleles are delineated in the gDNA, whereas maternally inherited alleles alone are identified in cDNA. These three cSNPs have also been studied in the mother of patient 1 (Figure 5D). (B) Paternal RTL1 expression in the placenta of a PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

Imprinting Control Centers at Human 14q32.2

Found at: doi:10.1371/journal.pgen.1000992.s004 (0.19 MB DOC)

as the primary target, followed by the epimutation (hypomethylation) of the MEG3-DMR after fertilization; and (2) Loss of the hypermethylated DMRs of paternal origin has no effect on the imprinting status [2,26], so that upd(14)mat-like phenotype is primarily ascribed to the additive effects of loss of functional DLK1 and RTL1 from the paternally derived chromosome (the effects of loss of DIO3 appears to be minor, if any [2,35]). Although the MEGs expression dosage is predicted to be normal in Deletion-4 and Deletion-5 and doubled in Epimutation-2 as well as in upd(14)mat, it remains to be determined whether the difference in the MEGs expression dosage has major clinical effects or not. (C) Normal and upd(14)pat/mat subjects. Found at: doi:10.1371/journal.pgen.1000992.s003 (2.72 MB TIF)

Table S2 Clinical features in the mother of patient 1. Found at: doi:10.1371/journal.pgen.1000992.s005 (0.09 MB DOC) Table S3 Primers utilized in the present study. Found at: doi:10.1371/journal.pgen.1000992.s006 (0.14 MB DOC)

Author Contributions Conceived and designed the experiments: MK ACFS TO. Performed the experiments: MK MF KM FK. Analyzed the data: MK TO. Contributed reagents/materials/analysis tools: MJO AJG YW OA NM KM TO. Wrote the paper: TO.

Table S1 The results of microsatellite and SNP analyses.

References 19. Ideraabdullah FY, Vigneau S, Bartolomei MS (2008) Genomic imprinting mechanisms in mammals. Mutat Res 647: 77–85. 20. Fitzpatrick GV, Pugacheva EM, Shin JY, Abdullaev Z, Yang Y, et al. (2007) Allele-specific binding of CTCF to the multipartite imprinting control region KvDMR1. Mol Cell Biol 27: 2636–2647. 21. Horsthemke B, Wagstaff J (2008) Mechanisms of imprinting of the Prader-Willi/ Angelman region. Am J Med Genet A 146A: 2041–2052. 22. Lin SP, Coan P, da Rocha ST, Seitz H, Cavaille J, et al. (2007) Differential regulation of imprinting in the murine embryo and placenta by the Dlk1-Dio3 imprinting control region. Development 134: 417–426. 23. Coan PM, Burton GJ, Ferguson-Smith AC (2005) Imprinted genes in the placenta–a review. Placenta 26 Suppl A: S10–20. 24. Georgiades P, Watkins M, Surani MA, Ferguson-Smith AC (2000) Parental origin-specific developmental defects in mice with uniparental disomy for chromosome 12. Development 127: 4719–4728. 25. Takada S, Tevendale M, Baker J, Georgiades P, Campbell E, et al. (2000) Deltalike and gtl2 are reciprocally expressed, differentially methylated linked imprinted genes on mouse chromosome 12. Curr Biol 10: 1135–1138. 26. Lin SP, Youngson N, Takada S, Seitz H, Reik W, et al. (2003) Asymmetric regulation of imprinting on the maternal and paternal chromosomes at the Dlk1Gtl2 imprinted cluster on mouse chromosome 12. Nat Genet 35: 97–102. 27. Takahashi N, Okamoto A, Kobayashi R, Shirai M, Obata Y, et al. (2009) Deletion of Gtl2, imprinted non-coding RNA, with its differentially methylated region induces lethal parent-origin-dependent defects in mice. Hum Mol Genet 18: 1879–1888. 28. Lewis A, Mitsuya K, Umlauf D, Smith P, Dean W, et al. (2004) Imprinting on distal chromosome 7 in the placenta involves repressive histone methylation independent of DNA methylation. Nat Genet 36: 1291–1295. 29. Umlauf D, Goto Y, Cao R, Cerqueira F, Wagschal A, et al. (2004) Imprinting along the Kcnq1 domain on mouse chromosome 7 involves repressive histone methylation and recruitment of Polycomb group complexes. Nat Genet 36: 1296–1300. 30. Sekita Y, Wagatsuma H, Irie M, Kobayashi S, Kohda T, et al. (2006) Aberrant regulation of imprinted gene expression in Gtl2lacZ mice. Cytogenet. Genome Res 113: 223–229. 31. Steshina EY, Carr MS, Glick EA, Yevtodiyenko A, Appelbe OK, et al. (2006) Loss of imprinting at the Dlk1-Gtl2 locus caused by insertional mutagenesis in the Gtl2 59 region. BMC Genet 7: 44. 32. Charlier C, Segers K, Karim L, Shay T, Gyapay G, et al. (2001) The callipyge mutation enhances the expression of coregulated imprinted genes in cis without affecting their imprinting status. Nat Genet 27: 367–369. 33. Georges M, Charlier C, Cockett N (2003) The callipyge locus: evidence for the trans interaction of reciprocally imprinted genes. Trends Genet 19: 248–252. 34. Moon YS, Smas CM, Lee K, Villena JA, Kim KH, et al. (2002) Mice lacking paternally expressed Pref-1/Dlk1 display growth retardation and accelerated adiposity. Mol Cell Biol 22: 5585–5592. 35. Tsai CE, Lin SP, Ito M, Takagi N, Takada S, et al. (2002) Genomic imprinting contributes to thyroid hormone metabolism in the mouse embryo. Curr Biol 12: 1221–1226. 36. Sekita Y, Wagatsuma H, Nakamura K, Ono R, Kagami M, et al. (2008) Role of retrotransposon-derived imprinted gene, Rtl1, in the feto-maternal interface of mouse placenta. Nat Genet 40: 243–248. 37. Seitz H, Youngson N, Lin SP, Dalbert S, Paulsen M, et al. (2003) Imprinted microRNA genes transcribed antisense to a reciprocally imprinted retrotransposon-like gene. Nat Genet 34: 261–262. 38. Davis E, Caiment F, Tordoir X, Cavaille´ J, Ferguson-Smith A, et al. (2005) RNAi-mediated allelic trans-interaction at the imprinted Rtl1/Peg11 locus. Curr Biol 15: 743–749.

1. da Rocha ST, Edwards CA, Ito M, Ogata T, Ferguson-Smith AC (2008) Genomic imprinting at the mammalian Dlk1-Dio3 domain. Trends Genet 24: 306–316. 2. Kagami M, Sekita Y, Nishimura G, Irie M, Kato F, et al. (2008) Deletions and epimutations affecting the human 14q32.2 imprinted region in individuals with paternal and maternal upd(14)-like phenotypes. Nat Genet 40: 237–242. 3. Kagami M, Yamazawa K, Matsubara K, Matsuo N, Ogata T (2008) Placentomegaly in paternal uniparental disomy for human chromosome 14. Placenta 29: 760–761. 4. Kotzot D (2004) Maternal uniparental disomy 14 dissection of the phenotype with respect to rare autosomal recessively inherited traits, trisomy mosaicism, and genomic imprinting. Ann Genet 47: 251–260. 5. Temple IK, Shrubb V, Lever M, Bullman H, Mackay DJ (2007) Isolated imprinting mutation of the DLK1/GTL2 locus associated with a clinical presentation of maternal uniparental disomy of chromosome 14. J Med Genet 44: 637–640. 6. Buiting K, Kanber D, Martı´n-Subero JI, Lieb W, Terhal P, et al. (2008) Clinical features of maternal uniparental disomy 14 in patients with an epimutation and a deletion of the imprinted DLK1/GTL2 gene cluster. Hum Mutat 29: 1141–1146. 7. Hosoki K, Ogata T, Kagami M, Tanaka T, Saitoh S (2008) Epimutation (hypomethylation) affecting the chromosome 14q32.2 imprinted region in a girl with upd(14)mat-like phenotype. Eur J Hum Genet 16: 1019–1023. 8. Zechner U, Kohlschmidt N, Rittner G, Damatova N, Beyer V, et al. (2009) Epimutation at human chromosome 14q32.2 in a boy with a upd(14)mat-like clinical phenotype. Clin Genet 75: 251–258. 9. Li E, Beard C, Jaenisch R (1993) Role for DNA methylation in genomic imprinting. Nature 366: 362–365. 10. Rosa AL, Wu YQ, Kwabi-Addo B, Coveler KJ, Reid Sutton V, et al. (2005) Allele-specific methylation of a functional CTCF binding site upstream of MEG3 in the human imprinted domain of 14q32. Chromosome Res 13: 809–818. 11. Wylie AA, Murphy SK, Orton TC, Jirtle RL (2000) Novel imprinted DLK1/ GTL2 domain on human chromosome 14 contains motifs that mimic those implicated in IGF2/H19 regulation. Genome Res 10: 1711–1718. 12. Tierling S, Dalbert S, Schoppenhorst S, Tsai CE, Oliger S, et al. (2007) Highresolution map and imprinting analysis of the Gtl2-Dnchc1 domain on mouse chromosome 12. Genomics 87: 225–235. 13. Takada S, Paulsen M, Tevendale M, Tsai CE, Kelsey G, et al. (2002) Epigenetic analysis of the Dlk1-Gtl2 imprinted domain on mouse chromosome 12: implications for imprinting control from comparison with Igf2-H19. Hum Mol Genet 11: 77–86. 14. Ohlsson R, Renkawitz R, Lobanenkov V (2001) CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet 17: 520–527. 15. Hark AT, Schoenherr CJ, Katz DJ, Ingram RS, Levorse JM, et al. (2000) CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405: 486–489. 16. Kanduri C, Pant V, Loukinov D, Pugacheva E, Qi CF, et al. (2000) Functional association of CTCF with the insulator upstream of the H19 gene is parent of origin-specific and methylation-sensitive. Curr Biol 10: 853–856. 17. da Rocha ST, Tevendale M, Knowles E, Takada S, Watkins M, et al. (2007) Restricted co-expression of Dlk1 and the reciprocally imprinted non-coding RNA, Gtl2: implications for cis-acting control. Dev Biol 306: 810–823. 18. Wan LB, Pan H, Hannenhalli S, Cheng Y, Ma J, et al. (2008) Maternal depletion of CTCF reveals multiple functions during oocyte and preimplantation embryo development. Development 135: 2729–2738.

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000992

siRNA–Mediated Methylation of Arabidopsis Telomeres Jan Vrbsky1,2.¤a, Svetlana Akimcheva1., J. Matthew Watson1., Thomas L. Turner3, Lucia Daxinger1¤b, Boris Vyskot2, Werner Aufsatz1, Karel Riha1* 1 Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna, Austria, 2 Institute of Biophysics, Czech Academy of Sciences, Brno, Czech Republic, 3 Ecology, Evolution, and Marine Biology Department, University of California Santa Barbara, Santa Barbara, California, United States of America

Abstract Chromosome termini form a specialized type of heterochromatin that is important for chromosome stability. The recent discovery of telomeric RNA transcripts in yeast and vertebrates raised the question of whether RNA–based mechanisms are involved in the formation of telomeric heterochromatin. In this study, we performed detailed analysis of chromatin structure and RNA transcription at chromosome termini in Arabidopsis. Arabidopsis telomeres display features of intermediate heterochromatin that does not extensively spread to subtelomeric regions which encode transcriptionally active genes. We also found telomeric repeat–containing transcripts arising from telomeres and centromeric loci, a portion of which are processed into small interfering RNAs. These telomeric siRNAs contribute to the maintenance of telomeric chromatin through promoting methylation of asymmetric cytosines in telomeric (CCCTAAA)n repeats. The formation of telomeric siRNAs and methylation of telomeres relies on the RNA–dependent DNA methylation pathway. The loss of telomeric DNA methylation in rdr2 mutants is accompanied by only a modest effect on histone heterochromatic marks, indicating that maintenance of telomeric heterochromatin in Arabidopsis is reinforced by several independent mechanisms. In conclusion, this study provides evidence for an siRNA–directed mechanism of chromatin maintenance at telomeres in Arabidopsis. Citation: Vrbsky J, Akimcheva S, Watson JM, Turner TL, Daxinger L, et al. (2010) siRNA–Mediated Methylation of Arabidopsis Telomeres. PLoS Genet 6(6): e1000986. doi:10.1371/journal.pgen.1000986 Editor: Tetsuji Kakutani, National Institute of Genetics, Japan Received February 4, 2010; Accepted May 12, 2010; Published June 10, 2010 Copyright: ß 2010 Vrbsky et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by GEN-AU Austria (GZ200.140/1-VI/12006, http://www.gen-au.at/), The Czech Ministry of Education (LC06004, http://www. msmt.cz/), and an NSF International Research Fellowship (OISE-0700946, http://www.nsf.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: karel.riha@gmi.oeaw.ac.at . These authors contributed equally to this work. ¤a Current address: National Center for Biomolecular Research, Masaryk University, Brno, Czech Republic ¤b Current address: Queensland Institute of Medical Research, Brisbane, Australia

at subtelomeres and leads to increased telomeric recombination, without a concomitant change in histone modifications [7]. These data indicate a functional interaction between subtelomeric and telomeric chromatin. Heterochromatin was thought to be transcriptionally inactive, but this view has been challenged by discoveries of numerous noncoding (nc) transcripts derived from heterochromatic loci. Some of these transcripts directly contribute to the assembly of heterochromatin at defined chromosomal domains and their biogenesis is vital for processes such as X chromosome inactivation, genomic imprinting, transposon silencing and centromere function [8]. Thus, it is not surprising that although telomeres possess marks of repressive heterochromatin, they are not transcriptionally silent. Recent studies revealed the presence of telomeric repeatcontaining RNAs (TERRA) that are transcribed from subtelomeric regions in yeast and vertebrates [9–11]. TERRA are removed from telomeres either through Rat1p-dependent degradation in budding yeast or through non-sense mediated RNA decay (NMD) in human; deficiencies in these RNA processing pathways have dramatic effects on telomere maintenance [9,10]. Hypomethylation of subtelomeric regions in mammalian cells lacking DNA methyltransferases leads to the overproduction of TERRA [11,12]. This suggests that the epigenetic status of subtelomeres and telomeres influences TERRA expression.

Introduction Telomeres safeguard the stability of eukaryotic chromosomes by protecting natural chromosome ends from triggering DNA damage responses. Chromosome termini consist of telomeric and subtelomeric repeats that are bound by a specific set of telomere binding proteins as well as nucleosomes that exhibit features of pericentric heterochromatin [1]. These regions are usually devoid of functional genes, and transgenes integrated in the vicinity of telomeres are subjected to transcriptional silencing, a phenomenon known as telomere position effect [2]. Studies in mammals indicate that telomeric heterochromatin plays an important function in chromosome end protection and telomere length regulation. Inactivation of the SIRT6 histone deacetylase in human cells causes hyperacetylation of telomeric histone H3, telomere dysfunction and premature cell senescence [3]. Deficiency in histone methyltransferases or the retinoblastoma tumor suppressor leads to disruption of telomeric heterochromatin and aberrant telomere elongation in mouse cells [4–6]. Another important hallmark of heterochromatin in mammals is DNA methylation. Although vertebrate telomeric DNA does not appear to be methylated due to the lack of canonical CG sites, subtelomeric repeats are heavily methylated [7]. Interestingly, inactivation of DNA methyltransferases in mouse cells decreases 5-methylcytosine PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

Author Summary Telomeres are proteinâ&#x20AC;&#x201C;DNA structures that protect the ends of eukaryotic chromosomes. A failure in this protective structure can lead to chromosomal instabilities and contribute to cancer and aging. The protective nature of telomeres relies on complex interactions between repetitive telomeric DNA and associated proteins. One major question is how telomeric proteins, including telomere-associated nucleosomes, are modified in order to achieve this protection. In this study, we have discovered that Arabidopsis telomeric nucleosomes contain a unique mixture of both active and inactive chromatin marks. Additionally, the telomeric DNA itself is modified by methylation of cytosines within the telomeric repeat. Regulation of DNA methylation is achieved by telomeric repeatâ&#x20AC;&#x201C;containing small RNAs, which are derived from the processing of telomeric transcripts by the RNAâ&#x20AC;&#x201C; dependent DNA methylation pathway. From these data, we infer that the formation of a proper telomere structure is partly regulated by non-coding telomeric RNAs.

The discovery of TERRA raised the question of whether ncRNAs contribute to the establishment of telomeric heterochromatin. This hypothesis gained support in a recent study in which downregulation of TERRA by exogenous short interfering RNAs (siRNAs) in human cell lines led to depletion of histone heterochromatic modification from telomeres [13]. In many organisms, RNA-mediated chromatin silencing relies on small RNA molecules that guide effector complexes to target sites [8,14]. However, involvement of small RNAs in chromatin formation at canonical telomeres has not been shown yet. In this study, we investigate chromatin organization and transcription at chromosome ends in the model plant Arabidopsis thaliana. We detect the presence of transcripts containing telomeric repeats and show that some of these transcripts are processed into ,24 nt siRNAs. These transcripts are produced from telomeres as well as from intrachromosomal telomeric loci that are mainly located at centromeres. The 24 nt siRNAs are generated through the RNA-dependent DNA methylation (RdDM) pathway, which is a plant-specific mechanism that utilizes siRNAs to guide DNA methyltransferases to asymmetric cytosines (CNN) [15,16]. We demonstrate that RdDM is responsible for methylation of telomeric DNA that contains cytosines exclusively in asymmetric sequence contexts and hence for reinforcement of heterochromatic marks at telomeres.

Figure 1. Expression of Arabidopsis chromosome-terminal genes. (A) A diagram of gene arrangement at the ends of five Arabidopsis chromosomes. Arrows illustrate the relative size and direction of transcripts of annotated terminal genes. The distance of the predicted ATG codon from the telomere is indicated. (B) Expression of the terminal genes in different tissues of wild-type plants assayed by RT-PCR. The size of the PCR products is indicated in parenthesis. doi:10.1371/journal.pgen.1000986.g001

expression patterns and the size of the RT-PCR products corresponded to the predicted size of the spliced mRNAs. There was no obvious correlation between the level of expression and promoter distance from telomeres, and even the At2g48160 gene, with a promoter immediately adjacent to telomeric DNA, was robustly expressed. These data indicate that, in contrast to yeast and mammals, Arabidopsis telomeres do not silence genes located in their vicinity. The high transcriptional activity near telomeres raised questions about the chromatin structure of chromosome termini in Arabidopsis. We investigated the distribution of histone modification marks typical for plant euchromatin (tri-methylation of histone H3 at Lys4, H3K4me3) and heterochromatin (di-methylation of H3 at Lys9, H3K9me2; and mono-methylation of H3 at Lys27, H3K27me1) at telomere-associated regions by chromatin immunoprecipitation (ChIP). The ,600 bp region immediately adjacent to the telomere on the right arm of chromosome 2 (2R) represents the promoter of the At2g48160 gene (Figure 2A) and carries typical euchromatic histone marks (Figure 2B). The H3K4me3 euchromatin mark was also dominant at the promoter of the At1g01010 gene that is located ,3.5 kb from the telomere on the left arm of chromosome 1 (region 1L-3, Figure 2A and 2B), although we could detect a weak H3K27me1 signal that is usually typical of heterochromatin. Histone heterochromatic marks (H3K9me2 and H3K27me1) became more pronounced at the 1L-2 and 1L-1 regions that are located on the same arm ,1.5 kb and 1 kb from the telomere, respectively (Figure 2A and 2B). The 1L telomere contains a recent 104 bp insertion of mitochondrial

Results Chromatin organization at Arabidopsis chromosome termini Gene organization at chromosome ends in Arabidopsis appears to be unique. In contrast to the majority of organisms with known telomere/subtelomere sequences, 8 of the 10 Arabidopsis subtelomeres have no repetitive DNA, and predicted genes are annotated in the immediate vicinity of telomeres [17] (Figure 1A). We experimentally confirmed that sequences annotated as chromosome ends are indeed associated with telomeres for 7 chromosome arms with the exception of the right arm of chromosome 3 [18]. The two remaining chromosome termini contain clusters of ribosomal RNA genes (NORs) [19]. We performed reverse transcription (RT) PCR analysis to verify that all the predicted terminal genes are expressed and that they do not represent pseudogenes (Figure 1B). The genes showed distinct tissue-specific PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

Figure 2. Chromatin structure of Arabidopsis chromosome termini. (A) Schematic diagram of the 1L and 2R chromosome ends. Black boxes represent telomeric DNA. The red and blue bars span regions analyzed by bisulfite sequencing and ChIP PCR, respectively. The green bar indicates the region analyzed by ChIP qPCR. (B) ChIP PCR data showing the distribution of methylated histones at unique regions immediately adjacent to telomeres at the indicated chromosome arms. A euchromatic fragment of the MEE51 gene and a heterochromatic gypsy-like retrotransposon (At4g03770) were amplified from the ChIP fractions as a control. (C) Immunoprecipitated DNA analyzed by sequential dot-blot hybridization with a telomeric probe and the centromeric CEN180 probe. doi:10.1371/journal.pgen.1000986.g002

DNA embedded within the centromere-proximal region of telomeric repeats [20] (Figure 2A). Using this insertion to design primers that span the centromere-proximal part of the 1L telomere (1L-0, Figure 2A), we were able to demonstrate that this region also displays heterochromatin marks (Figure 2B). Nevertheless, the 1L-0 region still possessed clearly detectable H3K4me3, which is atypical of classical heterochromatin where the H3K4me3 modification is strongly reduced in comparison to H3K27me1 and H3K9me2. A similar histone-modification pattern was also observed in telomere-adjacent regions of five other chromosome arms (Figure 2B). To further examine chromatin at telomeres, we analyzed ChIP fractions by dot-blot hybridization with a telomeric probe (Figure 2C). The Arabidopsis genome is enriched for intrachromosomal degenerated telomeric repeats that are mainly localized at centromeres (Figure S1). To specifically assay for chromatin at telomeres, we used stringent hybridization conditions at which the centromere-derived signal is eliminated to less than 2% of the total telomeric signal (Figure S1). We readily detected H3K27me1 and H3K9me2 modifications, and a weaker but still clearly detectable H3K4me3 signal. This hybridization pattern was reminiscent of the results obtained by ChIP analysis of telomere-adjacent regions by PCR (Figure 2B). Thus, our ChIP data show that Arabidopsis telomeres form chromatin that is enriched for H3K9me2 and H3K27me1 heterochromatic marks, but still retains the euchromatic H3K4me3 modification. We found that the heterochromatin marks extend ,1.5 kb into the subtelomeric region of 1L. A survey of a high-resolution genome-wide map of H3K9me2 distribution indicates that H3K9me2 also spreads up to 1.5 kb from telomeres at chromosome arms 1R, 3L, 4R and 5L [21] (http://epigenomics.mcdb.ucla.edu/H3K9m2/). However, detecting the prominent H3K4me3 signal side by side with the heterochromatic marks (Figure 2B and 2C) strongly indicates that PLoS Genetics | www.plosgenetics.org

Arabidopsis telomeres exhibit features of intermediate heterochromatin that is characterized by retention of opposing histone H3 methylation marks [22].

Identification of telomeric DNAâ&#x20AC;&#x201C;containing transcripts and siRNAs We next asked whether Arabidopsis telomeres are transcribed by assaying for the presence of TERRA by Northern hybridization with a CCCTAAA probe. We readily detected two types of TERRA: heterogeneous transcripts which ranged from high molecular weight strands that migrated at the limits of gel resolution to hundreds of nucleotides, and several distinct bands (Figure 3A). We also detected antisense telomeric transcripts (ARRET) that gave a similar hybridization pattern as the TERRA by the complementary TTTAGGG probe (Figure 3A). These signals disappeared after pretreatment of the samples with RNaseA (Figure 3B and data not shown) demonstrating that they do not represent remnants of DNA in RNA preparations. Expression of TERRA varied between RNA samples extracted from different tissues of Arabidopsis (Figure 3C). Interestingly, remarkable variation in expression was also detected between different Arabidopsis accessions, as the levels of TERRA in seedlings of Zur and Ws ecotypes were almost two orders of magnitude higher than in Col and Ler (Figure 3C). Arabidopsis TERRA and ARRET can originate at telomeres or arise from transcription of degenerated intrachromosomal telomeric sequences localized at centromeric regions (Figure S1). The bulk of centromeric DNA consists of 177â&#x20AC;&#x201C;179 bp satellite repeats (CEN180), a subset of which is transcribed [23]. Sequential hybridization of a Northern blot with probes detecting TERRA and CEN180 resulted in an almost identical hybridization pattern, characterized by five distinct bands (Figure 3A). Hybridization of the blots with probes detecting sequences immediately adjacent to telomeres did not 3

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

Figure 3. Identification of TERRA and ARRET transcripts in Arabidopsis. (A) Northern blot analysis of wild-type RNA that was hybridized with a strand-specific TTTAGGG probe, stripped after exposure further sequentially rehybridized with CCCTAAA and the centromeric CEN180 probe. The gel stained with ethidium bromide (EtBr) is shown as a loading control. (B) Sensitivity of ARRET transcripts to RNaseA. (C) Northern blot detection of TERRA in different tissues of wild-type Col plants and in seedlings of the Ws, Zur and Ler ecotypes. The left and right parts of the membrane were exposed for 1 day and 2 h, respectively. (D) Detection of telomere-derived TERRA and ARRET transcripts by RT-PCR. The diagram outlines the strategy used for strand-specific RT-PCR at a hypothetical chromosome end (the telomere is indicated as a black box). The size of the expected PCR product for each telomere is indicated. As chromosome ends 1R and 4R contain a stretch of sequence homology, one set of primers was used to assay for the expression at both telomeres in one reaction. The resulting chromosome-end-specific products can be distinguished by their size. It is currently unknown whether the subtelomere sequence at the NOR-bearing chromosome end represents the 2L or 4L telomere. ARRET transcripts at this arm were not analyzed because they correspond to the nascent 45S rRNA. doi:10.1371/journal.pgen.1000986.g003

produce any detectable signal (data not shown). These results suggest that TERRA and ARRET transcripts detected by Northern analysis mainly arise from centromeric regions that contain remnants of telomeric DNA and not from the transcription of telomeres. To examine whether telomeres are transcribed at levels nondetectable by Northern hybridization, we analyzed expression of subtelomeric regions adjacent to telomeric DNA by strand-specific RT-PCR in flowers. We could distinguish expression of TERRA and ARRET by using either telomeric or subtelomeric armspecific primers for reverse transcription (Figure 3D). We detected expression of both TERRA and ARRET at four out of eight analyzed chromosome arms. We failed to detect any transcription at chromosome arms 1R and 5R. Interestingly, only the TERRA but not ARRET transcripts were detected at 1L. The RT-PCR data demonstrate that at least five Arabidopsis telomeres are indeed transcribed, albeit at a low level. To gain further insights into telomere transcription, we cloned a ,500 nt promoter of the At2g48160 gene, which is located next to the telomere (Figure 1), in front of a reporter b-glucuronidase (GUS) gene in both sense and antisense orientations. We could detect GUS transcripts in transgenic plants carrying both constructs, although the expression PLoS Genetics | www.plosgenetics.org

in the antisense direction was much weaker than in the sense orientation (Figure S2). This experiment further supports the idea that telomere adjacent regions can drive transcription into a telomere. The presence of centromeric and telomeric TERRA and ARRET indicated that telomeric transcripts are able to form partially double stranded (ds) intermediates that could be processed by a Dicer into siRNA. In support of this hypothesis, siRNAs corresponding to both strands of telomeric DNA were detected in wild-type plants (Figure 4A). We estimate the size of the telomeric C-rich strand siRNAs (C-siRNA) to be 24–25 nt, and the size of G-siRNAs to be 23–24 nt (Figure S3). The formation of 24 nt siRNAs in Arabidopsis is mediated by RNAprocessing enzymes of the RdDM pathway [24]. This pathway is specific to plants and mediates methylation of cytosine residues in an asymmetric sequence context (CNN). The absence of telomeric 23–25 siRNAs in plants lacking RNA-dependent RNA polymerase 2 (RDR2), Dicer-like 3 (DCL3) or subunits of RNA Polymerase IV (NRPD1 or NRPD2) and their reduction in two other RdDM mutants (drd1 and nrpe1) further demonstrated that telomeric siRNAs belong to the category of 24 nt heterochromatic siRNAs (Figure 4A). These siRNAs are usually derived from heterochro4

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

and 5L (Figure 5, Table S2). Since these regions are formed by unique sequences, the origin of the siRNAs can be unambiguously traced to these loci. Interestingly, AGO4-associated siRNAs were particularly enriched at the chromosome ends that also exhibited expression of TERRA and ARRET (1L, 3L, 4R, 5L; Figure 5). These data strongly argue that telomeric TERRA and/or ARRET are processed into siRNAs.

The RdDM pathway mediates methylation of telomeric DNA Plants can methylate cytosines in all sequence contexts, and DNA methylation at asymmetric positions relies largely on 24 nt siRNAs and on the RdDM pathway. The presence of telomeric siRNAs prompted us to ask whether telomeric DNA, which contains cytosines exclusively in the CNN context, can be methylated. We took advantage of the unique insertion in the 1L telomere that allowed us to design primers spanning 13 CCCTAAA repeats located in the centromere-proximal part of the 1L telomere (region 1L-0’; Figure 2A). Bisulfite sequencing of the 1L-0’ region in wild-type plants revealed that over 40% of cytosines in these telomeric repeats are methylated (Figure 6). In contrast, the 1L and 2R subtelomeric regions are devoid of DNA methylation (Figure S4). The telomeric methylation in 1L-0’ is non-randomly distributed, with preferential enrichment at the third cytosine in the CCCTAAA sequence (Figure 6A and 6B). A similar observation was recently made through whole genome bisulfite sequencing that also revealed methylation of telomeric repeats, albeit at a lower total frequency than reported here [30]. The level of 5-methylcytosine in all sequence contexts was dramatically reduced in rdr2 mutants, arguing that methylation of the 1L-0’ region primarily depends on the RdDM mechanism (Figure 6A and 6C). We next examined whether cytosine methylation and its dependence on the RdDM pathway is a general feature of telomeric DNA. We sequentially hybridized bisulfite-treated total genomic DNA to oligonucleotide probes that first detected fully converted telomeric DNA (probe AAAATTT), then unconverted, and thus completely methylated DNA (probe TTTAGGG), and finally the complementary cytosine-free strand (probe CCCTAAA) as a control for loading (Figure 6D). A strong hybridization AAATTTT signal suggested that the bulk of telomeric DNA is only weakly methylated. Nevertheless, a portion of wild-type DNA was resistant to bisulfite conversion as hybridization with the TTTAGGG oligo probe showed a signal that was ,4-fold higher than a background signal from a corresponding amount of nonmethylated bisulfite-converted telomeric DNA cloned in a plasmid (Figure 6D and 6E). These data further indicate the presence of some heavily methylated CCCTAAA sequences in wild-type plants. Importantly, this CCCTAAA signal was reduced to a background level in rdr2 and nrpd2a mutants (Figure 6D and 6E). To further investigate whether methylation occurs at telomeres, we performed high-stringency hybridization of the bisulfiteconverted samples with a long telomeric TTTAGGG probe (Figure 6C). Under these conditions, converted plasmid-cloned telomeric DNA produces a high background hybridization signal that is likely caused by sufficiently stable interactions between longer fragments of the (TTTTAAA)n converted telomeric DNA and the (TTTAGGG)n probe. Nevertheless, wild-type DNA samples still produced a signal that was significantly higher than the background hybridization (Figure 6F). These data, together with the bisulfite sequencing of the 1L-0’ telomeric region, strongly argue that DNA methylation is a general characteristic of Arabidopsis telomeres and that its maintenance requires the RdDM pathway.

Figure 4. Detection of telomeric siRNAs. (A) Northern analysis of small RNAs from wild-type and the indicated RdDM mutants. The membrane was hybridized with a CCCTAAA probe, stripped and rehybridized with the TTTAGGG probe. Electronically merged autoradiograms show faster migration of the C-siRNA that is due to a sequence bias towards pyrimidines. The loading control represents a large RNA that hybridizes to the TTTAGGG probe. (B) Distribution of siRNAs containing at least 12 nt of telomeric sequence in different Argonaute complexes. (C) Distribution of AGO4-associated C- and GsiRNAs according to the extent of homology to telomeric sequence. The total number of siRNAs is indicated on the y-axis. doi:10.1371/journal.pgen.1000986.g004

matic loci and form the most abundant fraction of plant small RNAs [25,26]. They typically associate with Argonaute 4 (AGO4) that is part of the effector complex that, together with Polymerase V, mediates CNN methylation [27,28]. To determine whether telomeric siRNAs associate with AGO4, we surveyed published datasets containing ,600,000 Argonaute (AGO1, AGO2, AGO4 and AGO5)-bound small RNAs [29]. We identified a total of 133 small RNAs containing at least 12 nucleotides with a perfect telomeric repeat (Table S1). As expected, the majority of these small RNAs were associated with AGO4 (Figure 4B). Surprisingly, the AGO4-associated telomeric siRNAs were almost exclusively G-siRNAs and only a few C-siRNAs containing no more than 14 nt of the CCCTAAA repeat sequence were found in the dataset (Figure 4C). Since the levels of total G- and C-siRNAs are similar (Figure 4A), this bias may be caused by a selective incorporation of the G-siRNAs into the AGO4 complex. As TERRA transcripts are produced from telomeres as well as from centromere-located telomeric DNA, the siRNAs may be of either telomeric or centromeric origin. To determine whether telomere-derived transcripts are processed into siRNAs, we aligned Argonaute-associated siRNAs with telomere-adjacent sequences. We found abundant siRNAs corresponding to both strands of subtelomeric DNA at chromosome arms 1L, 1R, 3L, 4R PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

Figure 5. Distribution of Argonaute-associated siRNAs in telomere-adjacent regions. The diagrams represent subtelomeric regions at the indicated chromosome arms. The rulers indicate the distance from a continuous array of telomeric repeats, open boxes mark positions of telomeric repeats intermingled within subtelomeric sequences. Only chromosome arms containing siRNAs detected within 5 kb from telomeres are shown. The position and orientation of AGO-associated siRNAs is indicated by colored bars. Only siRNAs that aligned to unique locations are included (Table S2). The TERRA and ARRET transcripts detected by strand-specific RT-PCR are depicted by a red arrow. doi:10.1371/journal.pgen.1000986.g005

Loss of DNA methylation is often accompanied by chromatin remodeling. However, the decrease in telomeric DNA methylation did not result in a significant loss of heterochromatic histone marks, and both H3K9me2 and H3K27me1 remained enriched at the bulk of telomeric DNA in rdr2 mutants (Figure 7A). However, analysis of histone modifications at the 1L-0’ locus by ChIP and quantitative PCR (Figure 7B and 7C) showed a decrease in H3K9me2 and H3K27me1 (Figure 7C) in rdr2 mutants. These data indicate that although the RdDM-dependent mechanism is not solely responsible for heterochromatin formation at telomeres, it contributes to its maintenance by mediating methylation of telomeric DNA, thereby reinforcing heterochromatic histone modifications. Disruption of telomeric heterochromatin or demethylation of subtelomeric sequences leads to increased telomere elongation and recombination in mouse [7]. Our analysis of telomere length and

intrachromatid recombination at chromosome ends did not reveal any differences between RdDM mutants and wild-type plants (Figure S5 and Figure S6). This observation further corroborates our finding that despite reduced DNA methylation, the bulk of telomeric chromatin in rdr2 mutants still retains heterochromatic features.

Discussion Heterochromatin is a universal characteristic of chromosome termini in a variety of organisms, including yeast, flies and mammals. Subtelomeric regions in these organisms are gene-poor and enriched for middle to highly repetitive sequences that contribute to the formation of a heritably repressed chromatin structure at chromosome termini that shares similarities with pericentromeric heterochromatin [1,31,32]. Nevertheless, some

Figure 6. Methylation of telomeric DNA in wild-type and RdDM-deficient plants. (A) The chart shows the frequency and distribution of cytosine methylation in the 1L-0’ region. In total, 30 clones from five independently treated wild-type samples and 19 clones from three independent rdr2 samples were analyzed. Asterisks indicate the third cytosine in the CCCTAAA sequence. (B) The proportion of methylated cytosines in the 1L-0’ region of wild-type plants depending on the position within the telomeric repeat as determined by bisulfite sequencing. (C) Frequency of cytosine methylation in the whole 1L-0’ region according to sequence context in wild-type and rdr2 plants. (D) Cytosine methylation in bulk telomeric DNA assayed by dot-blot hybridization. Bisulfite-treated (BS) genomic DNA was spotted onto a membrane (,7, 33 and 200 ng from each sample) and sequentially hybridized with AAAATTT, TTTAGGG and CCCTAAA probes. Untreated wild-type genomic DNA, and untreated and bisulfite-treated (BS) plasmids carrying ,750 nt of Arabidopsis telomeric DNA were used as controls. (E,F) Quantification of signals obtained with oligo (E) and long (F) TTTAGGG probes. The signal intensity of non-converted DNA (obtained with the TTTAGGG probe) was normalized to the amount of telomeric DNA determined from hybridization with the CCCTAAA probe. The signal from BS-treated plasmid served to determine background hybridization of the probes to fully converted non-methylated telomeric DNA. Error bars represent standard deviation (N = 3). doi:10.1371/journal.pgen.1000986.g006

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

cations are most pronounced immediately next to telomeres and that their presence gradually recedes with growing distance from telomeres. Data on whole-genome distribution of H3K9me2 indicate that this also holds true for telomere-associated regions of several other chromosome arms [21]. These data infer that repressive histone marks are primarily established at telomeres and spread only a limited distance within adjacent subtelomeric sequences. The existence of such relatively small clusters of repressive chromatin (2â&#x20AC;&#x201C;5 kb) next to otherwise large gene-rich regions suggests a functional importance for the heterochromatization of telomeres in Arabidopsis. It further suggests the existence of mechanisms that specifically maintain repressive histone modifications at telomeres. Assembly of heterochromatin at chromosome ends in budding yeast is partially dependent on tethering Sir proteins to telomeres via the Rap1 telomere-binding protein [31]. Human SIRT6 histone deacetylase preferentially associates with telomeres, although how it is recruited to chromosome termini is not known [3]. A recent study in mice overexpressing TRF2 indicates that, similar to the situation in yeast, heterochromatin formation at telomeres in mammals may also involve telomere-binding proteins [37]. The discovery of TERRA provides another attractive model that involves targeting the chromatin remodeling machinery to chromosome termini through ncRNA [38,39]. This suggestion was recently corroborated by the finding that downregulation of TERRA by RNAi in human cells causes a decrease in histone H3K9 methylation [13]. It was proposed that TERRA facilitates heterochromatin formation by stabilizing interactions between heterochromatin factors and telomeric DNA. In this study, we demonstrate expression of telomeric transcripts in Arabidopsis and describe a mechanism by which telomeric repeats-containing RNAs affect telomeric chromatin through siRNA. In contrast to the situation in mammals, where only UUAGGG telomeric transcripts were detected [10,11], both telomeric strands appear to be transcribed from some telomeres in Arabidopsis. This indicates that canonical telomeric DNA may, under certain circumstances, act as a promoter and initiate transcription. Two lines of observations further corroborate the link between transcription and telomeric DNA in Arabidopsis. Firstly, short stretches of a telomeric sequence were found in numerous Arabidopsis promoters and it has been shown that these interstitial telomere motifs are required for transcription [40]. Secondly, several transcription factors have been identified in Arabidopsis that specifically bind to telomeric DNA in electromobility shift assays (reviewed in [41]). Thus, it is possible that some of these transcription factors localize to telomeres and promote their expression. In addition to transcripts that originated at telomeres, we detected TERRA and ARRET that are apparently generated by transcription of centromere-associated telomeric loci. We cannot currently determine the exact identity of telomere- or centromerederived TERRA/ARRET that is processed by DCL3 and degraded to telomeric siRNAs. The requirement of RDR2 for siRNA formation indicates that the predicted dsRNA intermediate is not a simple annealing product of complementary TERRA and ARRET, but is dependent on additional RNA-dependent RNA synthesis. Thus, even relatively low level transcripts can yield significant amounts of siRNA. In fact, direct detection of precursor transcripts in the RdDM pathway has been so far reported only in a special transgene system [42]. In plants, heterochromatic siRNAs serve to guide DNA methylases to specific asymmetric CNN positions in a mechanism that relies on AGO4 [28]. Interestingly, AGO4 appears to retain telomeric G-siRNAs, and not the complementary C-siRNAs, although these data should still

Figure 7. Telomeric heterochromatin in RdDM mutants. (A) Representative ChIP data of wild-type and rdr2 plants. Chromatin was immunoprecipitated with antibodies against histone H3, H3K4me3, H3K9me2 and H3K27me1, blotted onto a membrane and hybridized with the TTTAGGG probe. The same membrane was stripped after exposure and rehybridized with the centromeric CEN180 probe. Data from three independent ChIP experiments were used for quantification. Signals were normalized to mock. (B) Analysis of ChIP fractions from wild-type and rdr2 plants by PCR with primers spanning the 1L-0 locus. (C) qPCR analysis of H3K4me3, H3K9me2 and H3K27me1 at the 1L-0 locus in wild-type and rdr2 plants. Each value represents an average of three qPCR measurements normalized to histone H3 occupancy for each ChIP fraction. The results of three independent pairwise ChIP experiments (1, 2 and 3) are presented. doi:10.1371/journal.pgen.1000986.g007

aspects of chromatin organization appear to be unique at telomeres as telomeric chromatin in humans and plants display unusually short nucleosomal spacing (,160 nt) in comparison with the ,180 nt periodicity at the bulk of chromatin [33â&#x20AC;&#x201C;35]. In contrast to many other organisms, telomeres in Arabidopsis are directly adjacent to transcriptionally active genes. This situation is more similar to silenced transposons inserted in gene-rich regions than to pericentromeric heterochromatin. This is also reflected in the organization of telomeric chromatin that exhibits features of intermediate heterochromatin that is characterized by the presence of both active and repressive histone H3 marks. Such chromatin was described to be associated with some Arabidopsis transposons and transgenic loci [22,36]. Chromatin analysis of the 1L subtelomere demonstrates that repressive histone H3 modifiPLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

described [49]. Centromeric transcripts were detected by hybridization with a [32P]-labeled CEN180 repeat unit amplified from Arabidopsis genomic DNA using primers CEN1 and CEN2 (Table S3). For RT-PCR analyses, ,2 mg of total RNA was reverse transcribed by using oligo dT for gene expression. The (CCCTAAA)3 oligo or subtelomere-specific primers (Table S3) were used for RT of TERRA and ARRET, respectively. The respective cDNAs were amplified by 25–35 cycles of PCR with specific primers (Table S3). Small RNAs were isolated from inflorescences using the mirVana miRNA isolation kit (Ambion), separated on 15% polyacrylamide gels and electroblotted onto a nylon membrane. Telomeric siRNAs were detected by hybridization with either (TTTAGGG)4 or (TAAACCC)4 oligo probes in ULTRAhyb-Oligo hybridization buffer (Ambion) at 42uC. The artificial 25 and 23 nt siRNAs were synthesized by in vitro transcription using T7 RNA polymerase (MBI). The T7-TOP oligonucleotide (10 mM) was annealed to a template oligonucleotide (10 mM) as indicated in Figure S3. In vitro transcription was carried out with 30U of T7 RNA polymerase (MBI) and the annealed oligos (0.5 mM) in 50 mL of 16 Transcription buffer (MBI) supplemented with NTPs (10 mM) and RiboLock RNase inhibitors (MBI) for 60 min at 37uC. 25 mL of the reaction was separated on a 15% polyacrylamide gel, electroblotted onto a nylon membrane and analyzed by Southern hybridization.

be verified by Northern analysis of AGO4 co-immunoprecipitated siRNAs. It is unknown whether the bias towards G-siRNAs is of biological significance, but it is interesting that the AGO4 complex appears to specifically retain siRNAs complementary to the telomeric strand to be methylated. Our data, showing methylation of bulk telomeric DNA as well as heavy methylation of the centromere-proximal region of the 1L telomere, together with data from whole genome bisulfite sequencing [30], argue that telomeric heterochromatin in Arabidopsis is not only defined by histone modifications, but also by DNA methylation. Although mammalian telomeres lack CG sites, and are, thus, believed to be unmethylated, at least two proteins linked to DNA methylation (SMCHD1, MBD3) have been found in purified fractions of human telomeric chromatin [43]. Additionally, the recent discovery of CNN and CNG methylation in human embryonic stem cells warrants the reexamination of DNA methylation at human telomeres [44]. We demonstrate that the maintenance of telomeric DNA methylation depends, to a large extent, on heterochromatic siRNA and the RdDM machinery. Intriguingly, loss of telomeric DNA methylation only has a slight effect on histone methylation at bulk telomeres, indicating that assembly of Arabidopsis telomeric heterochromatin relies on several reinforcing mechanisms that recruit histone methyltransferases such as SUVH4 to telomeres [45]. Loss of DNA methylation has a more profound effect on histone methylation at the centromere-proximal part of the 1L telomere. This indicates that RdDM may play a role in maintaining heterochromatin at the boundary between telomeres and adjacent euchromatic genes. The involvement of siRNA in modulation of telomeric heterochromatin may not be restricted to plants. Our data in Arabidopsis are reminiscent of the situation in fission yeast where heterochromatin in subtelomeric regions is established by two independent pathways, one of which relies on the telomere-binding protein Taz1, while the other involves RNAinduced transcriptional silencing (RITS) [46]. However, in contrast to the situation in Arabidopsis where siRNA targets canonical telomeric repeats, RITS in fission yeast is directed at centromere-like sequences that are located ,15 kb from telomeres. In humans, TERRA has been proposed to act as a scaffold, reinforcing interactions between telomere-binding proteins and heterochromatin factors such as ORC1 and HP 1 [13]. Nevertheless, human TERRA could also promote heterochromatin formation through an siRNA-mediated pathway. This notion is supported by the observation that enrichment of Argonaute-1 at human telomeres is correlated with increased H3K9 methylation and HP1 association [47], and by the discovery of telomerederived human siRNAs [48].

Analysis of DNA methylation Genomic DNA was extracted from 4 week old plants with the DNAeasy Plant Maxi Kit (Qiagen). Bisulfite modification was performed using the EpiTect Bisulphite Kit (Qiagen) according to the manufacturer’s instructions. The completeness of the conversion was tested by PCR amplification of a nonmethylated genomic region [50]. Modified DNA was used as a template for PCR amplification with the primers indicated in Table S3. The PCR products were cloned into the pCR2.1 TOPO cloning vector (Invitrogen) and sequenced using a BigDye terminator and an ABI310 sequencer (Applied Biosystems). The sequence of the clones was analyzed with the software CyMATE [50]. The efficiency of cytosine conversion in the 1L-0’ region in these samples was further controlled by either spiking genomic DNA with a bacterial plasmid containing a region that partially overlaps with 1L-0’ or by sequence analysis of other genomic loci that are devoid of 5-methylcytosines. For methylation analysis at bulk telomeric DNA, bisulfite-modified genomic DNA was transferred onto a nylon membrane by vacuum-blotting. As a control, a bisulfitemodified plasmid containing 750 bp of plant non-methylated telomeric DNA was blotted onto the membrane in an amount that roughly corresponded to the total amount of telomeric DNA present in genomic samples (1 ng of the plasmid contained telomeric DNA equivalent to ,260 ng of genomic DNA). The membrane was hybridized with the [32P] 59 end-labeled (TTTAAAA)4 oligo (AAAATTT probe) in a standard hybridization buffer [49] at 40uC. The membrane was washed twice for 10 min at 40uC in 26 SSC followed by a 40 min wash in 16 SSC at 40uC. The membrane was exposed to a Kodak Phosphor screen (Biorad) and scanned with Molecular Imager FX (Biorad). The membrane was then stripped and sequentially rehybridized with the TTTAGGG and CCCTAAA oligo probes at 55uC as described [49]. The final rehybridization was performed at 65uC with a strand-specific (TTTAGGG)n probe that was obtained by labeling of a 750 bp fragment of telomeric DNA with a-[32P]-GTP. The signals were quantified using QuantityOne software (Biorad).

Materials and Methods Plant material and growth conditions Arabidopsis mutants carrying the following alleles were used in this study: dcl3-1 (dcl3), rdr2-1 (rdr2), nrpd1a-4 (nrpd1), nrpd1b-1 (nrpe1), sgs2-1 (rdr6), drd1-1 (drd1) and nrpd2a-1 (nrpd2). Plants were grown in soil under long-day conditions (16 h light/8 h dark) at 22uC.

RNA analyses Total RNA was extracted using TriReagent solution (Sigma). For Northern blot analysis, 10 mg aliquots were separated on 1.2% formaldehyde agarose gels, blotted onto a nylon membrane and hybridized with [32P] 59 end-labeled (TTTAGGG)4 (TTTAGGG probe) or (TAAACCC)4 (CCCTAAA probe) oligonucleotides. Oligo hybridizations were carried out at 55uC as previously PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

sequences in close proximity. To look for the presence of such loci in the Arabidopsis genome, we performed PCR with combinations of primers that flank the centromeric CEN180 satellite repeat (CEN1, CEN2) as well as with primers that anneal either to the Crich or G-rich telomeric strands (TelC, TelG). The reaction with both centromeric primers resulted in a ladder whose periodicity corresponds to the size of the CEN180 repeat unit (180 bp; lane 1). Importantly, products were also amplified in reactions in which the CEN1 primer was combined with either of the telomeric primers (lanes 5 and 6), demonstrating that telomeric sequences are adjacent to the CEN180 repeat. Furthermore, strong amplification products were obtained in reactions containing a single telomeric primer (lanes 8 and 9), indicating the existence of sequences that contain telomeric repeats in inverted orientation. (C) Intrachromosomal telomeric sequences do not efficiently hybridize to a telomeric probe under the high stringency conditions. Genomic DNA was digested with TruI1 restriction endonuclease, blotted onto a membrane and hybridized to a TTTAGGG probe at high stringency conditions (65 uC). The membrane was stripped after exposure and rehybridized to an oligo TTTAGGG probe under low stringency conditions (55 uC). While under the low stringency of hybridization interstitial telomeric DNA (,0.5 kb) contributed to ,21% of the total telomeric signal, these sequences were barely detectable when the membrane was rehybridized at high stringency and more than 98% of the signal was derived from terminal restriction fragments ranging between 2–4 kb. [Uchida W, Matsunaga S, Sugiyama R, Shibata F, Kazama Y, et al. (2002) Distribution of interstitial telomere-like repeats and their adjacent sequences in a dioecious plant, Silene latifolia. Chromosoma 111: 313–320. Vannier JB, Depeiges A, White C, Gallego ME (2009) ERCC1/XPF protects short telomeres from homologous recombination in Arabidopsis thaliana. PLoS Genet 5: e1000380. Armstrong SJ, Caryl AP, Jones GH, Franklin FC (2002) Asy1, a protein required for meiotic chromosome synapsis, localizes to axis-associated chromatin in Arabidopsis and Brassica. J Cell Sci 115: 3645–3655.] Found at: doi:10.1371/journal.pgen.1000986.s001 (9.55 MB TIF)

Chromatin immunoprecipitation Chromatin isolation and immunoprecipitation were performed as described [51] using antibodies against histone H3 (Abcam; cat. no. ab1791), H3K9me2 (Abcam; cat. no. ab1220), H3K4me3 (Abcam; cat. no. ab8580) and H3K27me1 (provided by Thomas Jenuwein). The DNA was column-purified from immunoprecipitated chromatin and concentrated in 50 ml of elution buffer. For dot-blot analysis, 40 ml of the DNA was blotted onto a nylon membrane and analyzed by hybridization with a [32P]-labeled 750 bp (TTTAGGG)n probe. For PCR analysis, 1 ml of the eluted DNA was amplified by 30 cycles of PCR with the primers specified in Table S1. Quantitative PCR analysis of the 1L-0 region was performed using the iQ5 Real Time PCR detection system (Biorad) and a 26 SensiMix Plus SyBR Kit (PeqLab).

Analysis of Argonaute-associated siRNAs The sequences of Argonaute-associated siRNAs were retrieved from the NCBI (accession number GSE10036). The individual AGO datasets were searched for the presence of siRNAs containing a string of at least 12 nucleotides of Arabidopsis telomeric repeats of any possible permutation. Telomeric siRNAs were copied into an Excel table and manually annotated. Subtelomeric siRNAs were identified by attempting to align all Argonaute-associated siRNAs to an ,15 kb sequence from the ends of each Arabidopsis chromosome using the publicly available program SOAP [52]. The subtelomeric sequences were derived from the sequences of whole chromosomes available in TAIR, and from cloned fragments of telomere-associated sequences deposited in the Gene Bank (AB033278 and AM177017). Perfectly matching siRNA alignments were retained, and plotted using R.

Cytology Mitotic chromosomes prepared from pistils of wild-type plants were subjected to fluorescence in situ hybridization (FISH) with a Cy3-conjugated (CCCTAAA)2 PNA probe (Metabion) as previously described [53]. Chromosomes were examined using a Zeiss Axioscope fluorescence microscope equipped with a CCD camera.

Figure S2 Analysis of bidirectional transcription of the GUS reporter gene from the 2Rp promoter. (A) Schematic representation of constructs used for this experiment. A ,500 bp genomic fragment (indicated by arrow) localized between the telomere and At2g48160 was cloned in sense and antisense orientations in front of the GUS reporter gene containing a ,200 nt intron (the intron is indicated by thin line; empty boxes represent exons). Resulting constructs (2Rp:GUS and R2p:GUS, respectively) were randomly inserted in Arabidopsis genome using Agorbacterium mediated transformation. T2 transgenic plants were analyzed for GUS expression by histochemical GUS assay and RT-PCR. Primers used for RT-PCR are indicated by arrows. (B) In total, T2 seedlings of 12 independent transgenic lines carrying the 2Rp:GUS construct, and 11 lines with the R2p:GUS were analyzed by histochemical GUS assay for the presence of GUS enzymatic activity. Two representative lines for each construct are shown. While 11 out 12 of 2Rp:GUS lines produced blue staining, none of the R2p:GUS lines gave a positive GUS signal. This experiment shows robust activty of the 2R promoter and confirmed RT-PCR data on the expression of the At2g48160 gene (Figure 1). (C) To further analyze whether the 2R promoter can drive expression in the antisense orientation, the presence of spliced GUS transcripts was examined by RT-PCR. The expected 133 bp long product was readily detected in all analyzed R2p:GUS lines. Interestingly, a weak but specific product was also amplified in four out of six 2Rp:GUS lines. These data are consistent with the RT-PCR

Telomere analyses The PETRA assay was carried out with genomic DNA extracted from a fifth generation tert mutant plant [54] according to the published protocol [18]. Terminal restriction fragment analysis was performed as described [49,55]. Analysis of intrachromatid telomeric recombination was performed by the tcircle amplification assay [56]. DNA extracted from Arabidopsis ku70 [57] mutants was used as a positive control.

Supporting Information Localization of telomeric DNA in Arabidopsis centromeres. A survey of the Arabidopsis genome led to the identification of several regions carrying short stretches of telomeric sequence in the proximity of centromeres (Uchida et al., 2002; Vannier et al., 2009). Fluorescent in situ hybridization (FISH) on pachytene chromosomes further showed co-localization of CEN180 and telomeric signals at the centromere of chromosome 1 (Armstrong et al., 2002) (A) This centromere-localized telomeric DNA is also readily detectable by FISH on mitotic metaphase chromosomes. The picture shows a diploid metaphase figure with ten Arabidopsis chromosomes counterstained with DAPI (red). Green signals represent telomeric DNA; the chromosome pair carrying centromere-localized telomeric DNA is indicated by arrows. (B) The poor annotation of Arabidopsis centromeres precluded in silico identification of genomic loci carrying telomeric and CEN180

Figure S1

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

Figure S5 Telomere length analysis in RdDM-deficient mutants.

analysis of TERRA at 2R (Figure 3D) and demonstrate that 2Rp promoter can initiate transcription into telomere. Found at: doi:10.1371/journal.pgen.1000986.s002 (3.40 MB TIF)

Southern analysis of Tru9I-digested genomic DNA hybridized with a telomeric probe. Each line represents a DNA sample extracted from a single plant. The telomere length in all analyzed mutants falls in the range typical for wild-type plants (2–5 kb). Found at: doi:10.1371/journal.pgen.1000986.s005 (1.63 MB TIF)

Figure S3 Determination of the size of the telomeric siRNA. (A)

Synthetic telomeric DNA and RNA oligonucleotides were used as size markers. DNA oligonucleotides are written in black. The telomeric G-RNAs (23 nt, 25 nt) and 23 nt C-RNA (written in blue) were synthesized by in vitro transcription from dsDNA produced by annealing the DNA oligonucleotides as indicated. (B) To determine the difference in migration between complementary telomeric RNA oligonucleotides, in vitro transcribed telomeric RNA as well as the indicated synthetic DNA oligonucleotides were separated by PAGE, electro-blotted onto a nylon membrane and sequentially hybridized with the CCCTAAA and TTTAGGG probes (only the right part of the membrane is shown after TTTAGGG hybridization). This experiment shows that 23 and 25 nt G-RNAs migrate like 25 and 27 nt TelG DNA oligonucleotides, respectively, while the 23 nt C-RNA migrates like the 22 nt TelG DNA oligonucleotide. This experiment demonstrates that CRNA migrates faster than G-RNA of the corresponding size. (C) DNA oligonucleotides were used as size markers and separated together with plant siRNAs by PAGE, blotted onto a membrane and hybridized with the radioactively-labeled CCCTAAA probe. Migration of plant floral G-siRNAs (marked by asterisks) corresponds to the migration of 25–26 nt TelG DNA oligonucleotides. The signal was stripped after exposure and the membrane was rehybridized with the TTTAGGG probe for detection of the C-siRNAs (marked by asterisks). Because the signal from TelG DNA oligonucleotides was not completely stripped, we could use it as a marker to determine that plant C-siRNAs migrate like 24– 25 nt TelG DNA oligonucleotides. Taking into account the difference in the migration of telomeric DNA and RNA (Figure S3B), we calculate that the size of plant G-siRNAs is 23–24 nt, and the size of the C-siRNAs is 24–25 nt. Found at: doi:10.1371/journal.pgen.1000986.s003 (10.61 MB TIF)

Figure S6 Analysis of intrachromatid recombination in RdDMdeficient mutants. Intrachromatid telomere recombination leads to excision of telomeric extrachromosomal circular DNA molecules (t-circles). We used the highly sensitive t-circle amplification assay to analyze the level of t-circles in RdDM mutants [56]. Genomic DNA was digested with the AluI restriction enzyme and digestionresistant t-circles were used as templates for primer extension via rolling circle amplification by the highly processive Phi29 polymerase. The high molecular weight products of rolling circle replication (indicated by an arrow) were separated from the bulk of digested genomic DNA by alkaline electrophoresis and were detected by Southern hybridization with a telomeric probe. Whereas a strong t-circle signal was obtained in ku70 mutants that exhibit increased telomeric recombination [56], no t-circles, indicating an elevated level of recombination, were detected in the RdDM-deficient plants. Found at: doi:10.1371/journal.pgen.1000986.s006 (1.30 MB TIF) Table S1 Argonaute-associated telomeric siRNAs. The region of homology to the Arabidopsis telomeric sequence is indicated in red. Found at: doi:10.1371/journal.pgen.1000986.s007 (0.03 MB XLS) Table S2 Argonaute-associated siRNAs that uniquely align to subtelomeric regions. Found at: doi:10.1371/journal.pgen.1000986.s008 (0.05 MB XLS) Table S3 Primers used in this study. Found at: doi:10.1371/journal.pgen.1000986.s009 (0.09 MB DOC)

Acknowledgments

Figure S4 Cytosine methylation in the subtelomeric regions 2R’, 1L-2’, and 1L-3’. Wild-type bisulfite-treated genomic DNA was used as a template for PCR with primers spanning subtelomeric regions 2R’, 1L-2’, and 1L-3’ (Figure 2B). The diagrams representing distribution of 5-methylcytosines in individual clones were generated using CyMATE software for analysis of sequencing data of bisulfiteconverted samples [50]. The data show almost a complete lack of DNA methylation in these regions. Found at: doi:10.1371/journal.pgen.1000986.s004 (9.05 MB TIF)

We thank Thomas Jenuwein for providing the H3K27me1 antibody and Maria Siomos and Marjori Matzke for helpful comments on the manuscript.

Author Contributions Conceived and designed the experiments: JV SA JMW BV WA KR. Performed the experiments: JV SA JMW WA. Analyzed the data: JV SA JMW TLT WA KR. Contributed reagents/materials/analysis tools: LD. Wrote the paper: JMW KR.

References 8. Zaratiegui M, Irvine DV, Martienssen RA (2007) Noncoding RNAs and gene silencing. Cell 128: 763–776. 9. Luke B, Panza A, Redon S, Iglesias N, Li Z, et al. (2008) The Rat1p 59 to 39 exonuclease degrades telomeric repeat-containing RNA and promotes telomere elongation in Saccharomyces cerevisiae. Mol Cell 32: 465– 477. 10. Azzalin CM, Reichenbach P, Khoriauli L, Giulotto E, Lingner J (2007) Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science 318: 798–801. 11. Schoeftner S, Blasco MA (2008) Developmentally regulated transcription of mammalian telomeres by DNA-dependent RNA polymerase II. Nat Cell Biol 10: 228–236. 12. Yehezkel S, Segev Y, Viegas-Pequignot E, Skorecki K, Selig S (2008) Hypomethylation of subtelomeric regions in ICF syndrome is associated with abnormally short telomeres and enhanced transcription from telomeric regions. Hum Mol Genet 17: 2776–2789. 13. Deng Z, Norseen J, Wiedmer A, Riethman H, Lieberman PM (2009) TERRA RNA binding to TRF2 facilitates heterochromatin formation and ORC recruitment at telomeres. Mol Cell 35: 403–413.

1. Blasco MA (2007) The epigenetic regulation of mammalian telomeres. Nat Rev Genet 8: 299–309. 2. Ottaviani A, Gilson E, Magdinier F (2008) Telomeric position effect: from the yeast paradigm to human pathologies? Biochimie 90: 93–107. 3. Michishita E, McCord RA, Berber E, Kioi M, Padilla-Nash H, et al. (2008) SIRT6 is a histone H3 lysine 9 deacetylase that modulates telomeric chromatin. Nature 452: 492–496. 4. Garcia-Cao M, O’Sullivan R, Peters AH, Jenuwein T, Blasco MA (2004) Epigenetic regulation of telomere length in mammalian cells by the Suv39h1 and Suv39h2 histone methyltransferases. Nat Genet 36: 94–99. 5. Gonzalo S, Garcia-Cao M, Fraga MF, Schotta G, Peters AH, et al. (2005) Role of the RB1 family in stabilizing histone methylation at constitutive heterochromatin. Nat Cell Biol 7: 420–428. 6. Jones B, Su H, Bhat A, Lei H, Bajko J, et al. (2008) The histone H3K79 methyltransferase Dot1L is essential for mammalian development and heterochromatin structure. PLoS Genet 4: e1000190. doi:10.1371/journal.pgen.1000190. 7. Gonzalo S, Jaco I, Fraga MF, Chen T, Li E, et al. (2006) DNA methyltransferases control telomere length and telomere recombination in mammalian cells. Nat Cell Biol 8: 416–424.

PLoS Genetics | www.plosgenetics.org

June 2010 | Volume 6 | Issue 6 | e1000986

Modulation of Telomeric Chromatin by siRNA

14. Moazed D (2009) Small RNAs in transcriptional gene silencing and genome defence. Nature 457: 413–420. 15. Pikaard CS, Haag JR, Ream T, Wierzbicki AT (2008) Roles of RNA polymerase IV in gene silencing. Trends Plant Sci 13: 390–397. 16. Matzke M, Kanno T, Daxinger L, Huettel B, Matzke AJ (2009) RNA-mediated chromatin-based silencing in plants. Curr Opin Cell Biol. 17. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815. 18. Heacock M, Spangler E, Riha K, Puizina J, Shippen DE (2004) Molecular analysis of telomere fusions in Arabidopsis: multiple pathways for chromosome end-joining. Embo J 23: 2304–2313. 19. Copenhaver GP, Pikaard CS (1996) RFLP and physical mapping with an rDNA-specific endonuclease reveals that nucleolus organizer regions of Arabidopsis thaliana adjoin the telomeres on chromosomes 2 and 4. Plant J 9: 259–272. 20. Kuo HF, Olsen KM, Richards EJ (2006) Natural variation in a subtelomeric region of Arabidopsis: implications for the genomic dynamics of a chromosome end. Genetics 173: 401–417. 21. Bernatavichute YV, Zhang X, Cokus S, Pellegrini M, Jacobsen SE (2008) Genome-wide association of histone H3 lysine nine methylation with CHG DNA methylation in Arabidopsis thaliana. PLoS ONE 3: e3156. doi:10.1371/ journal.pone.0003156. 22. Habu Y, Mathieu O, Tariq M, Probst AV, Smathajitt C, et al. (2006) Epigenetic regulation of transcription in intermediate heterochromatin. EMBO Rep 7: 1279–1284. 23. May BP, Lippman ZB, Fang Y, Spector DL, Martienssen RA (2005) Differential regulation of strand-specific transcripts from Arabidopsis centromeric satellite repeats. PLoS Genet 1: e79. doi:10.1371/journal.pgen.0010079. 24. Huettel B, Kanno T, Daxinger L, Bucher E, van der Winden J, et al. (2007) RNA-directed DNA methylation mediated by DRD1 and Pol IVb: a versatile pathway for transcriptional gene silencing in plants. Biochim Biophys Acta 1769: 358–374. 25. Mosher RA, Schwach F, Studholme D, Baulcombe DC (2008) PolIVb influences RNA-directed DNA methylation independently of its role in siRNA biogenesis. Proc Natl Acad Sci U S A 105: 3145–3150. 26. Zhang X, Henderson IR, Lu C, Green PJ, Jacobsen SE (2007) Role of RNA polymerase IV in plant small RNA metabolism. Proc Natl Acad Sci U S A 104: 4536–4541. 27. He XJ, Hsu YF, Zhu S, Wierzbicki AT, Pontes O, et al. (2009) An effector of RNA-directed DNA methylation in arabidopsis is an ARGONAUTE 4- and RNA-binding protein. Cell 137: 498–508. 28. Wierzbicki AT, Ream TS, Haag JR, Pikaard CS (2009) RNA polymerase V transcription guides ARGONAUTE4 to chromatin. Nat Genet 41: 630–634. 29. Mi S, Cai T, Hu Y, Chen Y, Hodges E, et al. (2008) Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 59 terminal nucleotide. Cell 133: 116–127. 30. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, et al. (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452: 215–219. 31. Perrod S, Gasser SM (2003) Long-range silencing and position effects at telomeres and centromeres: parallels and differences. Cell Mol Life Sci 60: 2303–2318. 32. Pryde FE, Gorham HC, Louis EJ (1997) Chromosome ends: all the same under their caps. Curr Opin Genet Dev 7: 822–828. 33. Tommerup H, Dousmanis A, de Lange T (1994) Unusual chromatin in human telomeres. Mol Cell Biol 14: 5777–5785. 34. Fajkus J, Kovarik A, Kralovics R, Bezdek M (1995) Organization of telomeric and subtelomeric chromatin in the higher plant Nicotiana tabacum. Mol Gen Genet 247: 633–638. 35. Sykorova E, Fajkus J, Ito M, Fukui K (2001) Transition between two forms of heterochromatin at plant subtelomeres. Chromosome Res 9: 309–323.

PLoS Genetics | www.plosgenetics.org

36. Lippman Z, May B, Yordan C, Singer T, Martienssen R (2003) Distinct mechanisms determine transposon inheritance and methylation via small interfering RNA and histone modification. PLoS Biol 1: e67. doi:10.1371/ journal.pbio.0000067. 37. Benetti R, Schoeftner S, Munoz P, Blasco MA (2008) Role of TRF2 in the assembly of telomeric chromatin. Cell Cycle 7: 3461–3468. 38. Horard B, Gilson E (2008) Telomeric RNA enters the game. Nat Cell Biol 10: 113–115. 39. Luke B, Lingner J (2009) TERRA: telomeric repeat-containing RNA. Embo J 28: 2503–2510. 40. Tremousaygue D, Manevski A, Bardet C, Lescure N, Lescure B (1999) Plant interstitial telomere motifs participate in the control of gene expression in root meristems. Plant J 20: 553–561. 41. Zellinger B, Riha K (2007) Composition of plant telomeres. Biochim Biophys Acta 1769: 399–409. 42. Daxinger L, Kanno T, Bucher E, van der Winden J, Naumann U, et al. (2009) A stepwise pathway for biogenesis of 24-nt secondary siRNAs and spreading of DNA methylation. Embo J 28: 48–57. 43. Dejardin J, Kingston RE (2009) Purification of proteins associated with specific genomic Loci. Cell 136: 175–186. 44. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 45. Grafi G, Ben-Meir H, Avivi Y, Moshe M, Dahan Y, et al. (2007) Histone methylation controls telomerase-independent telomere lengthening in cells undergoing dedifferentiation. Dev Biol 306: 838–846. 46. Kanoh J, Sadaie M, Urano T, Ishikawa F (2005) Telomere binding protein Taz1 establishes Swi6 heterochromatin independently of RNAi at telomeres. Curr Biol 15: 1808–1819. 47. Ho CY, Murnane JP, Yeung AK, Ng HK, Lo AW (2008) Telomeres acquire distinct heterochromatin characteristics during siRNA-induced RNA interference in mouse cells. Curr Biol 18: 183–187. 48. Cao F, Li X, Hiew S, Brady H, Liu Y, et al. (2009) Dicer independent small RNAs associate with telomeric heterochromatin. Rna 15: 1274–1281. 49. Riha K, Fajkus J, Siroky J, Vyskot B (1998) Developmental control of telomere lengths and telomerase activity in plants. Plant Cell 10: 1691–1698. 50. Hetzl J, Foerster AM, Raidl G, Mittelsten Scheid O (2007) CyMATE: a new tool for methylation analysis of plant genomic DNA after bisulphite sequencing. Plant J 51: 526–536. 51. Lawrence RJ, Earley K, Pontes O, Silva M, Chen ZJ, et al. (2004) A concerted DNA methylation/histone methylation switch regulates rRNA gene dosage control and nucleolar dominance. Mol Cell 13: 599–609. 52. Li R, Li YR, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide aligment program. Bioinformatics 24: 713. 53. Akimcheva S, Zellinger B, Riha K (2008) Genome stability in Arabidopsis cells exhibiting alternative lengthening of telomeres. Cytogenet Genome Res 122: 388–395. 54. Riha K, McKnight TD, Griffing LR, Shippen DE (2001) Living with genome instability: plant responses to telomere dysfunction. Science 291: 1797–1800. 55. Fitzgerald MS, Riha K, Gao F, Ren S, McKnight TD, et al. (1999) Disruption of the telomerase catalytic subunit gene from Arabidopsis inactivates telomerase and leads to a slow loss of telomeric DNA. Proc Natl Acad Sci U S A 96: 14813–14818. 56. Zellinger B, Akimcheva S, Puizina J, Schirato M, Riha K (2007) Ku suppresses formation of telomeric circles and alternative telomere lengthening in Arabidopsis. Mol Cell 27: 163–169. 57. Riha K, Watson JM, Parkey J, Shippen DE (2002) Telomere length deregulation and enhanced sensitivity to genotoxic stress in Arabidopsis mutants deficient in Ku70. Embo J 21: 2819–2826.

June 2010 | Volume 6 | Issue 6 | e1000986

Initial Genomics of the Human Nucleolus Attila Ne´meth1, Ana Conesa2, Javier Santoyo-Lopez2, Ignacio Medina2, David Montaner2, Ba´lint Pe´terfia1, Irina Solovei3, Thomas Cremer3, Joaquin Dopazo2, Gernot La¨ngst1* 1 Department of Biochemistry III, University of Regensburg, Regensburg, Germany, 2 Department of Bioinformatics and Genomics, Centro de Investigacio´n Prı´ncipe Felipe, Valencia, Spain, 3 Department of Biology II, Ludwig-Maximilians University of Munich, Planegg-Martinsried, Germany

Abstract We report for the first time the genomics of a nuclear compartment of the eukaryotic cell. 454 sequencing and microarray analysis revealed the pattern of nucleolus-associated chromatin domains (NADs) in the linear human genome and identified different gene families and certain satellite repeats as the major building blocks of NADs, which constitute about 4% of the genome. Bioinformatic evaluation showed that NAD–localized genes take part in specific biological processes, like the response to other organisms, odor perception, and tissue development. 3D FISH and immunofluorescence experiments illustrated the spatial distribution of NAD–specific chromatin within interphase nuclei and its alteration upon transcriptional changes. Altogether, our findings describe the nature of DNA sequences associated with the human nucleolus and provide insights into the function of the nucleolus in genome organization and establishment of nuclear architecture. Citation: Ne´meth A, Conesa A, Santoyo-Lopez J, Medina I, Montaner D, et al. (2010) Initial Genomics of the Human Nucleolus. PLoS Genet 6(3): e1000889. doi:10.1371/journal.pgen.1000889 Editor: Asifa Akhtar, Max-Planck-Institute of Immunobiology, Germany Received November 18, 2009; Accepted March 1, 2010; Published March 26, 2010 Copyright: ß 2010 Ne´meth et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: GL and TC are supported by the Deutsche Forschungsgemeinschaft (DFG); GL by the Bayerisches Genomforschungsnetzwerk (BayGene); AN and GL by the University of Regensburg - DFG Anschubfinanzierung; and AC, DM, IM, JS-L, and JD by grants from project BIO BIO2008-04212 from the Spanish Ministry of Science and Innovation (MICINN) and grant (RD06/0020/1019) from Red Tema´tica de Investigacio´n Cooperativa en Ca´ncer (RTICC), Instituto de Salud Carlos III (ISCIII), MICINN. The National Institute of Bioinformatics (www.inab.org) is a platform of Genoma Espana. The CIBER de enfermedades raras is an initiative of the ISCIII. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: gernot.laengst@vkl.uni-regensburg.de

Thomson [16] and dyskeratosis congenita syndromes [17] and Diamond-Blackfan anemia [18]. Nucleoli are easily detectable under the microscope, however, despite the simple methods of nucleolus isolation, their molecular structure is largely unknown. The nucleolar proteome has been recently analysed by high-throughput mass-spectrometry [19], but the nucleic acid composition of nucleoli had not yet been determined. Therefore the aim of our investigations was to construct and characterize the first high-resolution, genome-wide map of NADs. Recent advances in sequencing and microarray technologies provided excellent platforms to subject nucleolusassociated DNA (naDNA) to critical scrutiny. The results presented here help to understand the mechanisms of nuclear information packaging by macromolecular assemblies and the functional compartmentalisation of the nucleus.

Introduction The largest and densest nuclear compartment is the nucleolus with its shell of perinucleolar DNA. The nucleolus is a unique object to study genome activity, since all three RNA polymerases are involved in the highly dynamic and tightly regulated ribosome biogenesis process, which is its main function. High proliferation activity of tumour cells coincides with high ribosome biogenesis activity thus exposing the nucleolus as a promising target in cancer therapy [1]. In addition, cell-type and function-dependent nucleolar localisation of tumour suppressor proteins, such as p53, MDM2 or p14ARF indicates the role of the nucleolus in carcinogenesis [2–5]. A number of other biological processes (e.g. senescence, RNA modification, cell-cycle control and stress sensing) are also regulated in the nucleolus and connect it to several functional networks of the cell [2–7]. Furthermore, chromatin motion is constrained at nucleoli or nuclear periphery, and disruption of nucleoli increases motility of chromatin domains, indicating the role of the nucleolus in higher-order chromatin arrangement [8]. The nucleolus can therefore be considered as a well-suited model system to investigate functional consequences of genome organisation. It is less well known, however, that alteration in the nucleolus might be linked to multiple forms of human disease, including viral infections. The interaction between viruses and the nucleolus is a pan-virus phenomenon, which is exhibited by DNA viruses, retroviruses and RNA viruses [9,10]. Moreover, multiple genetic disorders have been mapped to genes that encode proteins located in nucleoli under specific conditions. These include Werner [11], fragile X [12,13], Treacher Collins [14], Bloom [15], Rothmund– PLoS Genetics | www.plosgenetics.org

Results/Discussion Because the nucleolar proteome was analysed in HeLa cells [19], our study started with the purification of nucleoli from this widely used model system (Figure 1A). Enrichment of the nucleolar transcription factor UBF and depletion of nuclear lamina proteins laminA/C from the nucleolar fraction was monitored by Western blot. Nucleolus-associated DNA was then isolated, and ribosomal DNA (rDNA) enrichment was measured by quantitative PCR (Figure S1). To analyse the genomic localisation of purified naDNA at low resolution, we performed 2D FISH experiments. Hybridisation of naDNA on human lymphocyte metaphase spreads shows that it appears predominantly on p-arms of acrocentric chromosomes, the location of the 1

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

and centromeric repetitive sequences are overrepresented in naDNA compared to other chromosomal regions (Figure 1B). Next, naDNA was analysed using Nimblegen whole genome microarrays at 6,270-bp median probe spacing resolution and compared to genomic DNA by performing two-colour hybridisation (aCGH). The aCGH data reinforced the results of the 2D FISH experiments: p-arm-adjacent regions of the acrocentric chromosomes and pericentromeric regions are enriched in naDNA. More interestingly, many other chromosomal regions are also present in the naDNA fraction (Figure 2A, Figure S2 and S3). For example, a large part of chromosome 19 associates with the nucleolus (Figure 3E). This finding explains the presence of chromosome 19 in central regions of the interphase nucleus [20], being close to the nucleoli. To elucidate NAD-specific sequence signatures in more detail, 454 sequencing was performed. In total 47,378,399 bases were sequenced in 218,030 reads with an average length of 217 bases/read. We used the complementary set of microarray and sequencing data to visualise the genome-wide localisation of NADs. Genome-wide studies are performed almost exclusively using one high-throughput strategy, which limits the quality of the detection. The combination of techniques compensates the inherent mistakes of the different methods. Our results clearly show that certain NADs are detectable only with one of these approaches (Figure S2 and Table S1). It is important to mention that the p-arms of the five acrocentric chromosomes,

Author Summary It is becoming increasingly clear that the nuclear organization and location of genes in metazoan organisms is not random. Functionally related genes are often found next to each other in the linear genome, and distant DNA elements or DNA regions residing on different chromosomes may reside in specific nuclear compartments. The largest nuclear compartment is the nucleolus with its shell of perinucleolar DNA. The nature of the nucleolusassociated DNA, the targeting mechanism, and the cellular function of this subset of genomic DNA are not known. In the present study we report for the first time the highresolution analysis of a nuclear compartment by sequencing, microarray analysis, and single-cell analysis. We have characterized the nucleolus-associated DNA on sequence level and by 3D microscopy and have determined common elements and the molecular function of this compartment.

repetitive rDNA, and on centromeres of several chromosomes. The addition of the repetitive Cot1 competitor DNA suppresses binding of the naDNA probe to various chromosomal regions, but not to rDNA-containing nucleolar organiser regions (NORs). The result clearly demonstrates that rDNA, moreover pericentomeric

Figure 1. Genome-wide analyis of nucleolus-associated DNA. (A) Experimental strategy. (B) 2D FISH analysis of nucleolus-associated DNA on human female lymphocyte metaphase spreads in the absence (-Cot1) or presence (+Cot1) of Cot1 competitor DNA. Arrows indicate chromosome 1 centromeres, arrowheads indicate p-arms of acrocentric chromosomes. doi:10.1371/journal.pgen.1000889.g001

PLoS Genetics | www.plosgenetics.org

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

Figure 2. Genomic and size distribution of NADs. (A) Distribution of NADs together with satellite repeats along human chromosomes. Note that the p-arms of the five acrocentric chromosomes (13, 14, 15, 21 and 22) were not analysed because they are not assembled in the hg18 genome build. NADs are labeled with red, satellite repeats with deep blue, centromeres with yellow and chromosomes with light blue (B) Histogram of NAD sizes; median = 749 kb; a total of 97 NADs were identified. doi:10.1371/journal.pgen.1000889.g002

tissue-specific manner. This phenomenon suggests that these large chromosomal regions may change their sub-nuclear position with regard to their transcriptional activity. In addition, both immunoglobulin and OR genes exhibit monoallelic expression [23,24]; therefore, nucleoli may be involved in this type of gene regulation. Though, this has to be tested for each individual gene in specific model systems. Besides the response to other organisms and odour perception, additional biological processes and molecular functions are specifically associated with genes localised in the vicinity of the nucleolus, including tissue development and embryo implantation. (Figure S4 and S5 and Table S3). Carcinoembryonic antigen cell adhesion molecule (CEACAM) genes and pregnancy-specific glycoprotein (PSG) gene clusters, whose protein products regulate implantation, were also found next to and within NADs, respectively. Additionally, a large number (119) of small nucleolar RNA (snoRNA) genes were identified within one NAD on chromosome 15. However, this association may be explained by the close proximity of this cluster to the rDNA repeats (distance of 5 Mb). RNA genes located within NADs were characterized using the datasets of the ‘RepeatMasker’ and ‘RNA Genes’ databases of the Genome Browser. Both analyses show that 5S and tRNA genes, both of which are transcribed by RNA polymerase III, are specifically enriched in NADs but not in LADs. In contrast, other RNA genes are distributed with a similar frequency in NADs and the rest of the genome (Figure 3B). This finding proofs that RNA polymerase III-transcribed genes co-localise with nucleoli [25–27], which is the site of RNA polymerase I transcription. These observations suggest that spatial regulation may play a role in coordinated, well-tuned transcription of the RNA components of the protein translation machinery. Analysis of the repetitive elements showed a more than 10-fold enrichment of satellite repeats in NADs and depletion of SINE -

which contain rDNA and satellite repeats, are not represented in the hg18 genome build and, therefore, were not included in our analysis. In addition to the previously described pericentromeric locations, a significant number of the NADs (nine) localised in subtelomeric regions. Altogether, 97 chromosomal regions that are associated with nucleoli were identified, encompassing about 4% (126,217,765 bp) of the genome. Our study detected the most frequent nucleolus-associated chromosome domains using stringent cut-off parameters for domain definition (Figure 2A, Figure S2 and S3, Table S1, and Materials and Methods). After genome-wide NAD identification, sequence and chromatin features were compared to the whole genome and laminaassociated domains (LADs). LADs were recently determined by high-resolution mapping using DamID technology [21]. The size distribution (0.1–10 Mb) and median sequence length (749 kb) of NADs (Figure 2B) were similar to LADs (0.1–10 Mb, 553 kb) suggesting that the architectural units of chromosome organisation within the mammalian interphase nucleus are about 0.5–1 Mb in length. One thousand thirty-seven genes have been identified within NAD sequences according to the RefSeq gene database, 729 of which were non-redundant (Table S2). Surprisingly, certain gene families were frequently associated with the nucleoli, even though the overall gene density in NADs is about 20% lower than in the whole genome. We observed a 4-fold enrichment of zinc-finger (ZNF) genes in NADs compared to the genome. Olfactory receptor (OR) and defensin genes were enriched in both NADs and LADs, but the enrichment was far greater in NADs (Figure 3A). Moreover, two of the six large clusters of immunoglobulin and T-cell receptor genes [22] overlap with NADs, and one other is juxtaposed to a NAD (Figure S3). The gene families mentioned above have two common features: their members are in large gene clusters, and they are expressed in a PLoS Genetics | www.plosgenetics.org

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

Figure 3. Sequence and chromatin features of NADs. (A) RefSeq gene (B) RNA gene and (C) repeat statistics of NADs, genome and LADs. ZNF, OR and DEF indicate zinc finger, olfactory receptor and defensin gene families, respectively. RNA gene analysis of the ‘RepeatMasker’ and ‘RNA Genes’ tracks of the UCSC Genome Browser are shown on the left and right, respectively. (D) Chromatin features of NADs. Enrichment of functionally characterised repressive histone marks H3K27Me3, H3K9Me3 and H4K20Me3 in NADs are shown on the left, whereas depletion of the active histone mark H3K4Me1 is shown on the right. Genome, NADs and LADs values are labeled uniformly in (A–D) with black, red and white, respectively. The complete analysis is summarised in Table S5 and S6. (E) NADs and their typical genomic features on chromosome 19. Brown rectangle indicates the centromere. Abbreviations: UR (Universita¨t Regensburg) NADs – nucleolus-associated chromatin domains identified in this study, PolI pseudo – pseudogenes of RNA polymerase I transcribed rRNA genes, OR – olfactory receptor genes, ZNF – zinc finger genes, tRNA – transfer RNA genes (and pseudogenes) transcribed by RNA polymerase III, NKI (Nederlands Kanker Instituut) LADs–lamin-associated chromosome domains identified in the Tig3 cell line [21]. doi:10.1371/journal.pgen.1000889.g003

PLoS Genetics | www.plosgenetics.org

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

especially MIR–repeats (Figure 3C). We next performed a detailed quantitative analysis of all major satellite repeat subclasses located within NADs. (Figure S6). Our results demonstrate that the major building blocks of NADs are the alpha-, beta- and (GAATG)n/ (CATTC)n-satellite repeats, whereas other types of satellite repeats (e.g. MSR1, D20S16, SATR2) were depleted. These data confirm and extend previous studies [28,29] that describe nucleolar association of satellite repeats, but do not analyse them in detail. Taken together with the fact that D4Z4 macrosatellite repeats are located on the short arms of acrocentric chromosomes [30] and that ‘RepeatMasker’ does not contain information about low copy number repeats (e.g., segmental duplications or macrosatellites), we extended our investigations to such repetitive elements and showed that these genomic features are enriched in NADs (Figure S3 and Table S4). The presence of low-copy number repeats in NADs underlie the difficulties of alignment-based localisation of naDNA sequences within the genome: segmental duplications and major satellites will be mapped to more than one region [31,32], thus the nucleolar association of chromosome regions containing such sequences has to be confirmed by neighbouring sequences or in 3D FISH experiments. Enrichment of satellites and segmental duplications in NADs may also explain the assignment of several domains to chromosome Y even though HeLa cells are derived from a female. The Y chromosome has been shown to co-localise with nucleoli in the interphase nucleus [29,33], indicating that such low-copy number repeats are maybe involved in nucleolar targeting. The detailed map of nucleolus-associated chromosomal regions and genomic features enriched in NADs is shown in Figure 3E for chromosome 19. The complete set of data is shown in Figure S3 and Table S5. In order to reveal specific chromatin patterns enriched within the nucleolus-associated chromatin domains, we used the genomewide maps of histone modifications [34–36]. Multiple repressive histone marks were specifically enriched, whereas the active histone mark H3K4Me1 was significantly depleted in NADs. As mirrored by the enrichment of repressive histone marks, we observed the reduced global gene expression in NADs (Figure 3D and Table S6). These findings imply that NADs tend to form large inactive chromatin domains in the interphase nucleus. However, nucleolus-associated inactive chromatin differs markedly from lamina-associated inactive chromatin in the kind of repetitive elements and the gene-associated biological processes, suggesting that multiple domains of functionally distinct inactive chromatin exist within the nucleus. Furthermore, the presence of the highly expressed classes of 5S RNA and tRNA genes in nucleolusassociated chromatin indicates that the perinucleolar region is not exclusively transcriptionally silent. We used 3D immuno-FISH to confirm whether NADs revealed by the high-throughput methods co-localise with nucleoli. Nucleo li were stained with an a-B23/nucleophosmin antibody, and we have chosen 11 genomic loci that were analysed by appropriate BAC clones. Target, negative and positive control regions were selected from different chromosomes (Table S7, Figure S7, and Materials and Methods). The pericentromeric Xq11.1 region and the 5S rDNA cluster at 1q42.13 served as positive controls [26,37]. The combination of microarray and high-throughput sequencing analysis revealed a high-fidelity list of nucleolus-associated DNA as all of our selected NADs were more frequently associated with nucleoli of HeLa cells than the negative controls. To prove whether the nucleolar association of these chromosomal regions is a cell type specific feature or it is a general property in human cells, IMR90 embryonic lung fibroblasts were analysed. In contrast to HeLa, IMR90 cells possess diploid karyotype and they are not immortal. Except the 5S rDNA cluster on chromosome 1, all PLoS Genetics | www.plosgenetics.org

selected regions showed similar levels of nucleolar association in IMR90 and HeLa cells (Figure 4A and Figure S8), suggesting that the nucleolar targeting of certain chromosomal regions is a common feature in human cells. We next addressed the function of transcription in DNA targeting to the nucleolus by monitoring nucleolus association of selected chromosomal domains upon transcriptional inhibition. We used a-amanitin to block transcription by RNA polymerases II and III, whereas the synthesis of the 47S rRNA precursor was repressed by the addition of actinomycin D. We found that the specific inhibition of any of the RNA polymerases results in spatial reorganization of the nucleolus-associated domains (Figure S9 and Table S7), which indicates that the nucleolus forms a functional unit together with the associated perinucleolar chromatin. However, the concomitant partial disruption of nucleolar structures [38] makes the interpretation of such experiments difficult. In addition to localisation studies of single chromosomal regions, three typical features of the perinucleolar chromatin were visualised. To this end, five-colour immunofluorescence experiments were performed, which allowed direct comparison of the signal distributions of centromere, H3K27Me3 and active RNA polymerase II localisations in the same cell. RNA polymerase II transcription was depleted around nucleoli, furthermore the frequent association of H3K27Me3 and centromere signals with nucleoli reinforced the results of the bioinformatic analysis of NADs. Both HeLa and IMR90 cells showed similar localisation of these nuclear marks and the observed punctuated patterns suggest that functionally distinct chromatin domains co-exist around nucleoli (Figure 4B and 4C and Figure S10). We report here the mapping and characterization of nucleolusassociated chromatin domains in the human genome. Bioinformatics and statistical analyses reveal that the main building blocks of NADs are certain types of satellite repeats, tRNA and 5S RNA genes and members of the ZNF, OR, defensin and immunoglobulin gene families. Thus, our data suggest that certain type of satellite repeat sequences play an important role in establishing of NADs. Indeed, the internal scaffold of the nucleolus, the rDNA repeats were analysed only by qPCR (Figure S1), but not in our high-throughput studies for several reasons: i) they are not represented in the hg18 genome build, ii) repetitive sequences are not printed on microarrays, iii) the number of 454 sequencing reads depends on the GC content, which is very variable throughout the rDNA repeat (Figure S11). The findings of a recent publication indicate that centromeric nucleoprotein complexes may be targeted to the nucleolus via an alpha-satellite RNA-mediated mechanism [39], and address the importance of transcription in this process. These data suggest that transcription has a general regulatory role in maintaining the nuclear architecture around the nucleolus. The transcribed RNA may be bound by nucleolar RNA-binding proteins, which sequester NADs to the nucleolar periphery. On the other hand, our results imply that there is not a unique predictor sequence – in addition to certain satellite repeats, other elements e.g. tRNA genes, 5S RNA genes may be sufficient for the nucleolar targeting of individual chromatin domains. The aforementioned DNA elements, together with specific RNA molecules and scaffold proteins like UBF, may coordinate the (at least partial) selfassembly of the nucleolus with its shell. The principles of the assembly might be similar to the ones that were demonstrated recently for the pseudo-NORs [40,41] and for the Cajal-body [42], where single DNA, protein or RNA scaffolds were able to nucleate the formation of nuclear compartments. Further experiments are required to uncover the molecular steps of 5

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

Figure 4. 3D immunoâ&#x20AC;&#x201C;FISH analysis of nucleolus-associated chromatin domains. (A) Histograms show the frequency of the nucleolar localisation of NADs and control chromosomal regions detected by 3D FISH in HeLa cervix carcinoma and IMR90 diploid fibroblast cells. Percentage of nucleolus-associated alleles is shown on the left. Red diamond indicates target, green ones negative controls, whereas yellow diamond indicates the chromosome X pericentromeric and blue diamond the 5S cluster positive controls, respectively (see Table S7 for further BAC details). Single light optical sections of HeLa nuclei are shown on the right. BAC hybridization signals of RP11-90G23 target, RP5-915N17 positive control and RP11-81M8 negative control BACs are shown in green, nucleolar staining in red and DAPI counterstain in blue (scale bars: 5 mm). (B) a-H3K27Me3, a-centromere, a-active Pol II and a-B23/nucleophosmin immunostaining of HeLa and IMR90 cells. a-H3K27Me3, a-centromere and a-active Pol II signals are shown in green, nucleolar staining in red and DAPI counterstain in blue (scale bars: 5 mm). doi:10.1371/journal.pgen.1000889.g004

transcription-dependent nucleolar targeting of different groups of NADs and to identify the players in this process. The dynamics of nucleolus association during cell cycle and cell differentiation will PLoS Genetics | www.plosgenetics.org

be addressed in future studies. The functional organisation of the nuclear architecture is studied intensively [43â&#x20AC;&#x201C;46] and the identification of NADs in the present work provides a basis for 6

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

bars in plots). 454 ‘Chip-Seq’ domains were selected as those areas with a running mean value above the 98% of the chromosome percentile. This arbitrary threshold fits well visual evaluation of 454 data as well as aCGH data. Finally, 454 regions were edited and border positions were curated manually. The significance of the 454-based NAD determinations was assessed empirically by comparing the number of reads in each of the detected NADs against the distribution of number of reads in 1000 randomly selected same-chromosome regions of the same size. The significance is then obtained as the quartile position of the NAD reads number in the random distribution. 454 and aCGH domains were merged in one single list of NADs. For merging, overlapping regions from both technologies were fused in one domain. Domain borders were defined following aCGH data unless the absence of array probes at merged borders suggested to use the 454 limits. Furthermore, adjacent regions separated by less than 0.1 Mb were joined to single domains.

the better understanding of the role of nucleoli in the spatial organisation of the human genome.

Materials and Methods Population average–based analyses HeLa cervix carcinoma cells were cross-linked with 1% formaldehyde and nucleoli were isolated as described [47]. rDNA content of equal amounts of naDNA and genomic DNA was quantified in real-time PCR reactions. Oligonucleotide sequences: Hr132F: 59CCTGCTGTTCTCTCGCGC, Hr155P: 59FAM-AGCGTCCCGACTCCCGGTGC-TAMRA, Hr198R: 59GGTCAGAGACCCGGACCC; Hr9776F: 59GCCACTTTTGGTAAGCAGAACTG, Hr9802P: 59FAM-CTGCGGGATGAACCGAACGCC-TAMRA, Hr9840R: 59CATCGGGCGCCTTAACC. Numbers indicate rDNA (GenBank Acc. No U13369) position relative to the transcriptional start site. Two rDNA regions were measured in technical triplicates from two biological replicate experiments. UBF and laminA/C protein levels were monitored with the sc-9131 and sc-20681 antibodies (Santa Cruz Biotechnology), respectively. naDNA was isolated and subjected to 454 sequencing (MWGBiotech) and microarray analysis on HG18 CGH 385K WG Tiling v1.0 platform (Nimblegen). Genomic features of NADs were analysed using the UCSC Table Browser (http://genome. ucsc.edu/cgi-bin/hgTables) and chromatin features using the Ensembl Database (http://www.ensembl.org) and the GSE12889 NCBI GEO dataset. Genomic features were visualised using Galaxy (http://galaxy.psu.edu/) and the UCSC Genome Browser (http://genome.ucsc.edu/). All analyses were performed on the hg18 genome build. Biological processes and molecular functions associated with NAD-located genes were analysed by using FatiGO [48]. Array CGH, 454 sequencing and subsequent data analysis were performed as follows: naDNA samples from two biological replicate experiments were subjected to microarray analysis on HG18 CGH 385K WG Tiling v1.0 platform. Hybridisation and pre-processing of hybridisation signals were performed at Nimblegen. For each of the samples, regions of increased intensity measurements were considered to be relevant if their mean value was greater than the 85 percentile of the sample distribution at 0.1 Mb running window size. Only the intersection of relevant regions across the microarray replicas was considered as a NAD. High-throughput sequencing was performed using the Roche GS FLX system. One of the aCGH analysed naDNA samples was taken as template for sequencing. 454 sequence reads were quality filtered and automatically assembled into contigs with the Newbler Assembler software at MWG-Biotech. Contigs were matched against the human genome using BLAT. Repeat masked sequences were used both for 454 data and genome data. For matching a 95% of sequence identity and coverage was requested and a maximum gap size of 3 was permitted. Of the mapped reads, 88% had unique hits. 454 data was widely spread on the genome. Only a few regions had higher intensity, mainly around centromeres. For domain detection, 454 data was first transformed into a binary (1/0) signal indicating presence/absence of mapped reads at chromosome positions defined by 100 nts length segments situated at a 1000 nts inter-spacing. A running mean algorithm was run on these data with a window size of 100 (which implies an actual chromosome window size of 0.1 Mb), to identify chromosomal regions with higher abundance of 454 sequencing hits (red PLoS Genetics | www.plosgenetics.org

Data deposition Microarray data have been submitted to the ArrayExpress Database (http://www.ebi.ac.uk/microarray-as/ae/) under accession number E-MEXP-2403. 454 sequencing data have been submitted to the Sequence Read Archive (http://www.ncbi.nlm. nih.gov/Traces/sra/) under accession number SRA009887.3.

Single-cell experiments 2D FISH experiments were performed on HeLa and human female lymphocyte metaphase spreads according to standard protocols. naDNA was labelled without amplification. NAD target and control BACs were selected as follows: RP11-434B14 (Xq11.1; ‘X cen’) and RP5-915N17 (1q42.13; ‘5S’) were used as positive controls. Perinucleolar localisations of the X chromosome and the large 5S rDNA cluster on chromosome 1 were reported previously [26,37]. RP11-90G23 (8q21.2; ‘REXO1’) and RP11173M10 (13q21.1; ‘7SK’, encompassing a 7SK RNA gene) were selected based on 454 sequencing data. We tested in the latter case if smaller 454 signals, which have not identified NADs could also be associated with nucleoli. RP11-44B13 (19q13.12; ‘27ZNF’) – selected based on our microarray data - marks a chromosomal fragment in FISH experiments where 27 KRAB-ZNF genes are located. The KRAB-ZNF gene cluster at 19q13.12 represents a SUV39H1 and CBX1 binding region. Our 3D FISH results reveal spatial features of this locus, which was formerly characterized at the level of chromatin domain organisation [49]. RP11-89H10 (3p12.3; ‘FRG2C’) and RP11-413F20 (10q26.3; ‘FRG2B’) were selected from combined aCGH/454 and aCGH results respectively. Both chromosomal regions contain D4Z4 major satellite repeats which may have nucleolar targeting potential. RP11-89O2 (3p14.1; ‘FRG2C ctrl’) and RP11-123G19 (10q24.1; ‘FRG2B ctrl’) served as negative controls for the latter two targets. RP11-81M8 (19p13.3; ‘REXO1’) covers a large 2 Mb chromosome fragment. This region contains the REXO1 gene thus having similarity at the primary sequence level to the REXO1L target and serves as its negative control. The negative control of the ZNF gene cluster (RP11-1137G4; 19p13.3-19p13.2; ‘ZNF557’) contains a single ZNF gene. 3D immuno-FISH experiments were performed as described [50]. In localisation experiments a-B23/nucleophosmin (Sigma, B0556), a-H3K27Me3 (Upstate, 07-449), a-active Pol II (Covance, MMS-129R), a-centromere (Antibodies Inc., 15–134) and different fluorescence dye-conjugated secondary antibodies, furthermore BAC clones RP11-90G23, RP11-173M10, RP1144B13, RP11-89H10, RP11-413F20, RP11-81M8, RP5-915N17, RP11-1137G4, RP11-89O2, RP11-123G19 and RP11-434B14 7

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

to the genome was performed using the FatiGO strategy [48] included in the Babelomics suite (www.babelomics.org). Enrichment of different features is indicated in red. Statistical values are listed in Table S3. For better view use .300% zoom. Found at: doi:10.1371/journal.pgen.1000889.s004 (0.19 MB PDF)

were used on HeLa cervix carcinoma cells and IMR90 lung embryonic fibroblasts. HeLa cells were treated with 75 mg/ml or 300 mg/ml a-amanitin for 5 hours in order to inhibit RNA polymerase II or RNA polymerases II and III. RNA polymerase I mediated synthesis of the rRNA precursor was impaired by treatment of the cells with 50 ng/ml actinomycin D for 1 hour. Cells were fixed and 3D immuno-FISH experiments were performed. Confocal microscopy and image analysis was performed after 3D FISH experiments as follows: series of optical sections through 3D-preserved nuclei were collected using a Leica TCS SP5 confocal system equipped with a Plan Apo 636/1.4 NA oil immersion objective and a diode laser (excitation wave length 405 nm) for DAPI, an argon laser (488 nm) for FITC and Alexa 488, a DPSS laser (561 nm) for Cy3, a HeNe laser (594 nm) for Texas Red and a HeNe laser (633 nm) for Cy5. For each optical section, signals in different channels were collected sequentially. Stacks of 8-bit gray-scale images were obtained with z-step of 200 nm and pixel sizes 30–100 nm depending on experiment. The axial chromatic shift was corrected and corresponding RGBstacks, montages and maximum intensity projections were created using published ImageJ plugins [51]. Positions of FISH signals were assessed by visual inspection of RGB stacks using the ImageJ program.

Figure S5 Molecular functions associated with NAD-located RefSeq genes. Statistical analysis of feature enrichment compared to the genome was performed using the FatiGO strategy [48] included in the Babelomics suite (www.babelomics.org). Enrichment of different features is indicated in red. Statistical values are listed in Table S3. Found at: doi:10.1371/journal.pgen.1000889.s005 (0.18 MB PDF) Figure S6 Satellite repeats in NADs and naDNA. The upper panel shows the number of different satellite repeats located in NADs compared to the genomic values. Repeat counts of 454 sequence reads shown in the lower panel reveal other quantitative aspects of different satellite repeat constitution to naDNA. Notably, satellite repeats located on the p-arms of the five acrocentric chromosomes (13, 14, 15, 21, and 22) are not included in the NAD analysis, but they appear in the naDNA analysis. Stars indicate repeats of which substantial amount (30%–50%) is located on chromosome Y and thus missing from female HeLa cells. Found at: doi:10.1371/journal.pgen.1000889.s006 (0.13 MB PDF)

Supporting Information

Figure S7 2D FISH analysis of BAC clones on human female lymphocyte and HeLa metaphase spreads. Lymphocytes are shown on the left and HeLa on the right panels. DAPI counterstaining is shown in red, BAC hybridization in green. White arrowheads point to BAC signals. Chromosomal localisation was verified by using chromosome paints (not shown). ID codes, chromosomal locations and BACPAC ID numbers of the BACs are indicated. Genomic coordinates of all BACs are shown in Table S7, locations in Figure S3. All BAC clones delivered 2 signals in lymphocytes, but RP11-89H10. However, cross-reaction signals could be filtered since they were significantly less intense than the specific signals. BAC clones delivered 3 signals in HeLa except RP11-89H10, RP11-89O2, RP11-434B14 (2 signals) and RP11-173M10 (4 signals). Again, cross-reaction signals could be filtered in the case of RP11-89H10. Found at: doi:10.1371/journal.pgen.1000889.s007 (0.13 MB PDF)

Figure S1 Controls of nucleolus purification. Left panel: differential interference contrast (DIC) micrographs show formaldehyde cross-linked HeLa cells and isolated nucleoli. Right panel: UBF and laminA/C immunoblot controls of a nucleolus preparation. Lane 1 shows the input, 2 and 3 the supernatants of the two-step purification [47] and 4 the nucleolar fraction. 0.5% of each fraction was loaded. Quantitative PCR measurement illustrates the enrichment of ribosomal DNA in nucleolusassociated DNA (naDNA) compared to genomic DNA (gDNA). Mean and standard deviation values of two biological replicate experiments are shown. Found at: doi:10.1371/journal.pgen.1000889.s001 (0.04 MB PDF) Figure S2 Nucleolus association maps on all chromosomes detected with 454 sequencing and/or aCGH analysis. 454 and aCGH signals are marked by red and blue bars, 454 and aCGH detected NADs by red and blue rectangles, respectively. Found at: doi:10.1371/journal.pgen.1000889.s002 (1.73 MB JPG)

Figure S8 Frequency of nucleolar localisation of NADs and control chromosomal regions detected by 3D FISH in HeLa cervix carcinoma and IMR90 diploid fibroblast cells. Percentage of cells containing at least one nucleolar-localised allele is shown. The results complement the data shown in Figure 4A and summarised in Table S7. Found at: doi:10.1371/journal.pgen.1000889.s008 (0.03 MB PDF)

Figure S3 Linear map of NADs and their typical genomic features on the human genome. NADs and their selected, typical sequence features are shown on the map. BAC clones used in 3D FISH experiments are indicated on the top and LADs on the bottom over the Segmental Duplication track. Abbreviations: UR NADs - nucleolus-associated chromosome domains identified in this study, PolI pseudo - pseudogenes of RNA polymerase I transcribed rRNA genes, D4Z4 - D4Z4 major satellite repeats (see Table S4 for further information), OR - olfactory receptor genes, ZNF - zinc finger genes, DEF - defensin genes, 5S and tRNA - 5S rRNA and transfer RNA genes (and pseudogenes) transcribed by RNA polymerase III, NKI LADs - lamin-associated chromosome domains identified by Guelen et al., [21]. Immunoglobulin and Tcell receptor gene clusters are shown according to www.imgt.org [22]. For better view use .400% zoom. Segmental duplications are shown with the colour code identical to the UCSC Genome Browser (http://genome.ucsc.edu/). Found at: doi:10.1371/journal.pgen.1000889.s003 (0.92 MB PDF)

Figure S9 3D immuno-FISH analysis of NADs after inhibition of transcription. HeLa cells were treated with a-amanitin in order to inhibit RNA polymerase II (Pol II) or RNA polymerases II and III (Pol II+III). RNA polymerase I (Pol I) mediated synthesis of the rRNA precursor was impaired by treatment of the cells with actinomycin D. Histograms show the frequency of the nucleolar localisation of three chromosomal regions detected by the indicated BAC clones in 3D FISH experiments. Red, green and blue diamonds indicate target, negative control, and the 5S cluster positive control, respectively (see Table S7 for further BAC details). We used a-amanitin to block transcription by RNA polymerases II and III as described [Huang S, Deerinck TJ, Ellisman MH, Spector DL (1998) The perinucleolar compartment and transcription. J Cell Biol 143: 35–47.; Wang C, Politz JC, Pederson T, Huang S (2003) RNA polymerase III transcripts and the PTB protein are essential for the integrity of the perinucleolar

Figure S4 Biological processes associated with NAD-located RefSeq genes. Statistical analysis of feature enrichment compared PLoS Genetics | www.plosgenetics.org

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

of 454 sequence hits per NAD is shown as well. The 454-based NAD determination was tested in an experimental statistical test comparing the number of reads in each of the detected NADs against the distribution of number of reads in 1,000 randomly selected same-chromosome regions of the same size. The significance is then obtained as the quartile position of the NAD reads number in the random distribution. NADs that were analysed in 3D FISH experiments are highlighted in yellow. Found at: doi:10.1371/journal.pgen.1000889.s012 (0.03 MB XLS)

compartment. Mol Biol Cell 14: 2425–2435.], whereas the synthesis of the 47S rRNA precursor was repressed by the addition of actinomycin D as described in the related nucleolar proteome study [19] and in the Materials and Methods. The results show that the specific inhibition of any of the RNA polymerases results in spatial reorganisation of NADs, which indicates that the nucleolus forms a functional unit together with the associated perinucleolar chromatin. Notably, the structure of nucleoli is also partially disrupted after the indicated treatments [38] and thus the interpretation of such analyses is difficult. The results of these experiments are summarised in Table S7. Found at: doi:10.1371/journal.pgen.1000889.s009 (0.03 MB PDF)

Table S2 List of RefSeq genes located in NADs. Genes within NADs were identified with the UCSC Table Browser (RefSeq Genes Track, hg18 genome build). Note, that almost 30% of the genes are duplicated or even more amplified. Specific enrichment of different gene families in NADs is shown in Figure 3A. Found at: doi:10.1371/journal.pgen.1000889.s013 (0.39 MB XLS)

Figure S10 Quantitative immunofluorescence analysis of selected NAD features. a-H3K27Me3 and a-active Pol II immunostainings of HeLa and IMR90 cells were quantified around nucleoli by using the ImageJ software. After thresholding a-B23/ nucleophosmin signals (indicated in blue), mean fluorescence intensity values were measured in the first 250 nm shell (red) and the second 250 nm shell (green) of 12 HeLa cells (22 nucleoli) and 16 IMR90 cells (56 nucleoli). The mean fluorescence intensity values were then divided to estimate enrichment or depletion. At the border of the nucleolus active Pol II and H3K27me3 show a clearly different distribution (p,0.001, Student’s t-test). Enrichment and depletion of the two markers in individual shells are significant in all cases (at least at the level p,0.05). Error bars are 95% confidence intervals. Found at: doi:10.1371/journal.pgen.1000889.s010 (0.99 MB PDF)

Biological processes and molecular functions associated with NAD-located RefSeq genes. Statistical analysis of feature enrichment compared to the genome was performed using the FatiGO strategy [48] included in the Babelomics suite (www. babelomics.org). Results are summarised in Figure S3 and S4 as graphs. Found at: doi:10.1371/journal.pgen.1000889.s014 (0.02 MB XLS)

Table S3

Table S4 List of D4Z4 major satellite containing chromosomal regions of the hg18 genome build. BLAT search was performed using the HUMFSHD sequence (GenBank Accession: D38024) as query. Chromosomal regions with more than 10% (330 bp) homology were indicated on the NAD map (Figure S3). Found at: doi:10.1371/journal.pgen.1000889.s015 (0.02 MB XLS)

Ribosomal DNA in 454 sequence reads. The assembly of rDNA containing 454 sequence reads is shown in the upper part and the scheme of the rDNA repeat unit below (black arrows indicate the position and direction of individual reads). In total 3,231 rDNA containing DNA fragments were sequenced, of which 2,086 reads were assembled together with the rDNA repeat unit into a single sequence in a MacVector Assembly Project. The results clearly show that different regions were unequally represented in the deep sequencing data, which is probably due to the technical limitations of the method (i.e. emPCR-based amplification of fragments with different GC content is unequal). The negative correlation between the number of sequence reads and GC content can be easily visualized by comparing the assembly result with the GC content plot over the rDNA sequence (the plot was calculated with the EMBOSS Isochore program, http://www.ebi.ac.uk/Tools/emboss). The scheme of the rDNA repeat is shown at the bottom of the figure, 18S, 5.8S, 28S, and IGS mark the coding regions and the intergenic spacer of the human rDNA (GenBank AccNo: U13369), respectively; red and blue lollipops mark the transcriptional start and stop sites, respectively; ticks on the ruler indicate 1 kb distances. We would like to underline here again that the combination of two high-throughput methods, i.e. 454 and aCGH, allows to reduce technical problems, such as the bias in next-generation sequencing [Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, et al. (2009) Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol 10: R32.] and the lack of repetitive sequence information in the microarray-based method. Found at: doi:10.1371/journal.pgen.1000889.s011 (0.30 MB PDF) Figure S11

Table S5 Statistical analysis of sequence features of NADs. Sequence features in NADs, genome and LADs were extracted from the UCSC Table Browser. Fisher’s exact test was performed to assess the significance of feature enrichments and the p-values are indicated. One-sided Fisher’s exact test was applied to test enrichment of genes of the selected gene families in NADs over the genome values. Two-sided Fisher’s exact test was applied to test enrichment/depletion of RNA genes and repeat families in NADs over the genome values. The statistical analysis of the enrichment of satellite repeats and depletion of SINE and in particular MIR repeats in NADs resulted in p = 0, thus they are not listed in the table. Although the differences between the observed NAD and genomic frequencies of other repeat types (LINE, Alu, LTR, DNA; p,,0.001) were also significant, the absolute differences were in these cases smaller than for satellites and MIRs and thus it is less likely that the latter repeats could possess specific nucleolar targeting and/or anchoring potential. The results of gene, RNA gene and repeat content analyses are illustrated as graphs in Figure 3A-3C, respectively. The detailed analysis of satellite repeat classes is shown in Figure S6. Found at: doi:10.1371/journal.pgen.1000889.s016 (0.02 MB XLS) Table S6 Statistical analysis of chromatin features of NADs.

Chromatin regulatory features in NADs were extracted from Ensembl Functional Genomics (eFG) database using Ensembl Perl API (Ensembl 50). These data were obtained by ChIP-seq analysis of lymphocytes [34]. The numbers indicate sequence reads per Mb. Additionally, gene expression and H3K27Me3 occupancy data for Hela cells were obtained from the Gene Expression Omnibus Database (GSM323148, GSM323149, GSM325898;

List of NAD genomic coordinates (hg18 genome build) and features of their detection. Chromosomal positions and size of NADs is shown in the table. The method of the detection for each 97 NADs is also indicated: 41 NADs were detected with both microarray and high-throughput sequencing, 20 NADs only by using sequencing, and 36 NADs only on microarrays. The number Table S1

PLoS Genetics | www.plosgenetics.org

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

[35]). The numbers indicate here sequence length occupied by the H3K27Me3 histone mark per Mb and mean values of gene expression in arbitrary units. Enrichment of features was tested by comparing the distribution of feature counts in NADs against the genome mean value using a t-test statistics and adjusting p-values for multiple testing. Importantly, the analysis of HeLa H3K27Me3 and gene expression data reinforces the results obtained from lymphocytes. Genomic and NAD values of functionally characterised, significantly enriched or depleted chromatin marks are shown in Figure 3D. Found at: doi:10.1371/journal.pgen.1000889.s017 (0.05 MB XLS)

transcription inhibition experiments are summarised in the lower part of the table and illustrated in Figure S9. Found at: doi:10.1371/journal.pgen.1000889.s018 (0.02 MB XLS)

Acknowledgments We thank M. Cremer, B. Joffe, D. Ko¨hler, and H. Jahn-Henninger for helpful discussions and technical help. AN dedicates his work to the memory of He´di.

Author Contributions Conceived and designed the experiments: AN TC GL. Performed the experiments: AN BP IS. Analyzed the data: AN AC JSL IM DM. Contributed reagents/materials/analysis tools: AN TC JD GL. Wrote the paper: AN.

Table S7 Summary of 3D FISH experiments. BAC locations,

allele and cell counts, furthermore nucleolus association frequencies in HeLa and IMR90 cells are shown. The results of

References 24. Pernis B, Chiappino G, Kelus AS, Gell PG (1965) Cellular localization of immunoglobulins with different allotypic specificities in rabbit lymphoid tissues. J Exp Med 122: 853–876. 25. Haeusler RA, Engelke DR (2006) Spatial organization of transcription by RNA polymerase III. Nucleic Acids Res 34: 4826–4836. 26. Matera AG, Frey MR, Margelot K, Wolin SL (1995) A perinucleolar compartment contains several RNA polymerase III transcripts as well as the polypyrimidine tract-binding protein, hnRNP I. J Cell Biol 129: 1181–1193. 27. Thompson M, Haeusler RA, Good PD, Engelke DR (2003) Nucleolar clustering of dispersed tRNA genes. Science 302: 1399–1401. 28. McStay B, Grummt I (2008) The epigenetics of rRNA genes: from molecular to chromosome biology. Annu Rev Cell Dev Biol 24: 131–157. 29. Stahl A, Hartung M, Vagner-Capodano AM, Fouet C (1976) Chromosomal constitution of nucleolus-associated chromatin in man. Hum Genet 35: 27–34. 30. Lyle R, Wright TJ, Clark LN, Hewitt JE (1995) The FSHD-associated repeat, D4Z4, is a member of a dispersed family of homeobox-containing repeats, subsets of which are clustered on the short arms of the acrocentric chromosomes. Genomics 28: 389–397. 31. Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE (2001) Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11: 1005–1017. 32. Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, et al. (2005) Segmental duplications and copy-number variation in the human genome. Am J Hum Genet 77: 78–88. 33. Bobrow M, Pearson PL, Collacott HE (1971) Para-nucleolar position of the human Y chromosome in interphase nuclei. Nature 232: 556–557. 34. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, et al. (2007) Highresolution profiling of histone methylations in the human genome. Cell 129: 823–837. 35. Cuddapah S, Jothi R, Schones DE, Roh TY, Cui K, et al. (2009) Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res 19: 24–32. 36. Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, et al. (2008) Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet 40: 897–903. 37. Bourgeois CA, Laquerriere F, Hemon D, Hubert J, Bouteille M (1985) New data on the in-situ position of the inactive X chromosome in the interphase nucleus of human fibroblasts. Hum Genet 69: 122–129. 38. Haaf T, Ward DC (1996) Inhibition of RNA polymerase II transcription causes chromatin decondensation, loss of nucleolar structure, and dispersion of chromosomal domains. Exp Cell Res 224: 163–173. 39. Wong LH, Brettingham-Moore KH, Chan L, Quach JM, Anderson MA, et al. (2007) Centromere RNA is a key component for the assembly of nucleoproteins at the nucleolus and centromere. Genome Res 17: 1146–1160. 40. Prieto JL, McStay B (2007) Recruitment of factors linking transcription and processing of pre-rRNA to NOR chromatin is UBF-dependent and occurs independent of transcription in human cells. Genes Dev 21: 2041–2054. 41. Prieto JL, McStay B (2008) Pseudo-NORs: a novel model for studying nucleoli. Biochim Biophys Acta 1783: 2116–2123. 42. Kaiser TE, Intine RV, Dundr M (2008) De novo formation of a subnuclear body. Science 322: 1713–1717. 43. Lanctot C, Cheutin T, Cremer M, Cavalli G, Cremer T (2007) Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions. Nat Rev Genet 8: 104–115. 44. Sexton T, Schober H, Fraser P, Gasser SM (2007) Gene regulation through nuclear organization. Nat Struct Mol Biol 14: 1049–1055. 45. Takizawa T, Meaburn KJ, Misteli T (2008) The meaning of gene positioning. Cell 135: 9–13. 46. Zhao R, Bodnar MS, Spector DL (2009) Nuclear neighborhoods and gene expression. Curr Opin Genet Dev 19: 172–179.

1. Drygin D, Siddiqui-Jain A, O’Brien S, Schwaebe M, Lin A, et al. (2009) Anticancer Activity of CX-3543: A Direct Inhibitor of rRNA Biogenesis. Cancer Res. 2. Boisvert FM, van Koningsbruggen S, Navascues J, Lamond AI (2007) The multifunctional nucleolus. Nat Rev Mol Cell Biol 8: 574–585. 3. Mayer C, Grummt I (2005) Cellular stress and nucleolar function. Cell Cycle 4: 1036–1038. 4. Olson MO, Hingorani K, Szebeni A (2002) Conventional and nonconventional roles of the nucleolus. Int Rev Cytol 219: 199–266. 5. Sirri V, Urcuqui-Inchima S, Roussel P, Hernandez-Verdun D (2008) Nucleolus: the fascinating nuclear body. Histochem Cell Biol 129: 13–31. 6. McKeown PC, Shaw PJ (2009) Chromatin: linking structure and function in the nucleolus. Chromosoma 118: 11–23. 7. Tschochner H, Hurt E (2003) Pre-ribosomes on the road from the nucleolus to the cytoplasm. Trends Cell Biol 13: 255–263. 8. Chubb JR, Boyle S, Perry P, Bickmore WA (2002) Chromatin motion is constrained by association with nuclear compartments in human cells. Curr Biol 12: 439–445. 9. Hiscox JA (2002) The nucleolus–a gateway to viral infection? Arch Virol 147: 1077–1089. 10. Hiscox JA (2007) RNA viruses: hijacking the dynamic nucleolus. Nat Rev Microbiol 5: 119–127. 11. Marciniak RA, Lombard DB, Johnson FB, Guarente L (1998) Nucleolar localization of the Werner syndrome protein in human cells. Proc Natl Acad Sci U S A 95: 6887–6892. 12. Tamanini F, Kirkpatrick LL, Schonkeren J, van Unen L, Bontekoe C, et al. (2000) The fragile X-related proteins FXR1P and FXR2P contain a functional nucleolar-targeting signal equivalent to the HIV-1 regulatory proteins. Hum Mol Genet 9: 1487–1493. 13. Willemsen R, Bontekoe C, Tamanini F, Galjaard H, Hoogeveen A, et al. (1996) Association of FMRP with ribosomal precursor particles in the nucleolus. Biochem Biophys Res Commun 225: 27–33. 14. Isaac C, Marsh KL, Paznekas WA, Dixon J, Dixon MJ, et al. (2000) Characterization of the nucleolar gene product, treacle, in Treacher Collins syndrome. Mol Biol Cell 11: 3061–3071. 15. Yankiwski V, Marciniak RA, Guarente L, Neff NF (2000) Nuclear structure in normal and Bloom syndrome cells. Proc Natl Acad Sci U S A 97: 5214–5219. 16. Woo LL, Futami K, Shimamoto A, Furuichi Y, Frank KM (2006) The Rothmund-Thomson gene product RECQL4 localizes to the nucleolus in response to oxidative stress. Exp Cell Res 312: 3443–3457. 17. Heiss NS, Girod A, Salowsky R, Wiemann S, Pepperkok R, et al. (1999) Dyskerin localizes to the nucleolus and its mislocalization is unlikely to play a role in the pathogenesis of dyskeratosis congenita. Hum Mol Genet 8: 2515–2524. 18. Lipton JM, Ellis SR (2009) Diamond Blackfan anemia 2008–2009: broadening the scope of ribosome biogenesis disorders. Curr Opin Pediatr. 19. Andersen JS, Lam YW, Leung AK, Ong SE, Lyon CE, et al. (2005) Nucleolar proteome dynamics. Nature 433: 77–83. 20. Croft JA, Bridger JM, Boyle S, Perry P, Teague P, et al. (1999) Differences in the localization and morphology of chromosomes in the human nucleus. J Cell Biol 145: 1119–1131. 21. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, et al. (2008) Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453: 948–951. 22. Lefranc MP, Giudicelli V, Ginestoux C, Jabado-Michaloud J, Folch G, et al. (2009) IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res 37: D1006–1012. 23. Chess A, Simon I, Cedar H, Axel R (1994) Allelic inactivation regulates olfactory receptor gene expression. Cell 78: 823–834.

PLoS Genetics | www.plosgenetics.org

March 2010 | Volume 6 | Issue 3 | e1000889

Nucleolus Genomics

49. Vogel MJ, Guelen L, de Wit E, Peric-Hupkes D, Loden M, et al. (2006) Human heterochromatin proteins form large domains containing KRAB-ZNF genes. Genome Res 16: 1493–1504. 50. Cremer M, Grasser F, Lanctot C, Muller S, Neusser M, et al. (2008) Multicolor 3D Fluorescence In Situ Hybridization for Imaging Interphase Chromosomes. Methods Mol Biol 463: 205–239. 51. Walter J, Joffe B, Bolzer A, Albiez H, Benedetti PA, et al. (2006) Towards many colors in FISH on 3D-preserved interphase nuclei. Cytogenet Genome Res 114: 367–378.

47. Sullivan GJ, Bridger JM, Cuthbert AP, Newbold RF, Bickmore WA, et al. (2001) Human acrocentric chromosomes with transcriptionally silent nucleolar organizer regions associate with nucleoli. Embo J 20: 2867–2874. 48. Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, et al. (2007) FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res 35: W91–96.

PLoS Genetics | www.plosgenetics.org

March 2010 | Volume 6 | Issue 3 | e1000889

Nuclear Pore Proteins Nup153 and Megator Define Transcriptionally Active Regions in the Drosophila Genome Juan M. Vaquerizas1., Ritsuko Suyama2., Jop Kind2., Kota Miura3, Nicholas M. Luscombe1,2"* , Asifa Akhtar2,4"* 1 European Bioinformatics Institute, Cambridge, United Kingdom, 2 Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 3 Centre for Molecular and Cellular Imaging, European Molecular Biology Laboratory, Heidelberg, Germany, 4 Laboratory of Chromatin Regulation, Max Planck Institute of Immunobiology, Freiburg, Germany

Abstract Transcriptional regulation is one of the most important processes for modulating gene expression. Though much of this control is attributed to transcription factors, histones, and associated enzymes, it is increasingly apparent that the spatial organization of chromosomes within the nucleus has a profound effect on transcriptional activity. Studies in yeast indicate that the nuclear pore complex might promote transcription by recruiting chromatin to the nuclear periphery. In higher eukaryotes, however, it is not known whether such regulation has global significance. Here we establish nucleoporins as a major class of global regulators for gene expression in Drosophila melanogaster. Using chromatin-immunoprecipitation combined with microarray hybridisation, we show that Nup153 and Megator (Mtor) bind to 25% of the genome in continuous domains extending 10 kb to 500 kb. These Nucleoporin-Associated Regions (NARs) are dominated by markers for active transcription, including high RNA polymerase II occupancy and histone H4K16 acetylation. RNAi–mediated knockdown of Nup153 alters the expression of ,5,700 genes, with a pronounced down-regulatory effect within NARs. We find that nucleoporins play a central role in coordinating dosage compensation—an organism-wide process involving the doubling of expression of the male X chromosome. NARs are enriched on the male X chromosome and occupy 75% of this chromosome. Furthermore, Nup153-depletion abolishes the normal function of the male-specific dosage compensation complex. Finally, by extensive 3D imaging, we demonstrate that NARs contribute to gene expression control irrespective of their sub-nuclear localization. Therefore, we suggest that NAR–binding is used for chromosomal organization that enables gene expression control. Citation: Vaquerizas JM, Suyama R, Kind J, Miura K, Luscombe NM, et al. (2010) Nuclear Pore Proteins Nup153 and Megator Define Transcriptionally Active Regions in the Drosophila Genome. PLoS Genet 6(2): e1000846. doi:10.1371/journal.pgen.1000846 Editor: Wolf Reik, The Babraham Institute, United Kingdom Received November 10, 2009; Accepted January 14, 2010; Published February 12, 2010 Copyright: ß 2010 Vaquerizas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by DFG SPP1129 and the EU funded FP7 Epigenome project. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: akhtar@immunbio.mpg.de (AA); luscombe@ebi.ac.uk (NML) . These authors contributed equally to this work. " These authors are joint senior authors on this work.

on the genome, so controlling transcriptional initiation. Despite the importance of these cis- and trans-acting factors on the local chromosomal environment and the transcription of nearby genes, it has become increasingly clear that they explain just one level at which chromatin is regulated [3,4]. The eukaryotic genome is spatially distributed in a highly organised manner, with entire chromosomal regions localising to well-defined sub-nuclear positions [5]. This organisation has a profound effect on chromatin accessibility and transcriptional activity on a genome-wide level [6–8]. For instance, chromosomal regions at the nuclear envelope tend to form closed heterochromatin, a structure that is generally indicative of transcriptional repression [9]. Genomic studies in Drosophila melanogaster and humans established that lamins—proteins lining the nuclear membrane [10]—are major contributors to sub-nuclear localisation and gene regulation [11,12]. Comparisons of binding profiles

Introduction The spatial organisation of DNA, both at the nucleotide and chromosomal levels, allows efficient storage of genetic information inside the nucleus. However, DNA-dependent processes such as transcription, require the chromosomal structure to be modified in order to allow access to this information. The regulation of chromatin accessibility is an intensely studied subject [1,2]. Molecular and genomic investigations have examined how nucleotide sequences and ATP-dependent chromatinremodelling enzymes specify the locations for nucleosomalbinding, and how histone-modifying enzymes modulate the stability of histone-nucleic acid interactions. These enzymes are recruited to precise genomic loci with the aid of sequence-specific DNA-binding transcription factors. In turn, particular histone modifications influence transcription factor-binding to target sites PLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

of the Drosophila genome in large domains spanning 10–500 kb in size. These regions—which we term nucleoporin associated regions (NARs)—contain large numbers of highly expressed genes, and are enriched for markers of active transcription including RNA polymerase-binding and histone H4 lysine 16 acetylation. Additionally, we reveal a remarkably high density of NARs on the male X chromosome, which correlate extremely well with the binding pattern of the dosage compensation complex. Finally, we demonstrate that chromosomal regions bound by these nucleoporins are composed of peripheral as well as non-peripheral pools of these proteins but interestingly the X chromosomal target regions are preferentially localised closer to the nuclear periphery. In summary, we firmly establish nucleoporins as a major class of chromatin-binding proteins in higher eukaryotes, with a general role in transcriptional regulation and three-dimensional chromosomal organisation. Finally we show for the first time, the importance of nucleoporin-binding not only as a mechanism for transcriptional control, but also in maintaining a complex organism-level biological system namely dosage compensation.

Author Summary The eukaryotic genome is spatially distributed in a highly organized manner, with chromosomal regions localizing to well-defined sub-nuclear positions. This organization could have a profound effect on chromatin accessibility and transcriptional activity on a genome-wide level. Using high-resolution, genome-wide, chromatin-binding profiles we show that the nuclear pore components Nup153 and Megator bind to quarter of the Drosophila genome in form of chromosomal domains. These domains represent active regions of the genome. Interestingly, comparison of male and female cells revealed enrichment of these domains on the male X chromosome, which represents an exceptionally active chromosome that is under dosage compensation control to equalize gene expression due to differences in X chromosome number between males and females. Based on extensive 3D image analysis, we show that these chromosomal domains are contributed by both peripheral as well as intranuclear pool of these proteins. We suggest that chromosomal organization by nucleoporins could contribute to global gene expression control.

Results Nup153 and Mtor bind chromatin in a genome-wide fashion

with gene expression data and histone marker information showed that chromosomal regions containing dense lamin-binding were transcriptionally repressed. Although the nuclear periphery has been primarily associated with repression, recent evidence has also suggested a role for membrane components in transcriptional activation [9,13–16]. The nuclear pore complex is a large structure comprising about 30 protein subunits, and it is the primary channel through which macromolecules traverse the nuclear envelope [17]. Interestingly, investigations in Saccharomyces cerevisiae identified subunits of the nuclear pore complex that preferentially bound transcriptionally active genes [18]. Moreover, several target loci such as GAL2 and INO1 were found to relocate from the interior to the periphery upon activation [13], although there were exceptions to this behaviour [19–22]. Thus, it is becoming increasingly clear that nuclear periphery components can have both positive and negative influence on gene regulation. Since there are differences in the composition of the nuclear envelope—such as the lack of lamins—it is important to also study the contribution of nuclear envelope components in gene regulation in higher organisms [9,17,23–25]. So far just one study has explored the global interactions of nucleoporin subunit Nup93 with human chromosomes 5, 7 and 16 [26]; the publication reported only a low density of binding sites, and their influence on gene regulation was inconclusive. Recently, we revealed a biochemical association between nucleoporins and the dosage compensation apparatus in higher eukaryotes including humans [27]. In Drosophila, the Male Specific Lethal (MSL) complex offsets the imbalance in the number of sex chromosomes in males and females by doubling the expression of genes on the male X chromosome [28,29]. By purifying enzymatically active MOF complexes, we identified interactions with the nucleoporins Nup153 and Megator (Mtor). Strikingly, depletion of either subunit resulted in the loss of dosage compensation in male cells. Therefore, our work suggested a vital role for nucleoporins in promoting transcriptional activation on a large-scale. Here, we present the first genome-wide study of nucleoporinbinding in a higher eukaryote. Using chromatin immunoprecipitation followed by hybridisation to high resolution tiling microarrays, we show that Nup153 and Mtor interact with 25% PLoS Genetics | www.plosgenetics.org

We produced DNA-binding profiles for nuclear pore components Mtor and Nup153 in Drosophila male SL-2 and female KC cell lines using chromatin immunoprecipitation followed by hybridisation to Affymetrix tiling arrays [30,31] (Figure 1). Raw data were processed as in Kind et al (2008) to minimise falsepositive signals from aberrant array probes (Figure S1) [32]. The ChIP-chip profiles for the two proteins strongly correlate, indicating they bind to similar locations throughout the genome (r = 0.77 and 0.88 for SL-2 and KC cells respectively; Figure 1D, Figure S4). We confirmed the reproducibility of results by performing three biological replicates for each condition (r = 0.73), and we validated binding at 18 control genes by realtime PCR in triplicate (Figure S2). Both Mtor and Nup153 exhibit extensive binding across the whole genome, and together they bind to 42% of the Drosophila genome (calculated as a fraction of base-pairs covered with twofold cut-off). Thus nucleoporins represent a new class of global chromatin-binding proteins for higher eukaryotes.

Nucleoporin-binding occurs in large chromosomal domains Visual inspection of the ChIP-chip profiles reveals that Nup153 and Mtor interact with the genome in a manner not observed for traditional transcription factors (Figure 1B and 1C) [33]. Instead of associating with discrete loci, nucleoporins bind extended chromosomal regions that alternate between domains of highdensity binding with those of low occupancy. In order to analyse the visual observations in a statistically rigorous fashion, we quantified binding that takes place within a 10 kb sliding window that was scanned along the genome (see Materials and Methods). Windows containing more than 70% binding (as a proportion of array probes with positive binding signal) were classified as Nucleoporin Associated Regions (NARs), and neighbouring windows reaching this threshold were grouped together as continuous NARs. The detection method is robust: the 70% threshold ensures that no NARs are found when binding sites are randomly distributed across the genome and we identify very similar sets of NARs for windows ranging 5 kb to 500 kb in size. Moreover, application of the domain-finding approach described 2

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

Figure 1. Nup153 and Megator bind the Drosophila genome on a large scale. (A) Karyotype representation of the Drosophila genome; the upper track depicts the occurrence of high-density nucleoporin-binding in SL-2 cells and the lower track shows the location of annotated genes. Termed Nucleoporin Associated Regions (NARs), high-density binding occurs across 25% of the genome and there is particularly high occupancy on the male X chromosome. (B) Magnified view of Nup153 and Mtor-binding on chromosome 3L. For each nucleoporin, the upper track displays the processed ChIP/input profile and the lower track colours the sections identified as NARs. Note that Nup153 and Mtor show very similar patterns of binding. (C) Magnified view of nucleoporin-binding and NAR occurrence on chromosome X. There is much denser binding on this chromosome compared with autosomes. (D) Smoothed scatter plot displaying the ChIP/input binding ratios for Nup153 and Mtor (r = 0.77). (E) Barplot representing the overlap in NARs defined by Nup153 and Mtor binding profiles. (F) Histogram of Nup153 and Mtor NAR length distributions. doi:10.1371/journal.pgen.1000846.g001

PLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

Therefore, we explored the impact of NARs on transcriptional regulation by examining the activity of genes encoded within these regions (Figure 2; Tables S1, S2). We measured gene expression levels using Affymetrix GeneChips (see Materials and Methods). Using the present-absence calls defined by the MAS5.0 algorithm [34], we detected the expression of 6,478 and 6,219 genes in SL-2 and Kc cells respectively. These genes are preferentially located within NARs: 63% of genes inside NARs are expressed compared with just 40% outside, indicating a significantly elevated transcriptional activity in the former (p-value ,2.2e216). This observation is supported by data quantifying RNA polymerase II-occupancy (Figure 2; Tables S1, S2); by mapping publicly available ChIP-chip data [35], we find the Pol II-binding is highly enriched at the promoters of genes inside NARs compared with those outside (p-value ,2.2e216). Recent publications demonstrated that histone modifications, MOF acetyltransferase- and lamin-binding are robust genomewide indicators of transcriptional activity. In both SL-2 and KC cells, acetylated histone H4 lysine (H4K16Ac) and MOF-binding [32]—strong markers for active transcription—are extremely prominent within NARs (Figure 2; Tables S1, S2; p-value

by Guelen et al [11] returns over 80% agreement with our method (in terms of base-pairs classified as NARs). There is considerable NAR-occurrence (Figure 1A–1C); in male SL-2 cells, a total of 1,384 NARs cover a quarter of the entire Drosophila genome (25Mb and 29Mb for Nup153 and Mtor respectively) and in female Kc cells 1,865 NARs occupy a similar proportion of the genome (33Mb and 35Mb for Nup153 and Mtor respectively; Figure S3). Most domains range in size from 10 kb to 100 kb, although some even extend to over 500 kb (Figure 1F, Figure S4). Most nucleotide positions within NARs are occupied by both Nup153 and Mtor. Moreover, even where the overlap is not perfect, NARs tend to occur in similar genomic loci (Figure 1E; Figure 1B chromosomal positions 560,000–600,000). Most importantly, NARs occur in gene-rich areas that encompass over 4,700 protein-coding genes whose activities might be affected by nucleoporin-binding.

Nucleoporin-binding demarcates actively transcribed chromosomal regions A direct relationship between nucleoporin-binding and gene expression has not been established so far in higher eukaryotes.

Gene legend Present Up−regulated Down−regulated

Chr.2L

NARs Nup153 NARs Mtor NARs

Active marks H4K16Ac MOF Pol II Gene expression

Nup153 RNAi

Repressive marks Lamin H3K27me3 3.0

3.2

3.4

3.6

3.8

4.0

Chromosomal location (Mb)

Figure 2. NARs define transcriptionally active regions of the genome. Genome-track view of 1Mb section on chromosome 2L. NARs are enriched for transcribed genes compared with non-NARs (gene expression track; green shading), and a large proportion of genes are down-regulated upon Nup153-depletion (Nup153 RNAi track; red shading). NARs also align with markers of a transcriptionally active chromatin structure (H4K14Ac, MOF and PolII tracks; grey shading), but exclude markers for inactive chromatin (lamin, H3K27me3; grey shading). doi:10.1371/journal.pgen.1000846.g002

PLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

,2.2610216). In contrast, histone H3 lysine 27 tri-methylation [36] and lamin-binding [12]—markers of transcriptional repression—are enriched outside NARs (Figure 2, Figure S5; Tables S1, S2; p-value ,2.2e216). Finally, we confirmed a causal link between nucleoporinbinding and transcriptional regulation by measuring gene expression levels following RNAi-mediated knock-down of Nup153 (Figure 2, Figure S7; Tables S1, S2). The depletion results in large and wide-spread transcriptional changes in cells collected after seven days: 5,684 genes 240% of Drosophila genes represented on the array—are differentially expressed in SL-2 cells (p-value ,0.05). Moreover, there is a large enrichment of down-regulated genes within NARs (29% of all genes; 40% of ‘present’ genes) compared with non-NARs (19% of all genes; pvalue ,2.2e216). We obtain similar enrichments for cells collected five days after RNAi-treatment, and also upon Mtor-depletion (data not shown). These observations strongly indicate that nucleoporin-binding promotes a high-level of transcriptional activity, which may be due to the formation of an open chromatin environment.

nucleoporin, Nup50 does not influence MSL1 and MOFlocalisation and binding (Figure S9; data not shown). Moreover, the observations are not due to an effect on MSL protein concentrations or defects in the RNA export pathway [27]: we previously showed that MSL levels remain unaffected in Nup153 and Mtor-depleted cells; and impairment of the major export pathways through NFX1-depletion does not disrupt the localisation of the MSL complex to the X chromosomes.

Spatial localisation of NARs versus non–NARs in the nucleus Although nucleoporins are primarily located at the nuclear periphery, some display dynamic association with the nuclear pore complex [37], and it remains unclear whether nucleoporinchromatin interactions would affect transcription at the periphery or within the nucleoplasm. Therefore, we assessed the spatial localisation of different chromosomal regions within the nucleus using three-dimensional imaging of Fluorescence In Situ Hybridisation (3D-FISH) in male and female cells (Figure 4). We selected 26 chromosomal regions of average length 15–20 kb for analysis (Table S3), comprising 18 NAR (targets T1-18) and 8 non-NAR loci (targets N1-8). An independent lamin-bound locus (target L105) was used as a positive control representing a region previously shown to localise at the nuclear periphery [12]. First we checked the localisation of Nup153 and Mtor themselves (Figure S6). Immunostaining of SL-2 cells and salivary glands from male larvae confirm that both proteins predominantly reside in the nuclear periphery, although we also detected some staining within the nucleus. This is consistent with earlier reports that these proteins are dynamic components of the nuclear pore complex, with the capacity to shuttle between different sub-nuclear locations [25,37]. Next, we used DAPI and lamin protein-immunostaining to assess the nuclear localisation of our target loci. We display a selection of images in Figure 4A: the lamin protein in green defines the nuclear boundary, the DAPI in blue the distribution of genomic DNA, and the FISH signal in red specifies the position of the target locus. In order to account for cell-to-cell variation in localisation that results from the dynamic behaviour of chromatin, we measured the distance between the FISH signal and nuclear boundary for a large number of samples (44,n,91). Size differences between nuclei were normalised by representing distances as a percentage of the nuclear radius. In Figure 4B, we show the expected distribution of distances for a simulated locus situated at the periphery; for a FISH signal with 30% radius, we find that most measurements lie between 0% and 30% of the distance to the centre of the nucleus. In contrast simulations for a signal positioned halfway between the periphery and the centre results in a distinct, more symmetrically shaped distribution, with most measurements falling between 20% and 60% of the distance to the centre (Figure 4C; Figures S10, S11; Videos S1, S2, S3, S4). The lamin-bound L105 locus displays a distribution that is heavily skewed towards the periphery (Figure 4D); however the profile is broader than the simulation, signifying that the locus is present at the interior of the nucleus at least part of the time. On the other hand, target N2 resembles that of the non-peripheral simulation (Figure 4E), albeit with a broader distribution, which indicates that the locus predominantly resides in the interior. Since both loci are NAR-independent, they were assigned as in vivo controls representing peripheral and non-peripheral localisation. Many NAR-target distributions show almost perfect overlap with L105, demonstrating that they are preferentially situated at the periphery (Figure 4F–4G; see Materials and Methods); interestingly however a subset of NAR loci displays distributions

NARs are enriched on the male X chromosome One of the most important manifestations of gene expression control in higher eukaryotes is dosage compensation for different number of sex chromosomes between the two sexes. In Drosophila—in which females have two X chromosomes but males possess only a single X—the dosage compensation complex offsets the imbalance in gene content by doubling the expression of the male X chromosome. Thus, the chromosome represents an outstanding example of an exceptionally highly transcribed genomic region. In order to explore the association of Nup153 and Mtor with the dosage compensation complex further, we compared the patterns of nucleoporin-binding in male SL-2 and female Kc cells (Figure 1A, Figure 3A–3D, Figure S3). There is a dramatic difference between the two sexes: in females, NARs are evenly distributed throughout the entire genome with only a 1.2-fold difference in % NAR occupancy between chromosome X (7.4Mb and 33% for Nup153; 8.0Mb and 36% for Mtor) and autosomes (26.0Mb and 27% for Nup153; 27.1Mb and 28% for Mtor); but in males, NARs are overwhelmingly biased towards the X chromosome (14.9Mb and 67% for Nup153; 16.6Mb and 75% for Mtor) compared with the autosomes (9.7Mb and 10% for Nup153; 12.0Mb and 12% for Mtor) with a 6-fold difference in occupancy. Further, domains on the male X chromosome (median length = 62Kb, 94Kb for Nup153 and Mtor respectively) are much longer than those found on any other chromosomes (median length = 22Kb for Nup153 and Mtor in male autosomes, ,35Kb for female autosomes and X chromosome). Having established that the nucleoporins are enriched on the male X chromosome, we explored the association with the dosage compensation system further. Recently, we demonstrated that the members of the dosage compensation complex—MSL1, MSL3 and MOF—preferentially bind to the male X chromosome [32]. A comparison of this previously published dataset with our current analysis shows that NARs on the male X chromosome coincide very well with the binding sites of the dosage compensation complex (Figure 3E). We also tested the effects of Nup153-depletion on MSL1 and MOF-binding to 10 known target loci using chromatin-immunoprecipitation followed by qPCR. X-chromosomal binding is severely reduced for both proteins (Figure 3F), and the additional binding to autosomal targets is lost for MOF (Figure S8). The effects are clearly specific to Nup153, as depleting another PLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

Figure 3. Male X chromosome is especially enriched for NARs. Percentages of NAR occupancy on male and female autosomes and X chromosome for (A) Nup153 and (C) Mtor. In males, NARs are particularly enriched on the X chromosome compared with autosomes, whereas NARs occur evenly throughout in females. NAR length distributions for (B) Nup153 and (D) Mtor. NARs are much longer on the male X chromosome. (E) Overlap between NARs and MSL1-, MSL3- and MOF-binding; numbers represent gene counts. (F) Effect of Nup153-depletion on MSL1- (red shading) and MOF-binding (grey shading) to four X-chromosomal target loci. DNA prepared from cells treated with EGFP (control) or Nup153 dsRNA was immunoprecipitated and analysed by qPCR using primers for the beginning (P1), middle (P2) and end (P3) of genes. Error bars represent the standard deviation in measurements from three replicate experiments. Recovered DNA is shown as a percentage of input DNA. doi:10.1371/journal.pgen.1000846.g003

PLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

Figure 4. Nup153 and Mtor define NARs both at the periphery and the interior of the nucleus. (A) Representative images of single confocal sections of nuclei containing the FISH signal (red) over DAPI (blue) and immunostained lamin (green). Target genomic regions include a lamin-bound gene (L105), NAR (T4, T15, T7, T11, T9, T13) and non-NAR loci (N1, N2). Probability density plots show the distribution of distance measurements between the FISH signal and the closest point on the nuclear boundary. Simulated nuclei show the ideal distributions for FISH targets located at the (B) periphery and (C) interior. Distances range from 0 at the boundary and 1.0 at the centroid of the nucleus. The grey background represents the theoretical 30% limit for a peripherally localised FISH signal. Observed distributions of in vivo controls for (D) peripherally localised L105 and (E) non-peripherally localised N2; the broad spread compared with simulations indicate that the loci display dynamic behaviour in their positioning within the nucleus. (F, G) Predominantly peripheral loci (T4, T15) have distributions that are similar to L105 (shown in yellow), whereas (H, I) predominantly non-peripheral loci (N1, T11) have very different distributions. Aggregate distributions for all NAR targets on (J) the male X chromosome and (L) autosomes, and all non-NAR targets on (K) the male X and (M) autosomes. Targets on the X chromosome are peripherally localised compared with autosomal ones. doi:10.1371/journal.pgen.1000846.g004

that are indicative of non-peripheral localisation (Figure 4I). For non-NARs, targets such as N1 display good overlap with the negative control N2 (Figure 4H), but some are found at the periphery. It is clear, therefore, that many targets regions tested here do not conform to the behaviour expected from NPCbinding. In fact, we find that NARs from chromosome X tend to reside at the periphery (6 out of 10 targets; Table S4), whereas only a small number of autosomal NARs do so (1 out of 8; Table S4). This is reflected in the aggregate distributions, in which Xchromosomal loci display the characteristic skewed profiles compared with autosomal regions (Figure 4J–4L). Among nonNARs (Figure 4K–4M), autosomal loci are invariably nonperipheral, whereas the X chromosomal targets display a tendency for peripheral localisation; the positioning of the latter is probably influenced by neighbouring NARs as there is such a large amount PLoS Genetics | www.plosgenetics.org

of binding on the X chromosome. For comparison, peripheral localisation of the X chromosome is absent in female Kc cells (data not shown). Thus in striking contrast to prior expectations, we reveal that interior as well as peripheral populations of nucleoporins bind chromatin and mediate transcriptional activity at NARs. Furthermore, interactions with the X chromosome promotes peripheral localisation of the chromosome—most likely as a result of the overwhelming amount of binding in males—but this is generally not the case for autosomes. Finally to confirm the influence of nucleoporins on localisation, we tested the effects of RNAi-mediated Nup153-knockdown for six loci: three peripheral X chromosomal NARs (T4, T5, T7), a nonperipheral X chromosomal NAR (T11), a non-peripheral autosomal NAR (T9) and the non-peripheral control (N2). For each we compared the distribution of Nup153-depleted samples against a mock EGFP RNAi-treatment (Figure 5, Figure S7). All 7

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

Figure 5. Peripheral localisation is dependent on Nup153. Probability density distributions of distance measurements for mock treated (red) and Nup153-depleted cells (purple). Histograms depict the proportion of nuclei for which the FISH signal is located within the 30% distance threshold (DAPI in blue). (A-C) NAR targets on the male X chromosome (T4, T7, T5) relocalise to the interior upon treatment, indicating that peripheral localisation is dependent on Nup153. (D-F) NAR and non-NAR targets at the interior remain unaffected upon Nup153-depletion. doi:10.1371/journal.pgen.1000846.g005

organisation of chromosomes within the nucleus is increasingly considered to have a profound effect on chromatin structure and transcriptional activity [5]. In particular, studies in yeast indicate that members of the nuclear pore complex might promote transcription by recruiting chromatin to the nuclear periphery [14,18]. However, the importance of such regulation in higher eukaryotes has remained unresolved [26]. In this study, we established conclusively that nucleoporins play a central role in mediating transcriptional regulation in a complex, multicellular organism. For the first time in any higher eukaryote, we generated a genome-wide profile of nucleoporin-binding; contrary to preliminary observations, binding is widespread, occurring across 40% of the genome. Thus, we reveal that nucleoporins—Nup153 and Mtor in particular—represent a major new class of global chromatin-binding proteins. Intriguingly, these proteins interact with the genome differently to traditional transcription factors. Rather than associate with individual loci, nucleoporins bind continuous sections of chromo-

three peripheral targets on the X chromosome displace to a more intra-nuclear position upon loss of Nup153 (Figure 5A–5C; p-value ,0.05), but in contrast there was no significant change for any of the non-peripheral loci (Figure 5D–5F; p-value .0.05). These data suggest that the sub-nuclear positioning of peripheral NARs—specifically those on the male X—depends on the presence of Nup153, whereas the localisation of intra-nuclear loci is independent regardless of whether they are bound by nucleoporins.

Discussion The classical view of transcriptional regulation describes the interplay of transcription factors, histones and associated enzymes with DNA in order to recruit the transcriptional machinery to the appropriate genomic loci. However, it has become increasingly clear that these interactions explain only one level at which gene expression is controlled. At a genome-wide level, the spatial PLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

at least 80% of all probes defined as domains by the Guelen approach.

somes at very high density. Termed NARs, these regions extend up to 500kb in length and occupy 25% of the entire Drosophila genome. Moreover, NARs are functionally important as they demarcate regions of open chromatin and transcriptional activity, which is lost on depletion of Nup153. It is significant that the male X chromosome—a prime example for hyper-transcription—is almost entirely occupied by NARs. Therefore, we suggest that Nup153 and Mtor may stimulate transcription by promoting the formation of an open chromatin environment. In dramatic contrast to expectations, nucleoporin-binding does not automatically lead to localisation at the nuclear periphery, though the male X chromosome is an exception in this regard. Since Nup153 and Mtor are known to be dynamic components of the nuclear pore complex, it appears likely that both peripheral and intra-nuclear pools of nucleoporins contribute to chromatinbinding. Given the dynamic nature of chromatin-localisation, it is also possible that NARs are located at the periphery in a very transient manner, and further developments in imaging techniques will help clarify this. Where NAR-formation and peripheral localisation do coincide however, Nup153 is necessary for sustained positioning. Chromosomal domains have been implicated in the formation of three-dimensional sub-nuclear structures to coordinate the expression of otherwise distant loci [38] such as the human betaglobin genes [39,40]. We speculate that NARs may indicate the genomic regions required for the assembly of these transcription factories on a very large scale. Within this context, the dynamic nature of Nup153 and Mtor is significant, as re-localisation of these proteins might allow a basis for global transcriptional control in response to cellular cues. Additionally, given the primary function of the nuclear pore complex in transporting macromolecules to and from the nucleus, Nup153 and Mtor may provide a means to couple transcriptional control with post-transcriptional events. We stress however that the mechanisms behind such processes are the subject of intense research activity and many controversies remain. Finally, the special link with dosage compensation confirms the importance of nucleoporin-binding not only as a molecular mechanism for transcriptional control, but also in maintaining a complex, organism-level biological system.

RNAi on cultured cells Nup153 and Nup50 were depleted as previously described in Mendjan et al [27]. Briefly, cells were incubated with dsRNA for five or seven days with a boost on day two. Cells were subsequently harvested for Western blot analysis, ChIP, gene expression profiling, or immunofluorescence experiments. Control experiments were performed using mock treatment (EGFP RNAi).

Gene expression profiling Gene expression was measured using Affymetrix Drosophila2 GeneChips in triplicate for each condition. Data analysis was performed using publicly available packages in the BioConductor Software Suite [43]. Raw .CEL files were processed using RMA [44] and probe-sets were mapped to genes using the annotation from the Ensembl database (v41) [45]. In control (EGFP-treated) cells, expressed genes were identified as those outputting MAS5.0 ‘present’ cells in all three replicates [34]. For comparisons of Nup153-depleted and mock-treated cells, differentially expressed genes were determined using the Limma package [46]; p-values were corrected for multiple-testing using FDR [47] and a significance threshold of p-value,0.05 was selected.

Overlap of NARs with markers for transcriptional activity We compared the overlap between NARs and genomic features. For ease of comparison, all data were mapped onto the Drosophila genome provided by the Ensembl database (v. 41) [45]. Accompanying each entry is the statistical significance of the difference in the amount of genomic feature found within NARs and non-NARs. (i) Histone H4 lysine K16 acetylation (H4K16Ac; p,2.2e216; ttest): processed ChIP-chip profiles obtained from Kind et al [32]. (ii) MOF-binding (p,2.2e216; Fisher test): processed ChIP-chip profiles obtained from Kind et al [32]. (iii) RNA PolII-occupancy (p,2.2e216; Fisher test): PolII-bound genes obtained from Muse et al [35]. For visualisation purposes in Figure 2, bound genes were represented as 1kb windows centred on the transcription start site. (iv) Gene density (p-value ,2.2e216; Wilcoxon test): number of genes as annotated by the Ensembl database within a 20kb sliding window with a 1 kb offset. (v) Expressed genes (p-value ,2.2e216; Fisher test): gene expression measured using Affymetrix Drosophila2 GeneChips as described above. (vi) Down-regulated genes upon Nup153-depletion (p-value ,2.2e216; Fisher test): differentially expressed genes in RNAi-treated cells compared with untreated cells as described above. (vii) Lamin-binding (p-value ,2.2e216; Fisher test): processed ChIP-chip data were obtained from Pickersgill et al [12]. Note that the study used low-resolution cDNA arrays, and therefore unlike the human study, the authors were unable to detect high-density lamin-associated domains. (viii) Histone H3 lysine 27 tri-methylation (H3K27me3; p-value ,2.2e216; Fisher test): processed ChIP-chip profiles obtained from Schwartz et al [36].

Materials and Methods ChIP–chip and qPCR analysis Chromatin immunoprecipitation combined with microarray hybridisation (ChIP-chip), and qPCR experiments were performed as described previously in Kind et al [32]. Primer sequences are provided in Text S1. Numerical data from Affymetrix Drosophila Tiling 2.0R Arrays (Dm35b_MR_v02) were processed as in Kind et al [32]. Briefly, array data were background corrected using GCRMA and quantile normalised [41]. Log2 (ChIP/input) ratios were calculated using the average from three replicate experiments. Log2 ratios were then smoothed by averaging the signal within a 500 bp window centred on each probe (Figure S1).

Identification of Nucleoporin Associated Regions (NARs) Fluorescent In Situ Hybridisation on cultured cells

Chromosomal regions with high densities of Nup153- and Mtor-binding were identified by sliding a 10 kb window along each chromosome, centred on the start position of each probe. NARs were defined as continuous chromosomal regions containing positive binding signal (ie, log2 ratio .0) for more than 70% of probes. We also implemented the two-stage domain-finding method described by Guelen et al [11]. Our method recovered PLoS Genetics | www.plosgenetics.org

DNA FISH on SL-2 cells was performed as previously described by Lanzuolo et al [48]. Briefly for DNA FISH 16106 cells were centrifuged, re-suspended in 0.4 ml of medium and placed for 30 min at room temperature on a poly-lysine-coated slide (10 mm diameter). After rinsing with PBS, the cells were fixed with 4% paraformaldehyde in PBT (PBS, 0.1% Tween 20) for 10 min at 9

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

results were obtained when we used the centre of mass of the FISH signal as the reference point instead of the mean distances for individual voxels (data not shown). In total, we examined 1,712 nuclei (35–91 samples for each target locus; total 1,172 nuclei for NAR; total 540 nuclei for nonNARs). For a given target, we compiled all distance measurements from all relevant nuclei to produce a distribution of distances as shown in Figure 4 and Table S4. The lamin L105 and N2 non-NAR targets were selected as in vivo controls with representative distributions for peripheral and non-peripheral sub-nuclear localisation. We compared the localisation of each target locus by comparing its distance measurements against the L105 and N2 controls separately. Statistical significance was calculated using the Wilcoxon test, with a FDRcorrected threshold of p ,0.05. Briefly, a non-significant p-value (ie p-value .0.05) compared with the L105 distribution is indicative of peripheral localisation, whereas a non-significant pvalue (i.e. p-value .0.05) compared with the N2 distribution is indicative of non-peripheral localisation.

room temperature. Cells were then washed three times with PBT, incubated for 1 h at room temperature with RNAse A (100 mg/ml in PBT). After rinsing with PBS, cells were incubated with 0.5% Triton in PBS for 10 min at room temperature. Cells were rinsed again with PBS and incubated with 20% glycerol in PBS for 30 min at room temperature. Cells were then frozen in liquid nitrogen, thawed at room temperature and soaked in 20% glycerol in PBS, repeatedly four times. After washing the cells again with PBS three times, they were incubated for 5 min in 0.1N HCl, briefly rinsed in 2XSSC twice, and stored in 50% formamide, 2XSSC, 10% dextransulphate, pH 7.0. Fluorescent probes were prepared with the FISH Tag DNA Kit (Invitrogen, Carlsbad, CA), dissolved in the hybridization mixture (50% deionized formamide, 2XSSC, 10% dextransulphate, salmon sperm DNA at 0.5 mg/ ml), applied to cells and sealed under coverslips with rubber cement. Probe and cellular DNA were denatured simultaneously on a hot block at 78uC for 3 min. Hybridization was carried out in a humid atmosphere at 37uC for 1 d. After hybridization, slides were washed in 2XSSC three times for 5 min at 37uC, and in 0.1XSSC three times for 5 min at 45uC, rinsed in PBS twice and counter-stained with DAPI. For immuno-FISH, the following procedure is added after washing with 0.1XSSC at 45uC. Wash twice with 2XSSC 5 min each at RT. Blocking with (TNT buffer; 0.1M Tris-HCl pH 7.5, 0.15M NaCl, 5% BSA) for 1 h at RT. Anti-lamin antibody is incubated for overnight at 4uC in TNT buffer, wash with wash buffer three times for 5 min. Second antibody is applied in TNT buffer for 2–3 h at RT, wash with wash buffer (0.1M Tris-HCl pH 7.5, 0.15M NaCl), including DAPI staining as described above. Cells were mounted on the glass slide with FluoromountG (Southern Biotech. Birmingham, AL). Three-dimensional image stacks were taken with Leica SP5 confocal microscope (Leica Microsystems, Exton, PA) using an x63 oil immersion objective with a numerical aperture of 1.4, and zoom 3.260.2. To perform DNA FISH on target and non-target probes, approximately 15 kb region were chosen, except for the repeated sequence, in the genome and amplified by PCR from genomic DNA with 5–10 primers pairs, each covering around 0.5–3 kb. Primer sequences are available on request.

Accession numbers Microarray data are available in the ArrayExpress databaset [42] under accession numbers E-MEXP-2523 (gene expression data) and E-MEXP-2525 (ChIP-chip data).

Supporting Information Figure S1 Processing of ChIP-chip data and NAR determination for Nup153. All ChIP-chip assays were performed in triplicate. Raw data were GCRMA-normalised. Triplicates were averaged and binding ratios were calculated relative to average intensities from triplicates of 10% input DNA. Data were then smoothened by using averaging of intensities within a 500bp sliding window centred on each probe. We then calculated the density of positively probes in 10 Kb windows centred on each probe, and used a cut-off of 70% to determine Nucleoporin Associated Regions (NARs). Profiles of the different analysis steps are illustrated for a 200 Kb region of chromosome X in SL-2 cells: GCRMA-normalised intensities for individual probes across three biological replicates (light orange); mean intensity values of the three biological replicates for Nup153 binding (orange); GCRMAnormalised intensities and mean values for the input DNA control (light and dark grey); ratios of Nup153-binding and control mean intensity signals (light blue); smoothed ratios using a 500-bp sliding window centred on each probe (dark blue); density of positively bound probes in 10 Kb windows centred on each probe (solid black line) and 70% threshold for detection of NARs (dotted red line); Nup153 NARs (dark red boxes); FlyBase genes in the forward and reverse strand are represented in light grey; coordinates represent the position on the corresponding chromosome. A similar procedure was used to determine NARs in male and female samples for Nup153 and Mtor. Found at: doi:10.1371/journal.pgen.1000846.s001 (0.6 MB PDF)

Image analysis of FISH localisation To determine quantitatively the three-dimensional position of the FISH signal within the nucleus, we used the ImageJ software [49]. The nuclear envelope was initially defined by segmentation of the DAPI image using the automated Otsu thresholding algorithm. The boundary definition was then refined against the lamin-staining, flagging significant deviations between the two signals if necessary. Figure S10 shows a schematic diagram of the procedure. We also display a distribution of radii calculated for 62 nuclei, demonstrating that the DAPI and lamin signals provide very consistent definitions of the nuclear boundary. Segmented images were then stacked in order to recreate the threedimensional nucleus. Next we calculated the distances between the FISH signal and the nuclear boundary (Figure S10, S11, Videos S1, S2, S3, S4). The segmented three-dimensional images of the nucleus were converted into a three-dimensional distance map using the Local Thickness plug-in (http://www.optinav.com/Local_Thickness. htm). We thresholded the FISH images to identify voxels within the nucleus that corresponded to the FISH signal and we measured the distances between all such voxels and the closest point on the nuclear boundary. For each nucleus we calculated the mean distance, and then for each test locus, we use the set of mean distances for all nuclei to plot the distance distribution. Similar PLoS Genetics | www.plosgenetics.org

Figure S2 Validation of Nup153 and Mtor target and nontarget genes by ChIP-QPCR. Chromatin prepared from SL-2 cells was used for immunoprecipitation using Nup153 (blue) and Mtor (grey) antibodies. Recovered DNA (% Input) was analysed by QPCR using primers in the beginning (P1), middle (P2) and end (P3) of genes as shown. Error bars represent standard deviation obtained from three independent experiments. Found at: doi:10.1371/journal.pgen.1000846.s002 (0.04 MB PDF) Figure S3 Nup153 and Mtor NARs in Kc cells. (A) Karyotype representation of Nup153 and Mtor NARs across the genome in 10

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

Kc cells. (B) Magnified view of a 1Mb region of chromosomal arm 2L. Tracks represent the smoothened binding ChIP/input ratio for Nup153 and Mtor (dark grey), the density of positively bound probes calculated in 10 Kb windows centred on each probe (solid grey line), and NARs (red boxes), for regions with a density of positively bound probes above 70%. (C) Magnified view of a 100 kb region in (B). Found at: doi:10.1371/journal.pgen.1000846.s003 (2.11 MB PDF)

bodies. Size markers (kDa) are indicated on the right side. (B) Cells treated with EGFP or Nup153 dsRNA were used for immunofluorescence confocal microscopy using antibodies against Nup153, Nup50, Mtor, MOF, and MSL1 (shown in red). Nup153, Mtor antibodies, and Hoechst were used for triple-immunostaining and pseudocolours were added using the ImageJ software. A similar strategy was used for MOF, MSL1, and Lamin triple immunostaining. Found at: doi:10.1371/journal.pgen.1000846.s009 (0.43 MB PDF)

Figure S4 Correlation between Nup153 and Mtor binding in Kc cells. (A) Smoothed scatter plot displaying the ChIP/input binding ratios for Nup153 and Mtor (Pearson r = 0.88). (B) Bar chart representing the overlap in NARs defined by Nup153 and Mtor binding profiles. (C) Histogram of Nup153 and Mtor NAR length distributions. Found at: doi:10.1371/journal.pgen.1000846.s004 (0.56 MB PDF)

Figure S10 Segmentation of DAPI image. (Top left) Nucleus labelled with lamin and stained with DAPI were used for the development of segmentation method (green = lamin, blue = DAPI). (Top centre) Lamin signal was thresholded and then reduced to single pixel rim (green). Detected rim was overlaid to the original lamin image. (Top right) The segmentation strategy were verified by measuring the deviation between DAPI segmented image (black/ white) and the lamin rim (green). Pink lines show how these deviations were measured. (Bottom) Probability density distributions of the mean radii of 62 individual nuclei calculated using the DAPI or lamin signals. The median radius for the DAPI segmented edge was 2.4560.33 mm and for the lamin signal was 2.4160.35 mm. Found at: doi:10.1371/journal.pgen.1000846.s010 (0.05 MB PDF)

Figure S5 H4K16Ac and H3K27me3 are mutually exclusive throughout the genome. (A) Detail view of H3K27me3 and H4K16Ac modifications in a 1 Mb region of chromosome X in SL-2 cells. H3K27me3 data were obtained from Schwartz et al (2006) [36] and H4K16Ac data were obtained from Kind et al (2008) [32]. For each modification, we used the cut-offs from the original publications to define significant signals. (B) Smoothed scatter plot of H4K16Ac and H3K27me3 modification intensity values. Only data points with significant intensity values are shown. Plot areas with high data density are shown in dark red; plot areas with low are density are shown in dark blue. Found at: doi:10.1371/journal.pgen.1000846.s005 (1.39 MB PDF)

Measurement of three-dimensional distance of FISH signals from the nuclear periphery. Three-dimensional brightestpoint projection images of a simulated nucleus showing (A) peripheral localisation and (B) non-peripheral localisation. Outlines of nuclear periphery in each z-slice (blue contours, DAPI channel) and FISH signal (red, FISH channel) are shown. Nucleus is rotated on the x-axis with 30 degree increments from top-left to bottom-right panel. Three-dimensional brightest-point projection images of real nuclei with NAR locus (C) T4 and (D) control locus N2. Bar = 5 mm. See also Videos S1-S4. Found at: doi:10.1371/journal.pgen.1000846.s011 (0.55 MB PDF) Figure S11

Figure S6 Immunostaining of Nup153 and Mtor in salivary

glands. Immunostaining of Nup153 and Mtor in salivary glands isolated from 3rd instar male larvae. Salivary glands were coimmunostained with either MSL1 antibody or pre-immune serum (Pre-Mtor, Pre-Nup153) and serum (Mtor and Nup153). Both Nup153 and Mtor show predominantly nuclear rim staining but there is also some diffuse staining within the nucleus. X chromosomal territory is observed with MSL1 staining. Found at: doi:10.1371/journal.pgen.1000846.s006 (0.23 MB PDF)

Table S1 Enrichment of active and repressive markers in NARs and non-NARs in SL-2 and Kc cells. Found at: doi:10.1371/journal.pgen.1000846.s012 (0.05 MB PDF) Table S2 Enrichment of H4K16Ac and gene density in NARs versus non-NARs for SL-2 and Kc cells. Found at: doi:10.1371/journal.pgen.1000846.s013 (0.05 MB PDF)

Figure S7 RNAi-mediated depletion of Nup153 in SL-2 cells. (A)

Whole extracts were obtained from cells treated with EGFP or Nup153 dsRNA for 0, 3, 5, or 7 days, and separated on SDS PAGE followed by western blot analysis using Nup153 and Tubulin antibodies. Size markers (kDa) are indicated on the right side. (B) Cells treated with EGFP or Nup153 dsRNA were used for immunofluorescence confocal microscopy. Nup153, Nup50, and Lamin antibodies were used for triple-immunostaining and pseudo colours were added using the ImageJ software. A similar strategy was used for MOF, MSL1 and Lamin triple immunostaining. Arrows indicate residual MSL1- or MOF-staining in Nup153-depleted cells. Found at: doi:10.1371/journal.pgen.1000846.s007 (0.75 MB PDF)

Table S3 Target (T) and non-target (N) regions used for FISH analysis. Start and end show the chromosomal localization coordinates according to release 3 of the Drosophila melanogaster genome (R5.11). Genes in each probe set are also indicated. Individual genes within these regions, which were further tested by Q-PCR in this study, are indicated in red. Found at: doi:10.1371/journal.pgen.1000846.s014 (0.08 MB PDF) Table S4 This table accompanies Figure 4. Chromosomal

location of the target and non-target regions is indicated. Total number of pixels and nuclei counted is also indicated as well as the statistical significance of each target or non-target region shown separately as well as average of each category. Found at: doi:10.1371/journal.pgen.1000846.s015 (0.06 MB PDF)

Figure S8 MOF-binding to autosomal promoters is affected in Nup153-depleted cells. Chromatin prepared from cells treated with EGFP (black) or Nup153 (grey) dsRNA was used for immunoprecipitation using MOF antibody. MOF-binding was scored on six autosomal target promoters. Recovered DNA was analysed by qPCR and is shown as percentage of input DNA (% Input). Error bars represent standard deviations obtained from three independent experiments. Found at: doi:10.1371/journal.pgen.1000846.s008 (0.33 MB PDF)

Text S1 Primer sequences for quantitative PCR; primer sequences for RNAi. Found at: doi:10.1371/journal.pgen.1000846.s016 (0.09 MB PDF) Video S1 3D projection movie of a simulated nucleus with FISH signal at nuclear periphery. Nuclear envelope is shown as blue contours and FISH signal is shown in red. Montages of the movies are shown in Figure S11. Found at: doi:10.1371/journal.pgen.1000846.s017 (0.73 MB MOV)

Figure S9 Control RNAi-mediated depletion of Nup50 in SL-2 cells. (A) Whole cells extracts were made from cells treated with EGFP or Nup50 dsRNA for 0, 5, or 7 days, and separated on SDS PAGE followed by western blot analysis using Nup50 and Tubulin antiPLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

Peperkok for support and Maarten Fornerod and Martin Hetzer for communication prior to publication. We thank Frederick Bantignies and Patrick Meister for advice on FISH protocols and Sebastein Huet for advise on FISH analysis. We are grateful to Paul Bertone for contributing to earlier analysis of the RNAi gene expression data. We thank Yosef Gruenbaum for gift of Drosophila Lamin antibody. We thank Vladimir Benes, Tomi Ba¨hr-Ivacevic, and Jos de Graaf for microarray hybridization and Stefan Terjung and ALMF team and Leica microsystems for support. We are grateful to Christian Haering, Lars Steinmetz, Francois Spitz, and Jan Ellenberg for critical reading of the manuscript.

Video S2 3D projection movie of a simulated nucleus with FISH

signal located between the periphery and nuclear centre. Found at: doi:10.1371/journal.pgen.1000846.s018 (0.73 MB MOV) Video S3 3D projection movie of real nucleus with NAR locus T4 localised to the periphery. Found at: doi:10.1371/journal.pgen.1000846.s019 (0.62 MB MOV) Video S4 3D projection movie of real nucleus with control locus N2 localised at the interior. Found at: doi:10.1371/journal.pgen.1000846.s020 (0.83 MB MOV)

Author Contributions Conceived and designed the experiments: NML AA. Performed the experiments: JMV RS JK. Analyzed the data: JMV RS JK KM NML AA. Contributed reagents/materials/analysis tools: JMV RS JK KM NML AA. Wrote the paper: NML AA.

Acknowledgments We thank members of the laboratory for helpful discussions and critical reading of the manuscript. We thank Bas van Steensel and Rainer

References 24. Goldman RD, Shumaker DK, Erdos MR, Eriksson M, Goldman AE, et al. (2004) Accumulation of mutant lamin A causes progressive changes in nuclear architecture in Hutchinson-Gilford progeria syndrome. Proc Natl Acad Sci U S A 101: 8963–8968. 25. Paddy MR, Belmont AS, Saumweber H, Agard DA, Sedat JW (1990) Interphase nuclear envelope lamins form a discontinuous network that interacts with only a fraction of the chromatin in the nuclear periphery. Cell 62: 89–106. 26. Brown CR, Kennedy CJ, Delmar VA, Forbes DJ, Silver PA (2008) Global histone acetylation induces functional genomic reorganization at mammalian nuclear pore complexes. Genes Dev 22: 627–639. 27. Mendjan S, Taipale M, Kind J, Holz H, Gebhardt P, et al. (2006) Nuclear pore components are involved in the transcriptional regulation of dosage compensation in Drosophila. Mol Cell 21: 811–823. 28. Lucchesi JC, Kelly WG, Panning B (2005) Chromatin remodeling in dosage compensation. Annu Rev Genet 39: 615–651. 29. Straub T, Becker PB (2007) Dosage compensation: the beginning and end of generalization. Nat Rev Genet 8: 47–57. 30. Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, et al. (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409: 533–538. 31. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, et al. (2000) Genomewide location and function of DNA binding proteins. Science 290: 2306–2309. 32. Kind J, Vaquerizas JM, Gebhardt P, Gentzel M, Luscombe NM, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133: 813–828. 33. Farnham, PJ (2009) Insights from genomic profiling of transcription factors. Nat Rev Genet 10: 605–616. 34. Hubbell E, Liu WM, Mei R (2002) Robust estimators for expression analysis. Bioinformatics 18: 1585–1592. 35. Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, et al. (2007) RNA polymerase is poised for activation across the genome. Nat Genet 39: 1507–1511. 36. Schwartz YB, Kahn TG, Nix DA, Li XY, Bourgon R, et al. (2006) Genomewide analysis of Polycomb targets in Drosophila melanogaster. Nat Genet 38: 700–705. 37. Rabut G, Doye V, Ellenberg J (2004) Mapping the dynamic organization of the nuclear pore complex inside single living cells. Nat Cell Biol 6: 1114–1121. 38. Sutherland H, Bickmore WA (2009) Transcription factories: gene expression in unions? Nat Rev Genet 10: 457–466. 39. Osborne CS, Chakalova L, Brown KE, Carter D, Horton A, et al. (2004) Active genes dynamically colocalize to shared sites of ongoing transcription. Nat Genet 36: 1065–1071. 40. Simonis M, Klous P, Splinter E, Moshkin Y, Willemsen R, et al. (2006) Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat Genet 38: 1348–1354. 41. Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F (2004) A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association 99: 909–917. 42. Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, et al. (2009) ArrayExpress update - from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37: D868–D872. 43. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80. 44. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, et al. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31: e15. 45. Flicek P, Aken BL, Ballester B, Beal K, Bragin E, et al. (2010) Ensembl’s 10th year. Nucleic Acids Res 38: D557–D562.

1. Kouzarides T (2007) Chromatin modifications and their function. Cell 128: 693–705. 2. Li B, Carey M, Workman JL (2007) The role of chromatin during transcription. Cell 128: 707–719. 3. Clapier CR, Cairns BR (2009) The biology of chromatin remodelling complexes. Annu Rev Biochem 78: 273–304. 4. Suganuma T, Workman JL (2008) Crosstalk among histone modifications. Cell 135: 604–607. 5. Lanctoˆt C, Cheutin T, Cremer M, Cavalli G, Cremer T (2007) Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions. Nat Rev Genet 8: 104–115. 6. Branco MR, Pombo A (2006) Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol 4: e138. doi:10.1371/journal.pbio.0040138. 7. Branco MR, Pombo A (2007) Chromosome organization: new facts, new models. Trends Cell Biol 17: 127–134. 8. Cremer T, Cremer M, Dietzel S, Mu¨ller S, Solovei I, et al. (2006) Chromosome territories - a functional nuclear landscape. Curr Opin Cell Biol 18: 307–316. 9. Shaklai S, Amariglio N, Rechavi G, Simon AJ (2007) Gene silencing at the nuclear periphery. FEBS J 274: 1383–1392. 10. Dechat T, Pfleghaar K, Sengupta K, Shimi T, Shumaker DK, et al. (2008) Nuclear lamins: major factors in the structural organization and function of the nucleus and chromatin. Genes Dev 22: 832–853. 11. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, et al. (2008) Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453: 948–951. 12. Pickersgill H, Kalverda B, de Wit E, Talhout W, Fornerod M, et al. (2006) Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat Genet 38: 1005–1014. 13. Brickner JH, Walter P (2004) Gene recruitment of the activated INO1 locus to the nuclear membrane. PLoS Biol 2: e342. doi:10.1371/journal.pbio.0020342. 14. Brown CR, Silver PA (2007) Transcriptional regulation at the nuclear pore complex. Curr Opin Genet Dev 17: 100–106. 15. Cabal GG, Genovesio A, Rodriguez-Navarro S, Zimmer C, Gadal O, et al. (2006) SAGA interacting factors confine sub-diffusion of transcribed genes to the nuclear envelope. Nature 441: 770–773. 16. Taddei A, Van Houwe G, Hediger F, Kalck V, Cubizolles F, et al. (2006) Nuclear pore association confers optimal expression levels for an inducible yeast gene. Nature 441: 774–778. 17. Tran EJ, Wente SR (2006) Dynamic nuclear pore complexes: life on the edge. Cell 125: 1041–1053. 18. Casolari JM, Brown CR, Komili S, West J, Hieronymus H, et al. (2004) Genome-wide localization of the nuclear transport machinery couples transcriptional status and nuclear organization. Cell 117: 427–439. 19. Chambeyron S, Bickmore WA (2004) Chromatin decondensation and nuclear reorganization of the HoxB locus upon induction of transcription. Genes Dev 18: 1119–1130. 20. Ferreira J, Paolella G, Ramos C, Lamond AI (1997) Spatial organization of large-scale chromatin domains in the nucleus: a magnified view of single chromosome territories. J Cell Biol 139: 1597–1610. 21. Gilbert DM (2001) Nuclear position leaves its mark on replication timing. J Cell Biol 152: F11–F15. 22. Zink D, Amaral MD, Englmann A, Lang S, Clarke LA, et al. (2004) Transcription-dependent spatial arrangements of CFTR and adjacent genes in human cell nuclei. J Cell Biol 166: 815–825. 23. Akhtar A, Gasser SM (2007) The nuclear envelope and transcriptional control. Nat Rev Genet 8: 507–517.

PLoS Genetics | www.plosgenetics.org

February 2010 | Volume 6 | Issue 2 | e1000846

Nucleoporins Bind Active Chromatin

46. Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3. 47. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57: 289–300.

PLoS Genetics | www.plosgenetics.org

48. Lanzuolo C, Roure V, Dekker J, Bantignies F, Orlando V (2007) Polycomb response elements mediate the formation of chromosome higher-order structures in the bithorax complex. Nat Cell Biol 9: 1167–1174. 49. Dougherty R, Kunzelmann KH (2007) Computing local thickness of 3D structures with ImageJ. Microscopy and Microanalysis 13: 1678–1679.

February 2010 | Volume 6 | Issue 2 | e1000846

Maternal Ethanol Consumption Alters the Epigenotype and the Phenotype of Offspring in a Mouse Model Nina Kaminen-Ahola1, Arttu Ahola1,2, Murat Maga3, Kylie-Ann Mallitt1, Paul Fahey1, Timothy C. Cox3, Emma Whitelaw1,4, Suyinn Chong1,4* 1 Division of Genetics and Population Health, Queensland Institute of Medical Research, Herston, Australia, 2 Department of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland, 3 Division of Craniofacial Medicine, Department of Pediatrics, University of Washington, Seattle, Washington, United States of America, 4 Griffith Medical Research College, Griffith University and the Queensland Institute of Medical Research, Herston, Australia

Abstract Recent studies have shown that exposure to some nutritional supplements and chemicals in utero can affect the epigenome of the developing mouse embryo, resulting in adult disease. Our hypothesis is that epigenetics is also involved in the gestational programming of adult phenotype by alcohol. We have developed a model of gestational ethanol exposure in the mouse based on maternal ad libitum ingestion of 10% (v/v) ethanol between gestational days 0.5–8.5 and observed changes in the expression of an epigenetically-sensitive allele, Agouti viable yellow (Avy), in the offspring. We found that exposure to ethanol increases the probability of transcriptional silencing at this locus, resulting in more mice with an agouticolored coat. As expected, transcriptional silencing correlated with hypermethylation at Avy. This demonstrates, for the first time, that ethanol can affect adult phenotype by altering the epigenotype of the early embryo. Interestingly, we also detected postnatal growth restriction and craniofacial dysmorphology reminiscent of fetal alcohol syndrome, in congenic a/ a siblings of the Avy mice. These findings suggest that moderate ethanol exposure in utero is capable of inducing changes in the expression of genes other than Avy, a conclusion supported by our genome-wide analysis of gene expression in these mice. In addition, offspring of female mice given free access to 10% (v/v) ethanol for four days per week for ten weeks prior to conception also showed increased transcriptional silencing of the Avy allele. Our work raises the possibility of a role for epigenetics in the etiology of fetal alcohol spectrum disorders, and it provides a mouse model that will be a useful resource in the continued efforts to understand the consequences of gestational alcohol exposure at the molecular level. Citation: Kaminen-Ahola N, Ahola A, Maga M, Mallitt K-A, Fahey P, et al. (2010) Maternal Ethanol Consumption Alters the Epigenotype and the Phenotype of Offspring in a Mouse Model. PLoS Genet 6(1): e1000811. doi:10.1371/journal.pgen.1000811 Editor: Jeannie T. Lee, Massachusetts General Hospital, Howard Hughes Medical Institute, United States of America Received June 22, 2009; Accepted December 10, 2009; Published January 15, 2010 Copyright: ß 2010 Kaminen-Ahola et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by an Australian Research Council (ARC) Discovery Project grant (DP0878192) to EW and SC. NKA received funding from the Sigrid Juselius Foundation, Academy of Finland, Finnish Alcohol Research Foundation, the Finnish Cultural Foundation and Arvo and Lea Ylppo Foundation. TCC was supported by the Seattle Childrens Hospital and the Murdoch Charitable Trust. EW is a National Health and Medical Research Council (NHMRC) Australia Fellow. Part of this work was done in conjunction with the Collaborative Initiative on Fetal Alcohol Spectrum Disorders (CIFASD), which is funded by grants from the National Institute on Alcohol and Alcohol Abuse (NIAAA), U24 AA014811. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: suyinn.chong@qimr.edu.au

end of the inserted IAP. Specifically, hypomethylation is associated with constitutive ectopic Agouti expression and a yellow coat, while hypermethylation correlates with cryptic promoter silencing and a pseudoagouti coat [9]. We have previously shown that DNA methylation at Avy is reprogrammed in early development at the same time that the rest of the genome is undergoing epigenetic reprogramming [10]. Alcohol consumption is widespread in our society, but it is also recognized as the leading preventable cause of birth defects and mental retardation [11,12]. High levels of alcohol consumption during pregnancy can result in fetal alcohol syndrome (FAS) which is characterized by prenatal and postnatal growth restriction, craniofacial dysmorphology and structural abnormalities of the central nervous system. The clinical features of FAS are variable and include a range of other birth defects, as well as educational and behavioral problems [13]. This syndrome is the most extreme form of a range of disorders that are known as fetal alcohol spectrum disorders (FASDs) [14]. Approximately 5% of the children of mothers who have drunk heavily during pregnancy have FAS [15], and studies have shown that the dose, time and

Introduction While it is well-recognized that gestational exposure to environmental triggers can lead to compromised fetal development and adult disease in humans [1], the underlying molecular mechanisms remain unknown. There is increasing evidence in animal models that environmental factors can affect gene expression via epigenetic modifications such as DNA methylation [2–6]. One way of detecting such events is to use reporters whose expression is closely linked to their epigenetic state. Such epigenetically sensitive alleles are also known as metastable epialleles, and the best known example in the mouse is Agouti viable yellow (MGI:1855930) or Avy [7]. Avy is a dominant mutation of the murine Agouti (A) locus, caused by the insertion of an intracisternal A-particle (IAP) retrotransposon upstream of the Agouti coding exons. The activity of Avy is variable among genetically identical mice, resulting in mice with a range of coat colors; from yellow to mottled to agouti (termed pseudoagouti) [8]. The expression of Avy is known to correlate with DNA methylation at a cryptic long terminal repeat (LTR) promoter located at the 39 PLoS Genetics | www.plosgenetics.org

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

reprogramming in the etiology of FASD and provides researchers with a relevant mouse model of the human disorder.

Author Summary In humans it has been known for some time that exposure to environmental insults during pregnancy can harm a developing fetus and have life-long effects on the individual’s health. A well known example is fetal alcohol syndrome, where the children of mothers that consume large amounts of alcohol during pregnancy exhibit growth retardation, changes to the shape and size of the skull, and central nervous system defects. At present the molecular events underlying fetal alcohol syndrome are unknown. We have developed a model of alcohol exposure in the mouse, in which the genetics and the environment can be strictly controlled. We find that chronic exposure of the fetus to a physiologically relevant amount of alcohol during the first half of pregnancy results in epigenetic changes at a sensitive reporter gene and produces fetal alcohol syndrome-like features in some mice. Our model is a useful tool to study the underlying causes of fetal alcohol syndrome, and our work raises the interesting possibility that the long-term physical effects of alcohol exposure during pregnancy are mediated by epigenetic changes established in the fetus and then faithfully remembered for a lifetime. In the future, such epigenetic changes could be used as markers for the preclinical diagnosis and treatment of fetal alcohol spectrum disorders.

Results In this study, Avy was used primarily as a sensitive reporter of epigenetic changes in response to maternal ethanol consumption. The C57BL/6J mouse is null (a) at the Agouti locus, so it has a black coat color. Avy is a gain-of-function, semi-dominant mutation and so the coat color of heterozygous (Avy/a) mice in the C57BL/6J background is a direct read out of Avy transcriptional activity and DNA methylation. The nature of the matings used in this study, an Avy/a male crossed with an a/a female, means that only 50% of the offspring will inherit the Avy allele and be useful for coat color phenotyping. The remaining (a/a) offspring will be black. To study the effects of gestational ethanol exposure, female a/a C57BL/6J mice were supplied with 10% (v/v) ethanol in their drink bottles for eight days after fertilization by a congenic male carrying the Avy allele (n = 46 litters, 242 total offspring, 109 Avy/a offspring). To evaluate the effects of preconceptional ethanol exposure, female a/ a mice were given 10% (v/v) ethanol for four days per week for ten weeks prior to fertilization (n = 22 litters, 131 total offspring, 69 Avy/a offspring). The Avy allele was passed through the male germ line to avoid the bias associated with maternal transmission, where epigenetic marks can be incompletely cleared between generations [9]. Control mice were given water instead of ethanol (n = 37 litters, 189 total offspring, 91 Avy/a offspring). Maternal ethanol exposure during gestation did not significantly alter Mendelian inheritance of the Avy allele (data not shown) or litter size (control 5.160.4, ethanol exposed 5.260.3, mean6SEM, Student’s t-test, p = 0.9). The establishment of epigenetic marks at Avy occurs during early embryogenesis and is a probabilistic event. The resulting variable expression of Avy among genetically identical mice produces individuals with a predictable range of coat colors. We found that, in the absence of any treatment, 21% of the offspring of Avy/a sires were yellow, 66% were mottled and 13% were pseudoagouti (Figure 1). Gestational ethanol exposure resulted in a higher proportion of pseudoagouti (Pearson’s chi-square test, p,0.05). Twenty-eight percent of offspring were pseudoagouti compared with 13% in the control group (Figure 1). Preconceptional ethanol exposure produced a similar trend (Pearson’s chi-square test, p,0.05). This shows that ethanol exposure can influence the establishment of Avy expression early in development. It increases the probability of transcriptional silencing at this particular locus. To confirm that the coat color correlated with DNA methylation at the Avy allele in gestationally exposed mice, 11 CpG dinucleotides in the LTR cryptic promoter of the Avy IAP were subjected to bisulfite sequencing (Figure 2). The results showed that, as expected, ethanol-exposed yellow mice were hypomethylated compared to ethanol-exposed pseudoagouti mice. Interestingly, atypical hypermethylated clones were found in five out of six yellow mice in the ethanol-exposed group, but they were clearly not sufficient to affect coat color. In the ethanol-exposed group 11% of the CpG dinucleotides were methylated compared to 2% in the control group. Using this measure a Student’s t-test or non-parametric equivalent was unsuitable because the data did not meet the distribution requirements of being spread on a continuum. So we analyzed allele-specific methylation. In the ethanol-exposed group 23% of clones showed evidence of methylation, n = 91, compared with 8% of clones in the control group, n = 71 (Pearson’s chi-square tests, p,0.01). In contrast, total DNA methylation level in ethanol exposed pseudoagouti mice (61%) was not significantly different to that observed in the controls (65%, Student’s t-test, p = 0.27).

duration of ethanol exposure are critical [16,17]. There are a number of mouse models of FAS that have reproduced some of the phenotypic characteristics of the human disorder, particularly the craniofacial abnormalities [16,18,19]. It should be noted that these studies used acute ethanol exposures between gestational days (GDs) 7 and 9 and high concentrations; generally two intraperitoneal injections of 0.015 ml of ,25% (v/v) ethanol per gram of body weight over a 4 hour interval resulting in ataxia and lethargy. These studies only examined the fetal outcomes (GDs 818) of ethanol exposure and did not assay offspring either after birth or as adults. There are some rodent studies of the effects of gestational exposure to moderate amounts of ethanol, but these have only identified neurological and behavioral deficits [20]. The molecular mechanisms underlying FAS are unknown. Some studies have focused on the toxic effects of acetaldehyde, the first metabolite of ethanol [18,21]. Acute ethanol exposure has also been found to result in increased cell death in the developing central nervous system and neurological anomalies in rodents and other animal models [22,23]. The idea that epigenetic changes are involved has been raised but evidence in support of this hypothesis has, so far, been weak. Garro and colleagues [24] detected a small decrease in the level of global methylation of fetal DNA after acute ethanol administration from GDs 9-11. Bielawski et al. [25] reported decreased DNA methyltransferase 1 (Dnmt1) messenger RNA levels in rat sperm after nine weeks of paternal ethanol exposure. Haycock and Ramsey [26] studied imprinting of the H19/Igf2 in preimplantation mouse embryos after maternal ethanol exposure. Despite severe growth retardation of embryos, they did not find epigenetic changes at the H19 imprinting control region. Here we have developed a mouse model of chronic ethanol exposure (overt signs of intoxication are not observed) that produces measurable phenotypes in adults. We find that maternal ethanol consumption either before or after fertilization affects the expression of an epigenetically sensitive allele, Avy, in her offspring and that, at least in the latter case, can also impact postnatal body weight and skull size and shape in a manner consistent with FASD. Our work raises the possibility of a role for epigenetic PLoS Genetics | www.plosgenetics.org

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

Figure 1. Gestational and preconceptional ethanol exposure produced a higher proportion of pseudoagouti Avy mice. In each pedigree, the black square represents the a/a dam and the diamonds represent her Avy/a offspring. Avy/a sires and all a/a offspring have been excluded from the pedigrees. White diamonds represent yellow offspring, light gray diamonds represent mottled offspring and dark gray diamonds represent pseudoagouti offspring. The percentage of offspring in each coat color category is indicated. The gestational ethanol exposure group and the preconceptional ethanol exposure group were statistically significantly different to the control group using Pearson’s chi-square test (p,0.05). doi:10.1371/journal.pgen.1000811.g001

addition to inter-individual variation, some of the genes were consistently differentially expressed in the ethanol-exposed group (Table S1). Twelve genes were significantly down-regulated (p,0.05) in the ethanol-exposed mice. Three of these; LIM domain and actin binding 1 (Lima1), also known as Eplin, Suppressor of cytokine signaling 2 (Socs2) and CDK5 and Abl enzyme substrate 1(Cables1) have been associated with growth [29–32]. Three; Socs2, Very low density lipoprotein receptor (Vldlr) and Cables1 have been associated with development of the nervous system [33–37] and one, Hepcidin antimicrobial peptide (Hamp1), has been reported to be downregulated in the livers of alcohol-fed rats [38]. We next focused on identifying the characteristic features of FAS in a/a pups exposed to moderate levels of alcohol in utero. All pups (from first litters) were weighed at three weeks of age. It was particularly important not to study Avy mice in these experiments because of the effects on body weight due to Agouti expression. Because litter size is known to influence body weight at weaning, we initially restricted our analysis to litters of 4–5 pups. The gestational ethanol exposure group consisted of 22 offspring, while the control group consisted of 26 offspring. The results (Figure 3) show that the mean weight of offspring of dams that consumed ethanol were significantly lower than that of controls (Student’s ttest, p,0.05). A second analysis included litter size as a random effect. Analysis of Variance of weight at 3 weeks, after adjustment for litter size, confirmed that the mean weight of the ethanol exposed group (n = 73) was statistically significantly smaller than the mean weight of the controls (n = 44, p,0.001, data not shown). The heads of 28–30 day old a/a mice (seven mice from gestational ethanol exposure group and 10 control mice) were subjected to micro-computed tomography, and three-dimensional computer-reconstructions at 18 mm resolution were made of each skull. Visual inspection of the reconstructions revealed an obviously smaller skull size in the ethanol group compared to controls. In addition, differences in shape in a few, but not all, individuals in the ethanol group were apparent. Most notable was the marked leftward deviation of the midface in one male (Figure 4B) and a significantly reduced interfrontal bone in one female (Figure 4C). To provide more quantitative information on skull shape, the 3D co-ordinates of thirty-four landmarks were recorded for each skull and used in various mathematical-based shape and form analyses.

Equivalent results were obtained from a random effects model which allowed for the clustering of clones within mice (p = 0.23). Bisulfite sequencing was also carried out on control and ethanolexposed mottled mice and we found the results extremely variable from one mouse to the next within both groups (Figure S1). Presumably, this is the result of the small size of the tissue sample (tail tip). The variegated expression in mottled mice means that any one sample could represent only one clonal patch, which could harbour an active or an inactive Avy allele and not represent the true methylation state of the whole animal. For this reason mottled mice were not used in our analyses. The effects of Avy expression are pleiotropic. For example, yellow mice exhibit hyperphagia, hyperglycemia, non-insulindependent diabetes and adult onset obesity [27]. We did not assay these other phenotypes following ethanol exposure in our mice as their relevance to humans is questionable since no human ortholog of the Avy allele has been identified. So, while Avy was initially useful as a sensitive indicator of epigenetic changes, any further study of FAS-like phenotypes must necessarily focus elsewhere in the genome. For this reason and the fact that variable Agouti expression would confound many phenotypes, all subsequent analyses were performed on the congenic a/a siblings of the Avy mice. In the mouse, IAPs are present at approximately 1,000 copies per haploid genome [28]. To see if gestational ethanol exposure changed the methylation level of IAPs globally, we performed bisulfite sequencing using PCR primers that anneal to all IAP LTRs and analyzed ten CpG sites in both the tail and forebrains of a/a mice. Tail DNA from eight ethanol-exposed mice (66 clones total) and eight control mice (65 clones total) were compared, and forebrain DNA from five ethanol-exposed mice (33 clones total) and five control mice (43 clones total) were compared using the Student’s t-test. All samples were highly methylated and no differences between the ethanol exposure group and controls were detected (Figure S2). This suggests that only a subset of IAPs, perhaps those that are usually hypomethylated, are sensitive to ethanol exposure. To detect changes in gene expression genome-wide, we performed expression arrays with liver tissue. The benefit of using liver is its homogeneity; it consists mainly of hepatocytes and consequently subtle changes will be detectable. We compared gene expression between age-matched male mice from the gestational ethanol group (three samples) and controls (four samples). In PLoS Genetics | www.plosgenetics.org

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

Figure 2. Avy methylation in control offspring and offspring exposed to ethanol in utero. Only mice with 100% yellow and 100% pseudoagouti coats were assayed. Control yellow (CY) offspring are numbered 1–6, control pseudoagouti (CP) offspring are numbered 1–6, ethanol yellow (EY) offspring are numbered 1–6 and ethanol pseudoagouti (EP) offspring are numbered 1–6. Methylation was analyzed by sequencing individual clones of PCR-amplified, bisulfite-converted tail genomic DNA. Each circle represents an individual CpG. Open circles indicate an unmethylated CpG, and closed circles represent a methylated CpG. Each line represents an individual clone and the methylation pattern of one allele in one cell. Each block of lines comprises clones derived from one bisulfite conversion. The total percentage of methylated CpGs is shown above each individual and group. There are more hypermethylated clones in yellow ethanol exposed mice (Pearson’s chi-square tests. p,0.01). There is no effect of ethanol exposure on the methylation of Avy in pseudoagouti mice (Student’s t-test, p = 0.27). doi:10.1371/journal.pgen.1000811.g002

Variates Analyses to the output of the GPA. EDMA, in contrast, uses a coordinate-free (or invariant) approach in which all the landmarks are converted into a matrix of inter-landmark distances [44,45]. We used EDMA to find the landmark pairs that show the most difference between two groups. Analysis of skull centroid sizes confirmed the observations of statistically significantly reduced cranial size in the ethanol group, even when the smaller body weight is taken into account (ANOVA, p,0.05; Figure 4D). Although the severe leftward deviation of the one male skull is biologically highly relevant, we

There are two classic approaches in geometric morphometrics: superimposition based methods such as Generalized Procrustes Analysis (GPA) [39–42] or invariant analyses of shape, such as Euclidean Distance Matrix Analysis (EDMA) [43,44]. GPA involves translation, rotation and scaling of landmark data through an iterative process during which the distances between the shapes are minimized by applying least-squares criteria. We used GPA to test for the mean shape difference between the groups and to quantify and visualize localized differences in the cranial shape. We also applied the multivariate ordination method Canonical PLoS Genetics | www.plosgenetics.org

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

seen in individuals presenting the milder end of fetal alcohol spectrum disorder, and support the notion that this ethanol regime provides a useful and relevant model for the effects of ethanol intake in humans.

Discussion The Avy allele has been called an epigenetic biosensor for environmental effects on the fetus [46]. Previous studies with this allele have identified a number of nutritional factors or toxic agents that affect expression and epigenetic regulation of Avy in offspring exposed in utero. For example, the addition of methyl supplements or an isoflavone (called genistein) to the diet causes hypermethylation of Avy and a shift to a pseudoagouti coat color [3,4,47], whereas bisphenol A, a chemical used in the manufacture of polycarbonate plastics causes a shift towards hypomethylation and a yellow coat [48]. We have used the Avy allele to investigate the epigenetic effects of exposure to ethanol, and established two models of moderate exposure in the mouse. The first involves exposure during the first eight days after fertilization; a period that encompasses pre-implantation, implantation and the first two days of gastrulation. This model simulates the effects of ethanol exposure during the first trimester of pregnancy in humans. Based on previous studies [49] we estimated that the peak blood alcohol level in our model is approximately 0.12%, which is a realistic human exposure. A World Health Organization (WHO) report shows that the maximum legal blood alcohol level for driving in Organization for Economic Cooperation and Development (OECD) countries varies from 0.02–0.08% [50]. The second model involves ethanol consumption for ten weeks immediately prior to fertilization; a period that encompasses multiple cycles of oocyte maturation and ovulation. In mammals, oocyte maturation is characterized by the resumption of meiosis, extrusion of the first polar body and the accumulation of RNAs and proteins in the cytoplasm in preparation for fertilization. Studies of epigenetic reprogramming have shown that following global DNA demethylation, the period of genome-wide remethylation coincides with implantation (for embryos) and oocyte growth (for female germ cells) [51]. Our results demonstrate that gestational ethanol exposure increases the likelihood of transcriptional silencing at Avy, resulting in an agouti-colored coat. It is worth emphasizing that despite being genetically identical, not all Avy mice become pseudoagouti; rather there is a subtle ,15% increase in the proportion of pseudoagouti offspring. Previous studies have demonstrated that there is a tight correlation between DNA methylation at Avy and coat color [9]. As expected, bisulfite sequencing showed that the observed coat colors correlated with DNA methylation status in all cases. We did observe atypical hypermethylated clones in five of six yellow mice from the ethanol exposed group that, while not sufficient to change coat color, may reflect a tendency towards increased DNA methylation in this group. Preconceptional ethanol exposure produced a similar shift towards pseudoagouti in Avy offspring. It is likely that the two types of ethanol exposure have different modes of action on Avy because this allele is paternally-derived and not present in unfertilized oocytes. Consequently, the effects of preconceptional ethanol exposure on Avy expression will be indirect, and further work will be required to understand this mechanism. It is of interest that the coat color changes observed in Avy mice exposed to a methyl rich diet can be inherited across generations [52]. It is therefore possible that the altered coat color following alcohol exposure could also be transmitted to the next generation, but was beyond the scope of this study.

Figure 3. Offspring in the gestational ethanol exposure group have a statistically significantly lower mean weight than the control group (Student’s t-test, p,0.05). All were a/a pups from first litters. Pups were weighed at 3 weeks of age and came from litter sizes of 4 and 5. The graph shows mean6SEM. doi:10.1371/journal.pgen.1000811.g003

chose to exclude this sample from subsequent shape analyses because of its significant impact on the results. This permitted us to assess the significance of other more subtle changes. However, it was included in the univariate analyses of relative cranial dimensions. In the absence of this outlier, CVA still revealed greater variation in overall craniofacial shape within the ethanol group (Figure 5A). Canonical variate 1 (CV1) clearly separated the females in the ethanol group from other skulls, suggesting a more pronounced effect on female skull shape. Notably, all the females as well as the included males from the ethanol group had positive values for CV2, whereas the controls spanned both negative and positive values (see Figure 5A), indicative of a similar trend in shape alteration in response to this level of gestational exposure to ethanol. One female from the ethanol group appeared to be unaffected in terms of craniofacial shape and grouped with the control females in all analyses. We then used EDMA to assess the differences in form between the ethanol and control groups. Analysis of the 561 possible interlandmark measurement combinations assessed by the 34 assigned landmarks (i.e. 34(34-1)/2) demonstrate that the majority show a consistent ratio below one, indicative of the fact that they are changed only relative to skull size and do not reflect localized altered shape. Nevertheless, numerous inter-landmark measures were shown to be significantly different from this mean form. The twenty most significant differences (a = 0.1) in either direction from the mean form are shown in Figure 5B. Strikingly, almost all of these forty most significant differences pertain to midfacial and palatal inter-landmark measures, highlighting the sensitivity of this region to the ethanol. In particular, these data reveal that the ethanol group as a whole have a relatively wider inter-orbital distance (inter-landmark measure 7–11), yet relatively shorter midface than controls (reflected in multiple inter-landmark measures). This is consistent with the CVA findings. A univariate analysis of inter-landmark distances (normalized to centroid size) also supported these differences between the ethanol and control groups, in particular, confirming the greater relative cranial and inter-orbital width in both males and females compared to the sex-matched controls (Figure 5C). Females from the ethanol group also showed greater variation in ‘nare’ height (data not shown), while males from the ethanol group showed reduced rostrum length (Figure 5C). Although less severe than the changes found with acute ethanol exposure in the mouse [16], many of these differences are reminiscent of the facial changes PLoS Genetics | www.plosgenetics.org

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

Figure 4. Variable midfacial dysmorphism and microcephaly in a/a offspring of mothers that consumed ethanol during gestation. The top panel shows 3D reconstructions of skull microCT data from 28â&#x20AC;&#x201C;30 day old mice. (A) Control female. (B) Ethanol-exposed male showing marked leftward deviation of the midface. (C) Ethanol-exposed female showing an almost absent interfrontal bone that is normally characteristic of untreated C57BL6/J mice. (D) Graph of centroid size (used as an estimate of overall cranial size) against body weight for control (n = 10) and ethanol exposed (n = 7) mice at 28â&#x20AC;&#x201C;30 days old. Centroid size was determined by summing the distances from each landmark to the centroid for each individual. Centroid size is highly correlated with body weight in both treatment groups, but those in the ethanol group are on average smaller than those in the controls (ANOVA, p,0.05). doi:10.1371/journal.pgen.1000811.g004

maternal behavior, we would argue that it is ultimately a product of exposure to ethanol. Interestingly the effects on skull shape in these mice, like the coat color presentation in Avy mice, are variable despite the fact that the mice are isogenic. Marked variability in phenotype has also been recognized in humans in which not all children of heavily drinking mothers have the typical FAS facial phenotype; the others falling in the continuum of FASDs [14]. While this variability has been attributed to genetic differences, and differences in the level/timing of exposure, it may also be a consequence of stochastic establishment of epigenetic state. The mechanism by which ethanol alters the establishment of epigenetic state at Avy is not known. It has been shown that chronic

Our model of moderate gestational ethanol exposure produces a postnatal growth restriction phenotype and craniofacial dysmorphism in line with those seen with FASD in humans. It is possible that the postnatal growth restriction phenotype is an indirect effect; for example, the offspring may be smaller because of deficient maternal care between birth and weaning. It is unlikely that the dam would have been intoxicated or even experiencing ethanol withdrawal symptoms in the postpartum period since ethanol exposure ceased at GD 8.5 and water was consumed for the rest of gestation (,11.5 days) and throughout nursing (21 days). Regardless of whether the phenotype is a direct physiological consequence of exposure in utero or the indirect result of altered PLoS Genetics | www.plosgenetics.org

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

Figure 5. Quantitative analysis of the effects of gestational exposure to ethanol on skull shape. (A) Output of the CVA on the procrustes coordinates. Two components account for more than 90% of the total variation. CV1 clearly discriminates ethanol treated female specimens from the rest of the samples. Both male and female ethanol specimens have CV2 values close to (and above) 0, whereas controls span the entire range. Changes in the skull morphology were visualized by using the loadings of the first two canonical variates. Negative values of CV1 indicate a relatively wider skull and relatively longer rostrum. Positive values of CV2 indicates relatively wider orbits (as measured by landmarks 7–11), and shorter rostrum. (B) Output of the EDMA form difference matrix. Out of the possible 561 interlandmark distances, the top and bottom 20 are shown. Values are reported as the proportion of ethanols to controls for each interlandmark distance. None of the confidence intervals for the reported distances crosses 1.0, indicating that all the dimensions are significantly lower in the ethanol specimens. The higher values at the bottom of the graph indicate relatively expanded dimensions for ethanols. 90% confidence intervals were estimated by the non-parametric bootstrap method. For landmark descriptions and positions, see Text S1 and Figure S3. (C) Mean relative size of selected linear dimensions in ethanol and control groups. Tails indicate the observed range. Measurements are divided by the centroid size to remove the size effect. Cranial width is measured as the distance between landmarks 15 and 17; rostrum length is measured as the distance between landmark 9 and center of landmarks 1 and 2; orbital width is measured by the distance between landmark 7 and 11. Abbreviations: FC, female control; MC, male control; FE, female ethanol; ME, male ethanol. Goodall’s F test was used to test for statistical significance of mean shape differences among groups. doi:10.1371/journal.pgen.1000811.g005

ethanol consumption can alter DNA methylation by changing the levels of S-adenosylmethionine (SAM), which donates methyl groups to cytosine [53,54]. It is also known that chronic or acute ethanol consumption can cause post-translational histone modifications in rat tissues [55–59]. The effect of ethanol on the developing embryo has been less studied at the molecular level. Candidate gene and microarray analyses have detected changes in the level of expression (both up- and down-regulation) of numerous genes [60–62] and decreased global DNA methylation in midgestation embryos has been reported following acute ethanol treatment [24]. Recent studies have also reported altered regulation of several microRNAs by ethanol suggesting a possible role for these RNA species in fetal alcohol syndrome [63,64]. Our gestational exposure experiments demonstrate that the epigenome is vulnerable to ethanol during early embryogenesis, a PLoS Genetics | www.plosgenetics.org

time when the DNA synthetic rate is high and there is genome-wide epigenetic reprogramming. Our preconceptional ethanol exposure experiments show that changes in the maturing oocyte (another period in development when there are widespread changes to the epigenome) can also affect offspring phenotype. The identification of microcephaly and midfacial dysmorphism in our gestational exposure model suggests effects on genes other than Avy. Our preliminary genome-wide gene expression analyses of liver in ethanol exposed mice revealed twelve consistently down-regulated and three up-regulated genes. Ongoing work will determine if the expression of these same genes has been changed in other tissues and whether it correlates with alterations in methylation level. The variable and subtle nature of the observed phenotypes will make this work challenging, but our ultimate goal is to gain a better understanding of the molecular processes underlying FASDs. 7

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

primary PCR followed by a semi-nested PCR with 2–5 ml of template (primers were forward 59 gaaaagagagtaagaagtaagagagagag 39, reverse 59 aaaatttaacacataccttctaaaaccccc 39 and seminested reverse 59 actccctcttctaaaactacaaaaactc 39) [10]. One bisulfite conversion and PCR was performed for each pseudoagouti sample, while 3–5 independent conversions and 3 PCRs/ conversion were performed for each yellow sample. Global IAP LTR sequences were amplified from bisulfite-converted tail and forebrain DNA using universal IAP primers; forward 59 ttgatagttgtgttttaagtggtaaataaa 39 and reverse 59 aaaacaccacaaaccaaaatcttctac 39 [67]. An agarose-only (no template) control was always included and the experiment was only continued if the agarose control was negative at the end of the semi-nested PCR. PCR fragments were gel-isolated and subcloned into the pGEM-T vector (Promega, Madison, Wisconsin, United States). Individually sequenced clones were analyzed with BiQ Analyzer [68]. To avoid bias, clones from the same PCR were only accepted if they differed by either CpG or non-CpG methylation. Any clones with lower than 90% conversion rate were also excluded from the dataset.

Materials and Methods Ethics Statement All animals were handled in strict accordance with good animal practice as defined by the relevant national and/or local animal welfare bodies, and all animal work was approved by the Animal Ethics Committee of the Queensland Institute of Medical Research (P986, A0606-609M).

Study Design and Animals The mice used in this study were inbred, genetically identical, C57BL/6J and all environmental factors (e.g. cage type, environmental enrichment) were standardized. We chose a voluntary consumption strategy for ethanol exposure instead of intraperitoneal injections or intragastric administration because it produces the least amount of maternal stress. C57BL/6 mice are also known to have a strong drinking preference for 10% (v/v) ethanol over water making them ideal for the study [65,66]. For gestational ethanol exposure, single mottled Avy/a males were caged with single 6–14 week old a/a females. The majority of females were 6–8 weeks old and virgins in both ethanol-exposured and control groups. The females were checked each morning for a vaginal plug which indicated that mating had taken place. The day of plugging was designated GD 0.5, the male was removed from the cage and the water bottle was replaced with one containing 10% (v/v) ethanol. Pregnant females were allowed free access to the drink bottle and food at all times. The ethanol solution was changed and consumption (ml) was measured every 24 hours. The average daily consumption was 3.160.4 ml of 10% (v/v) ethanol (or 12 g ethanol/kg body weight/day) which was not statistically significantly different from the average daily water consumption of control mice (Student’s t-test, two tailed, p = 0.8). It has been shown that in female mice, voluntary consumption of 10% (v/v) ethanol at 14 g ethanol/kg body weight/day produces an average peak blood alcohol level of ,120 mg/dl [49]. Only one out of 47 females tested refused to drink ethanol in the initial 24 hour period and was excluded from the analysis. On the final day of exposure, GD 8.5, the ethanol bottle was replaced with a bottle containing water. All dams were subjected to only one cycle of ethanol exposure. Offspring were left with their mothers until weaning (at 3 weeks of age), when their coat color was recorded (Avy/a mice) or they were weighed or subjected to micro-computed tomography (a/a mice). For preconceptional ethanol exposure, 6 week old a/a female mice were given 10% (v/v) ethanol for 4 days per week (4 days ethanol followed by 3 days water) for ten weeks. After treatment, at 18–22 weeks age, they were mated with mottled Avy/ a males. Avy/a offspring were weaned and phenotyped for coat color at three weeks of age. All ethanol and control exposures were performed in parallel so that exactly the same animal house conditions were experienced for all experiments. The coat color of Avy/a offspring was visually classified by a single trained observer (NK-A) and placed into one of five categories: yellow (.95% yellow), yellow/mottled (75–95% yellow), mottled (25–74% yellow or 25–74% agouti), pseudoagouti/mottled (75–95% agouti) or pseudoagouti (.95% agouti). In the final analysis these categories were combined into three classes: yellow, mottled (comprised of yellow/mottled, mottled and pseudoagouti/mottled) and pseudoagouti.

Gene Expression Arrays To detect possible changes in gene expression in gestational ethanol exposure mice compared to the controls, we used the MouseWG-6 v2.0 Expression BeadChips (Illumina). We extracted total liver RNA from 28 days old males from control and gestational ethanol groups, using a Qiagen RNeasy Plus-kit (Qiagen). We used a Bioanalyzer (Agilent RNA 6000 Nano, Agilent) to confirm the quality of RNA and accepted only samples with RNA Integrity Numbers (RINs) above 9. We amplified RNA using an Illumina TotalPrep RNA Amplification Kit and performed a Whole-Genome Gene Expression Direct Hybridization Assay (Illumina). The gene expression data from scanned microarray images generated by the Illumina BeadArrayTM Reader was analysed by the GenomeStudio Gene Expression Module (Illumina) by using probe information. Four control samples from two litters and three gestational ethanol exposure samples from three litters were analysed.

Analysis of Skull Morphology Seventeen a/a mice (ten controls and seven ethanol exposed mice) aged between 28 and 30 days were subjected to microcomputed tomography using a SkyScan 1076 microtomograph at the Small Animal Tomographic Analysis Facility located at the University of Washington. The sex and treatment breakdown of the microCT samples is female ethanol (n = 4), female control (n = 5), male ethanol (n = 3) and male control (n = 5). Specimens were scanned at 18 micron resolution (65 kV, 150 mA, 1.0 mm Al filter) and reconstructed as series of 8-bit grayscale images. Threedimensional models of the skulls were generated using the thresholding algorithm in Analyze 3D (Mayo Clinic, version 9.0). A grayscale value of 55 was determined to be the optimum threshold value to remove the soft tissue structures and scan noise while keeping the skull morphology intact, and was used for all specimens. Using the point measurement tool of Analyze, 35 landmarks were collected from each specimen (Text S1 and Figure S3). Specimens were digitized by the same observer (MM) to reduce inter-observer error. Visualizations showed that landmark 31 could not be accurately determined in every specimen because of the occasional fusion of the presphenoid and basisphenoid bones. Because geometric morphometrics requires homologous landmarks collected from every specimen, this landmark was omitted in subsequent analyses. Landmark data were fed into various morphometric packages. Using the R statistical package [69], linear measurements of

Bisulfite Sequencing For bisulfite sequencing of the Avy allele, 200–400 ng of tail genomic DNA was embedded in agarose and then treated with sodium bisulfite as described previously [10]. The bisulfite-treated DNA was resuspended in 30 ml of water and 5 ml was used in the PLoS Genetics | www.plosgenetics.org

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

certain common cranial dimensions were calculated from the landmark coordinates and normalized to their respective skull centroid sizes. Generalized Procrustes Analysis (GPA) was also conducted in R by using the SHAPES module. Goodall’s F test was used to test for statistical significance of mean shape differences among groups. The Canonical Variates Analysis (CVA) was conducted in the MorphoJ package [70]. The loadings of the canonical variates 1 and 2 were used to visualize the cranial shape changes depicted by each axis. The WinEDMA package [71] was used to conduct Euclidean Distance Matrix Analysis. We used the FORM procedure of WinEDMA to find the landmark pairs that significantly differed between two mean forms (i.e., ethanols and controls) as measured by the form difference matrix. Following Lele and Richtsmeier [45], the 90% confidence intervals for the pairwise ratios were calculated by bootstrapping the form difference matrix 1000 times.

Found at: doi:10.1371/journal.pgen.1000811.s002 (0.98 MB TIF) Landmark positions. Found at: doi:10.1371/journal.pgen.1000811.s003 (9.62 MB TIF)

Figure S3

Table S1 Summary of significantly up- and down-regulated genes in liver following ethanol exposure in utero. The Diff Score is a transformation of the p-value that provides directionality to the p-value based on the difference between the average signal in the control group versus the ethanol exposed group. For p-values of 0.05, 0.01 and 0.001 the Diff Scores are 613, 620, and 630, respectively. Found at: doi:10.1371/journal.pgen.1000811.s004 (0.03 MB XLS) Text S1 Landmark descriptions. Found at: doi:10.1371/journal.pgen.1000811.s005 (0.03 MB DOC)

Supporting Information

Acknowledgments

Figure S1 Avy methylation in control offspring and offspring

We thank Edward P. Riley (Center for Behavioural Teratology, San Diego State University) for helpful discussions.

exposed to ethanol in utero in mottled mice. Only mice with 50% yellow/50% pseudoagouti coats were assayed. Found at: doi:10.1371/journal.pgen.1000811.s001 (1.01 MB TIF)

Author Contributions

Figure S2 Global IAP methylation in control offspring and in

Conceived and designed the experiments: NKA EW SC. Performed the experiments: NKA AA MM KAM TCC SC. Analyzed the data: NKA AA MM PF TCC EW SC. Wrote the paper: NKA AA MM PF TCC EW SC.

offspring exposed to ethanol in utero. Methylation was analyzed by sequencing individual clones of PCR-amplified, bisulfite-converted forebrain and tail genomic DNA.

References 1. Barker DJ (1998) In utero programming of chronic disease. Clin Sci (Lond) 95: 115–28. 2. Wolff G, Kodell R, Moore S, Cooney C (1998) Maternal epigenetics and methyl supplements affect agouti gene expression in Avy/a mice. FASEB J 12: 949–957. 3. Waterland R, Jirtle R (2003) Transposable elements: targets for early nutritional effects on epigenetic gene regulation. Mol Cell Biol 23: 5293–5300. 4. Dolinoy D, Weidman J, Waterland R, Jirtle R (2006) Maternal genistein alters coat color and protects Avy mouse offspring from obesity by modifying the fetal epigenome. Environ Health Perspect 114: 567–572. 5. Waterland RA, Dolinoy DC, Lin JR, Smith CA, Shi X, et al. (2006) Maternal methyl supplements increase offspring DNA methylation at Axin Fused. Genesis 44: 401–406. 6. Sinclair KD, Allegrucci C, Singh R, Gardner DS, Sebastian S, et al. (2007) DNA methylation, insulin resistance, and blood pressure in offspring determined by maternal periconceptional B vitamin and methionine status. Proc Natl Acad Sci USA 104: 19351–19356. 7. Rakyan V, Blewitt M, Druker R, Preis J, Whitelaw E (2002) Metastable epialleles in mammals. Trends in Genetics 18: 348–351. 8. Wolff GL (1978) Influence of maternal phenotype on metabolic differentiation of agouti locus mutants in the mouse. Genetics 88: 529–39. 9. Morgan H, Sutherland H, Martin D, Whitelaw E (1999) Epigenetic inheritance at the agouti locus in the mouse. Nat Genet 23: 314–318. 10. Blewitt M, Vickaryous N, Paldi A, Koseki H, Whitelaw E (2006) Dynamic reprogramming of DNA methylation at an epigenetically sensitive allele in mice. PLoS Genet 2: e49. doi:10.1371/journal.pgen.0020049. 11. Abel E, Hannigan J (1995) Maternal risk factors in fetal alcohol syndrome: provocative and permissive influences. Neurotoxicol Teratol 17: 445–462. 12. Sokol RJ, Delaney-Black V, Nordstrom B (2003) Fetal alcohol spectrum disorder. JAMA 290: 2996–2999. 13. Jones K, Smith D (1973) Recognition of the fetal alcohol syndrome in early infancy. Lancet 302: 999–1001. 14. Hoyme HE, May PA, Kalberg WO, Kodituwakku P, Gossage JP, et al. (2005) A practical clinical approach to diagnosis of fetal alcohol spectrum disorders: clarification of the 1996 institute of medicine criteria. Pediatrics 115: 39–47. 15. Abel E (1995) An update on incidence of FAS: FAS is not an equal opportunity birth defect. Neurotoxicol Teratol 17: 437–443. 16. Sulik K, Johnston M, Webb M (1981) Fetal alcohol syndrome: embryogenesis in a mouse model. Science 214: 936–938. 17. Sulik K, Johnston M, Daft P, Russell W, Dehart D (1986) Fetal alcohol syndrome and DiGeorge anomaly: critical ethanol exposure periods for craniofacial malformations as illustrated in an animal model. Am J Med Genet Suppl 2: 97–112. 18. Webster W, Walsh D, McEwen S, Lipson D (1983) Some teratogenic properties of ethanol and acetaldehyde in C57BL/6J mice: implications for the study of the fetal alcohol syndrome. Teratology 27: 231–243.

PLoS Genetics | www.plosgenetics.org

19. Kotch, Sulik (1992) Experimental fetal alcohol syndrome: proposed pathogenic basis for a variety of associated facial and brain anomalies. Am J Med Genet 44: 168–76. 20. Choi IY, Allan AM, Cunningham LA (2005) Moderate fetal alcohol exposure impairs the neurogenic response to an enriched environment in adult mice. Alcohol Clin Exp Res 29: 2053–62. 21. Menegola E, Broccia ML, Di Renzo F, Giavini E (2001) Acetaldehyde in vitro exposure and apoptosis: a possible mechanism of teratogenesis. Alcohol 23: 35–9. 22. Driscoll CD, Streissguth AP, Riley EP (1990) Prenatal alcohol exposure: Comparability of effects in human and animal models. Neurotoxicology and Teratology 12: 231–237. 23. Sulik KK (2005) Genesis of alcohol-induced craniofacial dysmorphism. Exp Biol Med 230: 366–75. 24. Garro A, McBeth D, Lima V, Lieber C (1991) Ethanol consumption inhibits fetal DNA methylation in mice: implications for the fetal alcohol syndrome. Alcohol Clin Exp Res 15: 395–398. 25. Bielawski D, Zaher F, Svinarich D, Abel E (2002) Paternal alcohol exposure affects sperm cytosine methyltransferase messenger RNA levels. Alcohol Clin Exp Res 26: 347–351. 26. Haycock P, Ramsay M (2009) Exposure of mouse embryos to ethanol during preimplantation development: effect on DNA methylation in the H19 imprinting control region. Biol Reprod 81: 618–27. 27. Manne J, Argeson AC, Siracusa LD (1995) Mechanisms for the pleiotropic effects of the agouti gene. Proc Natl Acad Sci U S A 92: 4721–4. 28. Kuff EL, Lueders KK (1988) The intracisternal A-particle gene family: structure and functional aspects. Adv Cancer Res 51: 183–276. 29. Song Y, Maul RS, Gerbin CS, Chang DD (2002) Inhibition of anchorageindependent growth of transformed NIH3T3 cells by epithelial protein lost in neoplasm (EPLIN) requires localization of EPLIN to actin cytoskeleton. Mol Biol Cell 13: 1408–16. 30. Greenhalgh CJ, Metcalf D, Thaus AL, Corbin JE, Uren R, et al. (2002) Biological evidence that SOCS-2 can act either as an enhancer or suppressor of growth hormone signaling. J Biol Chem 277: 40181–4. 31. Leung KC, Doyle N, Ballesteros M, Sjogren K, Watts CK, et al. (2003) Estrogen inhibits GH signaling by suppressing GH-induced JAK2 phosphorylation, an effect mediated by SOCS-2. Proc Natl Acad Sci U S A 100: 1016–21. 32. Sovio U, Bennett AJ, Millwood IY, Molitor J, O’Reilly PF, et al. (2009) Genetic determinants of height growth assessed longitudinally from infancy to adulthood in the northern Finland birth cohort 1966. PLoS Genet 5: e1000409. doi:10.1371/journal.pgen.1000409. 33. Goldshmit Y, Greenhalgh CJ, Turnley AM (2004) Suppressor of cytokine signalling-2 and epidermal growth factor regulate neurite outgrowth of cortical neurons. Eur J Neurosci 20: 2260–6.

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetics and Gestational Alcohol Exposure

34. Scott HJ, Stebbing MJ, Walters CE, McLenachan S, Ransome MI, et al. (2006) Differential effects of SOCS2 on neuronal differentiation and morphology. Brain Res 1067: 138–45. 35. Hack I, Hellwig S, Junghans D, Brunne B, Bock HH, et al. (2007) Divergent roles of ApoER2 and Vldlr in the migration of cortical neurons. Development 134: 3883–91. 36. Larouche M, Beffert U, Herz J, Hawkes R (2008) The Reelin receptors Apoer2 and Vldlr coordinate the patterning of Purkinje cell topography in the developing mouse cerebellum. PLoS ONE 3: e1653. doi: 10.1371/journal. pone.0001653. 37. Zukerberg LR, Patrick GN, Nikolic M, Humbert S, Wu CL, et al. (2000) Cables links Cdk5 and c-Abl and facilitates Cdk5 tyrosine phosphorylation, kinase upregulation, and neurite outgrowth. Neuron 26: 633–46. 38. Bridle K, Cheung TK, Murphy T, Walters M, Anderson G, et al. (2006) Hepcidin is down-regulated in alcoholic liver injury: implications for the pathogenesis of alcoholic liver disease. Alcohol Clin Exp Res 30: 106–12. 39. Bookstein F (1991) Morphometric tools for landmark data: Geometry and biology. New York: Cambridge University Press. 456 p. 40. Dryden I, Mardia K (1998) Statistical shape analysis. Chichester, UK: John Wiley & Sons. 376 p. 41. Rohlf F, Marcus L (1993) A revolution in morphometrics. Trends in Ecology and Evolution 8: 129–132. 42. Slice D (2007) Geometric Morphometrics. Annual Review of Anthropology 36: 261–281. 43. Lele S, Richtsmeier J (1995) Euclidean distance matrix analysis: Confidence intervals for form and growth differences. Am J Phys Anthropol 98: 73–86. 44. Lele S, Richtsmeier J (2001) An invariant approach to the statistical analysis of shapes. (Chapman and Hall/CRC, Boca Raton). 45. Lele S, Richtsmeier J (1991) Euclidean distance matrix analysis: a coordinatefree approach for comparing biological shapes using landmark data. Am J Phys Anthropol 86: 415–427. 46. Waterland R (2006) Assessing the effects of high methionine intake on DNA methylation. J Nutr 136: 1706S–1710S. 47. Cooney C, Dave A, Wolff G (2002) Maternal methyl supplements in mice affect epigenetic variation and DNA methylation of offspring. J Nutr 132: 2393–2400. 48. Dolinoy D, Huang D, Jirtle R (2007) Maternal nutrient supplementation counteracts bisphenol A-induced DNA hypomethylation in early development. Proc Natl Acad Sci U S A 104: 13056–13061. 49. Allan AM, Chynoweth J, Tyler LA, Caldwell KK (2003) A mouse model of prenatal ethanol exposure using a voluntary drinking paradigm. Alcohol Clin Exp Res 27: 2009–16. 50. WHO (World Health Organization) (2004) World Health Report 2002: Reducing Risks, Promoting Healthy Life, World Health Organization, Geneva. 51. Reik W, Dean W, Walter J (2001) Epigenetic reprogramming in mammalian development. Science 293: 1089–93. 52. Cropley JE, Suter CM, Martin DI (2007) Methyl donors change the germline epigenetic state of the A(vy) allele. FASEB J 21: 3021. 53. Tsukamoto H, Lu S (2001) Current concepts in the pathogenesis of alcoholic liver injury. FASEB J 15: 1335–1349.

PLoS Genetics | www.plosgenetics.org

54. Lu SC, Huang ZZ, Yang H, Mato JM, Avila MA, et al. (2000) Changes in methionine adenosyltransferase and S-adenosylmethionine homeostasis in alcoholic rat liver. Am J Physiol Gastrointest Liver Physiol 279: G178–185. 55. Park P, Miller R, Shukla S (2003) Acetylation of histone H3 at lysine 9 by ethanol in rat hepatocytes. Biochem Biophys Res Commun 306: 501–504. 56. Kim J, Shukla S (2005) Histone H3 modifications in rat hepatic stellate cells by ethanol. Alcohol Alcohol 40: 367–372. 57. Kim J, Shukla S (2006) Acute in vivo effect of ethanol (binge drinking) on histone H3 modifications in rat tissues. Alcohol Alcohol 41: 126–132. 58. Pal-Bhadra M, Bhadra U, Jackson DE, Mamatha L, Park PH, et al. (2007) Distinct methylation patterns in histone H3 at lys-4 and lys-9 correlated with upand down-regulation of genes by ethanol in hepatocytes. Life Sci 81: 979–987. 59. Lee Y, Shukla S (2007) Histone H3 phosphorylation at serine 10 and serine 28 is mediated by p38 MAPK in rat hepatocytes exposed to ethanol and acetaldehyde. Eur J Pharmacol 573: 29–38. 60. Rifas L, Towler D, Avioli L (1997) Gestational exposure to ethanol suppresses msx2 expression in developing mouse embryos. Proc Natl Acad Sci U S A 94: 7549–7554. 61. Da Lee R, Rhee GS, An SM, Kim SS, Kwack SJ, et al. (2004) Differential gene profiles in developing embryo and fetus after in utero exposure to ethanol. J Toxicol Environ Health A 67: 2073–2084. 62. Hard M, Abdolell M, Robinson B, Koren G (2005) Gene-expression analysis after alcohol exposure in the developing mouse. J Lab Clin Med 145: 47–54. 63. Sathyan P, Golden H, Miranda R (2007) Competing interactions between micro-RNAs determine neural progenitor survival and proliferation after ethanol exposure: evidence from an ex vivo model of the fetal cerebral cortical neuroepithelium. J Neurosci 27: 8546–8557. 64. Wang LL, Zhang Z, Li Q, Yang R, Pei X, et al. (2009) Ethanol exposure induces differential microRNA and target gene expression and teratogenic effects which can be suppressed by folic acid supplementation. Hum Reprod 24: 562–579. 65. McClearn G, Rodgers D (1959) Differences in alcohol preference among inbred strains of mice. Q J Stud Alcohol 20: 691–695. 66. Belknap J, Crabbe J, Young E (1993) Voluntary consumption of ethanol in 15 inbred mouse strains. Psychopharmacology 112: 503–510. 67. Lane N, Dean W, Erhardt S, Hajkova P, Surani A, et al. (2003) Resistance of IAPs to methylation reprogramming may provide a mechanism for epigenetic inheritance in the mouse. Genesis 35: 88–93. 68. Bock C, Reither S, Mikeska T, Paulsen M, Walter J, et al. (2005) BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing. Bioinformatics 21: 4067–4068. 69. Venables WN, Smith DM, R Development Core Team (2008) An Introduction to R: Notes on R, A Programming Environment for Data Analysis and Graphics. Available: http://cran.r-project.org/doc/manuals/R-intro.pdf. Accessed 2009. 70. Klingenberg C (2008) MorphoJ (Faculty of Life Sciences, University of Manchester, UK). 71. Cole T (2003) WinEDMA: Software for Euclidean distance matrix analysis. Version 1.0.1. (University of Missouri-Kansas City School of Medicine, Kansas City).

January 2010 | Volume 6 | Issue 1 | e1000811

Epigenetic Analysis of KSHV Latent and Lytic Genomes Zsolt Toth1, Dennis T. Maglinte2, Sun Hwa Lee1, Hye-Ra Lee1, Lai-Yee Wong1, Kevin F. Brulois1, Stacy Lee1, Jonathan D. Buckley2,3, Peter W. Laird2, Victor E. Marquez4, Jae U. Jung1* 1 Department of Molecular Microbiology and Immunology, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America, 2 USC Epigenome Center, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America, 3 Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America, 4 Laboratory of Medicinal Chemistry, Center for Cancer Research, NCI-Frederick, Frederick, Maryland, United States of America

Abstract Epigenetic modifications of the herpesviral genome play a key role in the transcriptional control of latent and lytic genes during a productive viral lifecycle. In this study, we describe for the first time a comprehensive genome-wide ChIP-on-Chip analysis of the chromatin associated with the Kaposiâ&#x20AC;&#x2122;s sarcoma-associated herpesvirus (KSHV) genome during latency and lytic reactivation. Depending on the gene expression class, different combinations of activating [acetylated H3 (AcH3) and H3K4me3] and repressive [H3K9me3 and H3K27me3] histone modifications are associated with the viral latent genome, which changes upon reactivation in a manner that is correlated with their expression. Specifically, both the activating marks co-localize on the KSHV latent genome, as do the repressive marks. However, the activating and repressive histone modifications are mutually exclusive of each other on the bulk of the latent KSHV genome. The genomic region encoding the IE genes ORF50 and ORF48 possesses the features of a bivalent chromatin structure characterized by the concomitant presence of the activating H3K4me3 and the repressive H3K27me3 marks during latency, which rapidly changes upon reactivation with increasing AcH3 and H3K4me3 marks and decreasing H3K27me3. Furthermore, EZH2, the H3K27me3 histone methyltransferase of the Polycomb group proteins (PcG), colocalizes with the H3K27me3 mark on the entire KSHV genome during latency, whereas RTA-mediated reactivation induces EZH2 dissociation from the genomic regions encoding IE and E genes concurrent with decreasing H3K27me3 level and increasing IE/E lytic gene expression. Moreover, either the inhibition of EZH2 expression by a small molecule inhibitor DZNep and RNAi knockdown, or the expression of H3K27me3specific histone demethylases apparently induced the KSHV lytic gene expression cascade. These data indicate that histone modifications associated with the KSHV latent genome are involved in the regulation of latency and ultimately in the control of the temporal and sequential expression of the lytic gene cascade. In addition, the PcG proteins play a critical role in the control of KSHV latency by maintaining a reversible heterochromatin on the KSHV lytic genes. Thus, the regulation of the spatial and temporal association of the PcG proteins with the KSHV genome may be crucial for propagating the KSHV lifecycle. Citation: Toth Z, Maglinte DT, Lee SH, Lee H-R, Wong L-Y, et al. (2010) Epigenetic Analysis of KSHV Latent and Lytic Genomes. PLoS Pathog 6(7): e1001013. doi:10.1371/journal.ppat.1001013 Editor: Paul Kellam, Sanger Institute, United Kingdom Received January 6, 2010; Accepted June 18, 2010; Published July 22, 2010 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Funding: This work was partly supported by CA082057, CA31363, CA115284, AI073099, DE019085, CA148616, the Global Research Program (KICOS/KMEST), Hastings Foundation, and Fletcher Jones Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: jaeujung@usc.edu

complexes, which eventually determine the activity of target genes [1]. Histone methylation status fluctuates in response to environmental and developmental conditions. A number of enzymes that add or remove methylation modifications have been discovered [3,4]. In general, transcriptionally active genes are associated with H3K4me3 and H3K36me3, whereas trimethylation of H3K9, H3K27 and H4K20 occurs primarily on repressed genes. H3K9me3 and H4K20me3 histone modifications are characteristic of pericentric heterochromatin, which is considered to be constitutive heterochromatin [5,6,7,8,9]. On the other hand, H3K27me3 is the marker of highly dynamic and reversible heterochromatin (facultative heterochromatin), and is characteristic of genes that are subject to tissue specific or developmentally regulated expression [10,11,12]. Genome-wide analysis of embryonic stem (ES) cells revealed that H3K27me3 preferentially localizes on developmental genes, which are repressed in stem

Introduction Chromatin is a highly dynamic structure of nucleosomes that are composed of DNA wrapped around the core histones (H2A, H2B, H3 and H4). Over the past decade, several studies have demonstrated that histones are subject to various posttranslational modifications (acetylation, methylation, phosphorylation, and ubiquitination), which are capable of modulating chromatin structures to thereby influence gene expression [1]. Hyperacetylation of histones H3 and H4 occurs mainly on promoters and correlates with gene activation, while hypoacetylation is characteristic of repressed genes [2]. Histone methylation is associated with either activation or repression of genes, depending on which histone lysine residues are mono-, di- or trimethylated. Various histone methylations are then recognized by specific chromodomain-containing proteins that can function as either transcription factors or as part of large chromatin remodelling/modifying PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

structures that resemble bulk cellular chromatin. Regulation of the chromatin structure of the Herpes simplex virus type 1 (HSV-1) genome has been implicated as the underlying cause of the switch between latency and lytic replication as well as being involved in the regulation of lytic gene expression [29,30]. A human tumour virus, called Kaposiâ&#x20AC;&#x2122;s sarcoma-associated herpesvirus (KSHV) or human herpesvirus 8 (HHV8), has been consistently identified in Kaposiâ&#x20AC;&#x2122;s sarcoma (KS) tumours, pleural effusion lymphoma (PEL), and Multicentric Castleman disease [31,32,33]. Several studies have indicated that DNA methylation and histone acetylation can play a role in the regulation of KSHV gene expression [34,35]. The immediate early KSHV gene encoded by ORF50, called RTA (replication and transcription activator), is the master regulatory factor and is sufficient to induce the complete cycle of viral replication [36,37]. The RTA promoter associates with histone deacetylases (HDACs) during latency resulting in hypoacetylated histones [34]. Treatment of latently infected KSHV positive cells with HDAC inhibitors, butyrate or TSA, induces the hyperacetylation of viral chromatin concomitantly with the recruitment of histone acetyltransferases, chromatin remodelling proteins (Ini1/Snf5) and the TRAP/Mediator coactivator complex to the RTA promoter, allowing RTA expression to induce the complete viral gene expression cascade [34,38]. To elucidate the characteristics of the KSHV epigenome, we performed a high-resolution genome-wide analysis to map a set of activating and repressive histone H3 modifications on the entire KSHV genome during latency and reactivation. Based on their genome-wide profiles, we found that the KSHV genes are associated with a distinctive pattern of active and repressive histone modifications during latency, which ultimately changes upon reactivation. Importantly, the promoter regions of RTA and several E genes are associated with both H3K4me3 and H3K27me3 marks, suggesting that these promoters have a bivalent chromatin structure that maintains their repression during latency and is also poised for rapid activation upon stimulation. We also found that while the EZH2 histone methyltransferase is colocalized with H3K27me3 on the entire KSHV latent genome, it rapidly dissociates from the RTA promoter and other IE-E gene-rich genomic regions upon reactivation. This event ultimately results in reduced level of H3K27me3, which are concomitant with increasing levels of activating histone marks on the RTA promoter. Furthermore, treatment of latently KSHV-infected cells with a drug inhibiting the expression of PcG proteins, the small inhibitory RNAmediated knockdown of EZH2 or the overexpression of H3K27me3 histone demethylases efficiently trigger the lytic reactivation of KSHV. These data collectively demonstrate that the Polycomb group proteins are involved in the maintenance of KSHV latency by preserving a reversible heterochromatin on the promoter regions of lytic genes such that they are silenced during latency but are poised for rapid activation upon reactivation.

Author Summary KSHV is a ubiquitous herpesvirus that establishes a lifelong persistent infection in humans and is associated with Kaposiâ&#x20AC;&#x2122;s sarcoma and several lymphoid malignancies. During latency, the KSHV genome persists as a multicopy circular DNA assembled into nucleosomal structures. While viral latency is characterized by restricted viral gene expression, reactivation induces the lytic replication program and the expression of viral genes in defined sequential and temporal order. Posttranslational modifications of the viral chromatin structure have been implicated to regulate viral gene expressions but the underlying gene regulatory mechanisms are still elusive. Here, we demonstrate that the latent and lytic chromatins of KSHV are associated with a distinctive pattern of activating and repressive histone modifications whose distribution changes upon reactivation in an organized manner in correlation with the temporally ordered expression of viral lytic genes. Furthermore, we demonstrate that the evolutionarily conserved Polycomb group proteins, that maintain the repression of genes involved in hematopoiesis, X-chromosome inactivation, cell proliferation and stem cell differentiation, also play a critical role in the regulation of KSHV latency by maintaining a repressive chromatin structure. Thus, the epigenetic program of KSHV is at the crux of restricting latent gene expression and the orderly expression of lytic genes. cells but are expressed during ES cell differentiation [13,14]. Interestingly, the promoter of large number of these developmental genes are also enriched in activating H3K4me3 suggesting that these genes are silenced but poised for rapid activation in ES cells [15]. Promoters enriched in both activating (H3K4me3) and repressive (H3K27me3) histone marks, called bivalent promoters, have been associated with rapidly inducible genes in T cells as well [7]. H3K27me3 is deposited by the evolutionary conserved 600-kDa Polycomb Repressive Complex 2 (PRC2), which consists of three Polycomb group (PcG) proteins (EZH2, SUZ12, EED) and the histone-binding proteins, RbAp48/46 [16]. The SET domaincontaining EZH2 is an H3K27me3 histone methyltransferase, which can be found along entire genomic regions enriched with H3K27me3 in mammalian cells [17]. H3K27me3 provides a binding platform for PRC1, a larger Polycomb complex consisting of more than 10 subunits. In Drosophila, PcG proteins are recruited to their target genes via Polycomb response elements (PRE) that can be found in promoters [18]. It is still unclear how PcG proteins are recruited to their target genes in mammalian cells, but non-coding RNAs and specific DNA sequences similar to PREs have been implicated to be involved in this process. [16,19,20]. Polycomb-mediated gene silencing has been shown to be reversible with H3K27me3 demethylases such as JMJD3 and UTX, which can be recruited to the repressed promoters by transcription activators as has been shown, for instance, in the case of the H3K4me3 methyltransferase complexes [21,22,23,24]. Viruses replicating in the nucleus are also under the influence of the chromatin during different stages of their life cycles. Therefore, viruses have evolved various mechanisms to utilize or neutralize the impact of cellular chromatin factors, to ultimately control viral replication and gene expression [25,26,27,28]. Herpesviruses have a large DNA genome that persists as multicopy circular episomes associated with histones in the nucleus. Herpesviral infection can lead to two different life cycles: latency and lytic replication. During latency, the viral episomes are assembled into nucleosomal PLoS Pathogens | www.plospathogens.org

Results Dissociation of histone H3 from the KSHV genome is concomitant with viral DNA replication In this study we asked what histone modifications are associated with the KSHV genome during latency and how they change upon reactivation. For this, we used the well-characterized recombinant KSHV-positive primary effusion lymphoma cell line, TRExBCBL1-RTA, which expresses a Doxycycline (Dox) inducible myc/His-tagged RTA incorporated into the cellular genome [39]. We chose the RTA-mediated reactivation of KSHV instead 2

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

promoters during KSHV lytic reactivation, suggesting that H3 dissociation occurs specifically on the viral genome (Figure S2C).

of chemical inducers such as TPA, TSA or sodium butyrate because these chemicals can globally affect both cellular and viral gene expression, while RTA ensures robust and specific viral reactivation [39]. Figure S1 showed that Dox treatment (1ug/ml) of the TRExBCBL1-RTA cells for 6, 12 and 24 hours rapidly induced myc/His-RTA expression, resulting in a gradual induction of the KSHV gene expression cascade (Figure S1A, B) [39], whereas viral DNA replication was apparent only at 24 hpi, which was in correlation with the induction of late gene expressions (Figure S1B and C). To analyse the changes in the KSHV nucleosome structure upon reactivation, ChIP experiments were performed with a histone H3 specific antibody. We measured the abundance of specific DNA sequences in the histone H3 immunoprecipitates with qPCR using a set of primer pairs specific for both the promoter and the coding regions of various viral genes including RTA, LANA, K2, ORF8, ORF25, ORF56 and ORF64 (Figure S2A, B). This revealed comparable levels of H3 association throughout the viral genome during latency (0 hpi), which did not significantly change on most of the genomic regions at 12 hpi, but dropped sharply at 24 hpi (Figure S2B). This is in agreement with previous findings showing that disassembly of the viral chromatin is concomitant with viral DNA replication during lytic infection [40]. In contrast, H3 levels remained constant on cellular

Distinct patterns of activating and repressive histone modifications on the KSHV genome during latency and reactivation In order to investigate the epigenome of KSHV during latency and lytic reactivation, we tested whether repressive histone modifications associated with lytic genes are responsible for maintaining the repression of lytic gene expression during latency and whether reactivation induces the deposition of activating histone modifications onto the viral genome for lytic gene expression. To uncover the global distribution of histone modifications on the KSHV chromatin during latency and reactivation, we mapped the genome-wide distribution of activating histone marks [acetyl-H3 (AcH3) and H3K4me3] and repressive marks (H3K9me3 and H3K27me3) on the KSHV genome (Figure 1 and Figure S3, S4, S5, S7). The ChIP-on-chip experiments were carried out with chromatins prepared from both non-induced (0 hpi/latency) and Dox-induced (12 hpi) TRExBCBL1-RTA cells. Figure 1 shows the average of two independent ChIP-on-chip biological replicates (Figure S3 and

Figure 1. Genome-wide mapping of histone modifications on the KSHV genome during latency and reactivation. Each ChIP-on-chip experiment is an average of two biological replicates. The histone H3 and histone modification ChIPs were performed with non-induced and doxycycline-induced (12 hpi) TRExBCBL1-Rta cells followed by the hybridization of the labelled ChIP and input DNAs onto a custom designed KSHVspecific 15-bp tiling microarray. See the Material and methods for details. Orange colour indicates 0 hpi-ChIP/input ratio while the black line shows 12 hpi-ChIP/input. Numbers in the left upper corners show the maximum values of Cy5/Cy3. Missing probes in specific genomic regions are shown below the genome scale (**). The alternating dark and light blue squares atop display the viral ORFs where the white triangle indicates ORFs that are expressed from the reverse DNA strand. The ‘‘hpi’’ stands for hours post-induction. doi:10.1371/journal.ppat.1001013.g001

PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

S4). To get a high resolution of the localization of histone modifications and proteins of interest on the KSHV genome, we used a 15-bp tiling microarray that contains 8942 overlapping 60 nucleotide-long oligos spanning the entire KSHV genome. Because changes in H3 occupancy may affect the enrichment of histone modifications on the viral genome, we first investigated the global distribution of H3 on the viral genome during latency and upon reactivation. The histone H3 ChIP-on-chips revealed that histone H3 enrichment levels were comparable throughout the KSHV genome at 0 hpi and did not significantly change on most parts of the KSHV genome at 12 hpi (Figure 1, S3, S4). The histone H3 ChIP analysis also showed similar results (Figure S2B). However, it should be noted that a small viral genomic region between 15 and 30 kb, which contains mostly KSHV unique genes, displayed a detectable decrease of H3 occupancy at 12 hpi (Figure 1). In contrast to the uniform distribution of H3, the different histone modifications displayed distinct patterns and were enriched in specific KSHV genomic regions during latency and reactivation (Figure 1). Immunoblot analysis revealed that the expression of H3 as well as the global levels of cellular histone modifications did not change upon KSHV reactivation (Figure S2 D), thus any change of the histone modifications on the KSHV genome is likely to be a consequence of the reactivation-induced change in the KSHV epigenome. The genome-wide mapping of histone modifications showed that both activating and repressive histone modifications were associated with the KSHV genome during latency (Figure 1). While both the activating AcH3 and H3K4me3 marks co-localized on the KSHV genome so did the repressive H3K9me3 and H3K27me3 marks. However, the activating and repressive histone modifications were mutually exclusive on the bulk of the KSHV genome (e.g. 30–60 kb and 90–120 kb) (Figure 1). As expected, the latency-associated locus (118–128 kb) where KSHV genes that are constitutively expressed during latency are located, was enriched with the activating H3K4me3 and AcH3 histone modifications but depleted for the repressive H3K9me3 and H3K27me3 (Figure 1). Unexpectedly, we also found that several regions of the KSHV genome (e.g. 1– 30 kb and 60–90 kb) that are not associated with latency-gene expression were also highly enriched in both H3K4me3 and AcH3 histone marks (Figure 1). Strikingly, the PcG protein-mediated repressive histone modification, H3K27me3, was widely distributed throughout the KSHV genome, whereas the H3K9me3 repressive histone modification was restricted mainly to two genomic regions (30–60 kb and 95–115 kb) encoding a number of late genes (Figure 1). Interestingly, the activating H3K4me3 and AcH3 histone modifications were absent in these two genomic regions where the repressive H3K9me3 and H3K27me3 histone modifications coexisted, suggesting that KSHV genes in these regions have a strongly repressive heterochromatin structure during latency (Figure 1). The RTA-induced initiation of KSHV lytic gene expression program results in the redistribution of histone modifications on viral genome. We calculated the 12hpi/input ratio to view the changes in histone modifications (Figure 1). Since H3 enrichment was comparable between 0 and 12 hpi on most parts of the viral genome and viral DNA replication had yet to occur by 12 hpi, we also hybridized the 12 hpi-ChIPs against the 0 hpi-ChIPs, which allowed us to specifically observe changes in histone modification levels on the KSHV genome upon lytic reactivation (Figure S7C). Others have also recently applied similar ChIP-on-chip analyses in studies investigating the epigenetic reprogramming of the host genome by the adenoviral protein E1a [41]. Our ChIP-on-chip analysis showed that the enrichment of the activating histone modifications was elevated the most when it was concomitant with PLoS Pathogens | www.plospathogens.org

the reduction of repressive histone marks in the genomic regions between 1 and 30 kb containing several early genes and between 68 and 77 kb, which encodes the IE proteins ORF45, ORF48, ORF50 (RTA) and K8 (KbZIP) (Figure 1 and S7). These changes in the viral chromatin are indicative of robust transcriptional activation and are consistent with the KSHV gene expression profile in that expression of the lytic genes in these genomic regions is induced in the early phase of reactivation [39,42,43]. In contrast, only minor changes in the levels of the AcH3 and H3K4me3 activating marks were detected in the genomic regions that encode large number of late genes (30–60 kb and 95–115 kb), whereas significant changes of the H3K27me3 repressive mark were observed at 12 hpi (Figure 1 and S7). In summary, these results demonstrate that the activating histone modifications of the latent KSHV genome are preferentially enriched in the constitutively active latency-associated genomic locus and in the early-lytic gene-containing genomic regions. In contrast, the H3K9me3 and H3K27me3 repressive histone marks are primarily enriched in the genomic regions that encode many late genes during latency (0 hpi), and they remain associated with these regions during the early phase (12 hpi) of reactivation as well.

Profiling the histone modification patterns within the regulatory regions of KSHV genes While only a few genes are expressed during latency, all viral genes are expressed upon lytic reactivation in a temporal and sequential order. This suggests that distinctive histone modifications may be associated with the different viral promoters to ultimately determine the timing and rate of viral gene expression. Thus, we attempted to delineate the characteristics of the chromatin structures associated with the regulatory regions of KSHV genes during latency and lytic reactivation (Figure 2, S6, S8). For this, we aligned the KSHV open reading frames (ORFs) relative to their translational start sites (TSS) and plotted the signal intensities of probes derived from the ChIP-on-chip analysis across a 2-kb region spanning 1 kb on either side the TSS (please see Materials and Methods for details). The rationale of this strategy was based on a number of considerations. (i) While the transcriptional start sites have only been identified for a few KSHV genes, they are generally within a few hundred base pairs of their TSS, showing that the TSS can be used as a reference point. (ii) Due to the compact structure of the KSHV genome, the promoters are usually closely localized upstream of the TSS. (iii) The 1 kb downstream region of the TSS was included in the analysis because several KSHV genes have introns close to the TSS at their 59 ends, which may contain gene regulatory elements. Based on these factors, the 2-kb sequences around the TSS were considered to be the gene regulatory regions of KSHV genes. In fact, a similar strategy has also been used to analyse the recruitment of the KSHV transactivators, Rta and K-bZIP, to eighty-three putative KSHV promoters in TRExBCBL1-RTA cells [44]. Using the signal intensities of the probes derived from the ChIP-on-chip analysis, we performed an average linkage hierarchical clustering within each gene class (La or latent, IE, E, L), which gave us a comprehensive overview of the repressive and activating histone modifications across all the 2-kb TSS regions (Figure 2 and S8). IE gene class. By definition, IE genes are the first set of expressed viral genes whose induction does not require de novo expression of any viral proteins and can be rapidly induced upon reactivation of KSHV from latency. We found that although the IE genes (K4.2, K8, ORF45, ORF48, ORF50/RTA) are silenced during latency, their 2-kb regulatory regions are enriched in the 4

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

Figure 2. Hierarchical clustering of histone modifications associated with the regulatory regions of viral genes. Based on their expression patterns the viral genes were grouped as latent (La), IE, E and L genes and hierarchical clustering was performed within the groups. The rows display the histone modification patterns along the 21 kb to +1 kb genomic regions relative to the translational start site (TSS) of each viral gene, which we assigned for the gene regulatory regions. The 1 kb regions are divided into twenty 50 bp fragments that show the average of log2 ratio of probe signal intensities derived from the average of the biological replicates of ChIP-on-chip experiments. Blue and yellow colours represent

PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

lower-than-average and higher-than-average for enrichment, respectively, whereas gray shows missing values for enrichment due to lack of probes in those genomic regions. I-V represents the clusters of genes that have similar histone modification patterns. (A) Distinctive histone modification patterns are associated with the KSHV genes of different expression classes during latency. (B) Changes in the enrichment of histone modifications during reactivation (12hpi). doi:10.1371/journal.ppat.1001013.g002

modulating immune responses, apoptosis or performing viral DNA synthesis. From the analysis of the TSS/promoter regions of E genes, three subgroups (I, II and III) of histone modification patterns emerged (Figure 2, S8). Group I: some of the E genes also seem to have a bivalent chromatin structure on their 2-kb putative regulatory regions during latency. Upon reactivation, this changes to fully active euchromatin with increased activating and decreased repressive histone modifications on their promoter (Figure 2, S8). Group II: the promoter regions of the majority of E genes are associated only with AcH3 and H3K4me3 but depleted for the repressive marks during latency (Figure 2, S8). Interestingly, groups I and II consist of a number of E genes that are involved in the deregulation of host immune response (K1/3/5/15, ORFs 10, 11) and/or KSHV pathogenesis (K2/4.1/ 6/9, ORF74) (Figure 2, S8). Group III: this group represents a cluster of E genes that are enriched in H3K27me3 and H3K9me3 but depleted for activating histone modifications during latency, which does not change significantly upon reactivation. It should be noted that the histone modification pattern of Group III genes resembles that of Group IV late genes, even though their expression patterns are considerably different during KSHV lifecycle. The above findings were further confirmed in independent ChIP experiments analysed by qPCR using specific primers for the K2 and ORF56 promoter and coding regions (Figure 3C). Late gene class. The KSHV genes whose expression can be blocked by viral DNA replication inhibitors are classified as late genes. Because most of the late gene products are necessary for the assembly and egress of viral particles, their expression is required only after viral DNA synthesis. To achieve this temporal order of viral gene expression they should be completely silenced during the expression of IE and E genes. In agreement with this notion, we found that the promoter regions of most of the late genes were enriched in the repressive H3K27me3 and H3K9me3 histone modifications and depleted in the activating histone marks during latency and in the early phase of the reactivation (12 hpi) (Figure 2, S8, group IV). In contrast, the TSS regions of some late genes (e.g. ORFs 39, 8, 47, 75) are associated with high levels of activating and low levels of repressive histone modifications during latency, which slightly changed during reactivation (Figure 2, S8, group V). It is also worth noting the difference in location between groups IV and V late genes on the KSHV genome, which may explain their distinct chromatin structures. The group IV late genes, which are highly enriched in repressive histone modifications, are clustered in the 30–60 kb and 95–115 kb genomic regions (see Figure 1B). On the other hand, the few L genes in the group V are scattered throughout genomic regions enriched with activating histone modifications. These results were further confirmed by a series of ChIP experiments of the promoter and coding regions of three late genes: ORF8 (group V), ORF25 (group IV, 30–60 kb) and ORF64 (group IV, 95–115 kb) (Figure 3C). These observations suggest that late genes can be regulated by distinctive mechanisms depending on their location on the KSHV genome. Latent gene class. In agreement with the observed constitutive expression of the latent genes (LANA and vIRF3) during latency, their 2-kb TSS regions display enrichment only in H3K4me3 and AcH3 while lacking repressive histone modifications (Figure 2). Interestingly, while H3K4me3 enrichment decreased somewhat,

activating histone modifications H3K4me3 and AcH3, and this is further increased upon lytic reactivation at ORFs 48, 50 and K8 (Figure 2, S8, S9). Since RTA is responsible for the switch between latency and lytic replication, not only is its promoter tightly repressed during latency, but its silencing should also be rapidly reversible upon reactivation. We found that the RTA promoter is enriched in both H3K4me3 and H3K27me3 during latency, suggesting that it possesses a bivalent chromatin that maintains the repression of the RTA promoter while keeping it poised for rapid activation (Figure 2, S8). An extensive ChIP analysis further confirmed the enrichment of AcH3, H3K4me3 and H3K27me3 and the depletion of H3K9me3 on the 2.5-kb promoter region of RTA during latency (Figure 3A). Upon lytic reactivation, however, the repressive H3K27me3 gradually decreased, while the activating histone modifications H3K4me3 and AcH3 increased on the RTA promoter (Figure 3A). These changes in the histone modification pattern are in concert with the induction of RTA expression. To confirm that the changes in the enrichment of AcH3, H3K4me3 and H3K27me3 were not due to differences in the efficiency of ChIPs between 0 and 12 hpi and that the low level of H3K9me3 was not due to a low efficiency of the H3K9me3-ChIP, three cellular promoters were also included as controls. Figure 3D shows that the levels of histone modifications on the transcriptionally repressed promoters of MYT1 and HTF6 genes as well as on the transcriptionally active promoter of the actin (ACT) gene did not significantly change between 0 hpi and 12 hpi, indicating that the efficiencies of the ChIPs were comparable at different time points (Figure 3D). Bivalent chromatin on the RTA promoter during latency. Although .95% of non-induced TRExBCBL1-RTA

cells were in latency, a few percentages of the cells spontaneously underwent lytic reactivation as measured by the surface expression of the early viral protein ORF K1 in flow cytometry analysis (data not shown). Thus, this raises the possibility that the colocalization of H3K4me3 and H3K27me3 histone modifications at the same genomic region may be due to the mixture of latent and spontaneously reactivated KSHV genomes in the ChIPs assays. To determine whether H3K4m3 and H3K27me3 on the RTA promoter were simultaneously on the same KSHV genome and not independent histone marks on different viral genomes, sequential ChIP assay was applied (Figure 3B). We performed the first ChIP with an anti-H3K4me3 antibody, followed by the elution of the immunoprecipitated chromatin, which was then used in a second ChIP with an anti-H3K27me3 antibody. The ChIP DNAs were measured by qPCR, using specific primers for the promoter regions of RTA, LANA and ORF25 genes. This showed that both H3K4me3 and H3K27me3 were enriched only on the RTA promoter (Figure 3B, top panel). The reciprocal order of the ChIPs gave the same results (Figure 3B, bottom panel). The promoter regions of LANA and ORF25 were included as controls and showed the presence of either H3K4me3 or H3K27me3, respectively, but not both (Figure 3B). These data collectively show that the H3K4me3 and H3K27me3 associated with the RTA promoter coexist on the same KSHV genome, indicating that the RTA promoter appears to possess a bivalent chromatin. Early gene class. Expression of IE genes is followed by the induction of E genes that encode a large variety of proteins for PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

Figure 3. Dynamic association of histone modifications with viral genes during latency and reactivation. (A) Time-course ChIP analysis of histone modifications on the RTA promoter during latency and reactivation. Cellular controls can be seen in panel D. (B) Colocalization of H3K4me3 and H3K27me3 on the RTA promoter is confirmed by sequential ChIP assays. The first ChIP was performed with either H3K4me3-specific or H3K27me3-specific antibody, followed by the elution of the immunoprecipitated DNAs and a second ChIP with either H3K27me3 or H3K4me3 antibody. LANA and ORF25 promoters were used as controls. (C) Time-course ChIP analysis of histone modifications on the selected latent (LANA), early (K2, ORF56) and late (ORFs 8, 25, 64) genes. Cellular controls can be found in panel D. Pr: promoter, in: within gene body. (D) ChIP assays of histone modifications on cellular promoters. The promoters of the repressed cellular MYT1 and HTF6 genes as well as the active promoter of the actin (ACT) gene were also tested using the same ChIP samples that had been used in panels A and C. ND: not detectable. doi:10.1371/journal.ppat.1001013.g003

AcH3 levels remained constant on the LANA promoter upon reactivation (Figure 3C). This was also confirmed by additional independent ChIP experiments with qPCR analysis focusing on the specific locations within the LANA promoter and coding regions (Figure 3C). In summary, we described the chromatin modification patterns of the TSS regions of the four expression classes (Latent, IE, E, and L) of KSHV genes during latency and lytic reactivation. Based on these data we propose that the diversity of the chromatin structure of the KSHV genome can reflect the various regulation mechanisms of viral gene expression, which allows a temporal and well-organized gene expression during reactivation.

Polycomb group protein EZH2 and its H3K27me3 histone mark are required for the maintenance of KSHV latency The dynamic association of EZH2 with KSHV genome suggests that PcG-mediated H3K27me3 histone modification is involved in the repression of lytic gene expression during latency. To address this issue, HA-tagged JMJD2A, JMJD3 and UTX histone demethylases were expressed in Vero-rKSHV.219 cells to test if the elimination of histone methylations can trigger KSHV reactivation (Figure 5A). JMJD2A is an H3K9me3-specific histone demethylase [50], JMJD3 and UTX are H3K27me3-specific histone demethylases [21,51], and UTXmut is an enzymatically inactive form of UTX that contains a single point mutation (H1146A) in the Fe2++ ion binding site [51]. JMJD3 and UTX have been shown to eradicate the H3K27me3 repressive mark, resulting in the upregulation of PcG-targeted gene expression [21,52]. JMJD2A and JMJD3 or UTX expression detectably suppressed the steady-state levels of H3K9me3 or H3K27me3, respectively, in transfected Vero cells (Figure S12B, C). Since Vero-rKSHV.219 cells express red fluorescent protein (RFP) from the KSHV lytic PAN promoter and green fluorescent protein (GFP) from the EF-1a promoter, RFP expression has been extensively used as a marker of KSHV lytic reactivation [53]. Immunofluorescence analysis revealed that JMJD3 and UTX efficiently triggered KSHV reactivation, while JMJD2A and UTXmut did not (Figure 5A, B). Furthermore, coexpression of JMJD2A and JMJD3 showed no significant synergistic effect on KSHV reactivation (Figure 5A, B). Finally, JMJD2A, JMJD3, UTX and UTXmut were expressed at comparable levels (Figure S12A). These results bespeak the importance of the H3K27me3 histone modification in the maintenance of KSHV latency. A small molecule, 3-Deazaneplanocin A (DZNep), has been shown to inhibit the expression of the Polycomb repressive complex 2 (PRC2) components (EZH2, SUZ12, and EED), resulting in the suppression of H3K27me3 histone methylation and the upregulation of PcG target genes in vivo [54]. To further test the role of H3K27me3 histone methylation in KSHV latency, we treated KSHV and EBV co-infected primary effusion lymphoma JSC-1 cell line with DZNep, followed by immunoblotting assays (Figure 5 C and D). DZNep treatment dramatically decreased the EZH2 and SUZ12 and, thereby, H3K27me3 levels, ultimately resulting in the induction of the expression of polycomb-targeted cellular MYT1 gene (Figure 5E). However, H3K9me3, histone H3, and actin levels were not affected under the same conditions (Figure 5C). We found that DZNep treatment efficiently induced the reactivation of KSHV, but not EBV: KSHV Rta and K8 expressions were detected as early as 2 days after DZNep treatment, while the EBV IE protein Zta was not induced (Figure 5D). Real time quantitative RT-PCR also showed the induction of other early (ORF56) and late KSHV genes (ORFs 8, 25, 64), suggesting that PRC2 depletion activates the gene expression cascade of KSHV from the repressed latent state (Figure 5E). Besides the downregulation of EZH2 and SUZ12, DZNep also induced apoptosis of JSC-1 cells, detected by monitoring the

Dynamic association of the Polycomb group proteins and transcriptional activators with the KSHV genome during latency and lytic reactivation Because of the genome-wide distribution of H3K27me3 on the KSHV genome, we performed a ChIP-on-chip assay to determine whether EZH2, the H3K27me3 histone methyltransferase of the PcG proteins, associates with the KSHV genome. (Figure 4A). ChIP-on-chip assays showed that EZH2 almost completely colocalized with H3K27me3 throughout the entire KSHV genome during both latency and reactivation (Figure 4A, S7, S10). Additional ChIP assays showed that EZH2 and SUZ12 (another subunit of the PcG complex PRC2) were found on the RTA and ORF25 promoters enriched with H3K27me3, but not on the LANA promoter (Figure 4B). Besides its regulatory role in cellular gene expression, the CBF1 transcription activator has been shown to play an active role in KSHV gene expression as well [45,46]. Since the RTA promoter contains several CBF1 binding sites (Figure S11 A) [47,48], we studied the recruitment of CBF1 onto the RTA promoter during reactivation. The ChIP assay showed that CBF1 was recruited to the RTA promoter, primarily at 1.4 kb upstream of the RTA translational start site where three putative CBF1 binding sites are closely located (Figure 4C and Figure S11 A). Furthermore, not only does this putative CBFbinding region of RTA promoter efficiently binds CBF1 in vitro, but its deletion also resulted in a dramatic decrease of the RTAmediated autoactivation of its own promoter (Figure S11 B, C). Additional ChIP experiments revealed that RNAPII is also recruited to the promoter regions of RTA and LANA upon reactivation (Figure 4C). Recruitments of CBF and RNAPII to the RTA and LANA promoters were specific since they were not recruited to the promoter of the late gene, ORF25, whose expression was still blocked at 6 and 12 hpi (Figure 4C and S1B). The increase of RNAPII over the 3-kb upstream region of RTA may be not surprising given that this genomic region also includes the promoters of other lytic genes (ORFs 45, 46, 47, 48) as well as an alternative upstream promoter for RTA [49]. These results illustrate the dynamic associations of the PcG complex and transcriptional activators with KSHV genome during latency and lytic reactivation. PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

Figure 4. Genome-wide binding of EZH2 to the KSHV genome correlates with the repression of lytic genes. (A) ChIP-on-chip was performed for EZH2 and its genome-wide binding was compared with the distribution of H3K27me3 on the KSHV genome during both latency and reactivation. Labels are the same as in Figure 1. The H3K27me3 graph was taken from Figure 1. (B) EZH2 and SUZ12 binding to the H3K27me3-rich lytic promoters are shown by independent ChIP assays. EZH2-interacting PcG protein SUZ12 is enriched only where EZH2 is present. (C) Recruitment of transcription activators to the activated lytic promoters during KSHV reactivation. An anti-RNA polymerase II antibody (H-224) that recognizes the RNAPII independently from its phosphorylation state (total RNAPII) was used for ChIP of total RNAPII, while the anti-RNA polymerase II antibody CTD4H8 specifically immunoprecipitates RNAPII phosphorylated at the 5th serine of its C-terminal domain (RNAPII Ser5). doi:10.1371/journal.ppat.1001013.g004

cleavage of the apoptosis marker PARP (Figure S13 A, B). To exclude the possibility that apoptosis may have influenced KSHV reactivation, we treated JSC-1 with stauorosporine (STS), which also induced apoptosis in JSC-1 as shown by the cleavage of PARP (Figure S13 C). In contrast to DZNep treatment, STS treatment affected neither the steady state levels of EZH2 and H3K27me3, nor did it reactivate KSHV from latency (Figure S13 C, D). This PLoS Pathogens | www.plospathogens.org

suggests that DZNep-mediated H3K27me3 reduction triggers KSHV reactivation. This was further supported by the fact that specific shRNA-mediated depletion of EZH2 induced the expression of number of lytic genes (Figure 5F, G). In summary, the depletion of the PcG proteins in latently infected cells induces the lytic reactivation of KSHV, suggesting that PcG proteins play an important role in the maintenance of KSHV latency. 9

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

Figure 5. Polycomb group proteins are involved in the maintenance of latency of KSHV. (A) Overexpression of the wild-type HA-tagged H3K27me3 histone methyltransferases (HMTs) UTX and JMJD3 triggers the lytic reactivation of KSHV in Vero-rKSHV.219 as shown with the expression of RFP. In contrast, the H3K9me3 HMT JMJD2A and the enzymatically inactive UTXm showed little or no effect on KSHV lytic reactivation. (B) Quantification of RFP positive cells. (C and D) JSC-1 cells were treated with 5 uM DZNep for 1, 2 and 3 days and the cells were harvested for immunoblot analysis with the indicated specific antibodies against cellular proteins and histone modifications (C) or viral proteins (D). ‘‘Dpt’’ indicates days post-treatment. Whole cell lysate of NaB-treated JSC-1 cells was used as a control for Zta immunoblot. (E) JSC-1 cells were treated with DZNep as described in (C) and total RNAs were isolated for RT-qPCR analysis of some selected KSHV, EBV and cellular mRNAs. (F and G) BCBL-1 cells were infected by lentivirus expressing the indicated shRNAs and were then subject to immunoblotting analysis with the indicated antibodies (F) or RTqPCR analysis was performed for the indicated viral transcripts (G). doi:10.1371/journal.ppat.1001013.g005

latency. However, the depletion of H3K9me3 is contemporaneous with the decrease of H3K27me3 upon reactivation, suggesting that both repressive histone marks are replaced by activating histone marks upon reactivation. The promoters of Group II genes are enriched primarily with activating histone modifications during both latency and reactivation, similar to those of IE genes, whereas the promoters of most Group III genes are mainly associated with repressive histone marks during latency and in the early phase of reactivation (12 hpi), resembling Group IV late genes. This indicates that despite the different gene expression profiles during the lytic reactivation cycle, some of the lytic genes show similar histone modification patterns in their promoters, suggesting that other histone modifications may be also associated with these promoter regions to generate the distinct gene expression profiles. This topic will be actively investigated in the future. Furthermore, our observation that the TSS regions/promoters of several E genes are enriched with the activating AcH3 and H3K4me3 histone marks but depleted for the repressive histone modifications during latency, suggests that although transcription of these E genes may have been initiated, RNAPII is likely stalled on their promoters (Figure 2 and 3C). In fact, several of these E genes (K2, K5, K6, K7, K11) have been shown to be temporally expressed immediately after de novo infection or rapidly express upon lytic reactivation [55,61]. These E genes carry immune modulatory and/or antiapoptotic functions so their rapid expressions seem to be crucial for the virus to escape host immune recognition or attack during the early phase of the lytic life cycle of KSHV. This suggests that their promoters are primed with activating histone modifications and probably have preassembled RNA polymerase II complexes during latency as also seen with a large number of inducible cellular genes [7,62]. However, this raises the question of how their gene expressions are suppressed during latency despite the presence of an active chromatin structure. It is intriguing that a large number of the E genes that are enriched primarily with AcH3 and H3K4me3 activating marks are also Rta-inducible, suggesting that the cooperation of Rta with the active chromatin structure may be necessary to activate expression of these E genes. Furthermore, it is conceivable that the stalled RNAPII on their promoters also requires the recruitment of specific cellular transcription factors such as PTEFb, in order to allow the conversion of RNAPII from a restricted state to an elongationcompetent state [62]. Histone H3 ChIP-on-chip revealed that a small viral genomic region (between 15 and 30kb) mostly containing KSHV unique genes displayed a detectable decrease of H3 occupancy at 12hpi (Figure 1). Thus, the decrease of the repressive H3K27me3 histone mark within this region upon reactivation may potentially be a consequence of the dissociation of H3 occupancy. However, the decrease of H3 occupancy in this region does not directly correlate with the changes in histone modifications as H3K27me3 decreases in both 15–20kb and 25–30kb regions at 12 hpi, but while H3K4me3 and AcH3 increase in the 15–20kb region (ORFs 10, 11, 70, K3) they decrease in the 25–30kb region (ORFs K5, K6, K7, PAN). Thus, changes in enrichment of histone modifications

Discussion The genome-wide transcriptional analysis of KSHV gene expression revealed that despite the differences in the features of their promoters, viral genes with similar functions display analogous expression patterns during the lytic replication cycle, implying the existence of a common regulatory mechanism for their gene expression [39,42,43,55]. In fact, cellular genes with related functions often have common chromatin structures that are associated with specific histone modifications whereby expression of large sets of genes can be coherently coordinated by epigenetic mechanisms [56]. Our ChIP-on-chip analysis identifies several distinct chromatin domains with different histone modifications on the latent KSHV genome, suggesting that expression of viral genes within each chromatin domain may be co-regulated (Figure 1, 2, S7, S8). Specifically, latent genes clustered in the latency-associated genomic locus have H3K4me3/ AcH3-rich chromatin domain during latency and reactivation, which is in correlation with the constitutively active transcription of latency-associated genes. The genomic region encoding the IE genes ORF50 and ORF48 has a bivalent chromatin domain defined by the concomitant presence of the activating H3K4me3 and the repressive H3K27me3 marks during latency, which rapidly changes upon reactivation with increasing AcH3 and H3K4me3 and decreasing H3K27me3 (Figure 2, 3A). Importantly, the chromatin bivalency of the RTA promoter ensures the repression of RTA during latency, but also readies it for rapid activation upon reactivation (Figure 3A, S1). Bivalent chromatin is characteristic of many inducible cellular genes that are involved in the regulation of development and immune responses such that these genes are repressed yet primed for rapid activation [7,15]. KSHV genomic regions encoding a number of late genes are associated with repressive H3K9me3 and H3K27me3 modifications during both latency and the early phase of lytic reactivation, which is in accordance with the observed silencing of late genes. Strikingly, the high expression of late genes is observed at the time of the viral DNA replication concomitantly with the disassembly of viral chromatin (Figure S1, S2). These data suggest that viral DNA replication may play a role in the disruption of the repressive heterochromatin associated with viral late genes, which ultimately facilitates late gene expression [40]. However, a recent study has shown that a portion of replicating herpesviral genomes can be chromatinized even during lytic replication, indicating that viral chromatin may be continuously involved in the regulation of late gene expression [57]. In addition, the deletion of several MHV68 genes has been shown to result in blocking of late gene expressions without affecting viral DNA replication [58,59,60]. This suggests that viral replication may not directly influence the disassembly of the heterochromatin of late genes. Based on the histone modification patterns associated with the promoter regions of E genes, three distinctive groups could be observed (Figure 2, S8). The chromatin structure of the promoters of Group I genes resembles that of cellular bivalent promoters with the exception that they are also enriched in H3K9me3 during PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

seem to be gene-specific and may not necessarily be due to changes in nucleosome occupancy. In contrast to the genome-wide repressive role of H3K27me3, the effect of the heterochromatin histone mark H3K9me3 on viral gene expression seems to be limited: enrichment of the H3K9me3 is restricted to specific genomic regions (Figure 1) and the H3K9me3 histone demethylase JMJD2A does not efficiently induce KSHV lytic replication in Vero cells (Figure 5A). However, it is possible that JMJD2A expression may induce KSHV reactivation in different cell types or H3K9me3 histone demethylases other than JMJD2A may contribute to KSHV reactivation [50]. This is in agreement with the Herpes simplex virus (HSV-1) latent genome: while the H3K9me2, H3K9me3, and H3K27me3 modifications are detected, all the tested viral promoters are most enriched in H3K27me3 [63]. On the other hand, the inhibition of the H3K9me3 histone demethylase LSD1 has been shown to block the reactivation of HSV-1 from latency [30], and the enrichment of H3K9me3 and relatively low levels of H3K27me3 were found on the latent genome of the gammaherpesvirus EBV [64,65]. These studies indicate that H3K9me3- and H3K27me3associated chromatin-based repression mechanisms may be a common feature of herpesvirus gene expression programs. Genome-wide co-distribution of the EZH2 with the H3K27me3 repressive mark on lytic genes of KSHV and the EZH2 knockdown-mediated induction of lytic gene expression strongly implicate PcG proteins in the repression of lytic gene expression during latency (Figure 4, 5, S7, S10). Interestingly, our ChIP-onchip analyses also showed that, correlating with the rise of EZH2 occupancy, H3K27me3 levels increased mostly on late gene-rich regions (30–50 kb and 105–115) during lytic reactivation. We hypothesize that in order to maintain a temporally ordered expression of lytic genes, late gene expression may be kept silenced by PcG proteins during the early gene expression period, and this repression is likely reversed only upon replication of the viral genome (Figure S1 and S2). In contrast, H3K27me3 levels significantly dropped in the 15–30kb region of the viral genome at 12 hpi, which correlates with the decrease of EZH2 binding and the dissociation of H3. Thus, decreased H3K27me3 levels may be due to the dissociation of nucleosomes, the dissociation of EZH2 and/or the recruitment of H3K27me3 histone demethylases. Nevertheless, these data suggest that the regulation of spatial and temporal association of the PcG proteins with the KSHV genome may be crucial for both KSHV latency maintenance and lytic reactivation. Indeed, SUZ12 as well as EZH2, the H3K27me3 histone methyltransferase of PcG proteins, extensively associate with the RTA promoter during latency, while EZH2 rapidly dissociates from the RTA promoter upon reactivation, which is in apparent correlation with the decrease of H3K27me3 levels and the increase of RTA expression (Figure 3A, 4B and S1). Several studies have shown that PcG proteins are recruited to their target promoters through specific DNA elements, transcription factors or non-coding RNAs, which leads to the condensation of nucleosomes, the inhibition of RNAPII elongation, and thereby repress the expression of their target genes [16,18,19,20,66,67,68,69]. Several cellular and viral proteins, K-RBP, KAP-1 and LANA, are involved in the maintenance of KSHV latency, suggesting that they may be potential recruiters of PcG proteins onto the KSHV genome [70,71,72]. Further studies are needed to clarify how EZH2 is recruited to the KSHV genome and how EZH2 binding is modulated during reactivation. Recently, it has been also shown that BMI, a subunit of the polycomb repressive complex PRC1, is recruited onto lytic HSV-1 promoters during latency [63,73]. Thus, the roles of PcG proteins in herpesvirus latency remain under active investigation. PLoS Pathogens | www.plospathogens.org

Derepression of the PcG proteins-mediated silencing of gene expression have been linked to the recruitment of the H3K27me3 histone demethylases UTX and JMJD3 and the H3K4me3 histone methyltransferases MLL3 and MLL4 onto the target promoters, suggesting a tight cooperation between demethylation of the H3K27 and trimethylation of the H3K4 [23,74]. We also found that overexpression of either UTX or JMJD3 resulted in the efficient reactivation of KSHV, suggesting that H3K27me3 histone demethylases release the PcG-mediated repression of lytic gene expressions (Figure 5A). Furthermore, our ChIP analysis (Figure 4C) indicates that along with the histone acetyltransferase CBP, the chromatin remodelling complex SWI/SNF, and the TRAP/Mediator complex [34,38], cellular transcription factors CBF1 and RNAPII are also recruited onto the RTA promoter to induce its gene expression. Taken together, PcG proteins are deposited on the RTA promoter to suppress transcription of RTA during latency, whereas upon reactivation, the resetting of histone modifications, remodelling of the chromatin structure, and the recruitment of a large set of transcription cofactors result in the activation of RTA gene expression. Collectively, our results indicate for the first time that histone modifications associated with the latent KSHV genome can be involved not only in the regulation of latency, but also in the control of the temporal and sequential order of the lytic gene expression cascade of KSHV upon reactivation. PcG proteins appear to play a key role in the maintenance of latency as well as in the inhibition of late gene expression during the early phase of lytic reactivation. Besides these beneficial roles of PcG proteins for the viral lifecycle, KSHV may adjust its viral genome through epigenetic modifications to tightly regulate viral expression to avoid promiscuous expression of lytic genes, which may trigger host immune system responses against KSHV infected cells.

Materials and Methods Cell cultures and chemical treatments TRExBCBL1-RTA is a KSHV-positive cell line, which expresses a doxycycline-inducible Rta gene [39]. It was maintained in RPMI 1640 medium (Cellgro) supplemented by 10% Tet system approved FBS (Clontech), 100 U/ml penicillin, 100ug/ml streptomycin and 20ug/ml hygromycin B. KSHV- and EBVpositive cell line JSC-1 was grown in RPMI 1640 medium (Cellgro) containing 10% FBS (Clontech), 100 U/ml penicillin and 100ug/ml streptomycin. For 293A, HeLa and VerorKSHV.219 cell lines DMEM medium (Invitrogen) containing 10% FBS (Clontech), 100 U/ml penicillin and 100ug/ml streptomycin was used. 1 ug/ml of Doxycycline (Dox) was used to induce the expression of myc/His-tagged Rta in TRExBCBL1RTA. JSC-1 was stimulated with 5, 1, 0.2 or 0.04 uM of 3deazaneplanocin (DZNEP) or 200 nM of staurosporine (STS).

Plasmids, transfection and luciferase assay Expression plasmids HA-JMJD3 and HA-UTX were obtained from Kristian Helin (University of Copenhagen). HA-JMJD2A expression vector was gift from Yang Shi (Harvard University). HA-UTXm was generated by replacing the C terminus of the UTX in HA-UTX with PCR cloning to change histidine to alanine at amino acid 1146. pLuc3kb, pLuc1.9kb and pLuc0.9kb reporter plasmids were generated by inserting the Luciferase gene derived from pGL3-Basic and different sized fragments of the promoter region of KSHV Rta into pcDNA5/FRT. 293A and Vero.219 cells were transfected by Polyfect (Qiagen) according to the manufactrer’s specifications. Luciferase assay was performed 12

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

0.5mg/ml Proteinase K (Invitrogen) and incubated for 1 hr at 37uC. Formaldehyde crosslinks were reversed by adding sodium chloride (final concentration 300 mM) to the samples and incubated overnight at 65uC. DNA was extracted first by one volume of phenol/chloroform/isoamyl alcohol (25:24:1) saturated with 10 mM Tris, pH8.0 and 1 mM EDTA and then purified once by one volume of chloroform. DNA was precipitated by cold absolute ethanol, 10% (v/v) of 3 M sodium acetate, pH 5.2 and 1ul of 15 mg/ml Glycogen Blue (Ambion) at 280uC at least for 1 hr following by wash with 70% ethanol. Finally the input DNA was dried at RT and resuspended in 20ul of water. For ChIPs chromatin containing 10ug of DNA was first diluted in 500ul of RIPA buffer and precleared by Sepharose A beads. Immunoprecipitation was carried out with 1–2ug of antibodies overnight at 4uC. Next day to pull down the DNA/protein complexes, Protein-A/G agarose was added for 4 hr. Immunoprecipitation was washed sequentially with RIPA buffer once briefly and once for 10 min followed by washing with LiCl buffer (10 mM Tris-HCl, pH 8.0, 1mM EDTA, pH 8.0, 250 mM LiCl, 0.5% NP-40, 0.5% sodium deoxycholate) once for 10 min and with TE buffer two times for 10 min. The DNA/protein-Protein A/G agarose complex was resuspended in 100ul of TE buffer containing 50ug/ul RNase A and incubated for 30 min at 37uC. The Proteinase K treatment, crosslink reversal and DNA purification was done exactly as it was described at the preparation of input DNA. Both input and ChIP DNAs were measured by qPCR. Based on the standard curves for each primer pairs the enrichment of proteins and histone modifications on specific genomic regions were calculated as percentage of the immunoprecipitated DNA compared to input DNA. Each data points in ChIP figures were averages of at least three independent ChIPs using three independent chromatins. Sequential ChIP (seqChIP) was performed as follows. ChIPs were carried out as described above except that after the first ChIP (ChIP I.) the immunoprecipitated DNA/protein complexes were eluted from the Protein A/G agarose by 100ul of Elution buffer (50 mM Tris-HCl, pH 7.5, 10 mM EDTA, pH 8.0, 1% SDS) by heating for 10–15 min at 65uC. 10% of the elution was saved for qPCR while the rest was diluted with RIPA buffer up to 1 ml and the second immunoprecipitation (ChIP II.) was performed exactly as described above for ChIP. The second ChIP was also measured by qPCR and the enrichment of the histone modifications on specific genomic regions was calculated as percentage of the immunoprecipitated DNA compared to the total amount of DNA eluted after the first ChIP. Each ChIP values were averages of three independent ChIPs using three independent chromatins.

by using the Promega system. The luciferase activity values are the average of at least three independent experiments.

Immunofluorescent analysis Vero cells were transfected with plasmids expressing HA-tagged JMJD2A, JMJD3, UTX or UTXm respectively. 3 days post transfection, cells were fixed by 4% paraformaldehyde and then permeabilized by 0.2% Triton X100. 10% goat serum was used for blocking aspecific binding of antibodies followed by incubation of cells with antibodies against HA-tag, H3K9me3 and H3K27me3. After extensive washing with PBS, FITC- and TRITC-conjugated secondary antibodies were applied followed by Hoechst staining.

DNA affinity purification assay Biotinylated DNA was made with biotin-conjugated primers and PCR amplification of sequences that are derived from the KSHV RTA promoter. KSHV (U75698.1 at GenBank) coordinates for fragment A and B are 69861–70500 and 70511–71130, respectively. The assay was performed essentially as described by Atanasiu et. al. [75] except that streptavidine resin (Stratagene) was used to pull down the DNA/protein complexes and proteins were eluted from the streptavidine resin with Laemmli sample buffer (Sigma).

Antibodies for ChIPs and western blotting The following antibodies were used in ChIPs and/or western blotting: rabbit anti-histone H3 (Abcam ab1791), rabbit antiH3K27me3 (Millipore 07-448), rabbit anti-H3K9me3 (Millipore 07-442), rabbit anti-H3K4me3 (Millipore 04-745), rabbit antiacetyl-histone H3 (AcH3) (Millipore 06-599), rabbit anti-RNA polymerase II (H-224) (total RNAPII) (Santa Cruz sc-9001), mouse anti-RNA polymerase II (CTD4H8) (RNAPII Ser5) (Millipore 05623), rabbit anti-CBF1 (Abcam ab25949), mouse anti-EZH2 (BD Biosciences 612666), rabbit anti-SUZ12 (Abcam ab12073). For western blotting the following antibodies were used: mouse antiPARP (BD Biosciences 556494), mouse anti-actin (Abcam), antiJMJD2A (Bethyl A300-861A), anti-JMJD3 (Abgent AP1022a), KSHV specific antibodies such as mouse anti-Rta gift from Koichi Yamanishi (Osaka University, Japan), rabbit anti-K8.1, mouse anti-K8 (Abcam ab36617), rabbit anti-K2 (ABI 13-214-050), rat anti-LANA (ABI 13-210-100), Epstein Barr Virus specific antibody such as mouse anti-Zebra (Zta) (Argene 11-007).

Chromatin Immunoprecipitation assay (ChIP) TRExBCBL1-RTA cells (46107) were treated for 6, 12 and 24 hr with 1 ug/ml doxycycline. After fixation of the cells with 1% (v/v) formaldehyde for 10 min at RT, the cross-linking was stopped by adding glycine (final concentration 125 mM) for 5 min at RT. Cells were washed (36) with cold PBS and then resuspended in Cell Lysis Buffer (5 mM Tris-HCl, pH 8.0, 85 mM KCl, 0.5% NP40, 16 protease inhibitor cocktail (Roche)) and incubated on ice for 10 min. After centrifugation (5 min, 5000 rpm at 4uC), the pellet was resuspended in 1.8 ml of RIPA buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, pH 8.0, 140 mM NaCl, 0.1% SDS, 0.1% sodium deoxycholate, 1% Triton X-100, 1 mM PMSF, 16 protease inhibitor cocktail), sonicated and centrifuged at 13000 rpm for 10 min at 4uC to remove cell debris. Aliquots of the supernatant (cellular and viral chromatin) were stored at 280uC. For preparation of input DNA 20ul of chromatin was incubated in 100ul of TE buffer containing 50ug/ul RNase A for 30 min at 37uC. The samples were then adjusted to contain 0.5% SDS and PLoS Pathogens | www.plospathogens.org

Quantitative real-time PCR (qPCR) qPCR was performed using iQ SYBR Green Supermix (BioRad) and CFX96 real-time PCR machine (Bio-Rad). PCR program was as follows: after an initial preincubation step at 95uC for 3 min, there were 40 cycles, each consisting of 95uC for 10sec, 64.5 or 59uC depending on the primers (Table S1) for 20 sec and 72uC for 20sec. The last amplification cycle was followed by a melt curve analysis to make sure about the specificity of the qPCR amplification. As for ChIP 0.45ul of ChIP DNA and 4.5ng of input DNA were measured in qPCR. Quantification of the amplifications in qPCR was based on standard curves for each primer pairs. Sequences of the primers used in ChIP experiments can be found in Table S1.

RNA isolation and RT-PCR Total cellular RNA was extracted with Tri reagent (Sigma) according to the manufacturer’s instructions. 1ug of total RNA 13

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

imported into Genetrix software (Epicenter Software) for visualization. For the KSHV promoter array analysis promoter regions were defined as 1000 bp upstream and downstream of the translational start sites (TSS) of each viral gene. The coordinates of ORFs are listed in Table S2 [80]. Normalized log2 ratios were calculated as described above. Log2 ratios were averaged for each 50 bp nonoverlapping window upstream and downstream of the TSS within the promoter for each gene. The TSS was assigned a single score. The resulting matrix was applied to perform hierarchical clustering with Cluster 3.0 (http://bonsai.ims.u-tokyo.ac.jp/ ,mdehoon/software/cluster/software.htm). The results of clustering were then imported into Java TreeView for visualization (version 1.1.4r2) [81]. While the 0 hpi-ChIPs were compared to the 0 hpi-DNA inputs in ChIP-on-chips the 12 hpi-ChIP-on-chips were analysed in two different ways. (1) 12 hpi-ChIPs were compared to 0 hpi-ChIPs (Figure S3, S4, S5, S6, S7, S8, S10) or (2) the 12 hpi-ChIP/input ratio was calculated by multiplying the 0 hpi-ChIP/input ratios by the 12 hpi-ChIP/0 hpi-ChIP ratios (Figure 1 and 2).

was treated by DNase I (Sigma), reverse transcribed by iScript cDNA Synthesis kit (Bio-Rad) and the cDNA was measured by either conventional PCR or qPCR. The relative quantification of gene expression was calculated with the ddCt method, where actin mRNA was used for normalization. RT-PCR graphs were made based on the average of at least two independent experiments.

Lentiviral shRNA knockdown The shRNA constructs were prepared by using the pLKO.1 lentiviral vector. The target sequences are listed in Table S1. Supernatants from 293T cells transfected by the shRNA and packaging vectors were collected 60 hours post transfection followed by concentration of the virus (24000 rpm, 1.5 hr, 4uC) and used for spinning infections (1800 rpm, 45 min) of one million of BCBL-1 cells in the presence of 10 ug/ml polybrene. 5 days post infection cells were harvested for western blot and RT-qPCR analysis.

KSHV tiling microarray and ChIP-on-chip The 15 bp-tiling KSHV microarray contains both KSHV and human probes and was manufactured by Agilent Technologies. The probes were spotted on 8615K array format. The majority of the oligonucleotides are overlapping 60-mer probes covering the entire KSHV genome (U75698 and U75699 at GenBank). In addition, probes specific for the gene regulatory regions of 71 human genes were also included in the microarray. The human probes are derived from the Agilent Human CoC 26244k microarray design. For ChIP-on-chip 30 ug of chromatin and 5–6 ug of antibodies were used per ChIP. The ChIP and DNA purification was performed exactly as described above. ChIP DNA was amplified by Complete Whole Genome Amplification (WGA2) kit (Sigma-Aldrich) and purified with QIAquick PCR Purification kit (Qiagen) according to the instructions of the manufacturers. Labeling, hybridisation and scanning of the microarrays were done at the Functional Genomics Core, Microarray Service at City of Hope (California). Probe signals were extracted using the Agilent Feature Extraction software (version 10.5.1.1). Each ChIP-on-chip experiment was performed two times and the average of the biological replicates is shown in Figures 1, 2, 4A, S5, S6, S8. The biological replicates can be found in Figures S3, S4, S5.

Supporting Information S1 RTA-mediated reactivation of KSHV. (A) TRExBCBL1-RTA cells were treated by 1 ug/ml of doxycycline for 6, 12 and 24 hours and subject to immunoblot analysis with the indicated antibodies. myc/His-Rta was detected by anti-myc antibody. Actin was used to monitor protein amounts in the cell lysates. (B) An aliquot of cells used in (A) was used for purification of total RNA to measure the indicated viral mRNAs. (C) Using the same number of cells, the copy number of KSHV genome in Dox-induced (6, 12, 24 hpi) TRExBCBL1-RTA cells relative to that in non-induced (0 hpi) cells were determined by qPCR using primers specific for the indicated genomic regions. The amount of viral DNA was normalized for the cellular DNA input. Found at: doi:10.1371/journal.ppat.1001013.s001 (0.35 MB TIF)

Figure

Figure S2 Dissociation of histone H3 from the replicating KSHV genome. (A) The schematic map shows the KSHV genome and arrows indicate the KSHV genomic regions tested in ChIPs. RTA and LANA promoters are shown in more details. 1 to 16 represents genomic regions spanning the RTA promoter and the intron of RTA (+0.8 kb) relative to the translational start site of RTA. CBF1-binding sites are highlighted in blue. (B) Noninduced and Dox-induced TRExBCBL1-RTA cells were used to perform ChIPs for hisone H3 on different viral gene promoters or (C) on cellular promoters. (D) Non-induced and Dox-induced TRExBCBL1-RTA cells were tested in immunoblot for histone H3 and indicated histone modifications. Found at: doi:10.1371/journal.ppat.1001013.s002 (0.59 MB TIF)

Normalization, visualization and cluster analysis of the ChIP-on-chip data Log2 (Cy5/Cy3) of the mean signals of the probes were calculated, and imported into R software (version 2.9.0) (http:// www.r-project.org). Using the Bioconductor package ‘‘aroma. light,’’ lowess normalization as implemented by the function ‘‘normalizeLoess,’’ was used to correct for die bias. The log2 ratios for samples at 0hpi and 12hpi were scaled to have the same median absolute deviation (MAD) using the function ‘‘normalizeBetweenArrays,’’ from the ‘‘limma’’ Bioconductor package [76,77,78,79]. To visualize enrichment across the KSHV genome, the normalized log2 ratios for each overlapping probe were averaged and a score was assigned to each base pair (bp). A moving average was calculated across the genome by sliding a 150-bp window stepwise by 1 bp unidirectionally. When calculating the moving average, the window is shifted past regions where there are missing probes so that regions with missing probes do not have any weight. Regions where no probes exist were assigned an enrichment value of ‘‘0.’’ Log2 ratios were converted to linear scale, and tracks in BED format were created for each sample. Tracks were then PLoS Pathogens | www.plospathogens.org

Figure S3 Genome-wide mapping of histone modifications on

the KSHV genome during latency and reactivation. It shows one of the biological replicates of the ChIP-on-chip experiments (Chromatin A). Details are described in Figure 1 and S7. (A) ChIPon-chip for histone H3. (B) Histone modifications on the KSHV genome during latency. (C) Changes of histone modification on the KSHV genome upon lytic reactivation. Found at: doi:10.1371/journal.ppat.1001013.s003 (0.88 MB TIF) Figure S4 Genome-wide mapping of histone modifications on

the KSHV genome during latency and reactivation. It shows one of the biological replicates of the ChIP-on-chip experiments (Chromatin B). Details are described in Figure 1 and S7. (A) ChIPon-chip for histone H3. (B) Histone modifications on the KSHV 14

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

Found at: doi:10.1371/journal.ppat.1001013.s010 (1.34 MB TIF)

genome during latency. (C) Changes of histone modification on the KSHV genome upon lytic reactivation. Found at: doi:10.1371/journal.ppat.1001013.s004 (0.88 MB TIF)

Binding of CBF1 to the RTA promoter is essential for RTA-mediated activation of its own promoter. (A) Schematic diagrams of deletion mutants of the RTA promoter fused to the luciferase reporter gene. The size of the tested promoter regions relative to the translational start site of RTA is indicated on the left. (B) The deletion mutants were transfected into 293A cells in the absence or presence of RTA and assayed for luciferase activity. Data represent the average of three independent experiments. (C) Nuclear extract of BCBL1 was subject to DNA affinity purification with DNA fragments derived from the RTA promoter region. Fragment A includes the 3 CBF1 binding sites at 21.4 kb relative to the translational start site of RTA. Fragment B is derived from the adjacent promoter region of Fragment A that does not contain any CBF1 binding sites. Immunoblot was performed with an antiCBF-1 specific antibody. Found at: doi:10.1371/journal.ppat.1001013.s011 (0.13 MB TIF)

Figure S11

Figure S5 Normalization of the ChIP-on-chip data by histone H3 at 0 hpi. The ChIP-on-chip experiments were performed as described in Figure 1. The 0 hpi-histone modification ChIP-onchip data were derived from two independent chromatins (Chromatin A and B), which were divided by the relevant 0 hpiH3 ChIP-on-chip dataset (the upper and middle panels). The average of the H3-normalized 0 hpi-ChIP-on-chip dataset is shown in the lower panel. Found at: doi:10.1371/journal.ppat.1001013.s005 (0.67 MB TIF) Figure S6 Comparison of the hierarchical clustering of histone

modifications associated with the regulatory regions of viral genes with and without H3 normalization. The clustering was performed as described in Figure 2. (A) The 0 hpi-ChIP-on-chip dataset used in the hierarchical clustering without normalization for histone H3. The clustering was taken from Figure 2A. (B) The 0 hpiChIP-on-chip dataset used in the hierarchical clustering was normalized for changes in histone H3. Found at: doi:10.1371/journal.ppat.1001013.s006 (2.96 MB TIF)

Overexpression of histone demethylases in Vero cells. (A) Overexpressed HMTs were detected by immunoblotting with HA-, JMJD2A- and JMJD3-specific antibodies. (B and C) HA-tagged JMJD2A, JMJD3, UTX and UTXm were transfected into Vero cells, which were subject to immunofluorescent analysis at 3 days post-transfection. The expression of enzymatically active JMJD2A suppressed the steady-state level of H3K9me3, whereas JMJD3 and UTX decreased H3K27me3 in transfected cells. Found at: doi:10.1371/journal.ppat.1001013.s012 (1.27 MB TIF)

Figure S12

Figure S7 Genome-wide mapping of histone modifications on

the KSHV genome during latency and reactivation. Each ChIPon-chip experiment is an average of two biological replicates. (A) Histone H3 ChIP-on-chip is the same as in Figure 1. (B) Histone modification ChIP-on-chips were performed using non-induced TRExBCBL1-Rta cells. (C) ChIP-on-chips were performed with TRExBCBL1-Rta cells induced by doxycycline for 12 hours. Changes of the distribution of histone modifications during reactivation (12 hpi) are shown as the normalized Cy5/Cy3 ratio of 12 hpi-ChIP DNA over 0 hpi-ChIP DNA. Red line indicates that the Cy5 (ChIP at 12 hpi)/Cy3 (ChIP at 0 hpi) ratio equals one showing that there is no change in the level of histone modifications between 0hpi and 12hpi. Numbers in the left upper corners show the maximum values of Cy5/Cy3. Missing probes in specific genomic regions are shown below the genome scale (**). The alternating dark and light blue squares atop display the viral ORFs where the white triangle indicates ORFs that are expressed from the reverse DNA strand. The hpi stands for hours postinduction. Found at: doi:10.1371/journal.ppat.1001013.s007 (0.85 MB TIF)

Induction of apoptosis is not sufficient to induce KSHV reactivation. (A) JSC-1 cells were treated with 5 uM DZNep for 1, 2 and 3 days (dpt) and then harvested for immunoblot analysis with an anti-PARP antibody. (B) JSC-1 cells were treated with different concentrations of DZNep for 3 days and then subject to immunoblot analysis with the indicated antibodies. (C) JSC-1 cells were treated with 200 nM of staurosporine (STS) for 1, 2 and 3 days followed by immunoblot analysis for the indicated cellular proteins and histone modifications (H3K27me3, H3K9me3). (D) A portion of control and 3dpt JSC-1cells used in (C) was used for RNA purification and the levels of the indicated KSHV mRNA were measured by RT-qPCR. Found at: doi:10.1371/journal.ppat.1001013.s013 (0.55 MB TIF)

Figure S13

Table S1 Primer sequences Found at: doi:10.1371/journal.ppat.1001013.s014 (0.06 MB DOC)

Hierarchical clustering of histone modifications associated with the regulatory regions of viral genes. Based on their expression patterns the viral genes were grouped as latent, IE, E and L genes and hierarchical clustering was performed within the groups. Details are described in Figure 2. Panel A is identical with Figure 2A while panel B shows 12 hpi-ChIP/0 hpiChIP based on Figure S7. Found at: doi:10.1371/journal.ppat.1001013.s008 (2.64 MB TIF)

Figure S8

Table S2 Genomic coordinates of the open reading frames of KSHV. IE = immediate early, E = early, L = late. Start codon coordinates indicate the first nucleotide of the KSHV genes and the stop codon coordinates indicate the last nucleotide in the stop codon of the KSHV genes. Coordinates are based on U75698.1 at GenBank. Found at: doi:10.1371/journal.ppat.1001013.s015 (0.05 MB DOC)

Figure S9 IE genes are repressed during latency but rapidly induced upon reactivation. Total RNAs were purified from noninduced (0 hpi) and induced (6, 12, 24 hpi) TRExBCBL1-RTA cells followed by RT-PCR using specific primers for the indicated IE transcripts. RT+: cDNA synthesis was performed with reverse transcriptase, RT2: cDNA synthesis reaction did not include reverse transcriptase. Found at: doi:10.1371/journal.ppat.1001013.s009 (0.25 MB TIF)

Acknowledgements We specially thank Drs. Kristian Helin and Yang Shi for providing reagents as well as Dr. Chengyu Liang for critical reading of the manuscript.

Author Contributions

Colocalization of EZH2 and H3K27me3 on the regulatory region of KSHV genes. Hierarchical clustering of EZH2 and H3K27me3 associated with the regulatory region of KSHV genes is performed as described in Figure 2. Figure S10

PLoS Pathogens | www.plospathogens.org

Conceived and designed the experiments: ZT JUJ. Performed the experiments: ZT. Analyzed the data: ZT DTM. Contributed reagents/ materials/analysis tools: ZT DTM SHL HRL LYW KFB JDB PWL VEM. Wrote the paper: ZT SL JUJ.

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

References 1. Kouzarides T (2007) Chromatin modifications and their function. Cell 128: 693–705. 2. MacDonald VE, Howe LJ (2009) Histone acetylation: where to go and how to get there. Epigenetics 4: 139–143. 3. Cloos PA, Christensen J, Agger K, Helin K (2008) Erasing the methyl mark: histone demethylases at the center of cellular differentiation and disease. Genes Dev 22: 1115–1140. 4. Dillon SC, Zhang X, Trievel RC, Cheng X (2005) The SET-domain protein superfamily: protein lysine methyltransferases. Genome Biol 6: 227. 5. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, et al. (2007) Highresolution profiling of histone methylations in the human genome. Cell 129: 823–837. 6. Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, et al. (2007) Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39: 311–318. 7. Roh TY, Cuddapah S, Cui K, Zhao K (2006) The genomic landscape of histone modifications in human T cells. Proc Natl Acad Sci U S A 103: 15782–15787. 8. Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, et al. (2008) Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet 40: 897–903. 9. Schotta G, Lachner M, Sarma K, Ebert A, Sengupta R, et al. (2004) A silencing pathway to induce H3-K9 and H4-K20 trimethylation at constitutive heterochromatin. Genes Dev 18: 1251–1262. 10. Trojer P, Reinberg D (2007) Facultative heterochromatin: is there a distinctive molecular signature? Mol Cell 28: 1–13. 11. Bracken AP, Dietrich N, Pasini D, Hansen KH, Helin K (2006) Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. Genes Dev 20: 1123–1136. 12. Kirmizis A, Bartley SM, Kuzmichev A, Margueron R, Reinberg D, et al. (2004) Silencing of human polycomb target genes is associated with methylation of histone H3 Lys 27. Genes Dev 18: 1592–1605. 13. Boyer LA, Plath K, Zeitlinger J, Brambrink T, Medeiros LA, et al. (2006) Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441: 349–353. 14. Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, et al. (2006) Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125: 301–313. 15. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, et al. (2006) A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125: 315–326. 16. Simon JA, Kingston RE (2009) Mechanisms of polycomb gene silencing: knowns and unknowns. Nat Rev Mol Cell Biol 10: 697–708. 17. Kuzmichev A, Nishioka K, Erdjument-Bromage H, Tempst P, Reinberg D (2002) Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. Genes Dev 16: 2893–2905. 18. Muller J, Kassis JA (2006) Polycomb response elements and targeting of Polycomb group proteins in Drosophila. Curr Opin Genet Dev 16: 476–484. 19. Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT (2008) Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322: 750–756. 20. Sing A, Pannell D, Karaiskakis A, Sturgeon K, Djabali M, et al. (2009) A vertebrate Polycomb response element governs segmentation of the posterior hindbrain. Cell 138: 885–897. 21. Agger K, Cloos PA, Christensen J, Pasini D, Rose S, et al. (2007) UTX and JMJD3 are histone H3K27 demethylases involved in HOX gene regulation and development. Nature 449: 731–734. 22. Cho YW, Hong T, Hong S, Guo H, Yu H, et al. (2007) PTIP associates with MLL3- and MLL4-containing histone H3 lysine 4 methyltransferase complex. J Biol Chem 282: 20395–20406. 23. Lee MG, Villa R, Trojer P, Norman J, Yan KP, et al. (2007) Demethylation of H3K27 regulates polycomb recruitment and H2A ubiquitination. Science 318: 447–450. 24. Smith ER, Lee MG, Winter B, Droz NM, Eissenberg JC, et al. (2008) Drosophila UTX is a histone H3 Lys27 demethylase that colocalizes with the elongating form of RNA polymerase II. Mol Cell Biol 28: 1041–1046. 25. Knipe DM, Cliffe A (2008) Chromatin control of herpes simplex virus lytic and latent infection. Nat Rev Microbiol 6: 211–221. 26. Kutluay SB, Triezenberg SJ (2009) Role of chromatin during herpesvirus infections. Biochim Biophys Acta 1790: 456–466. 27. Lieberman PM (2008) Chromatin organization and virus gene expression. J Cell Physiol 216: 295–302. 28. Silva L, Cliffe A, Chang L, Knipe DM (2008) Role for A-type lamins in herpesviral DNA targeting and heterochromatin modulation. PLoS Pathog 4: e1000071. 29. Garber DA, Schaffer PA, Knipe DM (1997) A LAT-associated function reduces productive-cycle gene expression during acute infection of murine sensory neurons with herpes simplex virus type 1. J Virol 71: 5885–5893. 30. Liang Y, Vogel JL, Narayanan A, Peng H, Kristie TM (2009) Inhibition of the histone demethylase LSD1 blocks alpha-herpesvirus lytic replication and reactivation from latency. Nat Med. 31. Chang Y, Cesarman E, Pessin MS, Lee F, Culpepper J, et al. (1994) Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi’s sarcoma. Science 266: 1865–1869.

PLoS Pathogens | www.plospathogens.org

32. Ganem D (2006) KSHV infection and the pathogenesis of Kaposi’s sarcoma. Annu Rev Pathol 1: 273–296. 33. Soulier J, Grollet L, Oksenhendler E, Cacoub P, Cazals-Hatem D, et al. (1995) Kaposi’s sarcoma-associated herpesvirus-like DNA sequences in multicentric Castleman’s disease. Blood 86: 1276–1280. 34. Lu F, Zhou J, Wiedmer A, Madden K, Yuan Y, et al. (2003) Chromatin remodeling of the Kaposi’s sarcoma-associated herpesvirus ORF50 promoter correlates with reactivation from latency. J Virol 77: 11425–11435. 35. Chen J, Ueda K, Sakakibara S, Okuno T, Parravicini C, et al. (2001) Activation of latent Kaposi’s sarcoma-associated herpesvirus by demethylation of the promoter of the lytic transactivator. Proc Natl Acad Sci U S A 98: 4119–4124. 36. Lukac DM, Renne R, Kirshner JR, Ganem D (1998) Reactivation of Kaposi’s sarcoma-associated herpesvirus infection from latency by expression of the ORF 50 transactivator, a homolog of the EBV R protein. Virology 252: 304–312. 37. Sun R, Lin SF, Gradoville L, Yuan Y, Zhu F, et al. (1998) A viral gene that activates lytic cycle expression of Kaposi’s sarcoma-associated herpesvirus. Proc Natl Acad Sci U S A 95: 10866–10871. 38. Gwack Y, Baek HJ, Nakamura H, Lee SH, Meisterernst M, et al. (2003) Principal role of TRAP/mediator and SWI/SNF complexes in Kaposi’s sarcoma-associated herpesvirus RTA-mediated lytic reactivation. Mol Cell Biol 23: 2055–2067. 39. Nakamura H, Lu M, Gwack Y, Souvlis J, Zeichner SL, et al. (2003) Global changes in Kaposi’s sarcoma-associated virus gene expression patterns following expression of a tetracycline-inducible Rta transactivator. J Virol 77: 4205–4220. 40. Oh J, Fraser NW (2008) Temporal association of the herpes simplex virus genome with histone proteins during a lytic infection. J Virol 82: 3530–3537. 41. Ferrari R, Pellegrini M, Horwitz GA, Xie W, Berk AJ, et al. (2008) Epigenetic reprogramming by adenovirus e1a. Science 321: 1086–1088. 42. Jenner RG, Alba MM, Boshoff C, Kellam P (2001) Kaposi’s sarcoma-associated herpesvirus latent and lytic gene expression as revealed by DNA arrays. J Virol 75: 891–902. 43. Paulose-Murphy M, Ha NK, Xiang C, Chen Y, Gillim L, et al. (2001) Transcription program of human herpesvirus 8 (kaposi’s sarcoma-associated herpesvirus). J Virol 75: 4843–4853. 44. Ellison TJ, Izumiya Y, Izumiya C, Luciw PA, Kung HJ (2009) A comprehensive analysis of recruitment and transactivation potential of K-Rta and K-bZIP during reactivation of Kaposi’s sarcoma-associated herpesvirus. Virology 387: 76–88. 45. Liang Y, Chang J, Lynch SJ, Lukac DM, Ganem D (2002) The lytic switch protein of KSHV activates gene expression via functional interaction with RBPJkappa (CSL), the target of the Notch signaling pathway. Genes Dev 16: 1977–1989. 46. Liang Y, Ganem D (2003) Lytic but not latent infection by Kaposi’s sarcomaassociated herpesvirus requires host CSL protein, the mediator of Notch signaling. Proc Natl Acad Sci U S A 100: 8490–8495. 47. Lan K, Kuppers DA, Robertson ES (2005) Kaposi’s sarcoma-associated herpesvirus reactivation is regulated by interaction of latency-associated nuclear antigen with recombination signal sequence-binding protein Jkappa, the major downstream effector of the Notch signaling pathway. J Virol 79: 3468–3478. 48. Persson LM, Wilson AC. Wide-scale use of Notch signaling factor CSL/RBPJkappa in RTA-mediated activation of Kaposi’s sarcoma-associated herpesvirus lytic genes. J Virol 84: 1334–1347. 49. Gray KS, Allen RD, 3rd, Farrell ML, Forrest JC, Speck SH (2009) Alternatively initiated gene 50/RTA transcripts expressed during murine and human gammaherpesvirus reactivation from latency. J Virol 83: 314–328. 50. Whetstine JR, Nottke A, Lan F, Huarte M, Smolikov S, et al. (2006) Reversal of histone lysine trimethylation by the JMJD2 family of histone demethylases. Cell 125: 467–481. 51. Hong S, Cho YW, Yu LR, Yu H, Veenstra TD, et al. (2007) Identification of JmjC domain-containing UTX and JMJD3 as histone H3 lysine 27 demethylases. Proc Natl Acad Sci U S A 104: 18439–18444. 52. Sen GL, Webster DE, Barragan DI, Chang HY, Khavari PA (2008) Control of differentiation in a self-renewing mammalian tissue by the histone demethylase JMJD3. Genes Dev 22: 1865–1870. 53. Vieira J, O’Hearn PM (2004) Use of the red fluorescent protein as a marker of Kaposi’s sarcoma-associated herpesvirus lytic gene expression. Virology 325: 225–240. 54. Tan J, Yang X, Zhuang L, Jiang X, Chen W, et al. (2007) Pharmacologic disruption of Polycomb-repressive complex 2-mediated gene repression selectively induces apoptosis in cancer cells. Genes Dev 21: 1050–1063. 55. Krishnan HH, Naranatt PP, Smith MS, Zeng L, Bloomer C, et al. (2004) Concurrent expression of latent and a limited number of lytic genes with immune modulation and antiapoptotic function by Kaposi’s sarcoma-associated herpesvirus early during infection of primary endothelial and fibroblast cells and subsequent decline of lytic gene expression. J Virol 78: 3601–3620. 56. Sproul D, Gilbert N, Bickmore WA (2005) The role of chromatin structure in regulating the expression of clustered genes. Nat Rev Genet 6: 775–781. 57. Nitzsche A, Paulus C, Nevels M (2008) Temporal dynamics of cytomegalovirus chromatin assembly in productively infected human cells. J Virol 82: 11167–11180.

July 2010 | Volume 6 | Issue 7 | e1001013

Epigenetic Analysis of the KSHV Genome

70. Chang PC, Fitzgerald LD, Van Geelen A, Izumiya Y, Ellison TJ, et al. (2009) Kruppel-associated box domain-associated protein-1 as a latency regulator for Kaposi’s sarcoma-associated herpesvirus and its modulation by the viral protein kinase. Cancer Res 69: 5681–5689. 71. Yang Z, Wood C (2007) The transcriptional repressor K-RBP modulates RTAmediated transactivation and lytic replication of Kaposi’s sarcoma-associated herpesvirus. J Virol 81: 6294–6306. 72. Lu F, Day L, Gao SJ, Lieberman PM (2006) Acetylation of the latencyassociated nuclear antigen regulates repression of Kaposi’s sarcoma-associated herpesvirus lytic transcription. J Virol 80: 5273–5282. 73. Kwiatkowski DL, Thompson HW, Bloom DC (2009) The polycomb group protein Bmi1 binds to the herpes simplex virus 1 latent genome and maintains repressive histone marks during latency. J Virol 83: 8173–8181. 74. Issaeva I, Zonis Y, Rozovskaia T, Orlovsky K, Croce CM, et al. (2007) Knockdown of ALR (MLL2) reveals ALR target genes and leads to alterations in cell adhesion and growth. Mol Cell Biol 27: 1889–1903. 75. Atanasiu C, Lezina L, Lieberman PM (2005) DNA affinity purification of Epstein-Barr virus OriP-binding proteins. Methods Mol Biol 292: 267–276. 76. Smyth GK, Yang YH, Speed T (2003) Statistical issues in cDNA microarray data analysis. Methods Mol Biol 224: 111–136. 77. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, et al. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30: e15. 78. Smyth GK (2005) Limma: linear models for microarray data. In: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, eds. Bioinformatics and Computional Biology Solutions using R and Bioconductor. 79. Yang JH, Dudoit S, Luu P, Speed T (2001) Normalization for cDNA microarray data. Proceedings in SPIE 4266: 141–152. 80. Yuan Y, Renne R (2009) Organization and expression of the Kaposi’s sarcomaassociated herpesvirus genome. In: Damania B, Pipas JM, eds. DNA tumor viruses. 81. Saldanha AJ (2004) Java Treeview–extensible visualization of microarray data. Bioinformatics 20: 3246–3248.

58. Arumugaswami V, Wu TT, Martinez-Guzman D, Jia Q, Deng H, et al. (2006) ORF18 is a transfactor that is essential for late gene transcription of a gammaherpesvirus. J Virol 80: 9730–9740. 59. Wong E, Wu TT, Reyes N, Deng H, Sun R (2007) Murine gammaherpesvirus 68 open reading frame 24 is required for late gene expression after DNA replication. J Virol 81: 6761–6764. 60. Wu TT, Park T, Kim H, Tran T, Tong L, et al. (2009) ORF30 and ORF34 are essential for expression of late genes in murine gammaherpesvirus 68. J Virol 83: 2265–2273. 61. Sun R, Lin SF, Staskus K, Gradoville L, Grogan E, et al. (1999) Kinetics of Kaposi’s sarcoma-associated herpesvirus gene expression. J Virol 73: 2232–2242. 62. Hargreaves DC, Horng T, Medzhitov R (2009) Control of inducible gene expression by signal-dependent transcriptional elongation. Cell 138: 129–145. 63. Cliffe AR, Garber DA, Knipe DM (2009) Transcription of the herpes simplex virus latency-associated transcript promotes the formation of facultative heterochromatin on lytic promoters. J Virol 83: 8182–8190. 64. Day L, Chau CM, Nebozhyn M, Rennekamp AJ, Showe M, et al. (2007) Chromatin profiling of Epstein-Barr virus latency control region. J Virol 81: 6389–6401. 65. Tempera I, Lieberman PM (2009) Chromatin organization of gammaherpesvirus latent genomes. Biochim Biophys Acta. 66. Villa R, Pasini D, Gutierrez A, Morey L, Occhionorelli M, et al. (2007) Role of the polycomb repressive complex 2 in acute promyelocytic leukemia. Cancer Cell 11: 513–525. 67. Dellino GI, Schwartz YB, Farkas G, McCabe D, Elgin SC, et al. (2004) Polycomb silencing blocks transcription initiation. Mol Cell 13: 887–893. 68. Francis NJ, Kingston RE, Woodcock CL (2004) Chromatin compaction by a polycomb group protein complex. Science 306: 1574–1577. 69. Stock JK, Giadrossi S, Casanova M, Brookes E, Vidal M, et al. (2007) Ring1mediated ubiquitination of H2A restrains poised RNA polymerase II at bivalent genes in mouse ES cells. Nat Cell Biol 9: 1428–1435.

PLoS Pathogens | www.plospathogens.org

July 2010 | Volume 6 | Issue 7 | e1001013

The Epigenetic Landscape of Latent Kaposi Sarcoma-Associated Herpesvirus Genomes Thomas Gu¨nther, Adam Grundhoff* Heinrich-Pette-Institute for Experimental Virology and Immunology, Hamburg, Germany

Abstract Herpesvirus latency is generally thought to be governed by epigenetic modifications, but the dynamics of viral chromatin at early timepoints of latent infection are poorly understood. Here, we report a comprehensive spatial and temporal analysis of DNA methylation and histone modifications during latent infection with Kaposi Sarcoma-associated herpesvirus (KSHV), the etiologic agent of Kaposi Sarcoma and primary effusion lymphoma (PEL). By use of high resolution tiling microarrays in conjunction with immunoprecipitation of methylated DNA (MeDIP) or modified histones (chromatin IP, ChIP), our study revealed highly distinct landscapes of epigenetic modifications associated with latent KSHV infection in several tumorderived cell lines as well as de novo infected endothelial cells. We find that KSHV genomes are subject to profound methylation at CpG dinucleotides, leading to the establishment of characteristic global DNA methylation patterns. However, such patterns evolve slowly and thus are unlikely to control early latency. In contrast, we observed that latency-specific histone modification patterns were rapidly established upon a de novo infection. Our analysis furthermore demonstrates that such patterns are not characterized by the absence of activating histone modifications, as H3K9/K14-ac and H3K4-me3 marks were prominently detected at several loci, including the promoter of the lytic cycle transactivator Rta. While these regions were furthermore largely devoid of the constitutive heterochromatin marker H3K9-me3, we observed rapid and widespread deposition of H3K27-me3 across latent KSHV genomes, a bivalent modification which is able to repress transcription in spite of the simultaneous presence of activating marks. Our findings suggest that the modification patterns identified here induce a poised state of repression during viral latency, which can be rapidly reversed once the lytic cycle is induced. Citation: Gu¨nther T, Grundhoff A (2010) The Epigenetic Landscape of Latent Kaposi Sarcoma-Associated Herpesvirus Genomes. PLoS Pathog 6(6): e1000935. doi:10.1371/journal.ppat.1000935 Editor: Paul Kellam, Sanger Institute, United Kingdom Received January 6, 2010; Accepted May 3, 2010; Published June 3, 2010 Copyright: ß 2010 Gu¨nther, Grundhoff. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The Heinrich-Pette-Institute is a member of the Leibniz Gemeinschaft (WGL, http://www.leibniz-gemeinschaft.de/) and is supported by the Free and Hanseatic City of Hamburg and the Federal Ministry of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: adam.grundhoff@hpi.uni-hamburg.de

latency-associated nuclear antigen LANA (encoded by ORF73) which permits replication of latent episomes, a viral cyclin D homologue (v-Cyc/ORF72), a viral homologue of a FLICEinhibitory protein (v-Flip) encoded by ORF71 (also termed K13) and Kaposin (ORF K12), a protein that can stabilize cytokine transcripts [7,8,9,10,11,12]. All of the above proteins are translated from alternatively spliced mRNAs transcribed from a single multicistronic locus; primary transcripts from the locus furthermore can give rise to 12 virally encoded microRNAs (miRNAs) [7,13,14,15,16,17,18]. It is thought that, together, these genes serve to ensure persistence of the latent infection and survival of the host cell. However, several of the latency genes have also been shown to exhibit tumorigenic properties in various experimental systems, supporting the idea that the viral latency program plays a causative role during onset and/or progression of KSHV associated tumors. The viral genes which encode components of the lytic or productive cycle are transcriptionally silent during latency. This quiescent state of infection can be overturned by forced expression of Rta (the product of the ORF50 gene, also termed Lyta), a homologue of the Epstein-Barr virus (EBV) transactivator Rta [19,20,21]. Upon expression, Rta acts as a master-switch regulator which orchestrates the expression of downstream lytic genes,

Introduction Herpesviruses are able to establish latent infections, enabling them to persist for the lifetime of their host [1]. During latency, no viral progeny is produced; instead, the largely quiescent genome persists as an extrachromosomal episome in the nucleus of the infected cell. Unfavorable conditions (e.g. cell stress) may trigger reactivation of such cells, leading to induction of the lytic cycle and completion of the viral lifecycle. In a healthy host, latently infected cells form a reservoir of chronic viral infection which is tightly controlled by the immune system. However, latently infected cells may also give rise to disease if the immunological control is lost. This is especially true for the members of the gammaherpesvirus subfamily, which are frequently associated with tumors in their natural host, in particular in immunosuppressed individuals. Kaposi Sarcoma-associated herpesvirus (KSHV) is etiologically linked to Kaposi Sarcoma (KS), a tumor of endothelial origin, as well as at least two lymphoproliferative disorders, primary effusion lymphoma (PEL) and multicentric Castleman’s disease (MCD) [2,3,4]. The majority of tumor cells in these malignancies exhibit a latent gene expression profile which has been extensively studied in cell lines established from PEL tumors [5,6,7]. These cells express a very limited contingent of viral genes, including the PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

patterns evolve as a secondary result of the histone modification patterns, which are established early in the infection. Surprisingly, in spite of their quiescent state, latent KSHV episomes were also not devoid of activating histone marks: In fact, such marks occupied several lytic promoters soon after the de novo infection and were not stripped from the genomes in long term infected cells. However, concomitant with the appearance of these modifications, latent genomes were also subject to profound tri-methylation of lysine 27 of histone H3 (H3K27), a modification which can suppress transcription even in the presence of activating marks [29,30,31]. Thus, latent episomes bear the hallmarks of poised chromatin, an observation which is in line with the hypothesis that viral latency represents a meta-stable state of transcriptional repression which can be quickly reversed once the lytic cycle is induced.

Author Summary A characteristic feature of herpesviruses is their ability to establish a latent infection during which most of the viral genes are silenced. As a consequence, no viral progeny is produced and the host cell remains viable. While the viral genome may persist in the nucleus of such cells indefinitely, it retains the ability to re-enter the lytic cycle and produce new virions if conditions in the cell become unfavorable. The molecular requirements for the establishment of latency are poorly understood, but are thought to depend on epigenetic modifications of the viral episome. Here, we report a genome-wide screen to investigate DNA methylation and histone modification patterns associated with latent infection by Kaposi Sarcoma-associated herpesvirus (KSHV), a tumor virus linked to the development of several cancers. We find that latency is likely to be determined by modifications commonly associated with genes that are transcriptionally ‘‘poised’’. The promoters of such genes harbor activating as well as repressive histone marks such that they are silenced, but they can be rapidly activated upon removal of the repressive marks. Our findings thus may explain how KSHV achieves efficient quiescence during latency, yet retains the potential to quickly revert to a fully active state upon induction of the lytic cycle.

Results/Discussion DNA methylation patterns of latent KSHV genomes In mammals, DNA methylation occurs almost exclusively by methylation of cytidine residues at CpG dinucleotides and is generally associated with transcriptional repression (reviewed in [32]). As methylcytidine is prone to spontaneous deamination, an evolutionary consequence of DNA methylation is the relative scarcity of CpG dinucleotides in methylated genomes. In contrast to most members of the alpha- and betaherpesvirus subfamily, the majority of gammaherpesviruses show evidence of such CpG suppression, suggesting that these viruses are subject to DNA methylation [26]. Furthermore, the genomes of EBV as well as the rhadinovirus Herpesvirus saimiri (HVS) have been found to carry methylated CpG motifs at multiple loci in latently infected cells, suggesting that DNA methylation plays a role in the control of latent gammaherpesvirus gene expression patterns (reviewed in [33,34]). So far, analysis of DNA methylation within latent KSHV genomes has been limited to the promoters of the gene encoding Rta (i.e. ORF50) and the promoter upstream of ORF73/LANA which drives expression of the latency gene cluster. While no CpG methylation was detected in the region of the ORF73 promoter, the ORF50 promoter was found to be heavily methylated in the PEL derived cell line BCBL-1 [26]. As promoter activity was furthermore repressed by DNA methylation in an in vitro assay, it was suggested that CpG methylation actively suppresses expression of the lytic switch gene Rta during KSHV latency [26]. However, this hypothesis is complicated by the fact that the same study also found that the majority of samples from different KSHV-positive tumor samples did not harbor these methylation patterns. The authors suggest that their observations may have been due to the presence of lytic cells, which represent a subpopulation among the mostly latently infected cells in some tumor types. Given the absence of comprehensive DNA methylation data for KSHV, we first sought to determine the global methylation status of viral episomes in PEL derived cell lines. For this purpose, we employed the MeDIP (methylated DNA immunoprecipitation) technique, which is based on the pulldown of methylated DNA using methylcytidine-specific antibodies [35]. The MeDIP samples were analyzed on a custom-designed, high-resolution microarray which covers both strands of the KSHV genome in nonoverlapping, hybridization temperature-optimized 60mers. To obtain a quantitative measure of the extent of DNA methylation, we additionally devised positive and negative controls according to the scheme depicted in Figure 1A. As a negative control, we employed a bacterially amplified (and hence CpG methylation free) bacmid clone which carries the complete KSHV genome

leading to massive amplification of viral genomes, followed by assembly of virions and, ultimately, death of the host cells and release of viral progeny [20,21,22,23]. How Rta and other lytic genes are kept silenced during latency is not understood, but it is very likely that epigenetic modifications play an important role during this process. This notion is supported by the fact that treatment of latently infected PEL cells with inhibitors of DNA methyltransferases as well as histone deacetylases induces lytic cycle replication, and that lytic cycle induction leads to profound chromatin rearrangements at several loci [24,25,26,27]. Furthermore, the ORF50 promoter was reported to be subject to DNA methylation in latently infected PEL cells whereas the latent ORF73 promoter remained unmethylated, and it has therefore been suggested that CpG methylation may actively repress Rta expression during latency [26]. The DNA methylation status of other regions of the KSHV genome, however, has so far not been analyzed. Likewise, the current knowledge about global histone modification patterns during viral latency is very limited. All studies of latent modification patterns have furthermore been performed in PEL cells and thus describe the epigenetic status during fully established latency. However, since the packaged virion DNA is devoid of DNA methylation as well as histones [28] and thus epigenetically naı¨ve, such epigenetic modification patterns need to be re-established during each round of latent infection. Especially the early phase of a de novo infection thus represents a critical phase of the viral lifecycle. We have performed a comprehensive study of DNA methylation as well as histone modification patterns across the complete KSHV genome, in both PEL cells as well as a de novo infected endothelial cell line. We have observed highly distinct global patterns on the level of both DNA as well as histone modifications. Such patterns were furthermore highly similar in PEL cells and stably infected endothelial cells, suggesting a highly regulated modification program during latency establishment. However, whereas modified histones could be readily detected at early timepoints of a de novo infection, DNA methylation patterns evolved over significantly longer periods of time, suggesting they do not govern early latency expression patterns. Our analysis rather suggests that DNA methylation PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure 1. Experimental design of MeDIP analysis. A: Schematic representation of the experimental setup for the analysis of CpG methylation patterns. The KSHV episome in infected cells is expected to be partially methylated, as indicated by black and white circles which symbolize methylated or unmethylated CpG dinucleotides, respectively. Genomic DNA was isolated from such cells and the samples were subjected to immunoprecipitation using a methylcytidine specific antibody (MeDIP procedure), followed by hybridization of the precipitated samples versus the input on tiling microarrays. For each probe, an enrichment score ES was calculated, which represents the ratio of MeDIP over input fluorescence signals. The efficiency of the immunoprecipitation depends on the total number of methylated CpG motifs in a given fragment and ES is thus a function of the extend of methylation as well as local CpG frequencies. Therefore, to obtain reference values which signify maximum methylation for each probe, we generated a positive control by subjecting KSHV bacmids to CpG methylation in vitro. The bacmid was mixed with cellular DNA to simulate the host background and subjected to the same MeDIP procedure as samples from infected cells. Similarly, a negative control of unmethylated bacmid was prepared to control for cross-hybridization of unspecific background. After normalization of the array data using a spike-in control (see Material & Methods for details), background-corrected methylation values MS and MP were calculated for each probe by subtraction of the corresponding negative control value. B: Confirmation of successful in vitro methylation of KSHV bacmids used as a positive control. A bacmid carrying the complete KSHV genome (BAC36 [36]) was methylated using M.SssI, a methyltransferase specific for CpG dinucleotides. Methylated or unmethylated bacmids were subjected to restriction digestion using the methylation sensitive enzyme HpaII and its isoschizomer MspI, which cuts regardless of methylation. Methylated bacmids were resistant to HpaII digestion, signifying complete methylation. doi:10.1371/journal.ppat.1000935.g001

[36]. For a given DNA fragment, the amount of DNA which can be maximally recovered by MeDIP is dependent on the number of CpG motifs in that sequence. Consequently, the array hybridization patterns are a function of the relative degree of methylation as well as local CpG frequency. To control for such differences we generated a positive control by in vitro methylation of KSHV bacmids using the methylase M. SssI, which is specific for CpG dinucleotides. Restriction analysis confirmed complete methylation of the bacmid DNA (Fig. 1B). Prior to immunoprecipitation, the untreated or in vitro methylated bacmid DNA was mixed with DNA from human cell lines to also control for any signals which may arise due to cross-hybridization of cellular DNA in the infected samples. The ratio of viral and cellular DNA was selected such that it is equal to that typically seen in KSHV-infected PEL cell lines and corresponds to a viral copy number of approximately 30 genomes per cell. Furthermore, all samples were spiked with a constant amount of in vitro methylated heterologous DNA, so that accurate normalization across individual array hybridizations could be performed. After normalization, the MeDIP values obtained from the samples or the positive control were corrected by subtracting the background values from the negative control (see Material & Methods for details). To minimize the risk of investigating cell-line specific (and hence potentially random) modifications, we analyzed the global DNA methylation pattern of KSHV genomes in three different PEL-derived cell lines: BCBL1, AP3 and HBL6. The HBL6 line was originally established from a PEL tumor co-infected with KSHV and EBV and carries both viruses in a latent state [37]. BCBL1 and AP3 cells are KSHV positive, but negative for EBV. In Figure 2, we present the results of our analysis of PEL cells, along with the data obtained from the positive control (upper four PLoS Pathogens | www.plospathogens.org

solid graphs in each panel; see also Figure S1 for a more detailed view with a differentially scaled x-axis). The distribution of local CpG frequencies across the KSHV genome is also shown (black line graph). As expected, the positive control yielded a signal distribution which showed a high degree of correlation with local CpG content (Pearson correlation coefficient = 0.513, see Table 1). The results from the PEL-derived cell lines revealed that, indeed, KSHV genomes are subject to profound DNA methylation during latency. For all three lines, we observed global methylation profiles which were strikingly similar, with overall correlation coefficients ranging from 0.593 to 0.724 (Table 1). Furthermore, the profiles were clearly not a mere function of CpG content, as several regions showed low levels of DNA methylation in all three PEL lines, but not the positive control. One such region, extending approximately from nucleotides (nts.) 127301 to 128901, harbors the major latency promoter upstream of ORF73. The absence of methylation in this area is to be expected (and has been noted before [26]), given the constitutive activity of the promoter in latently infected cells. However, our analysis revealed several additional loci which were not (or only poorly) methylated, despite their (presumable) transcriptional inactivity in latently infected cells. For example, the region between nts. 9701 and 12601 showed very little methylation in PEL lines compared to the positive control; this area is centered on the start position of the gene encoding the DNA polymerase (ORF9), which is exclusively expressed during the lytic cycle. While the methylation profiles of the three PEL lines were highly similar, the absolute degree of methylation was different: Across the complete KSHV genome, HBL6 reached approximately 88% of the MeDIP signal obtained for the positive control, followed by AP3 (54%) and BCBL1 (51%). 3

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure 2. Global DNA methylation patterns of latent KSHV genomes. Global DNA methylation patterns of KSHV genomes in PEL cells (HBL6, AP3 and BCBL1), long-term in vitro infected endothelial SLK cells (SLKp) or SLK cultures 5 days after de novo infection with KSHV (SLK-5dpi) were determined by MeDIP array analysis as described in the text. The profile observed for the positive control, consisting of a completely methylated KSHV bacmid mixed with cellular DNA, is also shown (BacM). CpG methylation values are shown on the y-axis for overlapping 250 bp sequence windows, shifted along the KSHV genome in increments of 100 bp. Methylation values of individual windows represent the mean of backgroundcorrected methylation values from all probes matching either strand of the window (see Material & Methods for details). The number of CpG

PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

dinucleotides which are present in each sequence window are shown at the top. The nucleotide positions and genome map shown at the bottom of each panel refer to the reference KSHV sequence (NC_009333). Open reading frames and repeat regions are indicated as block arrows and grey boxes, respectively. doi:10.1371/journal.ppat.1000935.g002

methylation at the former and absence of methylation at the latter loci. We selected one of the loci (labeled 1 in Figure 3A) which had shown differential methylation in PEL and SLKp cells for further analysis. Figure 3B shows an enlarged representation of the corresponding section of the KSHV genome, along with the original MeDIP array data. As shown in Figure 3C, quantitative real-time PCR amplification of an ,100 bp segment at the center of the region (indicated by the black bar labeled ‘‘qPCR’’ in Figure 3B) confirmed the overall lower degree of methylation in SLKp cells and suggested an intermediate degree of methylation in AP3 cells, which is in accordance with the array data for this position. As discussed later, we also investigated de novo infected SLK cells at 5 days post infection (SLK-5dpi), which showed very little evidence of methylation. Next, we performed a PCR amplification of bisulfite converted total DNA from all samples and subjected the amplified region to digestion with the restriction enzyme TaqI (combined bisulfite restriction analysis assay, COBRA). The recognition sequence of TaqI contains a CpG motif, and as only methylated cytosine residues are preserved during bisulfite conversion, absence of DNA methylation at the restriction site leads to TaqI resistance. As shown in Figure 3D, unmethylated bacmid DNA as well as DNA isolated from KSHV virions were completely unmethylated and hence resistant to TaqI restriction. Likewise, DNA from freshly infected SLK cells remained intact. In contrast, the amplification products from the in vitro methylated bacmid as well as BCBL1 and HBL6 were completely cleaved, in agreement with methylation of all 4 TaqI sites, and the products from AP3 cells and SLKp cells were incompletely digested, indicating an intermediate level of methylation. The latter contained a significant amount of undigested product, suggesting that the material represents a mixture of methylated and unmethylated DNA, presumably due to clonal differences in the original single cell clones. We hence determined the specific sequence of bisulfite converted DNA from two individual clones, and subjected the remaining samples to bulk sequencing. The results of this analysis are shown in Figure 3E and are in perfect accord with our COBRA analysis and MeDIP array results. Indeed, the two investigated SLKp clones showed differential methylation patterns at this particular locus, thus explaining the observed restriction patterns. We hence conclude that our array analysis accurately reflects CpG DNA methylation within the KSHV genome. As noted before, the global methylation patterns in PEL and SLKp cells were highly similar: If a locus was found to be methylated in one of the samples it tended to be methylated also in the others. Only very few loci were methylated in only one sample. Interestingly, the most prominent locus which showed differential methylation was a region encompassing nts. 70500 to 71700, which includes the promoter governing expression of ORF50/ Rta. Our analysis suggested profound methylation in the HBL6 line, but very little or no methylation in AP3, BCBL1 and SLKp cells (see Figure 2 (second panel) and enlarged depiction of the ORF50 locus in Figure 4A). This was surprising, as the ORF50 promoter has been previously reported to be abundantly methylated in BCBL1 cells [26]. To confirm our results, we performed bulk bisulfite sequencing of the region extending from the transcriptional start of ORF50 to a position approximately 1100 bp upstream (nts. 70597 to 71681) with DNA isolated from AP3, BCBL1, HBL6, SLK-5dpi and SLKp and cells (Figure 4B).

To investigate whether the observed methylation profiles were specific to PEL cell lines or a general feature of latent genomes, we next sought to analyze genomes from cells which had established stable latency after KSHV infection in vitro. While the infection of non-adherent cells (including B cells) with KSHV in vitro is very inefficient, a wide variety of adherent cells can be readily infected by incubating the cultures with supernatants from lytically induced PEL lines [38]. However, although KSHV rapidly adopts a latent expression profile in these cultures, most infected cells tend to loose the viral episomes over the following cell divisions [39]. Only a small percentage of cells ultimately succeeds in establishing stable latent episomes, which are then propagated with the same efficiency as the genomes in PEL cells. Previously, we have established the SLKp sub-line from in vitro infected SLK cells, a cell line of endothelial origin [39]. The SLKp line was generated by pooling seven KSHV-positive single cell clones which had been isolated from an infected bulk cultures at approximately 65 days post infection. SLKp cells are stably infected, carry approximately the same episome copy number as BCBL1 cells (30–40 copies/ cell), and have a strictly latent expression profile [39]. We analyzed SLKp cells which had been in continuous culture for 6 months, corresponding to a total time span of approximately 8 months after the original infection. As shown in Figure 2 (red graph), although the overall methylation levels in SLKp cells were substantially lower (reaching approximately 9.6% of positive control levels; note the differentially scaled y-axis in Figure 2), the observed profile was indeed highly similar to that seen in PEL cells, with the highest degree of similarity to BCBL1 cells (correlation coefficient 0.712, see Table 1). Taken together, these results suggest that the distinct MeDIP profiles revealed during our analysis are non-random and represent a characteristic of latent KSHV episomes. In order to confirm that the relative MeDIP values identified during our microarray-based analysis are indeed an accurate measure of CpG methylation levels, we investigated a number of loci using independent methods. First, based on our analysis of BCBL1 cells, we chose 3 loci which had registered as being strongly methylated, and another 3 for which our initial analysis had suggested the absence of DNA methylation. As shown in Figure 3A, bulk bisulfite sequencing established near-complete Table 1. Pearson correlation coefficients of DNA methylation patterns.

MeDIP CpG Frequency BacM

BCBL1

AP3

HBL6

SLKp

MeDIP BacM

0.513

BCBL1

0.324

0.427

AP3

0.263

0.407

0.608

HBL6

0.369

0.591

0.593

0.724

SLKp

0.235

0.300

0.712

0.403

0.433

SLK-5dpi

0.297

0.092

0.549

0.334

0.266

0.653

Note: correlation coefficients were calculated according to Pearson from the data shown in Figure 2. All data points are given in Dataset S1. doi:10.1371/journal.ppat.1000935.t001

PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure 3. Verification of MeDIP microarray results. Bisulfite sequencing (BS), COBRA analysis and real-time qPCR were used to confirm KSHV DNA methylation profiles at select loci. A: Three loci for which our MeDIP analysis had indicated profound methylation, and three loci which were predicted to be unmethylated were analyzed by bisulfite sequencing of BCBL1-derived DNA. The global BCBL1 MeDIP methylation profile and the location of sequenced regions are shown for reference at the top. The results of the bisulfite sequencing are shown underneath, where closed and open circles indicate methylated and unmethylated CpG motifs, respectively. The nucleotide positions indicate the position of the first and the last CpG motifs within the KSHV reference genome (NC_009333). Bâ&#x20AC;&#x201C;E: Confirmation of DNA methylation profiles at the genomic ORF23 locus in PEL cells (HBL6, AP3 and BCBL1), long-term in vitro infected endothelial SLK cells (SLKp), SLK cultures 5 days after de novo infection (SLK-5dpi), in vitro methylated or unmethylated KSHV bacmids (BacM and Bac, respectively), and virion DNA. The methylation profiles of the samples investigated by MeDIP are shown in B. Black lines indicate the regions for which COBRA analysis and bisulfite sequencing of genomic DNA, or real-time qPCR of MeDIP samples were performed. C: Real-time qPCR was performed to quantify immunoprecipitated DNA from three independent MeDIP experiments. Values were calculated as percent of the input and were normalized to an internal control consisting of methylated plasmid DNA (pCR2.1) spiked into each sample prior to MeDIP. D+E: the region indicated in B was PCR-amplified from bisulfite converted DNA and subjected to a COBRA assay (D) or bisulfite sequencing (E). Cleavage of bisulfite converted DNA at the TaqI sites indicated by arrows requires methylation of the corresponding CpG motif. The CpG profiles as shown in E were determined by bulk sequencing reactions except for the samples labeled SLKp #1 and #2, which represent two individual clones from the SLKp line. doi:10.1371/journal.ppat.1000935.g003

Additionally, we subjected the two overlapping amplification products from BCBL1, SLKp and HBL6 cultures to COBRA analysis (Figure 4C). The results clearly confirmed our MeDIP results and revealed near complete CpG methylation of the ORF50 promoter region in the HBL6 line, but no or only sporadic PLoS Pathogens | www.plospathogens.org

methylation in all other cells. We currently do not know the reason for the different BCBL1 methylation patterns detected in our study and that performed by Chen et al. [26]. It is possible that Chen and colleagues employed a different sub-clone of the BCBL1 line, or that the lines may have diverged while being cultured in the two 6

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure 4. DNA Methylation at the ORF50 promoter. A: Methylation profiles of KSHV infected cells at the ORF50 promoter (see legend to Figure 3 for abbreviations). The region investigated by COBRA and bisulfite sequencing is indicated by the black bar above and hashed lines underneath the graph. B: Results of bisulfite sequencing of genomic DNA from BCBL1, HBL6 or SLKp cells. Closed and open circles indicate methylated and unmethylated CpG motifs, respectively. Numbers above each circle indicate the position of the motif relative to the ORF50 transcriptional start. The nucleotide positions shown underneath indicate the position of the first and the last CpG motifs within the KSHV reference genome (NC_009333). The position of TaqI restriction sites in bisulfite converted DNA is indicated by arrows (conservation of the sites requires methylation of the corresponding CpG motif). The black bars labeled ‘‘Fragment I’’ and ‘‘II’’ represent the two overlapping PCR fragments which were amplified and sequenced, and which were further analyzed by COBRA as shown in C. doi:10.1371/journal.ppat.1000935.g004

labs. The methylation patterns we detected in HBL6 cells are very similar (but not identical) to those described by Chen et al., and thus an agreement between both studies is that such patterns can principally evolve in PEL cells. However, regardless of the reasons for the different findings, our results clearly show that methylation of the ORF50 promoter is not a principal requirement for the maintenance of latency in PEL cells or in vitro infected endothelial cells lines. As the majority of cells in PEL and KS tumors are latently infected with KSHV, our findings may also provide an PLoS Pathogens | www.plospathogens.org

alternative explanation for the observation that the ORF50 promoter was found to be not or only poorly methylated in the majority of clones derived from such tissues [26]. SLKp cells exhibited global methylation patterns which were near-identical to those seen in the BCBL1 line, but were characterized by a significantly lower absolute level of DNA methylation (approximately 1/5th of that seen in BCBL1 cells). This observation suggested to us that DNA methylation of KSHV episomes may progress slowly over time; hence the lower overall 7

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

[40] after host cell entry, and the deposition of histone modifications is therefore expected to represent a much more dynamic process than DNA methylation. To investigate this hypothesis, we performed chromatin immunoprecipitation (ChIP) experiments from BCBL1, SLKp cells or de novo infected SLK cultures and analyzed the precipitated DNA on our tiled microarrays using standard ChIP-on-chip protocols (see Material & Methods for details). First, we investigated the distribution of two modifications which are commonly associated with active chromatin, using antibodies which are specific for Histone H3 acetylated at lysine 9 and/or 14 (H3K9/K14-ac), or H3 molecules which are tri-methylated at lysine K4 (H3K4-me3). As shown in Figure 6 (see also Figure S2 for a more detailed view), in BCBL1 as well as SLKp cells we observed global modification patterns which were highly similar when comparing any pairwise combination of either histone modification or cell line, with correlation coefficients ranging from 0.709 to 0.894 (Table 2). Furthermore, investigation of H3K4-me3 patterns in SLK-5dpi cultures showed that these patterns were indeed already fully established 5 days after de novo infection. Comparison with the previously observed CpG methylation patterns revealed a marked negative correlation between these histone modifications and DNA methylation, as most of the regions which had been found to be poorly methylated in BCBL1 or SLKp cells compared to the positive control showed abundant deposition of active histone marks. In accordance with the overall higher degree of DNA methylation, this negative correlation was most obvious in BCBL1 cells (Pearson correlation coefficient = 20.530, see Table 2), but could also be clearly observed in SLKp cells (correlation coefficient = 20.263). Interestingly, while the highest density of CpG motifs within the KSHV genome is found at the terminal repeats (TR, see rightmost region of the KSHV map), this is also the region which showed the highest levels of H3K9/K14-ac and H3K4-me3 enrichment. The latter is in agreement with the observation that, in spite of the high

extend of methylation would be a result of the comparatively short period of time (approx. 8 months, see above) that has elapsed since the SLKp cells were originally infected. We therefore analyzed SLK cultures which had been freshly infected with KSHV. We choose a time point of 5 days post-infection for our analysis; at this time point, the cultures have adopted latent expression patterns and sporadic lytic cells are found only at very low frequency (,0.01%) [38,39]. Quantitative RT-PCR (Figure 5A) and immunofluorescence analysis (Figure 5B) confirmed efficient infection and absence of lytic gene expression (note that the relatively high basal levels of lytic ORF50 and ORF59 transcripts in latent BCBL1 cultures (Figure 5A, top panel) can be significantly upregulated by lytic cycle induction (bottom panel); they stem from the small number (approximately 0.3%, see Figure S4) of spontaneously reactivating cells present in uninduced BCBL1 cultures). Indeed, our array-based MeDIP analysis of global DNA methylation patterns of SLK-5dpi cultures revealed very little DNA methylation at this early time point of infection (see graphs labeled SLK-5dpi in Figures 2, 3B and 4A), reaching, on average, less than 1% of the levels observed for the positive control. DNA methylation was also virtually absent from the ORF50 promoter, in spite of the fact that the infected cultures had established a strictly latent infection. These findings support our hypothesis that, although DNA methylation may reinforce latent gene expression patterns at late timepoints of infection, ORF50 promoter methylation is principally not required to abolish or prevent Rta expression during KSHV latency.

Histone modification patterns of latent KSHV episomes Given the absence of DNA methylation in our SLK-5dpi samples, we deemed it unlikely that this modification governs early KSHV latency expression patterns, and hypothesized that such patterns might rather be governed by histone modifications. Herpesvirus genomes are known to become rapidly chromatinized

Figure 5. Latent KSHV expression patterns of SLKP and de novo infected SLK cells. A: Expression of select latent (ORF71, ORF73) and lytic (ORF50, ORF59) transcripts was analyzed by quantitative RT-PCR in BCBL1 cells, long-term infected SLKP cells and de novo infected SLK cultures at day 5 post infection (SLK-5dpi). Levels were normalized to represent expression relative to ORF73, which is expressed during the latent as well as the lytic cycle. Compared to BCBL1, SLK-5dpi cells show little expression of lytic antigens, and expression was undetectable in SLKp cells. The detection of lytic transcripts in latent BCBL1 cultures is due to the low percentage (less than 1%) of cells which undergo spontaneous lytic reactivation. The percentage of lytic cells and thus transcript levels can be increased by treatment such as sodium butyrate (lower panel). Spontaneously reactivating cells are completely absent from SLKp cells, which is in accordance with the lack of detectable lytic gene expression. B: Immunofluorescence staining of SLK5dpi cultures for LANA, the product of ORF73. DAPI staining is shown in the lower panel. More than 90% of the cells tested positive for LANA, while expression of the lytic DNA polymerase processivity factor encoded by ORF59 could be detected in less than 0.01% of cells (compare also left column in Figure 9F). doi:10.1371/journal.ppat.1000935.g005

PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure 6. Global patterns of H3K9/K14 Acetylation and H3K4 tri-methylation on latent KSHV genomes. Global patterns of H3K9/K14 Acetylation (H3K9/K14-ac) of KSHV genomes in BCBL1 and SLKp cells, as well as H3K4 tri-methylation (H3K4-me3) patterns in BCBL1, SLKp and SLK cultures at 5 days post infection (SLK-5dpi) were analyzed by ChIP-on-chip assays as described in the text. Values shown on the y-axis represent relative enrichment of normalized signals from the immunoprecipitated material over input, calculated for overlapping sequence windows of 250 bp by averaging the values from all matching probes, as described in the legend to Figure 1 and the Material & Methods section. See legend to Figure 1 for explanation of map elements displayed at the bottom of each panel. doi:10.1371/journal.ppat.1000935.g006

DNA methylation over the ensuing cell divisions, ultimately leading to the establishment of the global methylation patterns as shown in Figure 2. While it may account for the evolution of the observed DNA methylation patterns, the distribution of H3K9/K14-ac and H3K4-me3 modifications provides no immediate explanation for

number of potential methylation sites, MeDIP signals were absent from this region (Figure 2). In fact, in bulk bisulfite sequencing reactions, we were unable to identify any DNA methylation within the terminal repeats in SLKp or BCBL1 cells (data not shown). Hence, our data indicate that local deposition of active histone marks early during KSHV infection prevents the acquisition of PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Table 2. Pearson correlation coefficients of DNA methylation and histone modification patterns.

MeDIP BCBL1

SLKp

SLK-5dpi

BCBL1

20.539

20.321

20.292

SLKp

20.384

20.246

20.189

H3K9/K14-ac

H3K4-me3

BCBL1

SLKp

H3K27-me3 SLK-5dpi

BCBL1

SLKp

H3K9-me3 SLK-5dpi

BCBL1

SLKp

H3K9/K14-ac

0.848

H3K4-me3 BCBL1

20.530

20.341

20.262

0.894

0.801

SLKp

20.263

20.180

20.095

0.709

0.824

0.772

SLK-5dpi

20.291

20.119

20.079

0.660

0.822

0.676

0.794

H3K27-me3 BCBL1

0.107

0.016

20.013

20.459

20.565

20.450

20.428

20.528

SLKp

0.458

0.210

0.141

20.430

20.508

20.376

20.406

20.482

0.619

SLK-5dpi

0.414

0.233

0.183

20.501

20.613

20.449

20.415

20.477

0.593

0.812

H3K9-me3 BCBL1

0.663

0.413

0.303

20.379

20.278

20.267

20.131

20.203

20.093

0.412

0.331

SLKp

0.406

0.397

0.230

20.263

20.206

20.151

20.069

20.095

20.055

0.218

0.215

0.615

SLK-5dpi

0.081

0.107

0.197

20.061

20.030

0.024

0.123

0.269

20.009

0.078

0.302

0.130

0.185

Note: correlation coefficients were calculated according to Pearson from the data shown in Figures 2, 6 and 7. All data points are given in Dataset S1. doi:10.1371/journal.ppat.1000935.t002

the establishment of latent expression profiles, as these marks were present on many loci which are transcriptionally inactive during latency. Notably, this also includes the ORF50 promoter, a finding which is in accordance with the absence of DNA methylation at this location in BCBL1 as well as SLKp cells. We therefore reasoned that latency may be determined by the presence of repressive marks rather than the absence of activating ones. Therefore, we analyzed two modifications commonly associated with silent chromatin: Tri-methylation of lysine 9 of histone H3 (H3K9-me3), which is a hallmark of constitutive heterochromatin, and tri-methylation of lysine 27 (H3K27-me3), a modification which is typically seen in facultative heterochromatin. As shown in Figure 7 (lower graphs in each panel; see also Figure S3 for a more detailed view), in both BCBL1 and SLKp cells the H3K9-me3 modification was mainly restricted to two consecutive regions of the viral genome, spanning approximately nts. 33000 to 46000 (ORF19-ORF25) and 100400 to 114400 (ORF64- ORF67). Both of these regions had shown relative poor occupancy with acetylated histones in our previous assay (see Figure 6), which is in agreement with the fact that these modifications in general are mutually exclusive. While the H3K9-me3 modification was most prominently detected in BCBL1 cells, SLKp cells did display a markedly less distinct pattern, and the modification was barely detectable in SLK-5dpi cultures. The ORF50 promoter was devoid of trimethylated H3K9 in all samples. Hence, H3K9-me3 is unlikely to be a major regulator of latent gene expression, at least not in the early phase of infection when latency is first established. Our findings are in agreement with a previous study that had investigated a number of select loci in ChIP experiments, and had found little to no H3K9 methylation at any of them [41]. Next, we analyzed the distribution of the H3K27-me3 modification across the viral genome. H3K27 tri-methylation is carried out by EZH2, the enzymatic subunit of the polycomb PRC2 complex, leading to the recruitment of polycomb PRC1 complexes and thus gene silencing [29,30,31,42,43,44]. Trimethylation of H3K27 has been shown to play important roles in developmental and differentiation processes, cell cycle regulaPLoS Pathogens | www.plospathogens.org

tion, mammalian X chromosome inactivation, stem cell identity and cancer [31]. One characteristic of H3K27 methylation is that, in contrast to H3K9-me3, it can occupy promoters concurrently with activating modifications, specifically H3K4-me3 or H3K9-ac. Such regions are termed ‘‘bivalent’’ domains and have been found to be specifically enriched in embryonic stem cells, where they often occupy promoters which encode key factors involved in developmental regulation [42]. The presence of H3K27 methylation keeps these promoters silent in undifferentiated cells, but the chromatin remains in a ‘‘poised’’ state due to the simultaneous presence of activating marks. Decreasing levels of H3K27-me3 during the onset of differentiation allows such promoters to rapidly revert to an active state, hence further commiting the cell to terminal differentiation [42]. As shown at the top of each panel in Figure 7, our analysis revealed that latent KSHV episomes in the BCBL1 and SLKp lines as well as SLK-5dpi cells were subject to abundant H3K27 tri-methylation. The modification was detected virtually across the complete genome, although most regions which had been found enriched in H3K9/K14-ac and H3K4-me3 modifications tended to be tri-methylated at H3K27 to a lesser extend (compare with Figure 6, see also correlation coefficients in Table 1). A number of loci, however, displayed the hallmarks of bivalent chromatin, i.e. simultaneous presence H3K27-me3 and activating marks. Interestingly, the ORF50 promoter featured prominently among these regions, whereas the major latency promoter upstream of ORF73 showed very little or no H3K27 trimethylation in all three samples. Importantly, in contrast to H3K9-me3, the H3K27-me3 patterns were already present 5 days after de novo infection of SLK cultures. This also includes the ORF50 promoter, and our data thus suggest that a poised state of repression is imposed upon the ORF50 promoter early during the establishment of latency. Interestingly, two studies have recently found this modification to be present on herpes simplex virus genomes during latent infection in dorsal root or trigeminal ganglia [45,46]. Although only a small number of select promoters were investigated, this may indicate a general role for this modification during herpesvirus latency. 10

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure 7. Global patterns of H3K27 and H3K9 tri-methylation on latent KSHV genomes. Global patterns of histone H3 tri-methylated at lysine 27 (H3K27-me3) or 9 (H3K9-me3) on KSHV genomes in BCBL1, SLKp cells as well as SLK cultures at 5 days post infection (SLK-5dpi) were analyzed by performing ChIP-on-chip assays as described in the text. Values shown on the y-axis represent relative enrichment of normalized signals from immunoprecipitated material over input, calculated for overlapping sequence windows of 250 bp by averaging the values from all matching probes, as described in the legend to Figure 1 and the Material & Methods section. See legend to Figure 1 for explanation of map elements displayed at the bottom of each panel. doi:10.1371/journal.ppat.1000935.g007

PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

As all of the KSHV-infected lines investigated here contain multiple copies of the viral episome, it may appear possible that the simultaneous detection of activating marks and the H3K27me3 modification could be due to the existence of distinct, but separate episome populations. We think this is very unlikely, as we have observed the same patterns in BCBL1 cells and in vitro infected SLK cells. Thus, if distinct populations exist, they would have to be re-established in the exact same stoichiometry upon a de novo infection. However, to also directly investigate the presence of bivalent histone modifications, we have performed a sequential ChIP from BCBL1 cells, followed by qPCR amplification of two regions within the ORF50 promoter. As controls, we investigated the latent promoter upstream of ORF73 (which exclusively carries activating marks) and a region within the coding region of ORF21 (which is subject to the H3K27-me3 modification, but is devoid of H3K9/K14-ac and H3K4-me3 marks). The location of the amplified regions and their histone modification profiles are depicted in Figure 8A. As shown in Figure 8B, when the first round of immunoprecipitation was carried out with an antibody specific for H3K9/K14-ac, the sequential ChIP using a H3K27me3-specific antibody recovered material only from the ORF50 promoter, but neither of the two control regions. To confirm these results in the reverse direction, we also performed the first round of immunoprecipitation using the H3K27-me3-specific antibody and used a H3K4-me3 antibody for the second immunoprecipitation to probe for the presence of activating marks (Figure 8C). Again, while the ORF21 region was recovered in the first round of ChIP, only the ORF50-specific sequences registered in both immunoprecipitation experiments, thus demonstrating bivalent modification of this promoter. If H3K27-me3 contributes to the silencing of the ORF50 promoter, these marks should also diminish upon lytic cycle induction. We therefore monitored the levels of H3K27-me3 during reactivation from latency. Indeed, sodium butyrate treatment of BCBL1 cells resulted in a progressive loss of H3K27-me3 at the ORF50 promoter, with a reduction to approximately 50%, 20% and 5% of the original levels after 24, 48 and 72h of treatment, respectively (Figure 8D). While this observation suggested efficient removal of H3K27-me3, the magnitude of the effect at 48h and 72h post induction was surprising, given that the treatment only reactivates about 20% of all cells in the cultures (as judged by staining for the late gene product ORF59). A possible explanation for this observation is that the rapidly increasing numbers of replication products (which are epigenetically naive) exaggerate the effect at late time points, as they will lead to a relative increase in the percentage of unmodified episomes within the cultures. However, the 24h time point precedes the replication phase and accumulation of newly synthesized/packaged genomes thus cannot be responsible for the H3K27-me3 decrease observed early after induction. To further substantiate this assumption, we also monitored H3K9/ K14-ac levels at the ORF50 promoter. We reasoned that, if the above is correct, prior to the onset of DNA replication we should first see an increase of the histone acetylation levels, followed by a decline as more and more replicated genomes accumulate. As shown in Figure 8D, this is precisely what we observed. Thus, upon lytic cycle induction, a reduction of H3K27-me3 and an increase of H3K9/K14-ac levels occur simultaneously at the ORF50 promoter and precede the DNA replication phase. In order to investigate whether a reduction of H3K27-me3 also results in an increase of the number of lytically reactivated cells in the absence of chemical inducers, we next generated BCBL1 and SLK cells which were stably transduced with a retrovirus that expresses the H3K27-me3-specific demethylase JMJD3 [47]. After PLoS Pathogens | www.plospathogens.org

antibiotic selection of the cultures for 12 days, the SLK cells were additionally infected with KSHV and analyzed 5 days later. In both lines, while the ectopically expressed JMJD3 protein was barely detectable on western blots (data not shown), we nevertheless observed a reduction of total cellular H3K27-me3 levels of at least 50% (Figure 9A). The reduction was less pronounced on the ORF50 promoter, which still exhibited about 70% and 80% of the H3K27-me3 levels seen in the vector controls of BCBL1 and SLK-5dpi cells, respectively (Figure 9B). However, as shown in Figure 9C, despite of the moderate degree of this reduction, both cultures showed a marked increase in the overall levels of ORF50 transcription, which reached approximately twice the values as in the control cultures. When we stained the JMJD3transduced BCBL1 cultures for the late gene product ORF59, (Figure 9D) we furthermore observed a twofold increase in the percentage of spontaneously reactivated cells in the JMJD3transduced BCBL1 cells (,0.6%, compared to 0.3% in the control cultures). In addition, the JMJD3-transduced cells were also more responsive to lytic cycle induction by sodium butyrate treatment, resulting in the reactivation of 30% of the cultures (compared to approximately 20% in the control cultures; Figure 9E). In comparison to PEL lines, SLK cells as well as most other de novo infected adherent cell lines exhibit an extremely low percentage of spontaneously reactivated cells [38,39]. Such cells do also not respond to chemical inducers which reactivate PEL cells (e.g. phorbol esters), although the lytic cycle can be induced by ectopic Rta/ORF50 overexpression [38]. Their low frequency notwithstanding, in addition to the elevated ORF50 transcript levels we also observed an approximately threefold increase of the number of spontaneously reactivated cells in JMJD3-transduced SLK-5dpi cultures (Figure 9E). Taken together, the above data suggest an important role for the H3K27-me3 histone modification during latent infection with KSHV. While our analysis has revealed highly distinct patterns of DNA and histone modifications, the question remains which factors determine such patterns. The molecular requirements for the recruitment of PRC2/EZH2 methyltransferase complexes in mammals are poorly understood, and so far no simple sequence motifs recognized by these complexes have been described. The results from our early infected SLK cultures would seem to indicate that PRC2/EZH2 complexes are recruited to KSHV genomes in a more global fashion. However, it is also possible that the modification is initially established at a small number of loci and rapidly spreads to neighboring regions during the earliest stages of a latent infection. Whatever the mode of deposition, loci with lower levels of H3K27-me3 are ultimately confined to those regions which carry activating H3K9/K14-ac and H3K4-me3 marks which, clearly, are present not only on latency promoters. Two independent studies have noted the presence of such marks in latent cultures before, using ChIP in conjunction with either PCR for a number of select loci, or a global promoter microarray [24,41]. Overall, the data from our high-resolution tiling arrays are in very good agreement with the findings reported in both studies. However, while Ellison and colleagues hypothesized that the detection of these marks may have been due to a low percentage of cells which undergo spontaneous reactivation, our study clearly shows that they are a hallmark of latent episomes: First and foremost, the patterns were not only detected in BCBL1 cells, but also in SLKp cultures, which are strictly latent and do not harbor any spontaneously reactivated cells. Second, for a low percentage of reactivated cells to leave a prominent footprint in the histone modification profile of the total population, one has to assume that they contain a disproportional high number of episomes which carry the lytic marks. Given the high copy number 12

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure 8. Bivalent histone modification patterns at the ORF50 promoter are reversed upon induction of the lytic cycle. A: Profiles of H3K9/K14-ac (blue), H3K4-me3 (red) and H3K27-me3 (black) histone modifications at the ORF21 (left), ORF50 (center) and ORF73 (right) loci of BCBL1 cells. Black bars indicate the location of regions amplified by quantitative PCR in the sequential ChIP and lytic reactivation experiments shown in Bâ&#x20AC;&#x201C;D. B, C: Sequential ChIP experiments carried out with antibodies directed against H3K9/K14-ac and H3K27-me3 during the first and second rounds of immunoprecipitation, respectively (B), or with antibodies against H3K27-me3 during the first ChIP, followed by H3K4-me3 specific antibodies for the second immunoprecipitation (C). For the first as well as the second round of immunoprecipitation, numbers on the y-axis indicate the percentage of recovered material relative to the total starting material (i.e., the amount of DNA which was used as the input during the first ChIP). D: Reversal of H3K27-me3 marks at the ORF50 promoter upon lytic reactivation. BCBL1 cells were treated with 0.3mM sodium butyrate to induce the lytic cycle. ChIP experiments were performed at the indicated time points to monitor changes in H3K27-me3 and H3K9/K14-ac modification patterns, using quantitative PCR with primers specific for the p50 2800 region as shown in A. doi:10.1371/journal.ppat.1000935.g008

methylation patterns observed in our study show a marked negative correlation with the activating histone marks thus strongly argues for their presence on latent episomes. So what signals trigger the initial recruitment of activating marks at the earliest timepoints of infection? While this is, ultimately, an issue which will have to be resolved in future studies,

of de novo replicated genomes in reactivated cells, this seems a reasonable assumption (provided that the lytically replicated genomes inherit the parental modification patterns, and that histones are removed only immediately before packaging). However, this does not apply to DNA methylation (which is absent from replicated virion DNA). The fact that the global CpG PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

at these sites. The onset of widespread H3K27 methylation then may lead to the silencing of the ORF50 promoter, establishing a poised state of repression which can be easily reverted upon reactivation. However, while this model may explain the initial deposition of activating marks, it does not provide a satisfactory explanation for their maintenance during the later stages of viral latency. So far, there is very little evidence for the propagation of activating histone marks through cell divisions; rather, it is thought that their preservation requires continuous transcriptional initiation. In contrast to DNA methylation or polycomb-associated H3K27-me3 marks, H3K9/K14-ac and H3K4-me3 modifications are therefore not considered inheritable (and, therefore in a strict sense also do not represent epigenetic modifications). Thus, even if Rta is indeed responsible for the initial establishment of H3K27me3 and H3K9/K14-ac marks, due to its rapid eradication upon establishment of latent expression patterns it cannot be responsible for their long-term maintenance. One possible explanation is that these loci represent preferred binding sites not only for Rta, but also for constitutively expressed host transcription factors. In this scenario, host factors could sustain the poised state of repression at H3K27-me3 enriched promoters, but additional stimuli would be required to return them to an active state. There are, however, also a few loci which are rich in H3K9/K14-ac and H3K4-me3 marks, but display very little H3K27 tri-methylation or DNA methylation. These include not only the constitutively active latency promoter upstream of ORF73 as well as the locus encoding the KSHV miRNA-cluster, but also ORFs K5/K6/K7/ nut-1, the region upstream of three of the four vIRFs (vIRF1/K9, vIRF-3 and vIRF-4) and the complete K15 gene region at the right end of the viral genome. vIRF-3 (also termed latencyassociated antigen 2, LANA2) is known to be expressed in latent PEL cells [50] and K15 (which encodes the latency associated membrane protein/LAMP) has been originally identified as a latently expressed gene, although its expression is significantly upregulated during the lytic cycle [51]. Since the region immediately upstream of the K1 gene at the left side of the KSHV genome as well as the terminal repeats (shown only to the right of the map in Figures 2, 5 and 6; as the episome is circular they however also flank the left terminus of the genome) are also highly enriched in activating marks but display very little H3K27me3, our data also support a previous report of K1 expression in latently infected cells [52]. However, as K1 transcription is strongly upregulated in lytic cells and LANA has been found to repress K1 gene expression [53], whether K1 is indeed expressed at significant levels during latency is currently unclear. Interestingly, the region upstream of the K2 gene also displayed a high ratio of activating vs. repressive marks in BCBL1 cells (which is less pronounced in SLKp or SLK-5dpi cells). K2 encodes v-IL6, a viral homologue of IL6 which supports B cell growth and blocks interferon responses [54,55]. While v-IL6 has been reported to be expressed in latently infected PEL cells [54,56], similar to K1 its expression is strongly upregulated by lytic cycle induction and it thus remains controversial whether it represents a latent gene. Although latent transcripts of unknown promoter origin have also been identified in the broader region encompassing ORFs K4 to K7 [57], most of the remaining genes so far have not emerged as being latently expressed in experimental systems [58]. If continuous transcriptional initiation is required to maintain H3K9/K14-ac and H3K4-me3 marks, then additional factors may exist which stall RNA polymerase II at these loci. Alternatively, it is possible that above genes are transcribed during latency only at low level, or that their mRNAs are rapidly turned over such that they do not accumulate. Transactivation by Rta as well as transcript stabilization by other lytic gene products (e.g. the

Figure 9. Consequences of JMJD3 expression in BCBL1 and de novo infected SLK cells. A: Reduction of global H3K27-me3 levels in BCBL1 (left) or SLK cells (right) after 2 weeks of transduction with a JMJD3-expressing retrovirus (right lanes in each panel) or with an empty control virus (left lanes). Western blots were simultaneously stained with antibodies specific for H3K27-me3 as well as actin. Bâ&#x20AC;&#x201C;F: Analysis of JMJD3-transduced BCBL1 cells, as well as JMJD3-transduced SLK cells after 5 days of infection with KSHV. B: H3K27-me3 status of the ORF50 promoter, as judged by ChIP analysis followed by real-time qPCR with primers amplifying the p50 2800 region shown in Figure 8A. Values are shown as relative levels in JMJD3-transduced compared to the control cells, which were set to 100%. C: ORF50 transcription as judged by quantitative PCR. Values are given as fold transcript levels in JMJD3-transduced cells compared to control cultures (set to 1). D and F: Percentage of spontaneously reactivating cells (as judged by immunofluorescence staining for the product of ORF59) in JMJD3 expressing BCBL1 (D) or SLK (F) cells, or the corresponding control cultures. E: Percentage of ORF59 positive cells after induction of BCBL1 cells with sodium butyrate for 72h. doi:10.1371/journal.ppat.1000935.g009

when comparing our data with those of two recent studies which have performed genome-wide screens for Rta binding sites [24,48], we noticed that a surprising number of sites mapped within or very close to the loci which were found to be enriched for activating histone marks during our investigation (see Figure S5). Interestingly, Rta is known to be expressed for a brief period of time within the first few hours of a de novo infection, before latency ensues [49]. Therefore, one attractive hypothesis is that binding of Rta may trigger the initial modification of H3K9/K14 and H3K4 PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

product of ORF57 ) may then allow efficient expression of these genes once the viral genome is committed to productive replication. Taken together, our data thus provide a rationale for the observation that some of the above genes have been found to be expressed at low level in latent cultures. Importantly, they may also help to understand how the host environment may modulate the latent gene expression program. For example, IFN-a treatment of PEL cells has been shown to result in the transactivation of the K2 promoter via IFN-stimulated response element (ISRE) sequences [55]. It has been suggested that this enables the virus to sense innate immune responses and modify its gene expression in order to block them, a model which is strongly supported by the observation that the K2 promoter appears to be already primed for expression in latently infected PEL cells. Finally, there is also the question of the role of the profound DNA methylation which occurs at later stages of viral latency. It appears likely that these patterns are established as a consequence of the continuous presence of EZH2/PRC2 repressor complexes on viral DNA, as such complexes have been shown to directly recruit DNA methyltransferases (DNMTs) [59]. The absence of DNA methylation at loci which are devoid of H3K27-me3 would support this conclusion. Another contributing factor could be the delayed appearance of constitutive heterochromatin marks: While H3K9-me3 is restricted to a few regions, it may nevertheless support the recruitment of DNMTs to the viral genome (this may be especially the case at the ORF64 locus, which displays the highest levels of H3K9-me3 but only moderate levels of H3K27me3). What are the functional consequences of DNA methylation? Currently, this is a question that is difficult to answer. Based on the fact that SLK cells establish latent expression patterns in the absence of DNA methylation, and that BCBL1 and AP3 cells maintain latency in spite of the lack of DNA methylation at the ORF50 promoter, one may be tempted to think that this epigenetic mark is of no fundamental importance during KSHV latency. However, this is a conclusion which cannot be drawn. Compared with SLKp cells, de novo infected SLK-5dpi cells indeed display elevated levels of lytic gene expression and a higher number of spontaneously reactivated cells (Figs. 5 and 9). However, the generally low transcript levels together with the scarcity of lytic cells even in SLK-5dpi cultures complicate any general conclusion. Likewise, as the cells do not respond to chemical treatment with reagents that induce the lytic cycle in PEL cells, comparative studies of reactivation are difficult to perform. Studies employing Rta overexpression (which can reactivate such cells [38]) may be feasible, but they would be of limited value as the ectopic expression would artificially override one of the most critical steps of lytic reactivation. With regard to PEL cells, more lines will have to be studied to conclude whether absence or presence of methylation marks at the ORF50 promoter has an impact on the percentage of spontaneously reactivated cells and/or the response of such lines to chemical agents which induce the lytic cycle. At present, although such differences certainly exist between many PEL lines, this could in large part or entirely be a consequence of host cell differences. Lastly, even if DNA methylation should turn out to have no significant additive effect over the presence of repressive histone marks in vitro, this may be fundamentally different in vivo. Although the physiological triggers which reactivate in vivo latency reservoirs (e.g. memory B cells) are poorly understood, they are very likely to be much more specific than the broad pleiotropic effects induced by chemical agents such as phorbol esters or sodium butyrate. It is thus very conceivable that DNA methylation may represent an additional, functionally important block which augments repressive histone marks and reinforces latent expression patterns during long-term latency in vivo, PLoS Pathogens | www.plospathogens.org

but which may be of lesser consequence in in vitro models of viral infection. Considering all of the above, many questions remain to be answered before the molecular mechanisms which govern establishment and maintenance of KSHV latency are fully understood. However, especially given the unexpected spatial and temporal patterns of histone modifications and DNA methylation revealed by our study, the data presented here provide important clues as to the host and viral factors which might be at work, and should greatly help to design further studies aimed at elucidating the role of epigenetic modifications during this crucial phase of the viral lifecycle.

Materials and Methods Cell culture and de novo KSHV infection The establishment of SLKp cells has been described before [39]. Briefly, endothelial SLK cells [60] were infected with KSHV in vitro and passaged for several weeks. Seven KSHV-positive single cell clones were selected from the long-term infected cultures and pooled to form the SLKp line. SLKp cells and the parental SLK line were cultured in DMEM supplemented with 10% fetal calf serum and penicillin-streptomycin (5 mg/ml). The KSHVpositive PEL cell lines BCBL1 [6], HBL6 [61] and AP3 [62] were cultured in RPMI 1640 medium (Invitrogen) supplemented with 10% fetal calf serum and penicillin-streptomycin at a final concentration of 5 mg/ml. Concentrated supernatants of infectious KSHV virions were harvested from lytically induced BCBL1 cells as described [39]. De novo infection of SLK cells was performed by incubating 26105 cells at 70% confluency for 2 hrs with 500 ml virus supernatant at a concentration of 16108 KSHV genome equivalents per ml (as determined by quantitative PCR) in the presence of 8 mg/ml polybrene in EGM-2 medium (Lonza). Generally, more than 95% of cells were infected, as judged by immunofluorescence analysis for LANA 48h after infection. For lytic reactivation of BCBL1 cells, sodium butyrate was added to the culture medium at a final concentration of 0.3 mM.

Immunofluorescence and western blot analysis Cells were fixated with 4% paraformaldehyde in PBS for 15 min, permeabilized with 2% Triton X-100 in PBS for 10 min, blocked with 3% BSA in PBS and incubated with primary antibodies specific for LANA or ORF59 (Advanced Biotechnologies: #13-211-100) in blocking solution for 2 hrs. Cells were washed three times with PBS and incubated with secondary antibodies (Alexa Fluor-555 goat anti mouse and 2488 goat anti rabbit) for another 2 hrs and analyzed by fluorescence microscopy. Western blot analysis of total cell lysates was carried out by standard SDS-PAGE and immunoblot protocols, using antibodies directed against histone tri-methylated at lysine 27 (Upstate: #07449) or, as a loading control, actin (Santa Cruz: #SC-8432).

Retroviral expression of the H3K27 specific demethylase JMJD3 A retroviral JMJD3 expression construct was kindly provided by Paul Khavari [47]. The retroviral backbone MSCV (Clontech) was used as a negative control. Supernatants containing infectious viral particles were harvested 48 hrs post transfection of PhoenixGP cells (Nolan Laboratory, http://www.stanford.edu/ group/nolan/). BCBL1 and SLK cells were transduced with recombinant retroviruses by spin inoculation at 3006g for 1 h, using undiluted supernatants in the presence of 8 mg/ml polybrene. After inoculation, cultures were maintained in medium 15

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

For preparation of input controls, 1/4th of the amount of chromatin used in the immunoprecipitation reactions was employed. Input samples were treated in an identical manner as the immunoprecipitated samples, starting with the de-crosslinking step. Both samples were subsequently subjected to whole genome amplification and labeling using a linker mediated PCR protocol (Agilent Mammalian ChIP-on-chip protocol V10.0, May 2008), followed by microarray hybridization.

containing 2 mg/ml puromycin for 12 days to select for transduced cells.

Analysis of CpG methylation by bisulfite sequencing and COBRA Bisulfite sequencing was performed using the EpiTect Bisulfite Kit (Qiagen), following the manufacturer’s instructions. The method relies on a chemical reaction that leads to the conversion of all unmethylated cytosine residues to thymidines, allowing the identification of originally methylated cytosines after PCR amplification and sequencing of the locus of interest. The sequences of all bisulfite sequencing primers employed in this study are given in Table S1. PCR products were sequenced directly (bulk sequencing) using either the forward or reverse primer from the original amplification. CpG methylation patterns were extracted from the bulk sequencing data using the BiQ Analyzer v2.0 software [63]. A combined bisulfite restriction analysis, short COBRA assay, has been described before [64]. Briefly, PCR products from bisulfite treated samples were digested with the restriction enzyme TaqI (Fermentas) and resolved on an agarose gel (3%). TaqI recognizes the nucleotide sequence TCGA, which contains a CpG dinucleotide. After bisulfite conversion, the site is only preserved if the original CpG motif was methylated (note that the bisulfite conversion creates additional TaqI sites at methylated CpG motifs which are flanked by C and A residues, as the C in position 21 is converted to a T by the bisulfite reaction).

Sequential ChIP assay Colocalization of bivalent histone marks was measured by use of a sequential ChIP assay. Prior to the first IP antibodies were incubated with protein-A agarose beads (Upstate) for 2 hrs at 4uC. Antibody bead complexes were washed twice with 0.2 M triethanolamine buffer (Sigma). Beads and antibodies were coupled covalently by incubation with 20 mM dimethyl pimelimidate dihydrochloride (DMP, Sigma) in 0.2 M triethanolamine buffer on a rotating wheel for 30 min at RT, and the reaction was stopped by washing with 50 mM Tris-HCl (pH 7.5). Uncoupled antibodies were removed by pre-elution with 0.1 M acetic acid (pH 3.0) for 5 min at RT. Beads were incubated with diluted chromatin samples for 16 hrs at 4uC. Washing and elution was performed in a identical manner as described above for the standard ChIP assay. 1/16th of the precipitated chromatin was decrosslinked and was used to determine the efficiency of the first IP by real-time qPCR. The remainder was employed as the input for the second IP, which again was performed according to the standard ChIP protocol. Results were calculated as percent of the original input, i.e. the total amount of DNA which was subjected to the first round of immunoprecipitation.

Isolation, reverse transcription and PCR quantitation of RNA RNA was isolated using the RNA-Bee (Tel-Test, Inc.) reagent. Contaminating DNA was removed by incubation with amplification grade DNase I (Invitrogen) and cDNA was prepared from random-primed RNA using Superscript III (Invitrogen) as per the manufacturer’s instructions. Real-time quantitative PCR (qPCR) of cDNA or genomic DNA samples was performed using SensiMix SYBR Kit (Quantace) on a Rotorgene 6000 light cycler (Corbett Life Science). For quantitation, standard curves were created using dilutions of genomic BCBL1 DNA over a range of at least 100006. The sequences of all primer pairs used in this study are given in Table S1.

Methylated DNA Immunoprecipitation assay (MeDIP) MeDIP analysis was essentially performed as described before [35,66,67,68], with some modifications. A detailed protocol is given in Protocol S2. Briefly, highly pure genomic DNA served as an input in the MeDIP procedure. Negative and positive controls were prepared by mixing genomic DNA from KSHV-negative SLK cells with unmethylated or in vitro methylated KSHV bacmid DNA [36], respectively. The ratio of viral vs. cellular DNA was selected such that it mimics the episome content typically seen in KSHV-infected PEL cell lines and the SLKp line (approx. 30–40 copies per cell). All DNA samples were sonicated to an average fragment size of 100–500 bp using a Bioruptor (Diagenode). In order to allow quantification and normalization of the data, a constant amount (0.2 ng) of in vitro methylated pCR2.1 plasmid was added per 5 mg of the sheared DNA. 1 mg of the sample was set aside as an input control, and the remainder was subjected to immunoprecipitation using 2.5 mg of a 59-methylcytidine specific antibody (MAb-5MECYT-100, Diagenode). The precipitated immunocomplexes were harvested using Dynabeads M-280 Sheep anti-Mouse IgG (Invitrogen). After washing, DNA was eluted and purified by phenol-chloroform extraction and ethanol precipitation. Input control samples were treated identical to the IP samples, starting with the ethanol precipitation step. The samples were subsequently analyzed by qPCR and/or microarray hybridization.

Chromatin Immunoprecipitation assay (ChIP) ChIP analysis was performed as described by Si et al. [65] and recommended by the array manufacturer (Agilent Mammalian ChIP-on-chip protocol V10.0, May 2008), with some modifications. A detailed protocol of the procedure is given in Protocol S1. Briefly, chromatin from 56106 to 26107 cells was cross-linked with 1% formaldehyde. After quenching of the reaction by the addition of glycine, cells were lysed and nuclei were isolated by centrifugation. Chromatin was extracted from the isolated nuclei and fragmented by sonication using a Bioruptor (Diagenode) to an average length of 100–500 bp. A portion of the total chromatin sample was set aside for the later preparation of input controls. Material from 16106 cells was pre-cleared with salmon-sperm DNA protein-A agarose beads (Upstate) to reduce non-specific background and subjected to immunoprecipitation using 2 to 10 mg of antibodies specific for the histone modifications H3K9/ K14-Ac (Upstate: #06-599), H3K4-me3 (Upstate: #04-745), H3K9-me3 (Upstate: #17-625) or H3K27-me3 (Upstate: #07449). After incubation for 16 hrs, chromatin-immunocomplexes were precipitated by the addition of protein-A agarose, washed, eluted and de-crosslinked overnight at 65uC. DNA was purified by phenol-chloroform extraction and ethanol precipitation. PLoS Pathogens | www.plospathogens.org

In vitro methylation of DNA For control and normalization purposes, we prepared in vitro methylated DNA from a bacmid containing the complete KSHV genome [36] or the pCR2.1 vector (Invitrogen). DNA was methylated by incubating 15 mg of DNA with 40 units of the CpG methyltransferase M.SssI (NEB) for 2 hrs in 16 NEBuffer 2 containing 160 mM S-adenosylmethionine (SAM). Fresh SAM was 16

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

added and reactions were incubated for another 2 hrs. DNA was purified and the reaction was repeated once to ensure complete methylation. Complete methylation was confirmed by restriction analysis using methylation sensitive enzymes (HpaII and MspI, Fermentas), and/or bisulfite sequencing of specific loci.

not been fully sequenced and thus may deviate from the reference genomes at a few locations. To control for such sequence differences, we flagged all spots which exhibited fluorescence levels which did not exceed a background fluorescence threshold in the input channel, which was set to the mean fluorescence plus twice the standard deviation of all negative control features (i.e. empty array features as well as spots containing irrelevant sequences, corresponding to all Agilent probes in the datasets which are labeled with ‘‘NC2_’’ and ‘‘(-)3xSLv1’’). Note that, if sequence diversification leads to only a reduction of hybridization efficiency (e.g. due to single nucleotide polymorphism, which will not abolish hybridization), this will not falsify our results as the hybridization efficiency will be reduced in input as well as the immunoprecipitated sample; the ratio will thus be unaffected. In addition to above quality controls, in each dataset we flagged all probes which exhibited more than 30% variance between duplicate spots. The 30% threshold corresponds to the mean variance plus twice the standard deviation exhibited by all KSHVspecific probes in all MeDIP experiments, thus removing all probes which show a significantly increased variance between individual spot repeats. MeDIP data were furthermore corrected by subtracting from each probe-specific signal the value observed in the negative control, i.e. the MeDIP sample representing the unmethylated KSHV bacmid in the background of cellular DNA. After normalization, an enrichment score was calculated for each of the probes, represented by the ratio of fluorescence signal intensities in the immunoprecipitated samples relative to the input control. As the average length of the immunoprecipitated MeDIP and ChIP fragments (100 to 500 bp) is greater than that of the tiled probes (45 to 60 nucleotides), the resolution of our analysis was limited by the fragment length rather than the array design. To account for this fact, the data presented in Figures 2 to 6, 7 and 8 were calculated by tiling overlapping sequence windows of 250 nucleotides across the KSHV genome, using a step size of 100 nucleotides to advance each window. The type M reference sequence (NC_003409) was used for the HBL6 line, whereas the type P genome (NC_009333) was used for all other cells. The KSHV specific probes were subsequently blasted against the window sequences, and each window was awarded an enrichment score represented by the average score of all probes which showed more than 90% identity with either strand of its sequence. All scores (which were also used to calculate the Pearson correlation coefficients presented in Tables 1 and 2) are given in the Dataset S1. All raw data, including the original GPR files as well as sequence and match location(s) of individual probes are available from the Gene Expression Omnibus (GEO) Database at http:// www.ncbi.nlm.nih.gov/geo, under accession number GSE19907.

Microarray design Custom high-resolution KSHV microarrays were designed by shifting a sequence window of 60 nt. across both strands of the prototypic KSHV sequence (type P, accession number NC_009333) as well as the terminal repeat unit (KSU86666). Probes with a length between 45 and 60 nucleotides were selected from these windows such that their melting temperature was close to the optimal Tm of 80uC. To also ensure complete coverage of type M KSHV strains, the resulting probe sets were aligned to the type M reference sequence (NC_003409) and additional probes were designed in a identical manner for all regions with a length of 80 or more nucleotides which were not already covered by the original probe set. The length of all probes was subsequently adjusted to 60 nucleotides using sequences from a common linker (ATAACCGACGCCTAA), and each probe was synthesized in duplicate on Agilent 8615k custom microarrays. For normalization purposes, the array also contains probe sets which were generated in an identical manner to cover the adenovirus type 5 genome (AY339865) as well as the pCR2.1 plasmid (Invitrogen).

Microarray sample labeling and hybridization 500 ng of MeDIP-input or ChIP-input controls, 500 ng of immunoprecipitated ChIP material and all of the MeDIP material were labeled with Cy3 and Cy5 using Agilent Genomic DNA Labeling Kit PLUS according to Agilent’s recommendations. For normalization purposes, 0.1 ng of Adenovirus Type 5 DNA were added to each samples prior to the labeling procedure. After labeling, samples were purified using Microcon YM-30 filter columns (Milipore), blocked using Agilent blocking solution and human cot-1 DNA (Invitrogen), and hybridized using the Agilent Oligo aCGH/ChIP-on-chip Hybridization Kit at 65uC for 24 hrs in a rotating oven. Arrays were washed once with Oligo aCGH/ ChIP-on-chip Wash Buffer 1 (Agilent 5188–5221) at RT for 5 min and in Oligo aCGH/ChIP-on-chip Wash Buffer 2 (Agilent 5188– 5222) at 37uC for 1 min and scanned using a GenePix Personal 4100A scanner (Axon Instruments).

Microarray data analysis and normalization Primary array analysis and data normalization was carried out using GenePix Pro 6.0 software (Axon Instruments). All MeDIP datasets were normalized using the methylated pCR2.1 plasmid which had been added to the samples prior to the immunoprecipitation, thus controlling for differences in MeDIP efficiency as well as labeling and array hybridization. Both channels were adjusted such that the average ratio of input vs. MeDIP signals across all pCR2.1-specific spots was 1. Similarly, ChIP datasets were normalized using the adenovirus type 5 DNA that was added as a spike-in prior to labeling, hence correcting for errors during labeling, hybridization or scanning of the samples. To eliminate false positive spots, we hybridized DNA from KSHV-negative SLK cells and identified all probes which exhibited high levels of background hybridization (i.e., fluorescence levels that exceeded the mean value plus 16 the standard deviation of all KSHVspecific spots on the negative control array). These probes (which mapped almost exclusively to repeat regions) were permanently flagged in all datasets and not used for further analysis. While our arrays carry probes specific for the M and P types of the KSHV genome, the KSHV genomes from the BCBL1 and AP3 lines have PLoS Pathogens | www.plospathogens.org

Supporting Information Table S1 KSHV-specific PCR primers used in this study. Found at: doi:10.1371/journal.ppat.1000935.s001 (0.08 MB DOC) Protocol S1 ChIP Protocol. Found at: doi:10.1371/journal.ppat.1000935.s002 (0.03 MB DOC) Protocol S2 MeDIP Protocol. Found at: doi:10.1371/journal.ppat.1000935.s003 (0.04 MB DOC) Dataset S1 Dataset containing all datapoints used for Figures 2 to 6, 7 and 8, and for the calculation of correlation coefficients given in Tables 1 and 2. Found at: doi:10.1371/journal.ppat.1000935.s004 (0.60 MB XLS) 17

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

Figure S1 Global DNA methylation patterns of latent KSHV genomes (Higher magnification of data presented in Figure 2). Found at: doi:10.1371/journal.ppat.1000935.s005 (0.50 MB PDF)

patterns in BCBL1, SLKp cells as well as SLK cultures at 5 days post infection (SLK-5dpi) are shown as described in the legend to Figure 6. The location of regions which were found to harbor Rta binding sites in genome-wide screens performed by Chen et al. [48] or Ellison et al. [24] is indicated by dotted lines. The labeling above the lines indicates whether these regions were identified by Chen et al. (labelled ‘‘1’’), Ellison et al. (‘‘2’’), or in both studies (‘‘1,2’’). Found at: doi:10.1371/journal.ppat.1000935.s009 (1.02 MB TIF)

Figure S2 Global patterns of H3K9/K14 Acetylation and H3K4 tri-methylation on latent KSHV genomes (Higher magnification of data presented in Figure 6). Found at: doi:10.1371/journal.ppat.1000935.s006 (0.41 MB PDF) Figure S3 Global patterns of H3K27 and H3K9 tri-methylation on latent KSHV genomes (Higher magnification of data presented in Figure 7). Found at: doi:10.1371/journal.ppat.1000935.s007 (0.44 MB PDF)

Acknowledgments We thank Sarah Kinkley, Hans Will, Uwe Tessmer, Thomas Christalla, Nicole Walz and Christine Henning for technical assistance and helpful discussions. We thank Nicole Fischer for critical reading of the manuscript.

Figure S4 Spontaneous reactivation in BCBL1 cells. BCBL1

cells were analyzed by immunofluorescence for the expression of the ORF59 gene product. A phase contrast image (PC) is shown to the left, and an enlarged overlay of the section framed by the white rectangle is shown at the bottom. Found at: doi:10.1371/journal.ppat.1000935.s008 (3.17 MB TIF)

Author Contributions Conceived and designed the experiments: TG AG. Performed the experiments: TG. Analyzed the data: TG AG. Wrote the paper: AG.

Figure S5 Rta binding sites and global patterns of H3K4-me3 on latent KSHV genomes. H3K4 tri-methylation (H3K4-me3)

References 19. Ragoczy T, Heston L, Miller G (1998) The Epstein-Barr virus Rta protein activates lytic cycle genes and can disrupt latency in B lymphocytes. J Virol 72: 7978–7984. 20. Sun R, Lin SF, Gradoville L, Yuan Y, Zhu F, et al. (1998) A viral gene that activates lytic cycle expression of Kaposi’s sarcoma-associated herpesvirus. Proc Natl Acad Sci U S A 95: 10866–10871. 21. Lukac DM, Renne R, Kirshner JR, Ganem D (1998) Reactivation of Kaposi’s sarcoma-associated herpesvirus infection from latency by expression of the ORF 50 transactivator, a homolog of the EBV R protein. Virology 252: 304–312. 22. Xu Y, AuCoin DP, Huete AR, Cei SA, Hanson LJ, et al. (2005) A Kaposi’s sarcoma-associated herpesvirus/human herpesvirus 8 ORF50 deletion mutant is defective for reactivation of latent virus and DNA replication. J Virol 79: 3479–3487. 23. Gradoville L, Gerlach J, Grogan E, Shedd D, Nikiforow S, et al. (2000) Kaposi’s sarcoma-associated herpesvirus open reading frame 50/Rta protein activates the entire viral lytic cycle in the HH-B2 primary effusion lymphoma cell line. J Virol 74: 6207–6212. 24. Ellison TJ, Izumiya Y, Izumiya C, Luciw PA, Kung HJ (2009) A comprehensive analysis of recruitment and transactivation potential of K-Rta and K-bZIP during reactivation of Kaposi’s sarcoma-associated herpesvirus. Virology 387: 76–88. 25. Lu F, Zhou J, Wiedmer A, Madden K, Yuan Y, et al. (2003) Chromatin remodeling of the Kaposi’s sarcoma-associated herpesvirus ORF50 promoter correlates with reactivation from latency. J Virol 77: 11425–11435. 26. Chen J, Ueda K, Sakakibara S, Okuno T, Parravicini C, et al. (2001) Activation of latent Kaposi’s sarcoma-associated herpesvirus by demethylation of the promoter of the lytic transactivator. Proc Natl Acad Sci U S A 98: 4119–4124. 27. Yu Y, Black JB, Goldsmith CS, Browning PJ, Bhalla K, et al. (1999) Induction of human herpesvirus-8 DNA replication and transcription by butyrate and TPA in BCBL-1 cells. J Gen Virol 80(Pt 1): 83–90. 28. Bechtel JT, Winant RC, Ganem D (2005) Host and viral proteins in the virion of Kaposi’s sarcoma-associated herpesvirus. J Virol 79: 4952–4964. 29. Bernstein BE, Meissner A, Lander ES (2007) The mammalian epigenome. Cell 128: 669–681. 30. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, et al. (2006) A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125: 315–326. 31. Schuettengruber B, Chourrout D, Vervoort M, Leblanc B, Cavalli G (2007) Genome regulation by polycomb and trithorax proteins. Cell 128: 735–745. 32. Doerfler W (2005) On the biological significance of DNA methylation. Biochemistry (Mosc) 70: 505–524. 33. Miller G, El-Guindy A, Countryman J, Ye J, Gradoville L (2007) Lytic cycle switches of oncogenic human gammaherpesviruses. Adv Cancer Res 97: 81–109. 34. Minarovits J (2006) Epigenotypes of latent herpesvirus genomes. Curr Top Microbiol Immunol 310: 61–80. 35. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, et al. (2005) Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet 37: 853–862. 36. Zhou FC, Zhang YJ, Deng JH, Wang XP, Pan HY, et al. (2002) Efficient infection by a recombinant Kaposi’s sarcoma-associated herpesvirus cloned in a bacterial artificial chromosome: application for genetic analysis. J Virol 76: 6185–6196.

1. Pellett PE, Roizman B (2001) The family herpesviridae: A brief introduction. In: Knipe DM, Howley PM, Griffin DE, Lamb RA, Martin MA, eds. Fields virology. Philadelphia: Lippincott Williams & Wilkins. pp 2381–2398. 2. Cesarman E, Chang Y, Moore PS, Said JW, Knowles DM (1995) Kaposi’s sarcoma-associated herpesvirus-like DNA sequences in AIDS-related bodycavity-based lymphomas. N Engl J Med 332: 1186–1191. 3. Chang Y, Cesarman E, Pessin MS, Lee F, Culpepper J, et al. (1994) Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi’s sarcoma. Science 266: 1865–1869. 4. Soulier J, Grollet L, Oksenhendler E, Cacoub P, Cazals-Hatem D, et al. (1995) Kaposi’s sarcoma-associated herpesvirus-like DNA sequences in multicentric Castleman’s disease. Blood 86: 1276–1280. 5. Dupin N, Fisher C, Kellam P, Ariad S, Tulliez M, et al. (1999) Distribution of human herpesvirus-8 latently infected cells in Kaposi’s sarcoma, multicentric Castleman’s disease, and primary effusion lymphoma. Proc Natl Acad Sci U S A 96: 4546–4551. 6. Renne R, Zhong W, Herndier B, McGrath M, Abbey N, et al. (1996) Lytic growth of Kaposi’s sarcoma-associated herpesvirus (human herpesvirus 8) in culture. Nat Med 2: 342–346. 7. Dittmer D, Lagunoff M, Renne R, Staskus K, Haase A, et al. (1998) A cluster of latently expressed genes in Kaposi’s sarcoma-associated herpesvirus. J Virol 72: 8309–8315. 8. Hu J, Garber AC, Renne R (2002) The latency-associated nuclear antigen of Kaposi’s sarcoma-associated herpesvirus supports latent DNA replication in dividing cells. J Virol 76: 11677–11687. 9. Grundhoff A, Ganem D (2003) The latency-associated nuclear antigen of Kaposi’s sarcoma-associated herpesvirus permits replication of terminal repeatcontaining plasmids. J Virol 77: 2779–2783. 10. Ballestas ME, Chatis PA, Kaye KM (1999) Efficient persistence of extrachromosomal KSHV DNA mediated by latency-associated nuclear antigen. Science 284: 641–644. 11. Russo JJ, Bohenzky RA, Chien MC, Chen J, Yan M, et al. (1996) Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8). Proc Natl Acad Sci U S A 93: 14862–14867. 12. McCormick C, Ganem D (2005) The kaposin B protein of KSHV activates the p38/MK2 pathway and stabilizes cytokine mRNAs. Science 307: 739–741. 13. Grundhoff A, Sullivan CS, Ganem D (2006) A combined computational and microarray-based approach identifies novel microRNAs encoded by human gamma-herpesviruses. Rna 12: 733–750. 14. Cai X, Cullen BR (2006) Transcriptional origin of Kaposi’s sarcoma-associated herpesvirus microRNAs. J Virol 80: 2234–2242. 15. Cai X, Lu S, Zhang Z, Gonzalez CM, Damania B, et al. (2005) Kaposi’s sarcoma-associated herpesvirus expresses an array of viral microRNAs in latently infected cells. Proc Natl Acad Sci U S A 102: 5570–5575. 16. Samols MA, Hu J, Skalsky RL, Renne R (2005) Cloning and identification of a microRNA cluster within the latency-associated region of Kaposi’s sarcomaassociated herpesvirus. J Virol 79: 9301–9305. 17. Pfeffer S, Sewer A, Lagos-Quintana M, Sheridan R, Sander C, et al. (2005) Identification of microRNAs of the herpesvirus family. Nat Methods 2: 269–276. 18. Pearce M, Matsumura S, Wilson AC (2005) Transcripts encoding K12, v-FLIP, v-cyclin, and the microRNA cluster of Kaposi’s sarcoma-associated herpesvirus originate from a common promoter. J Virol 79: 14457–14464.

PLoS Pathogens | www.plospathogens.org

June 2010 | Volume 6 | Issue 6 | e1000935

Epigenetic Landscape of Latent KSHV Genomes

37. Drexler HG, Meyer C, Gaidano G, Carbone A (1999) Constitutive cytokine production by primary effusion (body cavity-based) lymphoma-derived cell lines. Leukemia 13: 634–640. 38. Bechtel JT, Liang Y, Hvidding J, Ganem D (2003) Host range of Kaposi’s sarcoma-associated herpesvirus in cultured cells. J Virol 77: 6474–6481. 39. Grundhoff A, Ganem D (2004) Inefficient establishment of KSHV latency suggests an additional role for continued lytic replication in Kaposi sarcoma pathogenesis. J Clin Invest 113: 124–136. 40. Paulus C, Nitzsche A, Nevels M Chromatinisation of herpesvirus genomes. Rev Med Virol 20: 34–50. 41. Stedman W, Deng Z, Lu F, Lieberman PM (2004) ORC, MCM, and histone hyperacetylation at the Kaposi’s sarcoma-associated herpesvirus latent replication origin. J Virol 78: 12566–12575. 42. Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, et al. (2007) Genomewide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448: 553–560. 43. Simon JA, Kingston RE (2009) Mechanisms of polycomb gene silencing: knowns and unknowns. Nat Rev Mol Cell Biol 10: 697–708. 44. Suganuma T, Workman JL (2008) Crosstalk among Histone Modifications. Cell 135: 604–607. 45. Cliffe AR, Garber DA, Knipe DM (2009) Transcription of the herpes simplex virus latency-associated transcript promotes the formation of facultative heterochromatin on lytic promoters. J Virol 83: 8182–8190. 46. Kwiatkowski DL, Thompson HW, Bloom DC (2009) The polycomb group protein Bmi1 binds to the herpes simplex virus 1 latent genome and maintains repressive histone marks during latency. J Virol 83: 8173–8181. 47. Sen GL, Webster DE, Barragan DI, Chang HY, Khavari PA (2008) Control of differentiation in a self-renewing mammalian tissue by the histone demethylase JMJD3. Genes Dev 22: 1865–1870. 48. Chen J, Ye F, Xie J, Kuhne K, Gao SJ (2009) Genome-wide identification of binding sites for Kaposi’s sarcoma-associated herpesvirus lytic switch protein, RTA. Virology 386: 290–302. 49. Krishnan HH, Naranatt PP, Smith MS, Zeng L, Bloomer C, et al. (2004) Concurrent expression of latent and a limited number of lytic genes with immune modulation and antiapoptotic function by Kaposi’s sarcoma-associated herpesvirus early during infection of primary endothelial and fibroblast cells and subsequent decline of lytic gene expression. J Virol 78: 3601–3620. 50. Rivas C, Thlick AE, Parravicini C, Moore PS, Chang Y (2001) Kaposi’s sarcoma-associated herpesvirus LANA2 is a B-cell-specific latent viral protein that inhibits p53. J Virol 75: 429–438. 51. Glenn M, Rainbow L, Aurade F, Davison A, Schulz TF (1999) Identification of a spliced gene from Kaposi’s sarcoma-associated herpesvirus encoding a protein with similarities to latent membrane proteins 1 and 2A of Epstein-Barr virus. J Virol 73: 6953–6963. 52. Wang L, Dittmer DP, Tomlinson CC, Fakhari FD, Damania B (2006) Immortalization of primary endothelial cells by the K1 protein of Kaposi’s sarcoma-associated herpesvirus. Cancer Res 66: 3658–3666. 53. Verma SC, Lan K, Choudhuri T, Robertson ES (2006) Kaposi’s sarcomaassociated herpesvirus-encoded latency-associated nuclear antigen modulates K1

PLoS Pathogens | www.plospathogens.org

54.

55.

56.

57.

58.

59. 60.

61.

62.

63.

64. 65.

66.

67.

68.

expression through its cis-acting elements within the terminal repeats. J Virol 80: 3445–3458. Nicholas J, Ruvolo VR, Burns WH, Sandford G, Wan X, et al. (1997) Kaposi’s sarcoma-associated human herpesvirus-8 encodes homologues of macrophage inflammatory protein-1 and interleukin-6. Nat Med 3: 287–292. Chatterjee M, Osborne J, Bestetti G, Chang Y, Moore PS (2002) Viral IL-6induced cell proliferation and immune evasion of interferon activity. Science 298: 1432–1435. Parravicini C, Chandran B, Corbellino M, Berti E, Paulli M, et al. (2000) Differential viral protein expression in Kaposi’s sarcoma-associated herpesvirusinfected diseases: Kaposi’s sarcoma, primary effusion lymphoma, and multicentric Castleman’s disease. Am J Pathol 156: 743–749. Taylor JL, Bennett HN, Snyder BA, Moore PS, Chang Y (2005) Transcriptional analysis of latent and inducible Kaposi’s sarcoma-associated herpesvirus transcripts in the K4 to K7 region. J Virol 79: 15099–15106. Jenner RG, Alba MM, Boshoff C, Kellam P (2001) Kaposi’s sarcoma-associated herpesvirus latent and lytic gene expression as revealed by DNA arrays. J Virol 75: 891–902. Vire E, Brenner C, Deplus R, Blanchon L, Fraga M, et al. (2006) The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439: 871–874. Herndier BG, Werner A, Arnstein P, Abbey NW, Demartis F, et al. (1994) Characterization of a human Kaposi’s sarcoma cell line that induces angiogenic tumors in animals. Aids 8: 575–581. Carbone A, Cilia AM, Gloghini A, Capello D, Todesco M, et al. (1998) Establishment and characterization of EBV-positive and EBV-negative primary effusion lymphoma cell lines harbouring human herpesvirus type-8. Br J Haematol 102: 1081–1089. Gaidano G, Cechova K, Chang Y, Moore PS, Knowles DM, et al. (1996) Establishment of AIDS-related lymphoma cell lines from lymphomatous effusions. Leukemia 10: 1237–1240. Bock C, Reither S, Mikeska T, Paulsen M, Walter J, et al. (2005) BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing. Bioinformatics 21: 4067–4068. Xiong Z, Laird PW (1997) COBRA: a sensitive and quantitative DNA methylation assay. Nucleic Acids Res 25: 2532–2534. Si H, Verma SC, Robertson ES (2006) Proteomic analysis of the Kaposi’s sarcoma-associated herpesvirus terminal repeat element binding proteins. J Virol 80: 9017–9030. Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, et al. (2007) Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet 39: 457–466. Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S (2007) Genomewide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet 39: 61–69. Reynaud C, Bruno C, Boullanger P, Grange J, Barbesti S, et al. (1992) Monitoring of urinary excretion of modified nucleosides in cancer patients using a set of six monoclonal antibodies. Cancer Lett 61: 255–262.

June 2010 | Volume 6 | Issue 6 | e1000935

A New Nuclear Function of the Entamoeba histolytica Glycolytic Enzyme Enolase: The Metabolic Regulation of Cytosine-5 Methyltransferase 2 (Dnmt2) Activity Ayala Tovy1, Rama Siman Tov1, Ricarda Gaentzsch2, Mark Helm2,3, Serge Ankri1* 1 Department of Molecular Microbiology, The Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel, 2 Department of Chemistry, The Pharmacy and Molecular Biotechnology Institute, Ruprecht-Karls University of Heidelberg, Heidelberg, Germany, 3 The Pharmacy and Biochemistry Institute, Johannes Gutenberg University, Mainz, Germany

Abstract Cytosine-5 methyltransferases of the Dnmt2 family function as DNA and tRNA methyltransferases. Insight into the role and biological significance of Dnmt2 is greatly hampered by a lack of knowledge about its protein interactions. In this report, we address the subject of protein interaction by identifying enolase through a yeast two-hybrid screen as a Dnmt2-binding protein. Enolase, which is known to catalyze the conversion of 2-phosphoglycerate (2-PG) to phosphoenolpyruvate (PEP), was shown to have both a cytoplasmatic and a nuclear localization in the parasite Entamoeba histolytica. We discovered that enolase acts as a Dnmt2 inhibitor. This unexpected inhibitory activity was antagonized by 2-PG, which suggests that glucose metabolism controls the non-glycolytic function of enolase. Interestingly, glucose starvation drives enolase to accumulate within the nucleus, which in turn leads to the formation of additional enolase-E.histolytica DNMT2 homolog (Ehmeth) complex, and to a significant reduction of the tRNAAsp methylation in the parasite. The crucial role of enolase as a Dnmt2 inhibitor was also demonstrated in E.histolytica expressing a nuclear localization signal (NLS)-fused-enolase. These results establish enolase as the first Dnmt2 interacting protein, and highlight an unexpected role of a glycolytic enzyme in the modulation of Dnmt2 activity. Citation: Tovy A, Siman Tov R, Gaentzsch R, Helm M, Ankri S (2010) A New Nuclear Function of the Entamoeba histolytica Glycolytic Enzyme Enolase: The Metabolic Regulation of Cytosine-5 Methyltransferase 2 (Dnmt2) Activity. PLoS Pathog 6(2): e1000775. doi:10.1371/journal.ppat.1000775 Editor: William A. Petri, Jr., University of Virginia Health System, United States of America Received September 2, 2009; Accepted January 18, 2010; Published February 19, 2010 Copyright: Ă&#x; 2010 Ankri et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by grants from the Israel Science Foundation and the Rappaport Family Institute for Research in the Medical Sciences, and the Deutsche Forschungsgemeinschaft (DFG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: sankri@tx.technion.ac.il

function mutations of the Dnmt2 gene do not change genomic methylation patterns in the mouse [7]. In addition, depletion of D. melanogaster Dnmt2 (dDnmt2) by RNA interference has no detectable consequences on embryonic development [8]. However, a recent report has shown that loss of Dnmt2 in somatic cells eliminates H4K20 trimethylation at retrotransposons, and impairs maintenance of retrotransposon silencing [9]. Dnmt2 has been established as a genuine DNA methyltransferase in lower eukaryotes. Dnmt2 catalyzes DNA methylation in Dictyostelium discoideum [10,11] and Entamoeba histolytica [12]. However, the weak DNA methyltransferase activity and the low expression level of Dnmt2 enzymes may explain the low methylation level that is found in these organisms [13]. Dnmt2 catalyzes cytosine methylation with a low preference for Cp(A/T) [8,12,14] or CC(A/T)GG [15], rather than the CpG motif. These results suggest that a dual specificity for DNA and RNA substrates emerged during the evolution of the Dnmt2 family [13]. Despite this dual specificity for DNA and RNA, the function of Dnmt2 as an RNA methyltransferase in lower eukaryotes has not yet been established. The finding of interacting partners to members of the DNA/ tRNA methyltransferase Dnmt2 is crucial for improving our existing understanding of its function. Until now, no interacting candidate has been reported for this family of proteins. In contrast, numerous proteins have been shown to interact with Dnmt1 and

Introduction The synthesis of 5-methylcytosine in both DNA and RNA is catalyzed by methyl 5-cytosine methyltransferases (m5C-MTase) with S-adenosylmethionine as a cofactor. The mammalian DNA methylation machinery consists of three active DNA m5CMTases: Dnmt1, Dnmt3a and Dnmt3b. Dnmt1 has a high preference for hemi-methylated DNA as a substrate [1], whereas Dnmt3a and Dnmt3b are de novo DNA MTases that act on nonmethylated DNA (for review, see Jeltsch [2]). A fourth DNA m5CMTases, Dnmt2, belongs to a large family of proteins that are conserved in all species from Schizosaccharomyces pombe to humans. Dnmt2 stands apart from the three active DNA MTases because its length is relatively short when compared to that of Dnmt3a, Dnmt3b, or Dnmt1. Furthermore, this enzyme resembles prokaryotic DNA MTases because it does not have a large Nterminal regulatory domain [3]. Native tRNAAsp extracted from Dnmt2-deficient mice, Arabidopsis thaliana or Drosophila melanogaster were methylated in vitro by the human Dnmt2 (hDnmt2) protein. Accordingly, it was proposed that hDnmt2 is a tRNAAsp MTase rather than a DNA MTase [4], an idea that was further supported by the fact that it can also methylate transcribed tRNAs in vitro [5,6]. On the other hand, the role of Dnmt2 seems to be not essential in higher eukaryotes because loss of PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

encode alcohol dehydrogenase (Accession number xp_653507.1) and enolase (Accession number xp_649161.1), respectively. Alcohol dehydrogenase was excluded from our analysis due to the presence of a frame shift mutation in its sequence. In order to validate the interaction between enolase and Ehmeth, we carried out GST pull-down experiments. Ehmeth was first transcribed in vitro, and then translated in the presence of radioactive 35-S-methionine (TNT system) before incubating it with gluthatione beads that were coated with either GSTEhenolase or GST. The result of this pull-down experiment shows that Ehmeth binds specifically to GST-Ehenolase, and not to GST (Fig. 1). The existence of sequence homology between members of the Dnmt2 protein family and members of the enolase family suggests that the interaction between Ehmeth and enolase is conserved outside the Entamoeba genus. In order to test this hypothesis, Drosophila and human Dnmt2 proteins were transcribed in vitro, translated, and then incubated with GST-Ehenolase. Interestingly, both Dnmt2 proteins were able to bind to enolase (Fig. 1).

Author Summary Epigenetics refers to heritable changes in gene function that occur without alterations in the DNA sequence. The best characterized epigenetic modification is DNA methylation. In mammals, DNA methylation is associated with gene silencing and transposon control. We have previously established the presence of methyl cytosine in the genome of Entamoeba histolytica, an important unicellular human pathogen. Ehmeth, an enzyme that belongs to the DNA methyltransferase 2 (Dnmt2) family, catalyzes DNA methylation in the parasite. Recent evidence in support of the notion that human Dnmt2 is a tRNAAsp methyltransferase fuels the debate about the real function of the Dnmt2 family. Our results show that Ehmeth also catalyzes tRNAAsp methylation and indicates a dual function for this protein. In this study, we have also identified that enolase, a glycolytic enzyme, interacts with Ehmeth, and modulates its activity under conditions of glucose starvation. These data add to the emerging evidence that glycolytic enzymes have multifunctional roles, and emphasize the importance of energetic metabolism in the control of the epigenetic enzymatic machinery.

Localization of enolase in E.histolytica trophozoites We previously reported that enolase is secreted by activated trophozoites [23]. In order to get further insights into the cellular localization of this protein, cytoplasmatic and nuclear trophozoite proteins that were prepared from HM-1:MSS trophozoites were analyzed by western blotting with an antibody against enolase (Fig. 2A, 2C). The specificity of the enolase antibody that was raised against human enolase was confirmed against GSTEhenolase using GST alone as the negative control (data not shown). The efficiency of the protein fractionation was examined by western blot analysis using antibodies against EhMLBP, a nuclear protein [24] and myosin II, a cytoplasmatic protein [25], as controls. As expected, EhMLBP was detected in the nuclear fraction and Myosin II in the cytoplasmatic fraction of the parasite (Fig. 2A). Enolase was detected as a 47 kDa protein present in the cytoplasmatic fraction of the parasite (Fig. 2A, 2C). Moreover, non-negligible amount of enolase were detected in the nuclear fraction of the parasite. To further validate these results, we

Dnmt3 thereby linking methylation to histone modifications and transcription regulation. For example both Dnmts were found to be associated with histone deacetylase [16,17]. Dnmt1 was also found to interact with several chromatin- associated proteins, such as retinoblastoma protein, DNA methyltransferase 1 associated protein 1 and methyl CpG binding protein 2 [1], and Dnmt3 binds various transcription regulators, such as the transcriptional regulator RP58, the fusion protein of promyelocytic leukemia (PML) and the retinoic acid receptor-a (RARa) (PML-RAR) and heterochromatin protein 1 [18]. E.histolytica is an interesting model in which to study DNA methylation because Ehmeth, an enzyme that belongs to the Dnmt2 family, is the unique DNA methyltransferase that is present in this parasite [12]. The presence of methylated cytosine in E. histolytica ribosomal DNA [12] and the scaffold/matrix attachment region [19], together with the evidence that mutations can result from accelerated deamination of methylated cytosines in the reverse transcriptase of LINE retrotransposon (RT LINE) [20] support a role for Dnmt2 in the control of repetitive elements. This role has been confirmed in lower eukaryote Dictyostelium discoideum [10,11] and in Drosophila [9]. Here, we establish that Ehmeth can catalyze the methylation of tRNAAsp. Moreover, we report, for the first time, that enolase, in addition to its involvement in the glycolytic pathway [21,22], is an inhibitor of Dnmt2.

Results Identification and validation of enolase as an interacting partner of Ehmeth We carried out a yeast two-hybrid screen using a bait vector that expressed pAS1-Ehmeth that was fused to the GAL4 binding domain (GAL4BD) and an E.histolytica cDNA library that was fused to the GAL4 activation domain (GAL4AD) as prey. For this purpose, 106 clones were analyzed, and only two were selected based on their ability to grow on the selective medium (histidine, leucine, tryptophan and adenine) and results from the bgalactosidase complementation assays (data not shown). For each of the two positive clones, the recombinant plasmid that harbored the cDNA sequence that was fused to GAL4AD was isolated by transformation of E. coli cells, and then sequenced. These plasmids PLoS Pathogens | www.plospathogens.org

Figure 1. In vitro interaction between Ehmeth, dDnmt2, hDnmt2 and enolase. 35S labeled proteins Ehmeth (TNT-Ehmeth), dDnmt2 (TNT-dDnmt2) and hDnmt2 (TNT-hDnmt2) were incubated respectively with glutathione beads coated with GST or GST- Enolase and the interacting proteins were analyzed by SDS-Page as described in the Materials and Methods. Left panel: Coomassie staining of GST and GST-Enolase fusion protein used in the pull-down procedure. Right panel: Pull down products TNT-Ehmeth, TNT-dDnmt2 and TNT-hDnmt2 were detected by exposure of the membrane to an x ray film. doi:10.1371/journal.ppat.1000775.g001

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

Figure 2. Enolase is present in the cytoplasmatic and nuclear fraction of E.histolytica. A. Cytoplasmatic (C) and nuclear (N) protein fractions of E.histolytica HM-1:MSS and pJST4-Ehmeth trophozoites were separated on 12% SDS-PAGE and analyzed by western blot with an anti HA antibody, an anti enolase antibody, an anti EhMLBP antibody or an anti Myosin II antibody. B. Cellular localization of Ehenolase in E.histolytica trophozoites. Ehenolase was detected by immunofluorescence microscopy using anti-enolase antibody. Ehenolase distribution is shown in red using a primary anti-enolase antibody and a secondary antibody conjugated with Cy3. Nuclei (blue) were stained by DAPI. Computer-assisted image overlay analysis of the signal given by enolase antibody and by DAPI, shows that Ehenolase is ubiquitously present in trophozoites including in the nucleus. C. Cytoplasmatic and nuclear protein fractions of E.histolytica HM-1:MSS, trophozoites expressing a NLS-fused-scramble peptide (NLS-Con) (30) and trophozoites expressing a NLS-fused enolase (NLS-Eno) were separated on 12% SDS-PAGE and analyzed by western blot with an anti enolase antibody, an anti actin antibody or an anti EhMLBP antibody. doi:10.1371/journal.ppat.1000775.g002

examined the localization of enolase in the parasite using immunofluorescent microscopy (Fig. 2B). The result of this analysis showed that enolase is ubiquitously present in the parasite including its nucleus.

enous enolase with a calmodulin, histidine, hemagglutin (CHH)tagged-Ehmeth in pJST4-Ehmeth transfected trophozoites nuclear lysate. We chose a tagged Ehmeth rather than the endogenous Ehmeth in these co-immunoprecipitation experiments because the antibody that we previously raised against Ehmeth [12] was unable to immunoprecipitate the protein (data not shown). A hemagglutin (HA) antibody was used to detect HA in the CHH tag. The expression of CHH-tagged Ehmeth in the nuclear

Ehmeth interacts with enolase in E.histolytica in vivo In order to test the binding of Ehmeth to enolase in the parasite, we conducted co-immunoprecipitation experiments using endogPLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

tional change in the structure of Ehmeth is expected , following the deletion of the amino acids 88 to 103.

fraction of pJST4-Ehmeth transfected trophozoites was confirmed by western blot analysis using an HA antibody (Fig. 2A). We observed that enolase co-immunoprecipitated with CHHtagged-Ehmeth (Fig. 3 left panel, Control). Ehmeth also coimmunoprecipitated with enolase (data not shown). In order to exclude the possibility that enolase interacts with the CHH tag and not with Ehmeth, enolase was immunoprecipitated from a nuclear lysate of trophozoites that expressed a CHH-KLP5 tagged protein [26] using the HA antibody. We observed that enolase does not co-immunoprecipitate with the CHH-KLP5 tagged protein, and this result indicates that no interaction occurred between enolase and the CHH tag (Fig. 3, right panel).

Enolase inhibits the binding of Ehmeth and hDnmt2 to EhMRS2 DNA We previously demonstrated that Ehmeth binds to EhMRS2, a DNA element, which contains the eukaryotic consensus scaffold/ matrix attachment regions (S/MAR) bipartite recognition sequences [19]. We hypothesized that enolase regulate Ehmeth activity because it binds to its catalytic site. In order to test this hypothesis, GST-Ehmeth was incubated with P32 labeled EhMRS2 DNA in presence of various amount of GST-Ehenolase, and the denaturant-resistant DNA-Ehmeth complex [3] was analyzed by SDS-PAGE under denaturing conditions. In agreement with a previous report [19], GST-Ehmeth forms a complex with EhMRS2 DNA which is characterized by a retarded band in the SDS gel (Fig. 5A). No complex was observed when the labeled EhMRS2 DNA probe was incubated with either GST or GST-Ehenolase (Fig. 5A). The presence of Ehmeth in the retarded band was confirmed by mass spectrometry analysis (Fig. S1). Remarkably, the formation of Ehmeth-EhMRS2 complex was inhibited in the presence of Ehenolase (Fig. 5A). In order to confirm this result for hDnmt2, we tested its ability to bind EhMRS2 DNA. We found that hDnmt2 binds to EhMRS2 DNA (Fig. 5A). The formation of hDnmt2-EhMRS2 DNA complex was also strongly inhibited by Ehenolase. These results suggest that an identical inhibitory mechanism is used by enolase to inhibit the binding of Ehmeth and hDnmt2 to EhMRS2 DNA.

Mapping of Ehmeth binding site to enolase In order to delineate the enolase-interacting domains on Ehmeth, a series of deletion mutant proteins (Fig. 4, upper panel) were pulled down by either GST-Ehenolase or GST. We observed that N-terminal (from amino acid 1 to 103) and C-terminal (from amino acid 88 to 322) of Ehmeth were able to bind enolase in the same manner as full length Ehmeth (Fig. 4 lower panel). These results suggest that the specific region between amino acid 88 and 103, which is shared by the C-terminal and N-terminal Ehmeth mutant proteins is involved in the binding of Ehmeth to enolase. This region includes the catalytic site (domain IV) of Dnmt2 proteins [27]. In order to test this hypothesis, a mutant Ehmeth protein that lacks the amino acids 88 to 103 (EhmethD88â&#x20AC;&#x201C;103) was generated, and its binding to GST-Ehenolase was examined. We found that the binding of EhmethD88â&#x20AC;&#x201C;103 to enolase is impaired (Fig. 4 lower panel). It is important to emphasize that the input amount of the different Ehmeth deletion mutants proteins used in the GST-pull down assay were equivalent (data not shown). This result indicates that the domain IV contributes to the binding of Ehmeth to enolase. The catalytic domain of Dnmt2 proteins subsists as an exposed loop which is not part of the main structure [3]. According to this model, no significant conforma-

Enolase inhibits the tRNAAsp methyltransferase activity of Ehmeth and hDnmt2 It has been reported that hDnmt2 catalyzes the methylation of tRNAAsp [4,5,6]. Therefore, we decided to examine this catalytic activity in E.histolytica because it has not yet been investigated in unicellular organisms. We found that the catalytic activity for Ehmeth was 9 U (Fig. 5B, left panel). This activity is substantially lower (about 100-fold) than that of hDnmt2 (Fig. 5B, right panel). GST has no detectable tRNAAsp MT activity. It has been reported that hDnmt2 methylates tRNAAsp using a DNA methyltransferaselike catalytic mechanism [6]. This last observation predicts that enolase will also inhibit the tRNAAsp MT activity of Ehmeth and hDnmt2. We confirmed this prediction by showing that the activity of Ehmeth and hDnmt2 tRNAAsp MT was strongly inhibited by enolase (approximately 60% and 90% inhibition, respectively) (Fig. 5B).

Effect of 2-phosphoglycerate (2-PG) on the inhibitory activity of enolase Enolase has been reported to undergo a conformational change following its binding to 2-PG [28,29]. This observation prompted us to examine the effect of 2-PG on the inhibitory activity of enolase. For this purpose, the ability of enolase to inhibit the methylation of tRNAAsp by hDnmt2 was investigated in the presence of increasing concentrations of 2-PG. For this experiment, hDnmt2 was preferred to Ehmeth because its tRNA MT activity is significantly higher (see Fig. 5B). We observed that the inhibitory activity of enolase was reduced by 2-PG in a dosedependent manner (Fig. 6A). This result may be explained by reduced enolase binding to hDnmt2 when 2-PG is present. In order to test this hypothesis, the binding of enolase and hDnmt2 was investigated in the presence of 2-PG (7 mM). Following the addition of 2-PG, we observed that the binding of enolase to hDnmt2 was strongly reduced (Fig. 6B). These results indicate that

Figure 3. In vivo interaction of Ehmeth with enolase. Immunoprecipitation with an anti-HA antibody from a nuclear lysate of E. histolytica trophozoites that express Ehmeth as a CHH-tagged protein (pJST4-Ehmeth) grown in regular media (control) and from trophozoites grown in a glucose starvation media (glucose starvation). Detection of immunoprecipitated proteins was done by western blot with an antienolase antibody. To validate that the same amounts of Ehmeth were used in the assay, immunoprecipated proteins were analyzed with an anti His antibody which detects the CHH tagged Ehmeth. As a negative control, immunoprecipitation with an anti-HA antibody from a nuclear lysate of E. histolytica trophozoites that express CHH-klp5 was used (right panel). The physical interaction between enolase and Ehmeth is demonstrated only after immunoprecipitation from Ehmeth tagged trophozoites and this complex is enhanced following glucose starvation (3 fold according to Tina densitometry analysis). doi:10.1371/journal.ppat.1000775.g003

PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

Figure 4. Mapping of the enolase binding region of Ehmeth. Upper panel: Scheme of the different Ehmeth mutants. Lower panel: Pull down experiment of different Ehmeth fragments with recombinant enolase. Whereas Ehmeth full length, Ehmeth (from amino acid 1 to103) and Ehmeth (from amino acid 88 to 322) where efficiently pull-down by enolase, Ehmeth that has its domain IV truncated interacts poorly with enolase. doi:10.1371/journal.ppat.1000775.g004

The localization of enolase during glucose starvation was followed by western blot analysis of cytoplasmatic and nuclear lysates. We consistently observed that at least three times more enolase was present in the nuclear lysate of 12-hour glucosestarved trophozoites than in non-starved control trophozoites (Fig. 7A, right panel). No accumulation of enolase in the nucleus was observed in trophozoites exposed to heat shock or oxidative stress (data not shown). The addition of glucose to the starved parasite restored the original distribution of enolase. This result emphasizes that the mechanism used to accumulate enolase in the nucleus is reversible. Moreover, immunoprecipitation analysis of the enolase-Ehmeth complex following glucose starvation for 12 hours showed that more enolase-Ehmeth complex was formed in the starved trophozoites than in the non-starved control trophozoites (Fig. 3, left panel). In this study we showed that enolase inhibits Ehmeth. Accordingly, we hypothesized that the formation of EnolaseEhmeth complex affects the level of DNA and tRNAAsp methylation following glucose starvation of the parasite. In order to test this hypothesis, the level of tRNA and DNA methylation in

the inhibitory activity of enolase is regulated by its substrate, and suggest a link between the glycolytic pathway and Dnmt2 activity.

Effect of glucose starvation on the localization of enolase, its binding to Ehmeth and on the DNA/tRNAasp methylation status Our previous results indicated that 2-PG modulates the inhibitory activity of enolase. In order to assess the physiological relevance of this observation, we used glucose starvation as a means to reduce the level of 2-PG in the parasite. We chose to quantify intracellular pyruvate, the end product of glycolysis, as the method to monitor the effect of 12-hour glucose starvation instead of a direct measurement of 2-PG because its determination is easier than 2-PG. We observed that the level of pyruvate in glucose-starved trophozoites for12 hours was reduced by 50% when compared to non-starved control trophozoites (8610214 mol/ml vs 861027 mol/ml). Longer glucose starvation (24 hours) resulted in significant death of the parasite (more than 50% of the original population, data not shown). PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

Figure 5. Enolase inhibits Ehmeth and hDnmt2 functions. A. The binding of c2ATP labeled EhMRS2 DNA (EhMRS2*20.33 mg) to Ehmeth or hDnmt2 was detected as a DNA-protein complex. No complex was observed between GST and GST-Enolase incubated with EhMRS2 DNA. Enolase inhibits the binding of Ehmeth and hDnmt2 to EhMRS2 DNA in a dose dependent manner. B. Effect of enolase on the Ehmeth (left panel) and hDnmt2 (right panel) tRNA methyltransferase activity. The results represent the mean and standard deviation of three independent experiments (Pvalue,0.05). U = one unit corresponds to 1 pmol of H3-Adomet incorporated/hour/nmol of enzyme. doi:10.1371/journal.ppat.1000775.g005

control and glucose starved trophozoites was determined. Accordingly, we observed, a significant decrease in tRNA methylation (38%) in glucose-starved trophozoites when compared to that determined in the non-starved trophozoites (Fig. 7B). Moreover, RT PCR analysis showed no significant difference in the amounts of tRNAAsp in glucose- starved and non-starved control trophozoites (Fig. 7C). In contrast, when we examined the level of DNA methylation in genomic DNA of control and glucose-starved parasites with an m5C antibody using dot blot analysis we could not detect any differences (Fig. 7D) [12]. This result indicates that DNA methylation is not affected by glucose starvation probably due to the short time (12 hours starvation). Therefore, to further examine the effect of enolase accumulation in the nucleus on DNA methylation we expressed enolase constitutively followed by a Nuclear Localization Signal (NLS) in the parasite.

were cultured continuously in the presence of 24 mg mL21 G418 for one month. The localization of enolase in NLS-Eno and NLSCon transfectants was followed by western blot analysis of cytolasmic and nuclear lysates (Fig. 2C). We observed that 7 times more enolase was present in the nucleus of NLS-Eno transfectants than in NLS-con transfectants or non-transfected trophozoites (HM1:MSS) (Fig. 2C, right panel). The level of DNA and tRNAAsp methylation in NLS-Con and NLS-Eno was determined (Fig. 7B and D). A significant decrease in both DNA and tRNAAsp methylation was observed in NLS-Eno transfectants when compared to that determined in NLS-Con transfectants. These results indicate that the continuous accumulation of enolase in the nucleus inhibit both Ehmeth DNA and tRNAAsp MT activity.

Effect of enolase accumulation in the nucleus on DNA and tRNAasp methylation

Of members of the Dnmt family of proteins, the roles of Dnmt1 and Dnmt3 are relatively well understood. In contrast, our knowledge about Dnmt2 is scanty. Furthermore there is no information about the molecules which interact with this protein. Therefore, the identification of such molecules would be a key step

Discussion

The transfected trophozoites with NLS Enolase and trophozoites expressing a random 12 amino acids peptide followed by a NLS [30] which were used as control (NLS-Con transfectants) PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

Figure 6. The influence of 2-PG on enolase inhibitory effect over Dnmt2 tRNA MT activity. A. Measure of the hDnmt2 tRNA methyltransferase activity in presence of enolase and increasing concentrations of 2 phosphoglycerate (2-PG). The activity of hDnmt2 measured in the presence of 7 mM 2-PG was regarded as 100%. As already reported enolase strongly inhibits hDnmt2 in absence of 2-PG. The activity of hDnmt2 in presence of enolase is restored by 2-PG in a dose dependent manner. The results represent the mean and standard deviation of three independent experiments (Pvalue,0.05). B. In vitro interaction between hDnmt2 and enolase in the presence of 7 mM 2-PG. 35S labeled enolase (TNT-Eno) was incubated respectively with glutathione beads coated with GST or GST- hDnmt2 in presence or absence of 2-PG (7 mM). The pull down products was detected by exposure of the membrane to an x ray film. According to Tina densitometry analysis around 4 times less Enolase was pull down by hDnmt2 when 2-PG was present in the reaction. doi:10.1371/journal.ppat.1000775.g006

PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

Figure 7. The effect of enolase accumulation in the nucleus over Dnmt2 tRNA and DNA MT activity. A. Western blot analysis of cytoplasmatic and nuclear protein fractions prepared from E.histolytica pJST4-Ehmeth trophozoites grown without glucose for (0, 6, 9 and 12 hours or for 12 hours of starvation followed by 12 hours of growth in presence of 1% glucose). Proteins were separated on 12% SDS-PAGE and analyzed by western blot with an anti HA antibody, an anti enolase antibody, or an anti Myosin II antibody. This figure is representative of at least three independent experiments. B. Effect of glucose starvation and continuous forced expression of enolase in the nucleus on the level of tRNA methylation in the parasite. RNA samples from trophozoites grown in regular (control), glucose starvation media (glucose starvation), NLS-Con trophozoites and NLS-Eno trophozoites were used as substrates for in vitro tRNA methylation assay performed with hDnmt2 (see materials and methods). The amount of methyl group incorporate in control RNA was taken as 100%. The significant higher amount of methyl group incorporated in RNA prepared from glucose starved trophozoites (38% increases) and NLS-Eno trophozoites (250% increases) indicates the tRNA present in this sample were less methylated. The results represent the mean and standard deviation of three independent experiments (Pvalue,0.05). C. RT PCR analysis of the tRNAasp amount in trophozoites grown in regular (control) and glucose starvation media (glucose starvation). The amount of rDNA was used for the normalization of the data. D. Effect of glucose starvation and continuous forced expression of enolase in the nucleus on the level of m5C methylation in the parasite. Genomic DNA was prepared from trophozoites grown in regular (control), glucose starvation media (glucose starvation), NLS-Con trophozoites and NLS-Eno trophozoites and dot blotted on nitrocellulose membrane in the indicated amounts. Genomic DNA from calf thymus (CT) or PCR product (PCRP) were used as positive and negative controls respectively. DNA methylation was detected with an antibody directed against 5-methylcytosine (a5mc) and the total amount of DNA was estimated by hybridization with a radioactive probe against rDNA. doi:10.1371/journal.ppat.1000775.g007

significance of enolase presence in the nucleus. The results of our investigations on the nuclear role of enolase suggest that it is a Dnmt2 inhibitor. The results from several recent studies have fuelled the debate on whether Dnmt2 is a DNA methyltransferase, a tRNA methyltransferase, or both. The results of our investigation support the notion that E.histolytica Dnmt2 (Ehmeth) is a DNA methyltransferase and a tRNA methyltransferase. Indeed, this is the first report of Dnmt2 being a tRNA methyltransferase in lower eukaryotes. Enolase has been reported to bind the bacteriophagespecific DNA adenine methyltransferase M.EcoT1. Interestingly, enolase binding to M.EcoT1 did not influence M.EcoT1 catalytic activity [36]. The domain IV of Ehmeth includes the catalytic sites, and is widely conserved among DNA-(cytosine-C 5)-methyltransferase. The binding of enolase to the domain IV of Ehmeth is probably the main mechanism of its inhibitory action. Dnmt2 methylates tRNA using a DNA methyltransferase-like catalytic mechanism [6]. Therefore, it is not surprising that the binding of enolase to Ehmeth interferes with both EhMRS2 DNA recognition and tRNAAsp MT activity. In S. cerevisiae, enolase interacts with cytosolic tRNALys in order to enable its translocation into the mitochondria, thereby displaying a function as a tRNA chaperone [37]. Our data showed that enolase does not interact with either DNA or tRNAAsp, thereby excluding competition as a mechanism to explain its Dnmt2 inhibitory activity. Only a few proteins have been reported to interact with the C-terminal domain, which contains the catalytic site for Dnmts. The P23 protein is a protein

towards elucidating our understanding of Dnmt2 functions. Enolase, a glycolytic enzyme that catalyses the conversion of 2PG to phosphoenolpyruvate, (PEP) is to the best of our knowledge the first Dnmt2-interacting protein to be described. For many years, glycolytic enzymes have been considered to be housekeeping cytoplasmatic proteins. Based on the results of studies on the function(s) of the glyceraldehyde-3-phosphate dehydrogenase, this concept has changed, and it is now well accepted that some of these enzymes that includes enolase, are multifunctional proteins which are involved in gene transcription, DNA replication, DNA repair, and nuclear RNA export (for review see [31]). The inability to select in complex growth media mutants of Bacillus subtilis [32], Escherichia coli [32] and E.histolytica enolase (data not shown) supports this multifunctional role. The catalytic activity of enolase in E. histolytica has been characterized [22], and it was found to be co-secreted with serpin and aldehyde alcohol dehydrogenase by activated trophozoites [23]. Indeed, antibodies against enolase have been detected in patients with amebiasis, and this suggests that enolase plays a role in the virulence of the parasite [33]. Such a role has been already reported in bacteria where enolase binds plasminogen [34]. The results of this investigation show that enolase is present in the cytoplasm and nucleus of E.histolytica. This ubiquitous localization is not unique to E. histolytica. In mammals, there are three isoforms of enolase (for review [35]), and each is characterized by its tissue distribution and expression. In HeLa cells, A. thaliana, and Plasmodium yoelii, enolase was found also in the nucleus. These observations raise the question about the PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

medium that has been prepared without glucose (glucose concentration 31 mg/l). Recovery from glucose starvation was done by direct addition of 1% glucose to the culture of starved parasites. Escherichia coli strain BL21 (DE3): F2 ompT gal dcm lon hsdSB(rB2 mB2) l(DE3 [lacI lacUV5-T7 gene 1 ind1 sam7 nin5]) Saccharomyces cerevisiae strain Y190: MATa, gal4 gal180 his3 trp1–901 ade2–101 ura3–52 leu2–3, 2112 + ura3::GALRlacZ, LYS2: GAL(UAS)RHIS3 cyhr

that is associated with steroid receptor complexes binds to the Cterminal of Dnmt1 [38]. However, its effect on Dnmt1 activity is still unclear. In contrast, p53 has been shown to stimulate Dnmt1 activity in vitro by binding to the C-terminal of Dnmt1 [39]. This last example together with our findings reinforce the notion that catalytic activity of Dnmt protein can be modulated by proteins that interact with their C-terminal. The accumulation of enolase in the nucleus and the formation of an additional Ehmeth-enolase complex following glucose starvation support a central role for glucose metabolism in the regulation of Ehmeth activity. Glucose starvation was preferred to drugs in order to inhibit glycolysis because (i) one of the unwanted action of such drugs is the inhibition of proteasome activity [40], and (ii) the physiological relevancy of glucose starvation during Entamoeba differentiation [41]. Metabolites can act as sensors of the cell energy status. Therefore, they are convenient regulators of enzymes under conditions of physiological stress such as glucose starvation. For example, glucose starvation affects the activation or silencing of rRNA expression [42]. Glucose starvation led to significant TrnaAsp demethylation, but not to DNA demethylation. In contrast, forced expression of enolase in the nucleus led to both DNA and tRNAAsp demethylation. In mammals, active DNA demethylation is controversial [43]. Recently, a convincing mechanism of active DNA demethylation in which DNA glycosylase act as DNA demethylases through a base-excision-repair pathway has been proposed [44]. There is no evidence that active DNA demethylation occurs in E.histolytica. Passive demethylation occurs when DNA methylation is progressively reduced with cell division [45]. The generation time of the parasite is eight hours, and this would make it unlikely that DNA demethylation will occur following 12 hours of glucose starvation. However, this passive mechanism of DNA demethylation has probably occurred in the enolase-NLS strain during the numerous divisions of this strain. In contrast, the turnover of tRNA is much faster, and allows for rapid passive demethylation [46]. The physiological meaning of the Dnmt2-mediated methylation on tRNAAsp is still unknown. tRNA methylation has been involved in the control of tRNA stability [47,48]. In S. cerevisiae, Trm9 mediated tRNA methylation is linked to the translation enhancement of genes related to stress response, DNA damage and other cellular functions [49,50]. Mitochondrial tRNA methylation mediated by Trm 5 was shown to regulate mitochondrial protein synthesis [51]. These different functions for tRNA methylation represent an interesting starting point for further research on the role of tRNAAsp methylation in E.histolytica. To conclude, the results of this investigation provide in vivo and in vitro evidence that establishes enolase as the first Dnmt2 interacting protein. Moreover, our results also provide strong evidence that link glucose metabolism and Dnmt2 activity. In addition, we have also shown that Dnmt2 is a tRNA methyltransferase in lower eukaryotes. The question of the significance of enolase-Dnmt2 interaction is higher eukaryotes needs further investigation.

DNA constructs used for: Yeast two-hybrid screen. An expression library of random primed c-DNA from E.histolytica was prepared by Vertis Biotechnologie AG (Germany), and cloned in the pACT2 vector downstream to the GAL4 activation domain. Ehmeth was amplified by PCR from E. histolytica genomic DNA using the primers Ehmeth Bam and Ehmeth39 (Table 1), and then cloned in the pGEMT easy vector (Promega). The resultant vector was digested with BamHI and SalI, and the Ehmeth insert was then subcloned upstream to the GAL4 binding domain into the pAS1 plasmid that was previously linearized using BamHI and SalI (pGAL4-BD-Ehmeth). In vitro translation. Ehmeth was amplified from E.histolytica genomic DNA by PCR using the primers Ehmeth start and Ehmeth39 (Table 1), and then cloned in pGEM– T–easy vector (pGEMT-Ehmeth). In order to serve as DNA template in the in vitro translation assay (TNT), Ehmeth was amplified from pGEMTEhmeth by PCR using the primers Ehmeth Kozak and Ehmeth 39 (Table 1). Truncated Ehmeth 1–103 (from amino acid 1 to 103) was amplified from pGEMT-Ehmeth by PCR and the primers Ehmeth Kozak and Ehmeth 310 were used in order to serve as DNA template in the TNT system (Table 1). Truncated Ehmeth 88–322 (from amino acid 88 to 322) was amplified from pGEMTEhmeth by PCR using the primers Ehmeth 265 Kozak and Ehmeth 39 (Table 1) in order to serve as DNA template in the TNT system. The deletion of the 45 nucleotides that encoded Ehmeth amino acids 88–103 was done as follows: Ehmeth 1–88 was first amplified from pGEMT-Ehmeth by PCR using the primers Ehmeth start and Ehmeth 265 Bgl, and the PCR product was then cloned in pGEM–T–easy vector (pGEMT-Ehmeth1–88). Ehmeth 103–322 was amplified from pGEMT-Ehmeth by PCR using the primers Ehmeth310 Bgl and Ehmeth 39, and the PCR product was cloned in pGEM– T–easy vector (pGEMT-Ehmeth103–322). The two plasmids, pGEMT-Ehmeth(88–322) and pGEMT-Ehmeth(1–88) were digested with Bgl II and EcoRI, and the Ehmeth DNA fragments were ligated using T4 DNA ligase (Biolabs). The product of the ligation was used as DNA template, and then amplified with the primers Ehmeth start and Ehmeth 39. The resultant PCR product was cloned in a pGEM–T–easy vector (pGEMT-EhmethD(88–103)). EhmethD(88–103) was amplified by PCR using the primers Ehmeth Kozak and Ehmeth39 in order to serve as DNA template for the TNT system. The primers hDnmt2 59 and hDnmt2 39 were used for the amplification of hDnmt2 from a cDNA clone HGNC:2977 (Open Biosystems) and then cloned in a pGEEM- T easy vector (pGEMT-hDnmt2) following for use as template in TNT system hDnmt2 was amplified from pGEMT-hDnmt2 by PCR with primers TNT- hDnmt2 and hDnmt2 39. Expression of recombinant proteins in E.coli. For the expression of the recombinant GST fusion proteins, Ehenolase was amplified from genomic DNA by PCR using the primers GST-Enolase and Enolase Bgl II 39 (Table 1). The PCR product

Materials and Methods Microorganisms used in this study Trophozoites of the E. histolytica strain HM-1:IMSS were grown under axenic conditions in Diamond’s TYI-S-33 medium (glucose concentration 750 mg/l) at 37uC. Trophozoites in the log phase of growth were used in all experiments. For the glucose starvation assays, trophozoites in the exponential phase of growth were washed three times and transferred to Diamond’s TYI-S-33 PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

Table 1. Primers used in this study.

Primer name

Sequence

Direction

Restriction site - underlined

Ehmeth kozak Ehmeth 39

GGATCCTAATACGACTCACTATAGGGAGCCACCATGGAACAGAAACAAGT

Sense

BamHI

TTATTCTTTTAAGTCATCGAATAAA

Antisense

GST Enolase

GGCGGATCCATGTCAATTCAAAAGGTTC

Sense

BamH I

Enolase 39

TATAGATCTTTAAGCAGTTGAATTTCTC

Antisense

Bgl II

Ehmeth(305)

TTAAATATTATAAATTTCTTTAAAAAC

Antisense

Eno dro sma

ATCCCGGGAATGACCATCAAAGCGATCAAGG

Sense

Eno dro

TTAAGCAGTTGAATTTCTCCAGTT

Antisense

Ehmeth 265 kozak

GGATCCTAATACGACTCACTATAGGGAGCCACCATGTCTAAACATAAAGA

Sense

BamHI

Ehmeth 265 39

ATAGATCTTATTGAATTATTATATGGTTGA

Antisense

Bgl II

Ehmeth 310 39

TTAAATATTATAAATTTCTTTAAAAAC

Antisense

DNMT2 Kozak

GGATCCTAATACGACTCACTATAGGGAGCCACCATGGTATTTCGGGTCTT

sense

EhMRS2 5

GATTTTATTATATTTATTAATGTTTGA

sense

EhMRS2 3

GATCCCATACAAAAATAATTACA

antisense

Dnmt2 39

ATAGCAAATATGTTGTATTTTGTTTTA

antisense

Sma I

BamHI

Dnmt2 59

ATGGTATTTCGGGTCTTAGAACTATT

sense

Ehmeth Bam

TATGGATCCAACAGAAACAAGTAAATG

sense

BamH1

Enolase Bgl 39

TATAGATCTTTAAGAGTTGAATTTCTC

antisense

Bgl II

Ehmeth start

ATGCAACAGAAACAAGTAAATGTTAT

sense

Ehmeth kpn

TATGGTACCATGCAACAGAAACAAGTA

sense

KpnI

Ehmeth Bgl

TAGATCTCTTTTAAGTCATCGAATA

antisense

Bgl II

hDnmt2 Bam

GGCGGATCCATGGAGCCCCTGCGGGTGCT

Sense

hDnmt2 39

TTATTCATATAAGATTTTGATTAGT

Antisense

hDnmt2 kozak

GGATCCTAATACGACTCACTATAGGGAGCCACCATGGAGCCACTGCGGGT

Sense

dtRNA

TGGCGCCCAACGTGGGGCTC

Antisense

CGCGCGAAGCTTAATACGACTCACTATA

sense

TNT Eno

GGATCCTAATACGACTCACTATAGGGAGCCACCATGTCAATTCAAAAGGT

sense

Enolase kpn 59

ATGGTACCATGTCAATTCAAAAGGTTC

sense

KpnI

Enolase NLS

GGATCCTTATCCAACCTTTCTTTTCTTTTTTGGTCCAGATCTAGCAGTTGAATTTCTCCAGTTCTTTCC

antisense

BamH1

doi:10.1371/journal.ppat.1000775.t001

constitutive expression vector pEhNEO/CAT [53], which had been previously linearized by digestion with KpnI and BamHI. The pScramblePept3 plasmid that was previously used to express a scramble peptide fused to a NLS sequence in E.histolytica [30] was used as control. The transfection of E. histolytica trophozoites was performed as described in [52].

was then cloned in a pGEM-T easy vector (pGEM- Ehenolase), digested with BamHI and Not I, and then subcloned into the pGEX-4T1 vector (Amersham Pharmacia Biotech) that was previously linearized using BamHI and Not I. The preparation of Ehmeth-GST was done as previously described [12]. The primers hDnmt2-Bam and hDnmt2 39 (Table 1) were used for the amplification of hDnmt2 from pGEM-hDnmt2. The PCR product was then cloned in a pGEM-T easy vector, digested with BamHI and Not I, and then subcloned into the pGEX-4T1 vector that was previously linerarized with BamH1 and Not I.

Two hybrid analysis S. cerevesiae Y190 was transformed with pGAL4-BD-Ehmeth (500 mg) using the LiAc transformation method [54]. The pGAL4-BD-Ehmeth strain was transformed with E.histolytica cDNA library (500 mg), and the transformants were then selected for their ability to grow on selective media that lacked leucine and tryptophan for four days at 30uC. After this first round of selection, the resistant clones were plated on a more selective media that lacked leucine, tryptophan, histidine, and adenine, and then grown for five days at 30uC. Fifteen resistant clones were then selected for further analysis. From these clones pACT2 vectors that contained cDNA inserts from E.histolytica library were isolated, and then transformed in the pGAL4-BD-Ehmeth strain. After the third round of selection, only two clones were able to grow on the selective media that lacked leucine, tryptophan, histidine, and adenine.

Expression of CHH tagged Ehmeth in E.histolytica.

Ehmeth was amplified by PCR with the primers Ehmeth kpn and Ehmeth Bgl, and then cloned in the pJST4 expression vector (kindly provided by Prof. Lohia, Department of Biochemistry, Bose Institute, India) that was previously linearized with Kpn I and Bgl II. This vector allows the expression of a calmodulin binding domain, HA, His (CHH)-tagged protein in E. histolytica whose expression is driven by an actin promoter. The transfection of E. histolytica trophozoites was performed in the identical manner as previously described [52]. Expression E.histolytica.

NLS

enolase

and

NLS

control

Enolase was PCR amplified using primers Enolase kpn and Enolase NLS 39 and cloned into the PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

resuspended in 2 ml of PBS. The trophozoites were lysed by freezing and then thawing to produce a total protein lysate. The pyruvic acid level in trophozoite lysates was determined according to a previously described method [56]. Briefly, 1 ml of 2,4dinitrophenylhydrazine (DNPH) (0.0125% in 2 N HCL) was added to 1 ml of trophozoite lysate. After 15 minutes of incubation at 37uC in a water bath, the sample was removed from the water bath, and 5 ml 0.6 N NaOH was added. The absorbance of the sample was then measured in a spectrophotometer at 420 nm. A standard curve was generated using sodium pyruvate [56].

In vitro transcription/translation Coupled transcription and translation was carried out using a T7 TNT in vitro transcription/translation kit (Promega) in accordance with the manufacturer’s instructions.

Expression and purification of the recombinant proteins in E. coli BL21 For the expression of the different GST-recombinant proteins, E. coli BL-21 that were transfected with the corresponding vectors were grown overnight in Luria Broth (LB) medium that contained 100 mg/ml ampicillin. The pre-cultures were inoculated (1:100) with 2xYT medium that was supplemented with 100 mg/ml ampicillin, and grown for about two hours at 37uC until the OD600 reached 0.8. Induction of the fusion protein was initiated by adding isopropyl-beta-D-thiogalactopyranoside (IPTG) at a final concentration of 0.5 mM to the growing culture. After a fourhour incubation at 30uC, the bacteria were harvested in lysis buffer (100 mM KCl, 1 mM DTT, 1 mM PMSF, 100 mg/ml Lysozyme and Leupeptine 100 mg/ml in PBS), and then sonicated for five minutes with 30 seconds of pulses with 30 seconds between each pulsation session. The lysis was completed by addition of BugBuster protein extraction reagent (1:100) (Novagen). The recombinant GST-proteins were purified under native conditions on a gluthatione-agarose resin (Sigma). Aliquots of GST fusion proteins that were bound to the glutathione-agarose beads were conserved at 270uC for the pull-down assay. The remaining recombinant proteins were then eluted with glutathione elution buffer (Tris HCl 50 mM pH 8.0, glutathione (Sigma) 10 mM), and their concentration was measured by Bradford’s method [55].

Production of anti-enolase antibody Male BALB/c mice were injected intraperitoneally with 100 mg of GST-Enolase recombinant protein that was emulsified in complete Freund’s adjuvant. Two and four weeks later, the mice were injected with 100 mg of the recombinant protein in incomplete Freund’s adjuvant. One week after the 4-week injection, about 0.8 ml of sera was obtained by retro-orbital puncture. Serum that was obtained from mice that were not injected with recombinant protein was used as the control.

Microscopic localization of enolase in trophozoites Trophozoites in a logarithmic growth phase were harvested, transferred to 8 mm round wells on glass slides, and then incubated for 30 min at 37uC in order to allow them attach to the glass surface. An indirect immunofluorescence assay was performed. For this purpose, the amebae were fixed with cold methanol for 20 min at 220 C, and then incubated with 1:400 enolase antibody for one hour at room temperature. After washing, the samples were then incubated with goat Cy3conjugated anti-mouse (Jackson ImmunoResearch) 1:1000 for one hour. Samples were then stained with 4,6-diamidino-2phenylindole dihydrochloride (DAPI,Sigma) in order to visualize the nuclei. Fluorescent images were captured by a CCD camera attached to an Axioscop2 (Zeiss) epifluorescence microscope with a 100/1.30 Plan Neofluar oil immersion objective and a differential interference contrast filter. The images were analyzed with ImagePro@Plus software (Media Cyberneticx, USA).

GST pull-down assay Gluthatione sepharose beads that were coated with GSTEnolase, or GST alone (20–50 mg) were incubated with in vitro translated [35S]-methionine-labeled proteins (15 ml of the TNT reaction) in a final volume of 500 ml pull-down buffer (20 mM Hepes pH 7.9, 100 mM NaCl, 1 mM DTT, 6 mM MgCl2, 20% glycerol, 1% Nonidet P40 and 0.5 mM EDTA) for one hour at room temperature. The beads were then centrifuged at 3000 rpm for five minutes, washed three times with the pull-down buffer, and then incubated at 100uC in presence of 25 ml Laemmli sample buffer for five minutes. Interacting proteins were resolved on 12% SDS-polyacrylamide gel electrophoresis or 15% SDS-polyacrylamide gel when TNT-Ehmeth (1–88) protein was used. The resultant bands were visualized after staining with Coomassie blue, drying and autography exposure.

Immunoprecipitation assays Aliquots of nuclear protein fraction (50 mg) were diluted in 20 mM Hepes pH 7.5, 150 mM NaCl, 0.1% Triton, 10% glycerol (HNTG buffer) (300 ml), and then incubated with protein G beads (Sigma) (10 ml) for 30 minutes at 4uC. Non-specific interacting proteins were excluded by centrifugation (3000 rpm at 4uC for 5 minutes). The supernatant was incubated with either 1:200-HA antibody or enolase antibody) for two hours at 4uC. Following incubation protein G-Sepharose beads (20 ml) were added to the samples which were then incubated for 16 hours at 4uC. Immunoprecipitated proteins were collected by centrifugation, washed three times with HNTG buffer, and then resolved by 12% SDS-polyacrylamide gel electrophoresis. The proteins were then transferred to nitrocellulose membranes by western blot analysis, and detected with the relevant antibody, mouse anti-enolase or rabbit anti-HA.

Trophozoites fractionation E.histolytica trophozoites nuclear and cytoplasmatic fractions were prepared in the identical manner as previously described [24]. Proteins were resolved on 12% SDS-polyacrylamide gel electrophoresis, and then transferred to nitrocellulose membranes. Blots were then blocked (3% skim milk powder), and then reacted with either 1:500 enolase antibody (Santa Cruz Biotechnology) or with 1:500 HA antibody (Santa Cruz Biotechnology). After incubation with the first antibody, the blots were incubated with 1:5000 corresponding second antibody (Jackson ImmunoResearch), and then developed by enhanced chemoluminescence.

Preparation of the EhMRS probe EhMRS2 was amplified from E.histolytica genomic DNA by PCR and the primers EhMRS2 5 and EhMRS2 3. EhMRS2 DNA (10 pmol) was end-labeled with T4-polynucleotide kinase (New England Biolabs) and c-ATP in accordance with the manufacturer’s recommendations. Unincorporated c-ATP was removed with the ProbeQuant kit (Amersham).

Determination of pyruvic acid in the lysate of control and glucose-starved trophozoites Trophozoites (106) that were grown in regular or glucosedeficient media were washed three times with PBS, and then PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

Examination of the effect of enolase on the binding of Ehmeth and hDnmt2 to EhMRS2 DNA

In vitro tRNA methylation assay Aliquots (40 pmol) of Drosophila tRNAAsp were incubated with 0.4 nmol Ehmeth or 0.04 nmol GST-hDnmt2 for three hours at 37uC in 40 ml of methylation buffer (100 mM Tris/HCl at pH 7.5, 5% glycerol, 5 mM MgCl2, 1 mM DTT, and 100 mM NaCl) that contained 4.2 mM labeled [methyl-3H] AdoMet (NEN). When we examined the effect of enolase on Ehmeth activity, GST-Enolase (2 nmol) or GST as negative control (2 nmol) were incubated with Ehmeth for one hour at 37uC. When we examined the effect of enolase on hDnmt2 activity, GST-Enolase (0.4 nmol) or GST (0.4 nmol) were incubated respectively with hDnmt2 for one hour at 37uC. Samples (8 ml) were taken from reaction mix (40 ml) at different times, and loaded on Whatman filters. The filters were then washed with 10% Trichloroacetic Acid Solution (TCA) three times and finally with 100% ethanol. After washing the filters were air-dried and transferred into tubes following addition of 3 ml scintillation liquid (CytoScint). The incorporated radioactivity was measured in a scintillation counter (Counter Beta Tri-Carb 2100TR). tRNA methyltransferase activity (one unit (U)) was expressed as the incorporation of 1 pmol AdoMet per hour per nmol of protein. In vitro tRNA methylation assay in the presence of 2 phosphoglycerate (2-PG) (Fluka) was done in the identical manner with minor modifications. Increasing 2-PG concentrations (1–7 mM) were incubated with GST-Enolase (0.4 nmol) and with hDnmt2 (0.04 nmol). The activity of hDnmt2 in the presence of 7 mM 2PG was used as control.

Gluthatione sepharose beads that were coated with GSTEhmeth, GST-hDnmt2, or GST alone (35 mg) were incubated in 100 ml blocking buffer (3% BSA and salmon sperm DNA 1 mg/ml in standard binding buffer (20 mM Tris-HCl pH 8, 50 mM NaCl, 1 mM EDTA in double distilled water) for 30 minutes at room temperature. Following blocking the beads were washed three times with standard binding buffer, and incubated with either 40 mg or 60 mg of GST- Enolase for one hour at room temperature (100 ml final reaction volume). The probe (0.3 mg) was then added, and binding was carried out at 4uC overnight. Subsequently, the beads were washed three times in standard binding buffer, boiled with 25 ml Laemmli sample buffer for 5 minutes; proteins were separated on 10% SDS-polyacrylamide gel electrophoresis. The signal of the proteins that were bound to the labeled DNA probe was detected directly from the polyacrylamide gel on X-ray film (Fuji).

tRNA preparation The methylation assay of tRNAAsp with the DNMT2 variants was performed using a previously described method [6]. Briefly, the DNA template that encoded Drosophila tRNAAsp was amplified by PCR and the T7 primer and tRNAAsp primer. For in vitro transcription, 100 ml of the PCR reaction were incubated with 200 ml 26 transcription buffer (80 mM Tris-HCl at pH 8.1, 2 mM Spermidine, 10 mM DTT, 0.02% Triton-X-100, 60 mM MgCl2, 4 mg/ml BSA), 5 mM of each NTP (final concentration), and 10 ml of T7-Polymerase (200 units/ml; Fermentas) in a final volume of 400 ml for three hours at 37uC. Transcripts were purified over 12% denaturing PAGE, and bands of correct size were excised, eluted in 0.5 M ammonium acetate, and precipitated with two volumes of 100% ethanol. After centrifugation, RNA pellets were washed once with 80% ethanol, and then dissolved in double distilled water. The concentration of tRNA was measured with a nanodrop spectrophotometer.

In vitro methylation assay of total RNA Total RNA was prepared with the TRI-Reagent kit (Sigma) from control or glucose-starved trophozoites and treated with DNase I to remove any contamination of DNA. Aliquots from the treated RNA (20 mg), were used as substrates for hDnmt2 in vitro tRNA methylation assay (see above protocol). The amount of methyl groups that was incorporated by hDnmt2 into the tRNA of each sample is proportional to the amount of unmethylated tRNA in the control sample.

Matrix-assisted laser-desorption/ionization – time of flight (MALDI-TOF) mass spectrometry analysis

Accession numbers of genes and proteins mentioned in the text

Protein bands of interest were excised from the SDS-polyacrylamide gel and digested with trypsin using a previously published protocol [57], and then analyzed by MALDI-TOF mass spectrometry analysis that was done at the Institute of Biology, Technion, Israel. The peptide mass profiles that were produced by MALDITOF mass spectrometry were processed using PepMiner (this software is described at http://www.haifa.il.ibm.com/projects/ verification/bioinformatics/). Peptides masses were compared with the theoretical masses that were derived from the sequences that were in the SWISS-PROT/TrEMBL (http://www.expasy.ch/ sprot/), the NCBI (http://www.ncbi.nlm.nih.gov/), and the E. histolytica genome project databases (http://pathema.jcvi.org/cgibin/Entamoeba/GenomePage.cgi?org = eha2).

E.histolytica enolase: XP_649161.1, Ehmlbp: XP_649236, Ehmeth: XP_655267.2, Myosin II: XM_651936.1, hDNMT2: NP_004403.1

Supporting Information Figure S1 Mass spectrometry analysis of the retarded band observed following incubation of Ehmeth with EhMRS2 DNA. Found at: doi:10.1371/journal.ppat.1000775.s001 (0.49 MB TIF)

Author Contributions Conceived and designed the experiments: AT SA. Performed the experiments: AT RST SA. Analyzed the data: AT MH SA. Contributed reagents/materials/analysis tools: RG MH. Wrote the paper: AT SA.

References 1. Spada F, Rothbauer U, Zolghadr K, Schermelleh L, Leonhardt H (2006) Regulation of DNA methyltransferase 1. Adv Enzyme Regul 46: 224–234. 2. Jeltsch A (2006) Molecular enzymology of mammalian DNA methyltransferases. Curr Top Microbiol Immunol 301: 203–225. 3. Dong A, Yoder JA, Zhang X, Zhou L, Bestor TH, et al. (2001) Structure of human DNMT2, an enigmatic DNA methyltransferase homolog that displays denaturant-resistant binding to DNA. Nucleic Acids Res 29: 439– 448.

PLoS Pathogens | www.plospathogens.org

4. Goll MG, Kirpekar F, Maggert KA, Yoder JA, Hsieh CL, et al. (2006) Methylation of tRNAAsp by the DNA methyltransferase homolog Dnmt2. Science 311: 395–398. 5. Hengesbach M, Meusburger M, Lyko F, Helm M Use of DNAzymes for sitespecific analysis of ribonucleotide modifications. RNA 14(1): 180–187. 6. Jurkowski T, Meusburger M, Phalke S, Helm M, Nellen W, et al. (2008) Human DNMT2 methylates tRNA(Asp) molecules using a DNA methyltransferase-like catalytic mechanism. RNA 14(8): 1663–1670.

February 2010 | Volume 6 | Issue 2 | e1000775

Enolase Interacts with Dnmt2

32. Commichau FM, Rothe FM, Herzberg C, Wagner E, Hellwig D, et al. (2009) Novel activities of glycolytic enzymes in Bacillus subtilis: interactions with essential proteins involved in mRNA processing. Mol Cell Proteomics 8: 1350–1360. 33. Carrero JC, Petrossian P, Acosta E, Sanchez-Zerpa M, Ortiz-Ortiz L, et al. (2000) Cloning and characterization of Entamoeba histolytica antigens recognized by human secretory IgA antibodies. Parasitol Res 86: 330–334. 34. Pancholi V, Fischetti VA (1998) alpha-enolase, a novel strong plasmin(ogen) binding protein on the surface of pathogenic streptococci. J Biol Chem 273: 14503–14515. 35. Pancholi V (2001) Multifunctional alpha-enolase: its role in diseases. Cell Mol Life Sci 58: 902–920. 36. Gassner C, Schneider-Scherzer E, Lottspeich F, Schweiger M, Schweiger M, et al. (1998) Escherichia coli bacteriophage T1 DNA methyltransferase appears to interact with Escherichia coli enolase. Biol Chem 379: 621–623. 37. Entelis N, Brandina I, Kamenski P, Krasheninnikov I, Martin R, et al. (2006) A glycolytic enzyme, enolase, is recruited as a cofactor of tRNA targeting toward mitochondria in Saccharomyces cerevisiae. Genes and Development 5: 1609–1620. 38. Zhang X, Verdine GL (1996) Mammalian DNA cytosine-5 methyltransferase interacts with p23 protein. FEBS Lett 392: 179–183. 39. Esteve P, Chin H, Pradhan S (2005) Human maintenance DNA (cytosine-5)methyltransferase and p53 modulate expression of p53-repressed promoters. Proc Natl Acad Sci U SA 102: 1000–1005. 40. Kang Ht Fau - Hwang ES, Hwang ES (2006) 2-Deoxyglucose: an anticancer and antiviral therapeutic, but not any more a low glucose mimetic. . 41. Thepsuparungsikul V, Seng L, Bailey GB (1971) Differentiation of Entamoeba: encystation of E. invadens in monoxenic and axenic cultures. J Parasitol 57: 1288–1292. 42. Grummt I, Ladurner AG (2008) A metabolic throttle regulates the epigenetic state of rDNA. Cell 133: 577–580. 43. Ooi SK, Bestor TH (2008) The colorful history of active DNA demethylation. Cell 133: 1145–1148. 44. Gehring M, Reik W, Henikoff S (2009) DNA demethylation by DNA repair. Trends Genet 25: 82–90. 45. Morgan HD, Santos F, Green K, Dean W, Reik W (2005) Epigenetic reprogramming in mammals. Hum Mol Genet 14 Spec No 1: R47–58. 46. Schlegel R, Iversen P, Rechsteiner M (1978) The turnover of tRNAs microinjected into animal cells. Nucleic Acids Res. pp 3715–3729. 47. Mj J, Bystrom AS (2002) Dual function of the tRNA(m(5)U54)methyltransferase in tRNA maturation. RNA 8: 324–335. 48. Studte P, Zink S, Jablonowski D, Bar C, von der Haar T, et al. (2008) tRNA and protein methylase complexes mediate zymocin toxicity in yeast. Mol Microbiol 69: 1266–1277. 49. Begley U, Dyavaiah M, Patil A, Rooney JP, DiRenzo D, et al. (2007) Trm9catalyzed tRNA modifications link translation to the DNA damage response. Mol Cell 28: 860–870. 50. Jablonowski D, Zink S, Mehlgarten C, Daum G, Schaffrath R (2006) tRNAGlu wobble uridine methylation by Trm9 identifies Elongator’s key role for zymocininduced cell death in yeast. Mol Microbiol 59: 677–688. 51. Lee C, Kramer G, Graham DE, Appling DR (2007) Yeast mitochondrial initiator tRNA is methylated at guanosine 37 by the Trm5-encoded tRNA (guanine-N1-)-methyltransferase. J Biol Chem 282: 27744–27753. 52. Fisher O, Siman-Tov R, Ankri S (2006) Pleiotropic phenotype in Entamoeba histolytica overexpressing DNA methyltransferase (Ehmeth). Mol Biochem Parasitol 147: 48–54. 53. Hamann L, Nickel R, Tannich E (1995) Transfection and continuous expression of heterologous genes in the protozoan parasite Entamoeba histolytica. Proc Natl Acad Sci U SA 92: 8975–8979. 54. Gietz R, Schiestl R (2007) High-efficiency yeast transformation using the LiAc/ SS carrier DNA/PEG method. Nat Protoc 2: 31–34. 55. Bradford MM (1976) A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72: 248–254. 56. Anthon GE, Barrett DM (2003) Modified method for the determination of pyruvic acid with dinitrophenylhydrazine in the assessment of onion pungency. Journal of the Science of Food and Agriculture 83: 1210–1213. 57. Shevchenko A, Wilm M, Vorm O, Mann M (1996) Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Anal Chem 68: 850–858.

7. Okano M, Xie S, Li E (1998) Dnmt2 is not required for de novo and maintenance methylation of viral DNA in embryonic stem cells. Nucleic Acids Res 26: 2536–2540. 8. Kunert N, Marhold J, Stanke J, Stach D, Lyko F (2003) A Dnmt2-like protein mediates DNA methylation in Drosophila. Development 130: 5083–5090. 9. Phalke S, Nickel O, Walluscheck D, Hortig F, Onorati MC, et al. (2009) Retrotransposon silencing and telomere integrity in somatic cells of Drosophila depends on the cytosine-5 methyltransferase DNMT2. Nat Genet 41: 696–702. 10. Katoh M, Curk T, Xu Q, Zupan B, Kuspa A, et al. (2006) Developmentally regulated DNA methylation in Dictyostelium discoideum. Eukaryot Cell 5: 18–25. 11. Kuhlmann M, Borisova BE, Kaller M, Larsson P, Stach D, et al. (2005) Silencing of retrotransposons in Dictyostelium by DNA methylation and RNAi. Nucleic Acids Res 33: 6405–6417. 12. Fisher O, Siman-Tov R, Ankri S (2004) Characterization of cytosine methylated regions and 5-cytosine DNA methyltransferase (Ehmeth) in the protozoan parasite Entamoeba histolytica. Nucleic Acids Res 32: 287–297. 13. Jeltsch A, Nellen W, Lyko F (2006) Two substrates are better than one: dual specificities for Dnmt2 methyltransferases. Trends Biochem Sci 31: 306–308. 14. Mund C, Musch T, Strodicke M, Assmann B, Li E, et al. (2004) Comparative analysis of DNA methylation patterns in transgenic Drosophila overexpressing mouse DNA methyltransferases. Biochem J 378: 763–768. 15. Pinarbasi E, Elliott J, Hornby DP (1996) Activation of a yeast pseudo DNA methyltransferase by deletion of a single amino acid. J Mol Biol 257: 804–813. 16. Rountree MR, Bachman KE, Baylin SB (2000) DNMT1 binds HDAC2 and a new co-repressor, DMAP1, to form a complex at replication foci. Nat Genet 25: 269–277. 17. Fuks F, Burgers WA, Godin N, Kasai M, Kouzarides T (2001) Dnmt3a binds deacetylases and is recruited by a sequence-specific repressor to silence transcription. Embo J 20: 2536–2544. 18. Turek-Plewa J, Jagodzinski PP The role of mammalian DNA methyltransferases in the regulation of gene expression. Cell Mol Biol Lett 10(4): 631–647. 19. Banerjee S, Fisher O, Lohia A, Ankri S (2005) Entamoeba histolytica DNA methyltransferase (Ehmeth) is a nuclear matrix protein that binds EhMRS2, a DNA that includes a scaffold/matrix attachment region (S/MAR). Mol Biochem Parasitol 139: 91–97. 20. Harony H, Bernes S, Siman-Tov R, Ankri S (2006) DNA methylation and targeting of LINE retrotransposons in Entamoeba histolytica and Entamoeba invadens. Mol Biochem Parasitol 147: 55–63. 21. Holt A, Wold F (1961) The isolation and characterization of rabbit muscle enolase. J Biol Chem 236: 3227–3231. 22. Saavedra E, Encalada R, Pineda E, Jasso-Chavez R, Moreno-Sanchez R (2005) Glycolysis in Entamoeba histolytica. Biochemical characterization of recombinant glycolytic enzymes and flux control analysis. FEBS J 272: 1767–1783. 23. Riahi Y, Siman-Tov R, Ankri S (2004) Molecular cloning, expression and characterization of a serine proteinase inhibitor gene from Entamoeba histolytica. Mol Biochem Parasitol 133: 153–162. 24. Lavi T, Isakov E, Harony H, Fisher O, Siman-Tov R, et al. (2006) Sensing DNA methylation in the protozoan parasite Entamoeba histolytica. Mol Microbiol 62: 1373–1386. 25. Arhets P, Gounon P, Sansonetti P, N aG (1995) Myosin II is involved in capping and uroid formation in the human pathogen Entamoeba histolytica. Infection and Immunity 63: 4358–4367. 26. Dastidar PG, Majumder S, A L (2007) Eh Klp5 is a divergent member of the kinesin 5 family that regulates genome content and microtubular assembly in Entamoeba histolytica. Cell Microbiol 9: 316–328. 27. Hermann A, Schmitt S, Jeltsch A (2003) The human Dnmt2 has residual DNA(cytosine-C5) methyltransferase activity. J Biol Chem 278: 31717–31721. 28. Boe¨l G, Pichereau V, Mijakovic I, Maze´ A, Poncet S, et al. (2004) Is 2phosphoglycerate-dependent automodification of bacterial enolases implicated in their export. Journal of molecular biology 2: 485–496. 29. Gerlt JA, Babbitt PC, Rayment I (2005) Divergent evolution in the enolase superfamily: the interplay of mechanism and specificity. Arch Biochem Biophys 433: 59–70. 30. Lavi T, Siman-Tov R, Ankri S (2008) EhMLBP is an essential constituent of the Entamoeba histolytica epigenetic machinery and a potential drug target. Molecular Microbiol 69: 55–66. 31. Kim J, Dang C (2005) Multifaceted roles of glycolytic enzymes. Trends Biochem Sci 30: 142–150.

PLoS Pathogens | www.plospathogens.org

February 2010 | Volume 6 | Issue 2 | e1000775

Epigenetic Variation in Mangrove Plants Occurring in Contrasting Natural Environment Catarina Fonseca Lira-Medeiros1, Christian Parisod2¤, Ricardo Avancini Fernandes1, Camila Souza Mata1, Monica Aires Cardoso1, Paulo Cavalcanti Gomes Ferreira1,3* 1 Diretoria de Pesquisa Cientı´fica, Instituto de Pesquisas Jardim Botaˆnico do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brasil, 2 Laboratoire de Biologie Cellulaire, Institut J.-P. Bourgin - INRA Centre de Versailles, Versailles, France, 3 Instituto de Bioquı´mica Me´dica, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brasil

Abstract Background: Epigenetic modifications, such as cytosine methylation, are inherited in plant species and may occur in response to biotic or abiotic stress, affecting gene expression without changing genome sequence. Laguncularia racemosa, a mangrove species, occurs in naturally contrasting habitats where it is subjected daily to salinity and nutrient variations leading to morphological differences. This work aims at unraveling how CpG-methylation variation is distributed among individuals from two nearby habitats, at a riverside (RS) or near a salt marsh (SM), with different environmental pressures and how this variation is correlated with the observed morphological variation. Principal Findings: Significant differences were observed in morphological traits such as tree height, tree diameter, leaf width and leaf area between plants from RS and SM locations, resulting in smaller plants and smaller leaf size in SM plants. Methyl-Sensitive Amplified Polymorphism (MSAP) was used to assess genetic and epigenetic (CpG-methylation) variation in L. racemosa genomes from these populations. SM plants were hypomethylated (14.6% of loci had methylated samples) in comparison to RS (32.1% of loci had methylated samples). Within-population diversity was significantly greater for epigenetic than genetic data in both locations, but SM also had less epigenetic diversity than RS. Frequency-based (GST) and multivariate (bST) methods that estimate population structure showed significantly greater differentiation among locations for epigenetic than genetic data. Co-Inertia analysis, exploring jointly the genetic and epigenetic data, showed that individuals with similar genetic profiles presented divergent epigenetic profiles that were characteristic of the population in a particular environment, suggesting that CpG-methylation changes may be associated with environmental heterogeneity. Conclusions: In spite of significant morphological dissimilarities, individuals of L. racemosa from salt marsh and riverside presented little genetic but abundant DNA methylation differentiation, suggesting that epigenetic variation in natural plant populations has an important role in helping individuals to cope with different environments. Citation: Lira-Medeiros CF, Parisod C, Fernandes RA, Mata CS, Cardoso MA, et al. (2010) Epigenetic Variation in Mangrove Plants Occurring in Contrasting Natural Environment. PLoS ONE 5(4): e10326. doi:10.1371/journal.pone.0010326 Editor: Peter Meyer, University of Leeds, United Kingdom Received January 13, 2010; Accepted March 26, 2010; Published April 26, 2010 Copyright: ß 2010 Lira-Medeiros et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: We thank CAPES (Coordenaçaõ de Aperfeiçoamento de Pessoal de Nı´vel Superior) and CNPq (Conselho Nacional de Desenvolvimento Cienta˜fico e Tecnola˜gico) for financial support for the molecular genetic studies and also scholarships for CFL-M and PCGF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: paulof@bioqmed.ufrj.br ¤ Current address: Laboratory of Evolutionary Botany, University of Neuchaˆtel, Neuchaˆtel, Switzerland

ation and their effects on phenotypes have been investigated in model plants [13], but their impact on the evolution of natural populations is underexplored [14,15]. Unlike genetic variation, epigenetic changes generating novel and heritable phenotypes may represent reversible genomic alterations that allow colonizing variable environments and new landscapes on an evolutionary timescale [7,16,17]. Mangroves are ecosystems occurring along tropical and subtropical coastlines, and subjected to daily variations of water salinity due to sea level oscillations [18,19]. Mangrove plant species thus have to tolerate a wide range of environmental conditions and often present divergent structural and morphological characteristics in different ecogeographic zones [20,21]. In regions with abundant input of fresh water, nutrients and sediments, trees generally show good development and reach heights of over 40 meters [21]. In contrast, in habitats with

Introduction Epigenetic changes can modify phenotypes without changing nucleotide sequence of promoter or coding regions of a gene [1]. Covalent modifications of the DNA or histones are responsible for transmitting epigenetic information from cell to daughter cell and, in plants, from generation to generation [2,3]. These phenomena are important to maintain genome stability against the proliferation of transposable elements, but also to control the regulation of gene expression [4–6]. CpG and non-CpG sequences are heavily methylated in the pericentromeric and repetitive regions of plant genomes, but DNA methylation is also present in gene-rich regions [7]. Accordingly, evidence that such DNA methylation is crucial for several plant developmental processes has accumulated in a short period [8–12]. The mechanisms controlling DNA methylPLoS ONE | www.plosone.org

April 2010 | Volume 5 | Issue 4 | e10326

Natural Epigenetic Variation

limiting factors, such as periodic drought and hyper-saline soils (i.e. salt marshes), plants have abnormal development and reach heights of only 1.5 to 3 meters, with a shrub-like morphology [21]. Laguncularia racemosa (L.) Gaertn. f. (Family: Combretaceae), also known as white mangrove, is broadly found in the western world among mangroves of the Americas and Africa [20]. In Sepetiba Bay (Rio de Janeiro, Brazil; Figure 1A), individuals of L. racemosa are found either in a river basin or near a salt marsh, suggesting that plants located at each site might be under divergent environmental pressures. As postulated by Schaeffer-Novelli et al. [21], those individuals are morphologically distinct, having a treelike structure in the riverside location (RS; Figure 1B) and a shrublike morphology near the salt marsh (SM; Figure 1C). In a preliminary survey, samples from both RS and SM areas were characterized by Amplified Fragment Length Polymorphism (AFLP) markers and presented no significant genetic differentiation (C.F. Lira-Medeiros, unpublished data). Considering the remarkable genetic similarity, morphological differences between individuals from RS and SM areas are surprising and could be associated with epigenetic variation. Few molecular tools are available to investigate epigenetic variation in non-model species, but the Methyl-Sensitive Amplified Polymorphism (MSAP) technique provides data on cytosine

methylation without a priori knowledge of genome sequences [22]. Using two isoschizomer restriction enzymes (MspI and HpaII), which recognize the same restriction site (5-CCGG-3) but have different cytosine methylation sensitivities, MSAP allows the identification of CpG-methylation polymorphism [6,23â&#x20AC;&#x201C;25]. While several studies evaluated methylation changes among related lineages of crop and/or polyploidy species [9,26â&#x20AC;&#x201C;31], the potential of MSAP to analyze genetic and epigenetic structure of natural populations is still underexplored. Using MSAP, this study aims at describing how CpG-methylation is distributed among individuals from natural populations occurring in two contrasting habitats. It also explores correlations between environmental and morphological variations found in L. racemosa. This is one of the first reports to assess epigenetic variation in natural plant populations and it indicates that DNA methylation may be evolutionarily unlinked from genetic alterations to shape phenotypic variation.

Results Morphological variation Morphological traits such as tree height, tree diameter and leaf size were measured on 50 Laguncularia racemosa individuals from RS

Figure 1. Map of the mangrove forest and pictures of Laguncularia racemosa illustrating the morphological differences between natural populations. (A) Map of Rio de Janeiro State, where the city of Rio de Janeiro is painted in red. There is an aerial view of the conserved Sepetiba Bayâ&#x20AC;&#x2122;s mangrove forest. The salt marsh formation is visible in gray (no vegetation) above the study area delimited by a green line. (B) Individual of L. racemosa from the riverside (RS) location, almost 10 meters tall. (C) Typical L. racemosa individual from the area near a salt marsh (SM) with abnormal development, reaching only 1.5 meters in height. doi:10.1371/journal.pone.0010326.g001

PLoS ONE | www.plosone.org

April 2010 | Volume 5 | Issue 4 | e10326

Natural Epigenetic Variation

and SM populations in Sepetiba Bay’s mangroves (Figure 1), half from each site. Tree heights in RS plants (mean 35.3 m) were significantly higher than in SM plants (mean 4.7 m) using twosample t-test (t = 16.24, P,0.0001; Table 1). Tree diameters at breast height (DBH) were also significantly different with mean values of 35.3 cm and 4.7 cm for RS and SM plants respectively (t = 14.26, P,0.0001; Table 1). Variation in leaf morphology were assessed by measuring leaf length, leaf width and leaf area in 25 leaves of each tree analyzed above (N = 1250; Table 1). Leaf length did not deviate significantly between RS and SM samples, with means of 7.3 cm and 7.2 cm respectively (t = 1.83, P = 0.068). In contrast, leaf width was significantly different between the two locations with means of 4.5 cm in RS and 4.0 cm in SM (t = 7.23, P,0.0001). Leaf areas were also significantly different between RS and SM plants with means of 25.3 cm2 and 23 cm2, respectively (t = 4.92, P,0.0001).

Genetic structure The genetic structure of L. racemosa was analyzed using EcoRI/ MspI data of 183 MSAP loci. Hemimethylated loci (i.e. loci with fragments present in EcoRI/HpaII digestions but absent in EcoRI/ MspI digestions) were excluded from genetic and epigenetic analysis. This type of fragment is not inherited over generations and represented only 12.4% of all loci obtained in the present study (Table 2). Within-population Shannon diversity indices for genetic data were 0.013 and 0.008 for RS and SM, respectively, and not significantly different (Wilcoxon rank sum test; W = 16031.5, P = 0.06; Table 3). The Principal Component Analysis (PCA) based on covariance matrix summarized 42.4% of the total inertia in the two first principal components of EcoRI/MspI data (Figure 2A), and showed very small genetic differences between RS and SM populations, with individuals from different locations being genetically very similar (i.e. central samples on Figure 2A). Since dominant markers cannot be used to calculate deviation from the Hardy-Weinberg equilibrium directly, two complementary statistical methods, which do not rely on these assumptions, were used and showed similar values for EcoRI/MspI data. Based on the partition of Shannon diversity within and among populations, we calculated a GST of 0.152 (Table 3). The multivariate Between-group Eigen Analysis (BPCA – PCA among groups based on PCA among individuals) resulted in a significant bST of 0.131 (P,0.001; Table 3).

Methylation patterns Six primer-pair combinations provided 209 reliable MSAP loci to study 34 L. racemosa individuals, half from each site (RS and SM). The calculated error rate was approximately 3% based on negative controls and replicated samples. Number of loci counted for each primer combination varied from 10 to 70 (Table 2). Comparisons between EcoRI/MspI and EcoRI/HpaII profiles for each sample allowed us to identify non-methylated, methylated and hemimethylated loci based on methylation sensitivities of both isoschizomers [32]. Presence of MSAP fragments in both EcoRI/ MspI and EcoRI/HpaII profiles indicated non-methylated loci. CpG-methylated loci were characterized by the presence of EcoRI/MspI fragments and absence of EcoRI/HpaII fragments in the same locus [32,33]. The opposite pattern, fragments present in EcoRI/HpaII digestions but absent in EcoRI/MspI digestions, were counted as hemimethylated loci representing methylation on external cytosines (i.e. non-CpG methylation [32]). Considering all MSAP markers, we observed 67 loci with CpGmethylation, 116 non-methylated loci and 26 hemimethylated loci (Table 2). All 67 methylated loci were found methylated in some samples of RS population (32.1% of all loci) but only 30 of these loci were methylated in some samples of SM population (14.6% of all loci). In this case, the remaining 37 loci were not methylated in any sample, indicating a hypomethylation in the SM population. The methylation status of all methylated loci also varied between samples from the same population. We observed that only 13 loci out of the 67 methylated loci were methylated in all RS samples. And all SM samples were methylated at the same time in 18 loci out of 30 methylated loci, indicating less CpG-methylation variation in SM plants.

Epigenetic structure The same 183 loci were similarly investigated for epigenetic structure of L. racemosa using EcoRI/HpaII dataset. The withinpopulation epigenetic Shannon indices calculated for RS and SM were 0.084 and 0.024, respectively, and were significantly different (Wilcoxon rank sum test; W = 11459.5, P,0.001; Table 3). Comparisons between genetic and epigenetic Shannon indices of the same population resulted in significantly higher epigenetic values in both locations (Kruskal-Wallis chi-squared = 112.8326, P,0.0001 for RS; and Kruskal-Wallis chi-squared = 143.1622, P,0.0001 for SM). The PCA based on covariance matrix of EcoRI/HpaII data summarized 38% of the total inertia in the two first principal components and showed pronounced differentiation between RS and SM samples. Individuals from different locations were clearly separated, and SM plants were more similar to each other than RS individuals on the epigenetic profile (Figure 2B).The epigenetic structure was also evaluated using two statistical methods, which gave similar differentiation indices: GST = 0.183 and bST = 0.222 (P,0.001; Table 3). Frequency-based and multivariate methods showed greater epigenetic than genetic differentiation between L. racemosa natural populations, as can also be visualized on PCAs (Figure 2).

Genetic versus epigenetic structure

Table 1. Morphological data obtained from Laguncularia racemosa plants in Sepetiba Bay’s mangrove forest.

H (m) ***

7.5

1.9***

DBH (cm)

L (cm)

W (cm)

***

35.3

4.7***

7.3

4.5

7.2NS

4.0***

In order to evaluate the contribution of both genetic and epigenetic profiles to the L. racemosa population structure, CoInertia analysis was performed. This analysis is a multivariate method that maximizes shared structure among multiple datasets drawn from the same samples. The two first axis of Co-Inertia analysis explained 76.5% of the genetic co-variation between EcoRI/MspI and EcoRI/HpaII datasets and this association was significantly different from the value expected for random association (P,0.0001). In the Co-Inertia subspace (Figure 3), three individuals from the RS population (represented by 1) and three from the SM population (represented by 2) showed similar genetic profiles (represented as circles: #), but rather divergent epigenetic profiles

A (cm2) 25.3*** 23***

***

P,0.0001. non-significant. Mean of tree height (H), diameter at breast height (DBH), leaf length (L), leaf width (W) and leaf area (A) were calculated for L. racemosa plants in RS and SM areas. Welch Two Sample t-test was performed to obtain P values. doi:10.1371/journal.pone.0010326.t001 NS

PLoS ONE | www.plosone.org

April 2010 | Volume 5 | Issue 4 | e10326

Natural Epigenetic Variation

Table 2. Number of loci and methylation pattern analyzed per EcoRI and MspI/HpaII primer combination.

EcoRI selective primer

MspI/HpaII selective primer

Methylated Loci

Non-methylated Loci

Hemi methylated Loci

Loci per primer combination

TCAA

AAC

TCAA

AAT

AAC

AAT

Total

67 (32.1%)

116(55.5%)

26 (12.4%)

209

Number of methylated, non-methylated and hemimethylated loci found with each of the six EcoRI and MspI/HpaII primer combinations used for L. racemosa population genetic analyses. A total of 209 loci were found from which 183 were used on the genetic and epigenetic analyses, after excluding hemimethylated loci (details in Methods). doi:10.1371/journal.pone.0010326.t002

(represented as arrowheads: c). The epigenetic profiles of these samples point in opposite directions, where RS samples clearly have epigenetic profiles (c) more similar to other plants from RS than to plants from the SM population. The same is true for SM samples, which are more similar to their co-habitants than to RS plants. Co-Inertia also showed less variation in both genetic and epigenetic components in SM than in RS samples. This is represented by a greater aggregation of SM circles (genetic profiles) and arrowheads (epigenetic profiles) in the Co-Inertia graph (Figure 3).

near salt marshes have been classified as dwarf forests [34], because they have abnormal growth (Figure 1). The leaf measures also showed significant reduction of leaf width and leaf area in SM plants that can be associated with environmental restrictions of their habitat. Another mangrove species Rhizophora mangle was also reported to have smaller leaves as a result of salinity and nutrient deficiency in salt marsh areas [35]. Preliminary analysis using AFLP markers revealed very little genetic differentiation (GST = 0.032) between L. racemosa individuals from SM and RS sites (C.F. Lira-Medeiros, unpublished data), suggesting that the extensive morphological divergence between the two sites was most affected by epigenetic events. Using MSAP methodology, we assessed both genetic (EcoRI/ MspI) and epigenetic (EcoRI/HpaII) components of the genome of L. racemosa plants from RS and SM areas. The use of these isoschizomers concentrates our investigation on CpG-dinucleotide regions, which are generally gene-rich areas of the genome [36]. Moreover, after the exclusion of hemimethylated loci, we could bypass MspI sensitivity to non-symmetrical methylated sites and extrapolate the EcoRI/MspI data analysis as genetic profile. Thus MSAP markers were highly informative, revealing genetic and epigenetic components of L. racemosa genome in natural populations. Divergences in CpG-methylation levels were detected by comparing MspI and HpaII profiles at the same loci. In plants from the RS location, 32.1% of all MSAP fragments were methylated in some samples, while only 14.6% of loci from SM plants were methylated in some samples. Previous works have shown that 30â&#x20AC;&#x201C;41% of loci were methylated in Brassica oleraceae accessions [32], 32% of loci were methylated in Gossypium hirsutum accessions [29], and 35â&#x20AC;&#x201C;43% of loci were methylated in Arabidopsis thaliana ecotypes [23]. This indicates a hypomethylation in the genome of SM plants. The hypomethylation has already been shown to affect plant development in other studies [37,38]. Genetic profiles observed in this work showed some similarities between RS and SM samples, but epigenetic profiles were more differentiated as shown by PCAs (Figure 2) and differentiation indices (Table 3). Two statistical methods resulted in higher epigenetic than genetic differentiation, suggesting that the two components of the genome probably are under divergent evolutionary pressures resulting in divergent genetic and epigenetic structure of RS and SM samples. At the epigenetic level, the Shannon index was significantly greater within the RS than within the SM population, showing that epigenetic diversity is somewhat constrained in the

Discussion Mangrove zonation is affected by climatic and hydrologic factors that shape the structural development of mangrove stands [21,34]. In Sepetiba Bay, mangrove zonation results in two physiographic forests where L. racemosa individuals display very distinct morphological characteristics. Riverside plants (RS) undergo daily tidal fluxes and consequently high nutrient input whereas salt marsh plants (SM) suffer from high soil salinity and low nutrient input; consequently plants from these areas develop differently. Our analyses of morphological traits in L. racemosa plants from RS and SM area correlate morphology with mangrove zonation. Tree height and trunk diameter reflected tree-like structure of RS plants and shrub-like structure of SM plants. Mangrove forests Table 3. Within-population diversity and among-population differentiation of genetic and epigenetic components of Laguncularia racemosa genome.

Data

Location

EcoRI/MspI

EcoRI/HpaII

Shannon diversity 0.013

0.008

0.084b

0.024

GST

bST

0.152

0.131***

0.183

0.222***

*** P,0.0001. Within-population Shannon diversity index was calculated for each dataset and each L. racemosa population. Differentiation indices between populations were calculated using three statistical methods. EcoRI/MspI data represent genetic profiles while EcoRI/HpaII data represent epigenetic profiles. Values with different superscript letters are significantly different according to Wilcoxon rank sum test with continuity correction and Kruskal-Wallis rank sum test. doi:10.1371/journal.pone.0010326.t003

PLoS ONE | www.plosone.org

April 2010 | Volume 5 | Issue 4 | e10326

Natural Epigenetic Variation

Figure 2. PCA analyses of Laguncularia racemosa natural populations using both genetic and epigenetic data separately. Multivariate analysis of the genetic (EcoRI/MspI) and epigenetic (EcoRI/HpaII) components of L. racemosa plants found at riverside (RS represented as number 1) and the salt marsh (SM represented as number 2) locations of Sepetibaâ&#x20AC;&#x2122;s Bay mangrove forest. (A) PCA on covariance matrix for genetic profiles obtained using EcoRI/MspI data. (B) PCA on covariance matrix for epigenetic profiles obtained using EcoRI/HpaII data. F1 and F2 values show the contribution of the two principal components summarizing the total variance of each dataset. bST was calculated using Between-group EigenAnalysis (BPCA) for both genetic and epigenetic profiles and tested with 9999 permutations. doi:10.1371/journal.pone.0010326.g002

Figure 3. Co-Inertia analysis of Laguncularia racemosa natural populations using genetic and epigenetic data. The Co-Inertia analysis maximized the covariance of PCAs shown in Figure 2. The significance test of this association was done with 9999 permutations. RS samples are numbered as 1 while SM samples are numbered as 2. Circles (#) correspond to the projection of genetic profiles (EcoRI/MspI) and arrowheads (c) the projection of epigenetic profiles (EcoRI/HpaII). Black-filled arrows indicate three RS samples that had similar genetic profiles with three SM samples, but divergent epigenetic profiles. F1 and F2 values show the contribution of the two principal components summarizing the total variance of each dataset. doi:10.1371/journal.pone.0010326.g003

PLoS ONE | www.plosone.org

April 2010 | Volume 5 | Issue 4 | e10326

Natural Epigenetic Variation

mangrove using methodology described elsewhere [43]. We measured tree height and diameter at breast height (DBH) of 25 plants from each location, RS and SM. The results were tested for significant differences between sites using the Welch Two Sample t-test in R software [44]. We also collected 25 leaves from the distal end of upper branches of those 50 measured trees in order to measure leaf length (L), leaf width (W) and leaf area (A). The leaf area (cm2) was calculated using a formula developed for L. racemosa plants by K.W. Krauss (unpublished data). These three leaf traits were tested for significant divergence between RS and SM plants using Welch Two Sample t-test in R software [44].

SM habitat. It is the first time that natural epigenetic variation is shown in a wild plant population and it is unexpected to learn that genetic and epigenetic diversity is not always linked together. Since the RS population is located in more favorable habitat of mangrove forest with daily tides and high nutrient inputs, and SM plants are subjected to limited nutrient input and high saline soil, we believe that it is expected that the population under stress would have changes in their epigenome in order to cope with habitat conditions. The outcome is an increased number of both fixed methylated loci and non-methylated loci resulting in the erosion of the epigenetic diversity in the SM population and an abnormal development of the plants. Co-Inertia analysis (Figure 3) also showed greater epigenetic than genetic variation between RS and SM plants. Interestingly, similarities between individuals from the same population are stronger at the epigenetic than at the genetic level, indicating that environmental conditions might have shaped epigenetic differentiation between the two locations. Apparently, the genetic component of the genome is not strongly affected in the same way by the habitat. Similar results were found in Solanum natural populations where epigenetic variation was greater than genetic variation and associated with abnormal floral phenotypes [39]. MET1 is a plant DNA methyltransferase that plays an important role in CpG-dinucleotide methylation maintenance on single-copy and repeat DNA [40]. Hypomethylated genomes caused by MET1 deficiency has been associated with development abnormalities such as plant stature, reduced apical dominance and decreased fertility [41]. Active methylation and demethylation of DNA is a key factor for sensing environmental changes and reacting with change in gene expression, especially in plants [42]. Laguncularia racemosa populations are interesting natural systems to study the correlations between DNA methylation level, environmental conditions and morphological traits. It is attractive to postulate an association between hypomethylated plants near the salt marsh and their unique developmental pattern by demethylation in these stressed plants. DNA methylation patterns are maintained by MET1 gene, which may be responsible for the methylation variation found in L. racemosa populations. A reduced level of this enzyme could lead to hypomethylation of SM plants. Further investigations should be carried on to confirm this hypothesis. We suggest that the epigenetic component of a genome, currently underexplored, plays an important role in long-term adaptation of the species in different environmental conditions. Epigenetic markers should not be disregarded in future population studies.

Genetic/Epigenetic analyses DNA extraction was carried out based on the protocol described by Cardoso et al. [45] with some modifications, including a scale-down process using 50 mg of dry material. PVP 40 000 was used to improve DNA yield and quality. DNA was resuspended in 100ml of sterile water and quantified on 1% agarose gels. MSAP methodology used both EcoRI/MspI and EcoRI/HpaII digests as described by Xiong et al. [22]. Samples were subjected to EcoRI digestion using 1mg of genomic DNA and 10 U of enzyme (PromegaR) with 16 buffer H in a final volume of 200ml. Digested DNA was precipitated with 0.1 vol of 3 M sodium acetate and 2.5 vol of 100% ethanol and afterwards washed in 70% ethanol. Half of EcoRI-digested DNA was used for digestion by each isoschizomer enzyme, with 5 U of MspI in 16 Multicore buffer (PromegaR) or 5 U of HpaII in 16 Buffer B (PromegaR) in a final volume of 50ml. Incubations were all performed at 37uC for 6 h and enzymes were afterwards denatured at 65uC for 20 min. Adapter ligation was performed with 20ml of digested DNA, 16 T4 DNA ligase buffer (PromegaR), 1 U T4 DNA ligase enzyme (PromegaR), 5 pmol of each EcoRI adapter [46] and 50 pmol of each MspI/HpaII adapter [22] in a 30ml reaction for 3 h at 20uC. Pre-amplification was conducted in a 20ml reaction using 2ml of ligated and digested DNA, 16PCR buffer, 0.4 mM dNTPs, 30 ng of EcoRI (59-GACTGCGTACCAATTC-39) and MspI/HpaII basic primers (59-ATCATGAGTCCTGCTCGG-39) and 2 U Taq polymerase (Ludwig Biotecnologia Ltda). The reactions were carried out for 25 cycles of 94uC 1 min, 56uC 1 min and 72uC 2 min with a 10-min final extension. Pre-amplification products were diluted 20 fold and 5ml was used for the selective amplifications. These amplifications use the basic primer sequence with 2 to 3 selective nucleotides at the 39 end in order to obtain greater polymorphism with the same DNA digestions and to have at the same time DNA fragments that could be visualized in the gel. These 20ml reactions contained 30 ng of each selective primer EcoRI and MspI/HpaII (EcoRI+AG, EcoRI+AC and EcoRI+AAC combined with MspI/HpaII+TCAA and also EcoRI+AG, EcoRI+AAC and EcoRI+AT combined with MspI/HpaII+AAT), 0.2 mM dNTPs, 16 PCR buffer and 2 U Taq polymerase (Ludwig Biotecnologia Ltd). The touchdown program performed was: 94uC for 30 s, 65uC for 30 s and 72uC for 1 min decreasing the annealing temperature by 0.7uC per cycle during 12 cycles and then 24 cycles of 94uC for 30 s, 56uC for 1 min and 72uC for 2 min with a final period of 5 min at 72uC. The final amplification products were separated by electrophoresis for 2.5 h at 60 W on a 4% denaturing polyacrylamide gel with 7.5 M urea. Gels were stained with a 0.1% silver nitrate solution containing 0.5% formaldehyde for 30 min after gel fixation on 10% acetic acid solution for 20 min. Staining development used a 6% sodium carbonate solution with 0.5% formaldehyde and 2mg of sodium thiosulphate for 3 min and the reaction was stopped with 10%

Materials and Methods Plant material Laguncularia racemosa plants in Sepetiba Bayâ&#x20AC;&#x2122;s mangrove forest (Rio de Janeiro, Brazil; Figure 1A) located within a 16-ha area limited by a salt marsh on one side and the Piraco River on the other side were investigated. Individuals near the salt marsh (SM) or at the riverside (RS) are 200 meters apart, separated by a transitional area called Rhizophora Sea where only the mangrove species Rhizophora mangle occurs. Young and undamaged L. racemosa leaves from 17 randomly chosen adult trees from each location were sampled and immediately stored in silica gel for DNA analyses.

Morphological analyses Morphological traits were measured on 50 randomly-chosen L. racemosa individuals in the two locations of Sepetiba Bayâ&#x20AC;&#x2122;s PLoS ONE | www.plosone.org

April 2010 | Volume 5 | Issue 4 | e10326

Natural Epigenetic Variation

acetic acid solution. Silver-stained gels were photo-documented for manual scoring.

indices inside each population was calculated by Kruskal-Wallis rank sum test both using R software [44]. The significance of these tests was adjusted with the Bonferroni correction [51]. Genetic and epigenetic structures were computed for each EcoRI/MspI and EcoRI/HpaII profiles as GST = (Htot2Hpop)/Htot [50]. Second, individual profiles were also investigated by multivariate analyses because it is a band-based approach that does not assume Hardy-Weinberg equilibrium. Principal Component Analysis (PCA) on inter-profile covariance matrix followed by Between-group Eigen Analysis (BPCA [49]) was computed on EcoRI/MspI and EcoRI/HpaII data using ADE-4 [52]. BPCA (i.e. PCA among groups based on the PCA among individuals) divides the variance into within- and between-group components and, given that it is a Euclidean approach, can be considered as analogous to F-statistics (called here bST). The statistical significance was assessed by the Romesburg randomization test (9999 permutations). In addition, multivariate analyses allow for the joint analysis of CpG-genetic and epigenetic structure through statistical procedures that maximize and test the common variance of different datasets. Here, the symmetrical Co-Inertia Analysis was used to investigate the association between EcoRI/MspI and EcoRI/HpaII profiles by projecting the PCA scores of individuals into a new subspace, maximizing their covariance [53]. Unlike the related Canonical Correspondence Analysis, Co-Inertia analysis does not rely on linear regressions and thus can be safely used for any number of variables to be related [54]. The significance of this association has been tested in ADE-4 by a procedure in which rows of EcoRI/MspI and EcoRI/HpaII tables were randomly permuted 9999 times.

Data analysis The 34 samples were scored for presence (1) or absence (0) of EcoRI/MspI and EcoRI/HpaII fragments. Only unambiguously and intensely labeled bands were scored. The error rate (3%) was calculated based on negative controls and repeated samples [47]. Loci with different band pattern on controls or replications were excluded from analysis as possible methodological artifacts and recorded as errors. Three types of DNA methylation status were identified by presence (1) or absence (0) of EcoRI/MspI and EcoRI/ HpaII digests respectively: fully methylated loci (1/0), nonmethylated loci (0/0) and hemimethylated loci (0/1) as stated by Salmon et al. [32]. Hemimethylated loci are represented by methylation in one DNA strand but not in its complement strand, i.e. the external cytosine residue of MspI and HpaII restriction site (5-CCGG-3) is methylated in one DNA strand only. Since this type of fragment is not inherited over generations, they were excluded from population structure analyses of L. racemosa. These analyses were done using both EcoRI/MspI and EcoRI/HpaII profiles separately in order to obtain the genetic (represented by non-symmetrical methylation sensitivity of MspI enzyme) versus epigenetic structure (represented by CpG-methylation sensitivity of HpaII). Dealing with dominant markers, heterozygosity cannot be calculated directly, so deviation from Hardy-Weinberg equilibrium thus has to be either: (i) assumed as null, (ii) bypassed, or (iii) assessed by other means [48,49]. Here, the first two possibilities have been explored and compared to provide robust estimates of the genetic and epigenetic structure of L. racemosa populations using MSAP data of EcoRI/MspI and EcoRI/HpaII respectively. First, within-population genetic diversity (Hpop) was assessed by Shannon diversity index calculated based on the frequency of each band out of the 17 samples for EcoRI/MspI and EcoRI/HpaII. As recommended by Bussell [50], log2 (0) was replaced by 0 for fixed absent bands. Significant differences of Shannon index among populations for each genetic and epigenetic profiles were assessed using the Wilcoxon rank sum test with continuity correction and significant differences between genetic and epigenetic Shannon

Acknowledgments We thank Martha Sorenson, Rob Martienssen, Eric J Richards and Xianfa Xie for critical reading of the manuscript. We thank Ricardo Matheus for his contribution to the field work.

Author Contributions Conceived and designed the experiments: CFLM MAC PCGF. Performed the experiments: CFLM RAF CSM. Analyzed the data: CFLM CP MAC PCGF. Wrote the paper: CFLM CP MAC PCGF.

References 1. Jablonka E, Lamb M (1998) Epigenetic inheritance in evolution. J Evol Biol 11: 159–183. 2. Fazzari M, Greally J (2004) Epigenomics: beyond CpG islands. Nat Rev Genet 5: 446–455. 3. Scott R, Spielman M (2006) Epigenetics: imprinting in plants and mammal the same but different? Curr Biol 14: R200–R203. 4. Bender J (2000) Plant epigenetics. Curr Biol 12: R412–414. 5. Rapp R, Wendel J (2005) Epigenetics and plant evolution. New Phytol 168: 81–91. 6. Parisod C, Salmon A, Tenaillon M, Zerjal T, Grandbastien MA, et al. (2009) Rapid structural and epigenetic reorganization near transposable elements in hybrid and allopolyploid genomes. New Phytol 183: 1003–1015. 7. Gehring M, Henikoff S (2007) DNA methylation dynamics in plant genomes. Biochim Biophys Acta 1769: 276–286. 8. Lee HS, Chen ZJ (2001) Protein-coding genes are epigenetically regulated in Arabidopsis polyploids. Proc Natl Acad Sci U S A 98: 6753–6758. 9. Hao Y, Wen X, Deng X (2004) Genetic and epigenetic evaluations of citrus calluses recovered from slow-growth culture. J Plant Physiol 161: 479–484. 10. Chan SL, Henderson I, Zhang X, Shah G, Chien JC, et al. (2006) RNAi, DRD1, and histone methylation actively target developmentally important nonCG DNA methylation in Arabidopsis. PLoS Genet 2: 791–797. 11. Huettel B, Kanno T, Daxinger L, Aufsatz W, Matzke A, et al. (2006) Endogenous targets of RNA-directed DNA methylation and pol IV in Arabidopsis. EMBO J 25: 2826–2836. 12. Zhang X, Yazaki J, Sundaresan A, Cokus S, Chan S, et al. (2006) Genome-wide high-resolution mapping and functional analysis of DNA methylation in Arabidopsis. Cell 126: 1189–1201. 13. Zhang X (2008) The epigenetic landscape of plants. Science 320: 489–492.

PLoS ONE | www.plosone.org

14. Kalisz S, Purugganan M (2004) Epialleles via DNA methylation: consequences for plant evolution. Trends Ecol Evol 19: 309–314. 15. Bossdorf O, Richards C, Pigliucci M (2008) Epigenetics for ecologists. Ecol Lett 11: 106–115. 16. Pavet V, Quintero C, Cecchini N, Rosa A, Alvarez M (2006) Arabidopsis displays centromeric DNA hypomethylation and cytological alterations of heterochromatin upon attack by Pseudomonas syringae. Mol Plant Microbe Interact 19: 577–587. 17. Lukens L, Zhan S (2007) The plant genomes methylation status and response to stress: implications for plant improvement. Curr Opin Plant Biol 10: 317– 322. 18. Saenger P (2003) Mangrove ecology, silviculture and conservation. The Netherlands: Kluwer Academic Publishers. 19. Barth O, So-Tihago L, Barros M (2006) Paleoenvironment interpretation of a 1760 years b.p. old sediment in a mangrove area of the bay of Guanabara, using pollen analysis. An Acad Bras Cienc 78: 227–229. 20. Tomlinson P (1986) The Botany of Mangroves. Cambridge: Cambridge University Press. 21. Schaeffer-Novelli Y, Cintro´n-Molero G, Adaime R, Camargo T (1990) Variability of mangrove ecosystems along the brazilian coast. Estuaries 13: 204–218. 22. Xiong L, Xu C, Saghai-Maroof M, Zhang Q (1999) Patterns of cytosine methylation in an elite rice hybrid and its parental lines, detected by a methylation-sensitive amplification polymorphism technique. Mol Genet Genomics 261: 439–446. 23. Cervera M, Ruiz-Garca L, Martnez-Zapater J (2002) Analysis of DNA methylation in Arabidopsis thaliana based on methylation-sensitive AFLP markers. Mol Genet Genomics 268: 543–552.

April 2010 | Volume 5 | Issue 4 | e10326

Natural Epigenetic Variation

24. Schones D, Zhao K (2008) Genome-wide approaches to studying chromatin modifications. Nat Rev Genet 9: 179–191. 25. Schellenbaum P, Mohler V, Wenzel G, Walter B (2008) Variation in DNA methylation patterns of grapevine somaclones (Vitis vinifera L.). BMC Plant Biol 8: 78–87. 26. Jaligot E, Beul T, Baurens F, Billotte N, Rival A (2004) Search for methylationsensitive amplification polymorphisms associated with the mantled variant phenotype in oil palm (Elaeis guineensis Jacq). Genome 47: 224–228. 27. Salmon A, Ainouche M, Wendel J (2005) Genetic and epigenetic consequences of recent hybridization and polyploidy in Spartina (Poaceae). Mol Ecol 14: 1163–1175. 28. Takata M, Kishima Y, Sano Y (2005) DNA methylation polymorphisms in rice and wild rice strains: detection of epigenetic markers. Breed Sci 55: 57–63. 29. Keyte A, Percifield R, Liu B, Wendel J (2006) Infraspecific DNA methylation polymorphism in cotton (Gossypium hirsutum L.). J Hered 97: 444–450. 30. Zhao X, Chai Y, Liu B (2007) Epigenetic inheritance and variation of DNA methylation level and pattern in maize intra-specific hybrids. Plant Sci 172: 930–938. 31. Parisod C, Alix K, Just J, Petit M, Sarilar V, et al. (2010) Impact of transposable elements on the organization and function of allopolyploid genomes. New Phytol, in press. 32. Salmon A, Clotault J, Jenczewski E, Chable V, Manzanares-Dauleux M (2008) Brassica oleracea displays a high level of DNA methylation polymorphism. Plant Sci 174: 61–70. 33. Peraza-Echeverria S, Herrera-Valencia V, Kay A (2001) Detection of DNA methylation changes in micropropagated banana plants using methylationsensitive amplification polymorphism (MSAP). Plant Sci 161: 359–367. 34. Lugo A, Snedaker S (1974) The ecology of mangroves. Annu Rev Ecol Syst 5: 39–64. 35. Arau´jo R, Jaramillo J, Snedaker S (1997) LAI and leaf size differences in two red mangrove forest types in south Florida. Bull Mar Sci 60: 643–647. 36. Tran R, Henikoff J, Zilberman D, Ditt R, Jacobsen S, et al. (2005) DNA methylation profiling identifies CG methylation clusters in Arabidopsis genes. Curr Biol 15: 154–159. 37. Kakutani T (2002) Epi-alleles in plants: Inheritance of epigenetic information over generations. Plant Cell Physiol 43: 1106–1111. 38. Chinnusamy V, Zhu JK (2009) Epigenetic regulation of stress responses in plants. Curr Opin Plant Biol 12: 133–139. 39. Marfil C, Camadro E, Masuelli R (2009) Phenotypic instability and epigenetic variability in a diploid potato of hybrid origin, Solanum ruiz-lealii. BMC Plant Biol 9: 21–37.

PLoS ONE | www.plosone.org

40. Finnegan E, Kovac K (2000) Plant DNA methyltransferases. Plant Mol Biol 43: 189–201. 41. Richards E (1997) DNA methylation and plant development. Trends Genet 13: 319–323. 42. Grant-Downton R, Dickinson H (2006) Epigenetics and its implications for plant biology 2. The epigenetic epiphany: Epigenetics, evolution and beyond. Ann Bot 97: 11–27. 43. Brooks R, Bell S (2005) A multivariate study of mangrove morphology (Rhizophora mangle) using both above and below-water plant architecture. Estuar Coast Shelf Sci 65: 440–448. 44. R Foundation for Statistical Computing, Vienna, Austria R version 2.6.0. URL http://www.R-project.org. 45. Cardoso M, Provan J, Powell W, Ferreira P, de Oliveira D (1998) High genetic differentiation among remnant populations of the endangered Caesalpinia echinata Lam. (Leguminosae Caesalpinioideae). Mol Ecol 7: 601–608. 46. Vos P, Hogers R, Bleeker M, Reijans M, Van de Lee T, et al. (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23: 4407–4414. 47. Bonin A, Bellemain E, Broken Eidesen P, Pompanon F, Brochmann C, et al. (2004) How to track and assess genotyping errors in population genetics studies. Mol Ecol 13: 3261–3273. 48. Bonin A, Ehrich D, Manel S (2007) Statistical analysis of amplified fragment length polymorphism data: a toolbox for molecular ecologists and evolutionists. Mol Ecol 16: 3737–3758. 49. Parisod C, Christin PA (2008) Genome-wide association to fine-scale ecological heterogeneity within a continuous population of Biscutella laevigata (Brassicaceae). New Phytol 178: 436–447. 50. Bussel J (1999) The distribution of random amplified polymorphic DNA (RAPD) diversity amongst populations of Isotoma petraea (Lobeliaceae). Mol Ecol 8: 775–789. 51. Rice W (1989) Analyzing tables of statistical tests. Evol 43: 223–225. 52. Thioulouse J, Chessel D, Doldec S, Olivier J (1996) Ade-4: a multivariate analysis and graphical display software. Stat Comput 7: 75–83. 53. Ainouche M, Fortune P, Salmon A, Parisod C, Grandbastien MA, et al. (2009) Hybridization, polyploidy and invasion: Lessons from Spartina (Poaceae). Biol Invasions 11: 1159–1173. 54. Doldec S, Chessel D (1994) Co-inertia analysis: an alternative method for studying species-environment relationships. Freshw Biol 31: 277–294. 55. Holsinger K, Lewis P, Dey D (2002) A bayesian approach to inferring population structure from dominant markers. Mol Ecol 11: 1157–1164. 56. Landry C, Rathcke B (2007) Do inbreeding depression and relative male fitness explain the maintenance of androdioecy in white mangrove, Laguncularia racemosa (Combretaceae)? New Phytol 176: 891–901.

April 2010 | Volume 5 | Issue 4 | e10326

Hepatocellular Carcinoma Displays Distinct DNA Methylation Signatures with Potential as Clinical Predictors Hector Hernandez-Vargas1., Marie-Pierre Lambert1., Florence Le Calvez-Kelm2, Ge´raldine Gouysse3, Sandrine McKay-Chopin2, Sean V. Tavtigian2, Jean-Yves Scoazec3, Zdenko Herceg1* 1 Epigenetics Group, International Agency for Research on Cancer (IARC), Lyon, France, 2 Genetic Cancer Susceptibility Group, International Agency for Research on Cancer (IARC), Lyon, France, 3 Service d’Anatomie Pathologique, Edouard Herriot Hospital Lyon, Lyon, France

Abstract Background: Hepatocellular carcinoma (HCC) is characterized by late detection and fast progression, and it is believed that epigenetic disruption may be the cause of its molecular and clinicopathological heterogeneity. A better understanding of the global deregulation of methylation states and how they correlate with disease progression will aid in the design of strategies for earlier detection and better therapeutic decisions. Methods and Findings: We characterized the changes in promoter methylation in a series of 30 HCC tumors and their respective surrounding tissue and identified methylation signatures associated with major risk factors and clinical correlates. A wide panel of cancer-related gene promoters was analyzed using Illumina bead array technology, and CpG sites were then selected according to their ability to classify clinicopathological parameters. An independent series of HCC tumors and matched surrounding tissue was used for validation of the signatures. We were able to develop and validate a signature of methylation in HCC. This signature distinguished HCC from surrounding tissue and from other tumor types, and was independent of risk factors. However, aberrant methylation of an independent subset of promoters was associated with tumor progression and etiological risk factors (HBV or HCV infection and alcohol consumption). Interestingly, distinct methylation of an independent panel of gene promoters was strongly correlated with survival after cancer therapy. Conclusion: Our study shows that HCC tumors exhibit specific DNA methylation signatures associated with major risk factors and tumor progression stage, with potential clinical applications in diagnosis and prognosis. Citation: Hernandez-Vargas H, Lambert M-P, Le Calvez-Kelm F, Gouysse G, McKay-Chopin S, et al. (2010) Hepatocellular Carcinoma Displays Distinct DNA Methylation Signatures with Potential as Clinical Predictors. PLoS ONE 5(3): e9749. doi:10.1371/journal.pone.0009749 Editor: Robert Feil, CNRS, France Received November 6, 2009; Accepted March 1, 2010; Published March 17, 2010 Copyright: ß 2010 Hernandez-Vargas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by la Ligue Nationale (Francaise) Contre le Cancer, l’Association pour le Recherche Contre le Cancer (l’ARC), l’Agence Nationale de Recherhe Contre le Sida et Hepatites Virales (ANRS, France), National Institutes of Health/National Cancer Institute, USA, and Swiss Bridge Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: herceg@iarc.fr . These authors contributed equally to this work.

mutations in several tumor suppressor genes (such as TP53, p16, and RB), oncogenes (including c-MYC and b-catenin), and other cancer-associated genes (including E-cadherin and cyclin D1) have been observed in HCC. These changes have been detected mainly in late stages of HCC development [4]. In addition, a frequent identification of loss of heterozygosity (LOH) in chromosome 8p in HCC cases, suggested that inactivation of the Deleted in Liver Cancer 1 gene (DLC-1) may play pivotal roles in HCC development [5]. However, while genetic events are likely to contribute to the development of HCC, neither of these genetic alterations has been consistently identified in HCC, suggesting that epigenetic changes may play an important role. Aberrant DNA methylation is a major epigenetic mechanism of gene silencing and is observed in many human cancers [6,7]. DNA methylation occurs in eukaryote DNA at CpG sites, usually enriched in the promoters of genes. In several types of tumors,

Introduction Hepatocellular carcinoma (HCC) represents an endemic burden worldwide, partially due to delayed diagnosis and multiple risk factors that contribute to a permanent high incidence [1,2]. Well-known risk factors include chronic hepatitis B virus (HBV) and hepatitis C virus (HCV) infection, toxic, metabolic and immune-related conditions [3]. In all these conditions, the development of malignancy is the consequence of a multistep process, including several morphologically recognizable stages and usually associated with a context of cirrhosis, a precancerous condition combining increased proliferation and prolonged environmental stress. The sequential progression to carcinoma has been related with changes at the genetic and epigenetic level [4]. A number of previous studies investigated genetic changes in HCC, including mutations and deletions in candidate cancer-associated genes [4]. Somatic PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

including HCC, global hypomethylation and specific promoter hypermethylation have been linked with genomic instability and inactivation of tumor suppressor genes (TSG), respectively [8,9]. Indeed, accumulating evidence indicates that HBV-infected hepatocytes often exhibit altered epigenetic status [10,11]. In this sense, a deregulated methylation profile can be an early marker of disease and a useful tool for cancer screening. Several studies support the potential role of promoter hypermethylation in HCC-related gene silencing, and this has been shown to be positively correlated with tumor progression [12]. Relevant TSGs consistently found hypermethylated in HCC include RASSF1A or p16INK4a [12,13,14,15,16,17,18]. However, although a growing number of genes undergoing aberrant CpG island hypermethylation in HCC has been described, most studies have involved the analysis of hypermethylation in a limited number of gene promoters or a restricted number of HCC samples [12,13,14,15,16,17,18]. In addition to improving our understanding of liver carcinogenesis, large scale DNA promoter methylation profiles may produce useful associations with clinical parameters such as recurrence and survival. We studied a series of human HCC samples for DNA promoter methylation using Illumina bead array analysis of 1505 CpG sites in 807 cancer-related gene promoters. Signatures of a distinct HCC methylation profile were obtained and validated, as well as their potential application as clinical predictors.

Table 1. Clinicopathological features of HCC patients.

Variable

No. of cases

No. of patients

30* 24

Female

Age, mean 6 SD

59612.3

Etiology HBV

HCV

Alcohol use

Unknown risk factor

Tumor differentiation Well

Moderately

Poorly

Tumor size ,5 cm

.5 cm

TNM Stage

Methods Patients and Biopsy Specimens

TII

TIII

No. of nodules

All patients included in the study were referred for treatment to Edouard Herriot Hospital in Lyon, France between 1997 and 2009. Tissue samples were used only from patients having signed an informed-consent form; all tumor tissue samples were obtained through the Tumorothe`que des Hospices Civils de Lyon. The study was approved by the institutional review boards of the International Agency for Research on Cancer and the local ethics committee of Edouard Herriot Hospital. 38 patients with HCC were selected for analysis; in all cases, cryopreserved samples from the primary tumor were available for study; in 30 patients, paired cryopreserved samples of adjacent non malignant tissue were also available (for clinicopathological features, see Table 1). Samples from two patients with liver adenoma were used for comparison purposes. An additional series of 8 matched HCC and surrounding tissues was used for validation. In addition, three different human HCC cell lines (PLC/PRF/5, Hep3B, HepG2) and one breast carcinoma cell line (MCF7) were included in the array. For all patients, samples were taken from a surgical specimen, obtained through hepatectomy or liver transplantation, under the supervision of a pathologist; they were snap frozen less than 30 minutes after the removal of the surgical specimen and stored in liquid nitrogen until use. Before molecular analysis, the representativity and the quality of the sample were verified by a pathologist (Figure S1). Information about risk factors for HCC was retrieved from clinical charts; the following information was noted: serological evidence for HBV or HCV infection, alcohol consumption, evidence for dysmetabolic syndrome or auto-immune disease, and other etiologies. Information about the evolution (treatments, duration of follow-up, duration of survival, status at the date of last information) was retrieved from clinical charts. The histological diagnosis and classification of primary liver tumors and the histological evaluation of the adjacent liver tissue were performed by an experienced pathologist (JYS). PLoS ONE | www.plosone.org

Male

Unilocular

Multilocular

Cirrhosis Yes

Only patients with paired samples (tumor and surrounding tissue) are described here. doi:10.1371/journal.pone.0009749.t001

Bead array analysis of DNA promoter methylation Tissues were frozen in liquid nitrogen, ground into powder and then collected into eppendorf tubes. Genomic DNA from HCC tumors and surrounding tissue was prepared by overnight proteinase K treatment, phenol-chloroform extraction, and ethanol precipitation. Sodium bisulfite modification was performed on 500 ng DNA using the EZ DNA Methylation-Gold Kit (Zymo Research). DNA methylation profiling using bead arrays for 1505 CpG sites, corresponding to 807 cancer-related genes, was performed with the Illumina GoldenGate methylation assay (Illumina) as described previously [19]. Briefly, for each CpG site, four probes are included: two allele-specific oligos (ASO) and two locus-specific oligos (LSO). Each ASOâ&#x20AC;&#x201C;LSO oligo pair corresponds to either the methylated or unmethylated state of the CpG site. Each methylation data point is represented by two-color fluorescent signals from the M (methylated) and U (unmethylated) alleles. Technical replicates of several bisulfite-converted samples were run. BeadStudio v3.2 software (Illumina) was used for initial filtering and clustering analysis (see below).

Pyrosequencing Genomic DNA from HCC tumors and surrounding tissue was extracted and modified as described above. The eluted DNA was at a final concentration of 25 ng/ml. To quantify the percentage of 2

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

data was the significance level of the global test (P,0.05 for the global test). In addition, we performed an alternative analysis considering the frequency of methylation in tumors respective to surrounding tissue. To this end, we defined a threshold for frequently unmethylated and frequently methylated genes based on the 25 and 75 percentiles in the surrounding tissues, respectively. This is, a given CpG site was considered as frequently hypermethylated in tumors if more than 75% of the tumor samples lied above the 75 percentile in surrounding tissues. Similarly, if more than 75% of the tumor samples lied below the 25% of methylation in surrounding samples, this CpG site was considered as frequently hypomethylated in tumors (Figure S3). Class Prediction. We used different models to predict the class of future samples using CpG methylation profile based on the Compound Covariate Predictor [24], Diagonal Linear Discriminant Analysis [25], Nearest Neighbor Classification [25], and Support Vector Machines with linear kernel [26]. The models incorporated CpG sites that were differentially methylated at the 0.001 significance level as assessed by the random variance t-test [23]. We estimated the prediction error of each model using leave-one-out cross-validation (LOOCV) [27]. For each LOOCV training set, the entire model building process was repeated, including the gene selection process. We also evaluated whether the cross-validated error rate estimate for a model was significantly less than one would expect from random prediction. The class labels were randomly permuted and the entire LOOCV process was repeated. The significance level is the proportion of the random permutations that gave a crossvalidated error rate no greater than the cross-validated error rate obtained with the real data. 1000 random permutations were used. In addition, the Prediction Analysis for Microarrays (PAM) Tool was used as another method of class prediction. The method uses the shrunken centroid algorithm [28], whereby the centroids of each group are shrunken toward each other by shrinking the class means of each CpG site toward an overall mean. The amount of shrinking is determined by a ‘‘tuning parameter’’ called delta. As the shrinking occurs, some CpG sites will have the same value of shrunken class mean for the different classes, and hence they will have no effect in distinguishing the classes. For larger values of delta, fewer CpG sites will have different shrunken means among the classes, and so the classifier will be based on fewer CpG sites. With this approach, the number of CpG sites included in the classifier is determined by the value of delta. The algorithm provides a kfold cross-validated estimate of prediction error for all values of delta where k is the minimum class size. The tool indicates the delta corresponding to the smallest cross-validated prediction error and gives the list of CpG sites that are included in the classifier for that value of delta. Gene Ontology Analysis. The evaluation of which Gene Ontology (GO) classes are differentially methylated between tumor and surrounding samples was performed using a functional class scoring analysis as previously described [29]. For each gene in a GO class, the P value for comparing tumor and surrounding samples was computed. The set of P values for a class was summarized by two summary statistics: (i) The LS summary is the average log P values for the genes in that class and (ii) the KS summary is the Kolmogorov-Smirnov statistic computed on the P values for the genes in that class. Functional class scoring is a more powerful method of identifying differentially methylated gene classes than the more common over-representation analysis or

methylated cytosine in individual CpG sites, bisulfite-converted DNA was sequenced using a pyrosequencing system (PSQTM 96MA, Biotage, Sweden) [20]. This method treats each individual CpG site as a C/T polymorphism and generates quantitative data for the relative proportion of the methylated versus the unmethylated allele. Pyrosequencing assays were established for the quantitative measurement of DNA methylation levels in the promoter region of 8 genes (RASSF1, GSTP1, APC, GNMT, GABRA5, MEST, MGMT, and H19), and LINE-1 using primers previously described [21]. (Table S1 and Figure S2). Hot-start PCR was performed with HotStarTaq Master Mix kit (Qiagen), and pyrosequencing was carried out in accordance with the manufacturer’s protocol (Biotage). The target CpGs were evaluated by converting the resulting pyrograms into numerical values for peak heights, and calculating the average of all CpG sites analyzed at a given gene promoter (Figure S2).

Quantitative RT-PCR Total RNA was isolated using the TRIzol Reagent (Invitrogen) according to the manufacturer’s instructions. Reverse transcription reactions were performed using MMLV-RT (Invitrogen) and random hexamers, according to the manufacturer’s protocol. Primers and probes were designed using Universal Probe Library Assay Design Center (Roche). Quantitative real-time PCR (qRTPCR) was performed in triplicates of each condition, using FastStart TaqMan Probe Master (Roche) and a MXP3000 realtime PCR system (Stratagene).

Statistical Analysis Filtering and unsupervised clustering. BeadStudio version 3.2 (Illumina) was used for obtaining the signal values (AVG-Beta) corresponding to the ratio of the fluorescent signal from the methylated allele (Cy5) to the sum of the fluorescents signals of both methylated (Cy5) and unmethylated alleles (Cy3), 0 corresponding to completely unmethylated sites and 1 to completely methylated sites. In order to avoid the gender effect, all probes in chromosome X (n = 84) were discarded. In addition, all probes with a P value above 0.01 in more than 10% of the samples were excluded from the analysis. BRBArrayTools software (version 3.8 beta2) was used for further analysis, using the AVG-Beta values. CpG sites showing minimal variation across the set of arrays were excluded from the analysis. Gene ontology and molecular interactions were analyzed with GenMAPP version 2.1 (http://GenMAPP.org/), and the KEGG Pathways Database (http://www.genome.jp/kegg/). Unsupervised hierarchical clustering, class comparison, class prediction, KEGG pathway enrichment, and survival prediction were performed with the BRBArrayTools software. Class Comparison. CpG sites were considered differentially methylated when their P value was less than 0.001. In addition, we identified CpG sites that were differentially methylated between tumor and adjacent tissue by using a multivariate permutation test [22] providing 90% confidence that the false discovery rate was less than 10%. The false discovery rate is the proportion of the list of CpG sites claimed to be differentially methylated that are false positives. The test statistics used are random variance t-statistics for each CpG site [23]. Although t-statistics were used, the multivariate permutation test is non-parametric and does not require the assumption of Gaussian distributions. A global test of whether the methylation profiles differed between the classes was also performed by permuting the labels of which CpG methylation states corresponded to which classes. For each permutation, the P values were re-computed, and the number of CpG sites significant at the 0.001 level was noted. The proportion of the permutations that gave at least as many significant CpG sites as with the actual PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

Figure 1. Unsupervised analysis of CpG methylation bead arrays in HCC. A. Clustering analysis of 76 HCC samples included in the bead array assay (HCC tumor and surrounding tissue). For the upper part of the cluster, names are given manually according to the enrichment of specific clusters. 1505 CpG sites are included. Yellow indicates hypomethylated, and red hypermethylated CpG sites. B. Representative logarithmic plot of two replicates included in the array, showing proper consistency of methylation (r2 value is included on the plot). C. Average promoter methylation of all 1505 CpG sites, in HCCs and surrounding tissues. D. Clustering analysis after grouping the samples by ethological factors. E. Average methylation for all 1505 CpG sites from the same ethological groups shown in (d). Significant differences (P,0.05) between tumor and surrounding tissue are represented with an asterisk (*). doi:10.1371/journal.pone.0009749.g001

methylation was statistically significant for HBV and HCV samples (P,0.0001 for both paired analysis). Although promoter methylation was also increased in alcohol-related and unknown-risk HCC samples, the difference did not reach statistical significance. Therefore, a distinct promoter methylation profile is common to all HCC tumors, with global non-promoter hypomethylation and increased promoter methylation.

annotation of gene lists based on individually analyzed genes. The functional class scoring analysis for GO classes was performed using BRB-ArrayTools. Survival Analysis. CpG sites whose methylation was significantly related to overall survival after treatment were selected with BRB-ArrayTools survival analysis. A statistical significant level was computed for each gene based on univariate proportional hazards models. These P values were then used in a multivariate permutation test in which the survival times and censoring indicators were randomly permuted among arrays [27,30]. The multivariate permutation test was used to provide 90% confidence that the false discovery rate was less than 10%. For other comparisons, means and differences of the means with 95% confidence intervals were obtained using GraphPad Prism (GraphPad Software Inc.). The Mann-Whitney test and the Wilcoxon matched pairs test were used for unpaired and paired analysis comparing average methylation between classes, respectively. P values,0.05 were considered statistically significant.

Signature and prediction of HCC by DNA promoter methylation profiling To distinguish those genes differentially methylated between tumors and surrounding tissue, a class comparison tool (BRBArrayTools v3.8) was used, as described in Methods. After filtering for a P value,0.001 and correcting for a False Discovery Rate (FDR) ,0.1, 124 CpG sites were shown to be differentially methylated. Several CpG sites corresponded to the same gene promoter, and consequently a total of 94 genes were considered as differentially methylated. Approximately one third of the significant promoters were significantly represented by more than one CpG site, arguing in favor of the quality of this data. Relative to surrounding tissues, tumors showed increased methylation in 34 (27%) of these CpG sites (corresponding to 27 gene promoters, including RASSF1, APC, and CDKN2A), and reduced methylation in 90 (73%) (corresponding to 66 gene promoters, including GABRA5, NOTCH4, and PGR) (Figure 2A and Table S2). To analyze the frequency of methylated or unmethylated CpG sites in tumors relative to surrounding tissue we used the upper and lower quartile of surrounding tissue to set a threshold (see Methods). This analysis yielded a similar result, with 7 and 35 CpG sites respectively hyper- and hypomethylated in tumors (Figure S3). Validation of a subset of 8 gene promoters by pyrosequencing was consistent with the bead arrays results (Figure S4A). The correlation between pyrosequencing and bead array analysis was statistically significant (P value,0.0001, Figure S4B). In addition, hypermethylation of RASSF1A and of APC promoters was associated with a significantly lower expression in HCC tumors, as assessed by qRT-PCR (Figure S5). The ontological analysis of the differentially methylated genes showed enrichment for ontology terms related to development, including the Wnt-b2catenin, TGF-b, Hedgehog and Notch signaling pathways (data not shown). Methylation of some of these genes has been previously described in HCC (i.e. APC, RASSF1A, and p16/CDKN2A), validating the sensitivity of this assay [14,31,32]. However, many gene promoters that were not previously linked to HCC showed differential methylation, including those involved in apoptosis (IRAK3, MYOD1), immune response (HLA-DQA2, GSTM2, IFNG), growth factor signaling (EGF, FGF6, IGF1R, NGFR), cell cycle regulation (CCND2), and metastasis (CDH17, MMP1, MMP3, MMP9) (Table S2). Interestingly, promoters in the HCC signature included a number of imprinted genes that were consistently hypomethylated in HCC relative to surrounding tissue (GABRA5, GABRG3, HBII-52, MEST, MKRN3, TRPM5, and ZIM3). For most of them there were at least 2 CpG sites differentially methylated, suggesting that this observation is biologically significant.

Results DNA promoter methylation in HCC samples To investigate whether HCC could harbor specific methylation profiles, DNA methylation of 1505 CpG sites was analyzed using Illumina bead arrays. A total of 38 HCC samples were suitable for analysis, including 30 pairs of HCC tumors/surrounding tissues. In addition, 4 liver adenoma tumors/surrounding samples and 4 cancer cell lines were included for comparison. 1219 Probes were used in the analysis, after excluding those with a P value higher than 0.01 in more than 10% of the samples, and those in chromosome X (to avoid the gender effect). An initial unsupervised hierarchical clustering analysis was able to distinguish HCC samples from other types of tumors (breast and esophageal cancer), blood and cell lines (data not shown). Unsupervised clustering within HCC samples was also able to distinguish 2 clusters enriched in tumors and surrounding tissue samples (Figure 1A). Together with the proper clustering of the replicates in the unsupervised analysis, the scatter plots analysis confirmed the quality and reproducibility of the methylation profiling (Figure 1B). Overall, tumor samples displayed a small but significant increase in average promoter CpG methylation (median methylation of 0.16 and 0.23 for surrounding and tumor tissue, respectively, P,0.05) (Figure 1C). This contrasts with the global DNA methylation as assessed with the LINE-1 element [21], which shows a significant hypomethylation in tumors compared to surrounding tissue (P,0.005, Figure S2C). An unsupervised analysis of samples grouped by risk factors (HBV, HCV, alcohol consumption, or unknown risk) showed that surrounding tissues were clustered together, while tumor tissues were in a separate group among which HCV-associated HCC were the most divergent subset (Figure 1D). When analyzing the average promoter methylation for these groups, an increased methylation was consistently found in tumor samples relative to surrounding tissue, with the exception of adenoma samples (Figure 1E). This increase in average promoter PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

Figure 2. Signature and predictor of HCC by methylation profiling. A. Differential methylation analysis was performed with the class comparison tool of BRBArrayTools software, as described in Materials and Methods. The heat map represents those CpG sites distinguishing HCC from surrounding tissue (n = 87) with a P value,0.001. The full list of CpG sites is presented as Table S2. Yellow indicates hypomethylated, and red hypermethylated CpG sites. B. Representation of the misclassification error as a function of the number of genes, as assessed with the PAM prediction analysis. The upper panel shows the correlation for the grouped samples; the lower panel shows the independent correlation for tumor and surrounding samples. Sensitivity and specificity of the predictor is included in the Figure. C. A heat map with the 20 CpG sites included in the HCC predictor was obtained for an independent series of HCC samples and HCC surrounding tissues, after unsupervised hierarchical clustering analysis. doi:10.1371/journal.pone.0009749.g002

The HCC samples analyzed in this study were obtained from patients exposed to different risk factors, including HBV infection, HCV infection, and ethanol consumption. In order to identify risk factor-specific profiles of methylation we performed a class comparison analysis including these groups, and a group of HCC samples with unknown risk factors (negative for HBV or HCV infection, and no history of alcohol consumption). After class comparison analysis, a reduced set of genes was significantly hypermethylated in each group relative to the other 3 groups (Figure 3B). By comparing among these groups it was possible to select CpG sites specifically modulated in alcohol-related (DIO3 and STAT5A), HBV-related (NAT2, CSPG2, DCC, NTKR3, TNFSF10, TNFRSF10C, and RASGRF1), and HCV-related HCCs (RIK and CHGA). Samples from unknown risk factor patients displayed a mixed profile, with hypermethylation of several of these promoters, probably reflecting their heterogeneous origin (Figure 3 and Table S3). The heterogeneity of HCC origin is also reflected in the conservation of the normal architecture of the liver. In this sense, our series of HCC surrounding tissues can be classified into those samples exhibiting cirrhotic (n = 16) or non-cirrhotic (n = 14) histology. Comparison between these two classes using stringent conditions of analysis (P value,0.001) shows that cirrhotic tissues are significantly hypermethylated in 2 gene promoters, corresponding to UGT1A7 and PLG.

The ability to discriminate tumor from surrounding tissue may have clinical impact, especially when small sets of genes are able to produce robust predictions. The significant differences between surrounding and HCC tissues after class comparison suggested the possibility of building a multivariate predictor from this gene set. Therefore, we next used a subset of CpG sites to predict the class of an independent series of HCC tumors and matching surrounding tissues. The models incorporated genes that were differentially methylated between tumor and surrounding tissue at the 0.001 significance level, as assessed by the random variance ttest. The prediction error of each model was assessed using leaveone-out cross-validation (LOOCV) [27]. Interestingly, the 124 CpG sites included in the HCC signature were able to discriminate tumor and surrounding tissue in all the samples included in the second series (data not shown). We next tried to design a predictor with a minimum number of CpG sites using the Prediction Analysis of Microarrays tool (PAM) [28]. As shown in Figure 2B, a minimum of 20 CpG sites is required to minimize the number of misclassification errors. This 20 CpG site predictor (corresponding to 16 gene promoters) was able to correctly classify 14 out of 16 of the new samples (sensitivity = 0.75, specificity = 0.97 for tumor prediction), and was included in the 124 CpG sites signature of HCC. An unsupervised clustering for the new series of HCC samples using this 20 CpG sites-signature highlights its ability to discriminate both types of samples (Figure 2C). Interestingly, the CpG sites with strongest ability to discriminate tumor from surrounding tissue were found in the promoter of genes hypermethylated in HCC samples (e.g. APC, RASSF1A, CDKN2A, and FZD7).

HCC methylation profile and prediction of survival Survival signatures were developed with BRB-ArrayTools using fitted Cox proportional hazards model, considering the time of biopsy as the starting point. At the time of analysis there were 13 deaths among 38 patients with available data, with a mean followup time of 194 weeks for all patients. With these data it was possible to classify the patients into two groups with significantly different survival curves (Figure 4A, P,0.001). The first 10 CpG sites with highest ability to differentiate between these two groups are shown in Figure 4B. Interestingly, this survival signature was significantly enriched in the promoters of genes involved in IGF-1 signaling and immune response (Figure 4C). In addition, the differences found in DNA promoter methylation were reflected in different expression profiles for some of the genes ranking highest in the survival prediction analysis (Figure 4D). This suggests that control of immune and growth factor response genes by methylation may represent a potential mechanism directly affecting the survival of HCC patients.

Methylation profile is associated with HCC risk factor and tumor progression In order to find CpG sites potentially associated with tumor progression, we performed a class comparison analysis to classify the methylation profile according to tumor stage (as assigned by the TNM classification) and grade of differentiation (histologically classified as 1 = well differentiated, 2 = intermediate, and 3 = poorly differentiated). Tumor stage will be referred to as T, as all samples except one [T3N1M0] sample were negative for lymph node invasion (N0) and metastasis (M0). Globally, tumors of the first 2 stages (T1 and T2) displayed a similar methylome profile, while 24 CpG sites were differentially methylated in advanced tumors (T3) (Figure 3A). All CpG sites were significantly hypermethylated in advanced tumors, and most of them show a trend to be progressively hypermethylated from T1 through T3 (Figure 3A). The set of 24 CpG sites hypermethylated in advanced HCC tumors are located in genes involved in immune response and adhesion (IL18BP, IPF1, HLA-DOB, CSPG2, GJB2 and PMP22), and the cell cycle (CCND2 and NTKR3). Similarly, the grade of differentiation was associated with changes in methylation only in the least differentiated tumors (grade 3) (data not shown). Three CpG sites were significantly hypomethylated in grade 3 tumors (e.g. HOXB2, DDR2, and TIMP3), while 19 CpG sites were hypermethylated (including CDK2, EF3, FANCF, LIF, RASGRF1, DNMT1, and ERCC1). PLoS ONE | www.plosone.org

Discussion This report describes the CpG methylation profile of HCC in a wide panel of cancer-related promoters. A differential analysis identified a signature of the genes specifically methylated in HCC with respect to surrounding tissue. Although a number of known promoters were found to be differentially methylated in HCC, we identified new candidate promoters that are potentially involved in the development and progression of liver cancer. By correlating the methylation data with clinical outcomes it was possible to 7

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

Figure 3. Methylation profile according to risk factor and tumor progression. Class comparison analyses were performed, as described in Figure 2. A. The heat map represents 27 CpG sites distinguishing the different HCC samples according to their TNM classification, with a P value,0.05. B. The heat map represents 17 CpG sites distinguishing the different HCC samples according to their ethological exposure, with a P value,0.01. HBV or HCV infection, EtOH = ethanol consumption, and Unknown = unknown risk factor. doi:10.1371/journal.pone.0009749.g003

PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

Figure 4. Survival risk predictor in HCC. A. Survival analysis using BRB-ArrayTools. A survival signature was developed using fitted Cox proportional-hazards model and leave-one-out crossvalidation, considering the time of biopsy as the starting point. Survival curves show a significant difference between two groups of HCC patients. B. A 58 CpG sites predictor (selected from the analysis shown in A.) was correlated with survival after treatment. Only the first 10 CpG sites (with the lowest P value) are shown. C. Pathway analysis for the 58 CpG sites included in the survival predictor showing the 5 significantly enriched pathways. D. Quantitative RT-PCR was performed for some of the genes with the highest ability to predict survival in HCC (MYLK, FLT1, CDKN1C and TAp73, in a subset of samples with high (H) and low (L) risk. doi:10.1371/journal.pone.0009749.g004

establish a DNA methylation predictor of patient survival and clinical parameters such as stage and grade. The strength and low complexity of these signatures, based on a reduced number of gene promoters, makes them a potential novel strategy for early detection and clinical prediction in HCC. Although early detection of HCC has improved, diagnosis is established at only advanced stages. Therefore, there is an urgent need to predict recurrence and response to therapy, especially because patients prone to recurrence may receive alternative treatment. The strength of the presented signatures is underscored by their validation in an independent series of HCC samples. Importantly, despite preliminary studies on clinical prediction based on gene expression profiling [33], the stability of DNA relative to RNA makes methylation profiling a tool better suited to clinical settings. In addition, the availability of signatures with a reduced number of CpG sites would enable their use for clinical prediction in, for example, paraffin-embedded samples or plasma DNA. A small set multivariate predictor may have important applications in the early detection of neoplastic transformation in populations at high risk for HCC, such as hereditary haemochromatosis patients [18]. Similarly, the prediction of survival may be useful in improving and individualizing therapeutic decisions. However, these multivariate signatures should be prospectively validated in larger cohorts before considering clinical applications. The importance of the role of DNA methylation has been previously described in HCC. Epigenetic changes on RASSF1A, p16, and p15 tumor suppressor genes in serum DNA have been shown to be potential biomarkers for early detection in populations at high risk for HCC [18]. The tumor suppressor APC also seems to be a common marker for HCC detection and is found consistently hypermethylated in HCC [12], whereas SYK and CRABP1 hypermethylation has been considered as a useful prognostic marker in HCC [34]. A previous screening of 105 promoters identified that the epigenetic activation of Ras and downstream Ras effectors was common in HCC, and was associated with poor prognosis [8]. In another study, increased methylation was shown in the p16 and GSTP1 genes in HCC compared to matching non-malignant cirrhotic liver [12,35,36]. In this sense, our bead array analysis supports and extends the previous findings on DNA methylation, and provides a novel and more comprehensive signature of HCC methylation. A previous study analyzed a limited panel of cancer-associated genes in HCC tumors and found that environmental factors may influence the degree and pattern of methylation in tumors [37]. Our study identified significant associations between methylation patterns and specific etiologic agents (i.e., HBV, HCV, and ethanol), tumor progression (stage and grade of differentiation), and tumor background (cirrhotic vs. non-cirrhotic surrounding tissue) for specific subsets of genes. Interestingly, those promoters differentially methylated in virus-related HCC samples correspond to genes involved in immune response and induction of apoptosis. Specifically, polymorphisms of the N-acetyltransferase encoded by the NAT2 gene have been linked to susceptibility to HBV-related HCC [38,39]. Moreover, promoter methylation of DNMT1 was associated with poor differentiation.. Remarkably, hypermethylation of the gene encoding DNA-methyltransferase 1 (DNMT1) can PLoS ONE | www.plosone.org

be associated with a lower expression and consequent global hypomethylation as observed with the LINE-1 pyrosequencing analysis. Another interesting observation is that the tumor background (cirrhotic vs. non-cirrhotic) determined a specific pattern of methylation for several promoters. UGT1A7 encodes a UDPglucuronosyltransferase involved in multiple metabolic pathways, including the metabolism of hormones and the metabolism of xenobiotics by cytochrome P450. In addition, UGT1A7 polymorphisms have been correlated with cirrhosis, and with increased risk of HCC in HBV- and HCV-infected patients [40,41,42]. Similarly plasminogen, encoded by PLG, is a circulating zymogen that is converted to the active enzyme plasmin and whose main function is to dissolve fibrin clots. It is noteworthy that PLG transcript expression has been reported to be reduced in HCC [43]. Therefore, aberrant promoter methylation of these two genes may be related with a disturbed detoxification of carcinogens, and the process of hepatic fibrogenesis that results in cirrhosis [44]. Further analysis of these genes may shed new light into the process of liver carcinogenesis in specific risk groups. However, the global similarity among HCC groups substantiates the notion that aberrant methylation is a ubiquitous phenomenon in liver carcinogenesis [8]. In summary, this study describes the methylation profile of hepatocellular carcinoma and the specific signatures that can be used as markers for detection and survival after therapy. Our results, based on bead arrays and quantitative analysis with pyrosequencing, give a reliable view of HCC promoter methylation in a wide panel of genes, and can be used as a reference tool for the potential development of clinical applications.

Supporting Information Figure S1 Representative histology of HCC tumors and surrounding tissues used for methylation profiling. H&E-stained HCC samples with surrounding non-tumor liver parenchyma. Examples of HCC samples with adjacent non-cirrhotic and cirrhotic tissues are shown in A and B, respectively. NC indicates non-cirrhotic surrounding liver tissue, C indicates cirrhotic surrounding liver tissue, and H indicates HCC tissue. Found at: doi:10.1371/journal.pone.0009749.s001 (7.59 MB TIF) Figure S2 Pyrosequencing design for imprinted genes. A. Diagram showing chromosomal localization and GC percentage for GABRA5 promoter, as an example of the design used for validation. The regions studied by bead arrays and pyrosequencing are represented under the chromosomal localization. B. Representative pyrograms of GABRA5 obtained from the analysis of bisulfite-modified DNA from HCC tumor and surrounding tissue. Primers used for pyrosequencing are included as Supplementary Table 1. C. Global methylation was studied using primers against LINE-1 elements [21]. A significant hypomethylation in tumors, relative to surrounding tissue, is shown by a (*) (,0.05). Found at: doi:10.1371/journal.pone.0009749.s002 (1.54 MB TIF) Figure S3 Analysis of frequency of methylation. AVG-Beta values in the surrounding tissues were used to define the percentiles 25 and 75 for each CpG site (see Methods). These 10

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

methylation is observed for the last sample, in which expression in tumors is higher than the matched surrounding tissue. Found at: doi:10.1371/journal.pone.0009749.s005 (2.17 MB TIF)

percentiles were used as a reference to define the frequency of methylation in tumors. A. Box plots representing the 3 CpG sites with highest frequency of methylation in tumors (upper panel) and highest frequency of unmethylation in tumors (lower panel) calculated in this way. S = surrounding, T = tumor. (*) P value , 0.001. B. Table showing the CpG sites frequently methylated in more than 75% of the tumors relative to surrounding tissues. C. Table showing the CpG sites frequently unmethylated in more than 75% of the tumors relative to surrounding tissues. Found at: doi:10.1371/journal.pone.0009749.s003 (2.56 MB TIF)

Table S1 Primers used for pyrosequencing. Found at: doi:10.1371/journal.pone.0009749.s006 (0.06 MB DOC) Table S2 CpG sites differentially methylated in HCC tumor vs. surrounding tissue. Found at: doi:10.1371/journal.pone.0009749.s007 (0.25 MB DOC)

Figure S4 Validation of bead arrays by pyrosequencing A. Pyrosequencing assays were designed for the validation of 8 gene promoters differentially methylated between tumor and surrounding HCC samples (upper dot plot). The level of methylation is shown in a percentage scale. Primers were designed as described in Materials and Methods. A dot plot representing the corresponding levels of methylation (in a 0 to 1 scale) for the same genes in the bead arrays assay is shown in the lower panel. B. Correlation analysis from the data presented in (A). Found at: doi:10.1371/journal.pone.0009749.s004 (1.45 MB TIF)

Table S3 CpG sites differentially methylated in HCC according to risk factor exposure. Found at: doi:10.1371/journal.pone.0009749.s008 (0.07 MB DOC)

Figure S5 Validation of bead arrays by qRT-PCR Quantitative RT-PCR was performed for APC and RASSF1A in a subset of samples. The bars show a lower expression in the tumors relative to surrounding tissue in 3 out of 4 samples analyzed. In addition, inverse correlation with methylation is shown in each plot. Each line represents the AVG-Beta value obtained with bead arrays for 2 independent probes in the same promoter. Higher initial

Author Contributions

Acknowledgments We want to acknowledge G. Durand from the Genetic Cancer Susceptibility Group in IARC for technical assistance with the Illumina bead array assay. Further thanks are due to John Daniel for editing the manuscript.

Conceived and designed the experiments: HHV MPL ZH. Performed the experiments: HHV MPL FLCK SMC. Analyzed the data: HHV MPL. Contributed reagents/materials/analysis tools: JYS. Wrote the paper: HHV ZH. Processed the samples and the clinicopathological data: GG. Gave conceptual assistance: SVT JYS. Contributed to planning the experiments: SVT JYS.

References 1. Parkin DM (2001) Global cancer statistics in the year 2000. Lancet Oncol 2: 533–543. 2. Feitelson MA (2006) Parallel epigenetic and genetic changes in the pathogenesis of hepatitis virus-associated hepatocellular carcinoma. Cancer Lett 239: 10–20. 3. Gomaa AI, Khan SA, Toledano MB, Waked I, Taylor-Robinson SD (2008) Hepatocellular carcinoma: epidemiology, risk factors and pathogenesis. World J Gastroenterol 14: 4300–4308. 4. Herath NI, Leggett BA, MacDonald GA (2006) Review of genetic and epigenetic alterations in hepatocarcinogenesis. J Gastroenterol Hepatol 21: 15–21. 5. Thorgeirsson SS, Grisham JW (2002) Molecular pathogenesis of human hepatocellular carcinoma. Nat Genet 31: 339–346. 6. Jones PA, Baylin SB (2002) The fundamental role of epigenetic events in cancer. Nat Rev Genet 3: 415–428. 7. Issa JP (2004) CpG island methylator phenotype in cancer. Nat Rev Cancer 4: 988–993. 8. Calvisi DF, Ladu S, Gorden A, Farina M, Lee JS, et al. (2007) Mechanistic and prognostic significance of aberrant methylation in the molecular pathogenesis of human hepatocellular carcinoma. J Clin Invest 117: 2713–2722. 9. Suzuki K, Suzuki I, Leodolter A, Alonso S, Horiuchi S, et al. (2006) Global DNA demethylation in gastrointestinal cancer is age dependent and precedes genomic damage. Cancer Cell 9: 199–207. 10. Kondo Y, Kanai Y, Sakamoto M, Mizokami M, Ueda R, et al. (2000) Genetic instability and aberrant DNA methylation in chronic hepatitis and cirrhosis–A comprehensive study of loss of heterozygosity and microsatellite instability at 39 loci and DNA hypermethylation on 8 CpG islands in microdissected specimens from patients with hepatocellular carcinoma. Hepatology 32: 970–979. 11. Flanagan JM (2007) Host epigenetic modifications by oncogenic viruses. Br J Cancer 96: 183–188. 12. Lee S, Lee HJ, Kim JH, Lee HS, Jang JJ, et al. (2003) Aberrant CpG island hypermethylation along multistep hepatocarcinogenesis. Am J Pathol 163: 1371–1378. 13. Tischoff I, Tannapfe A (2008) DNA methylation in hepatocellular carcinoma. World J Gastroenterol 14: 1741–1748. 14. Yang B, Guo M, Herman JG, Clark DP (2003) Aberrant promoter methylation profiles of tumor suppressor genes in hepatocellular carcinoma. Am J Pathol 163: 1101–1107. 15. Yu J, Ni M, Xu J, Zhang H, Gao B, et al. (2002) Methylation profiling of twenty promoter-CpG islands of genes which may contribute to hepatocellular carcinogenesis. BMC Cancer 2: 29. 16. Yu J, Zhang HY, Ma ZZ, Lu W, Wang YF, et al. (2003) Methylation profiling of twenty four genes and the concordant methylation behaviours of nineteen

PLoS ONE | www.plosone.org

17.

18.

19.

20.

21.

22.

23.

24. 25.

26.

27.

28.

29.

genes that may contribute to hepatocellular carcinogenesis. Cell Res 13: 319–333. Gao W, Kondo Y, Shen L, Shimizu Y, Sano T, et al. (2008) Variable DNA methylation patterns associated with progression of disease in hepatocellular carcinomas. Carcinogenesis 29: 1901–1910. Zhang YJ, Wu HC, Shen J, Ahsan H, Tsai WY, et al. (2007) Predicting hepatocellular carcinoma by detection of aberrant promoter methylation in serum DNA. Clin Cancer Res 13: 2378–2384. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, et al. (2006) Highthroughput DNA methylation profiling using universal bead arrays. Genome Res 16: 383–393. Vaissiere T, Hung R, Zaridze D, Moukeria A, Cuenin C, et al. (2008) Quantitative analysis of DNA methylation profiles in lung cancer identifies aberrant DNA methylation of specific genes and its association with gender and cancer risk factors. Cancer Res In Press. Daskalos A, Nikolaidis G, Xinarianos G, Savvari P, Cassidy A, et al. (2009) Hypomethylation of retrotransposable elements correlates with genomic instability in non-small cell lung cancer. Int J Cancer 124: 81–87. Korn EL, Li MC, McShane LM, Simon R (2007) An investigation of two multivariate permutation methods for controlling the false discovery proportion. Stat Med 26: 4428–4440. Wright GW, Simon RM (2003) A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19: 2448–2455. Radmacher MD, McShane LM, Simon R (2002) A paradigm for class prediction using gene expression profiles. J Comput Biol 9: 505–511. Dudoit S, Fridlyand J, Spee TP (2002) Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association 97: 77–87. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98: 15149–15154. Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95: 14–18. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99: 6567–6572. Pavlidis P, Qin J, Arango V, Mann JJ, Sibille E (2004) Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex. Neurochem Res 29: 1213–1222.

March 2010 | Volume 5 | Issue 3 | e9749

Methylome Signature in HCC

37. Shen L, Ahuja N, Shen Y, Habib NA, Toyota M, et al. (2002) DNA methylation and environmental exposures in human hepatocellular carcinoma. J Natl Cancer Inst 94: 755–761. 38. Yu MW, Pai CI, Yang SY, Hsiao TJ, Chang HC, et al. (2000) Role of Nacetyltransferase polymorphisms in hepatitis B related hepatocellular carcinoma: impact of smoking on risk. Gut 47: 703–709. 39. Agundez JA, Olivera M, Ladero JM, Rodriguez-Lescure A, Ledesma MC, et al. (1996) Increased risk for hepatocellular carcinoma in NAT2-slow acetylators and CYP2D6-rapid metabolizers. Pharmacogenetics 6: 501–512. 40. Kong SY, Ki CS, Yoo BC, Kim JW (2008) UGT1A7 haplotype is associated with an increased risk of hepatocellular carcinoma in hepatitis B carriers. Cancer Sci 99: 340–344. 41. Wang Y, Kato N, Hoshida Y, Otsuka M, Taniguchi H, et al. (2004) UDPglucuronosyltransferase 1A7 genetic polymorphisms are associated with hepatocellular carcinoma in japanese patients with hepatitis C virus infection. Clin Cancer Res 10: 2441–2446. 42. Tang KS, Lee CM, Teng HC, Huang MJ, Huang CS (2008) UDPglucuronosyltransferase 1A7 polymorphisms are associated with liver cirrhosis. Biochem Biophys Res Commun 366: 643–648. 43. Kinoshita M, Miyata M (2002) Underexpression of mRNA in human hepatocellular carcinoma focusing on eight loci. Hepatology 36: 433–438. 44. Friedman SL (2008) Mechanisms of hepatic fibrogenesis. Gastroenterology 134: 1655–1669.

30. Simon R (2003) Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data. Br J Cancer 89: 1599–1604. 31. Zhong S, Yeo W, Tang MW, Wong N, Lai PB, et al. (2003) Intensive hypermethylation of the CpG island of Ras association domain family 1A in hepatitis B virus-associated hepatocellular carcinomas. Clin Cancer Res 9: 3376–3382. 32. Zhu J (2006) DNA methylation and hepatocellular carcinoma. J Hepatobiliary Pancreat Surg 13: 265–273. 33. Hoshida Y, Villanueva A, Kobayashi M, Peix J, Chiang DY, et al. (2008) Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N Engl J Med 359: 1995–2004. 34. Lee HS, Kim BH, Cho NY, Yoo EJ, Choi M, et al. (2009) Prognostic implications of and relationship between CpG island hypermethylation and repetitive DNA hypomethylation in hepatocellular carcinoma. Clin Cancer Res 15: 812–820. 35. Zhang YJ, Ahsan H, Chen Y, Lunn RM, Wang LY, et al. (2002) High frequency of promoter hypermethylation of RASSF1A and p16 and its relationship to aflatoxin B1-DNA adduct levels in human hepatocellular carcinoma. Mol Carcinog 35: 85–92. 36. Jung JK, Arora P, Pagano JS, Jang KL (2007) Expression of DNA methyltransferase 1 is activated by hepatitis B virus X protein via a regulatory circuit involving the p16INK4a-cyclin D1-CDK 4/6-pRb-E2F1 pathway. Cancer Res 67: 5771–5778.

PLoS ONE | www.plosone.org

March 2010 | Volume 5 | Issue 3 | e9749

Whole Methylome Analysis by Ultra-Deep Sequencing Using Two-Base Encoding Christina A. Bormann Chung1*, Victoria L. Boyd1, Kevin J. McKernan2, Yutao Fu2, Cinna Monighetti1, Heather E. Peckham2, Melissa Barker1 1 Life Technologies, Foster City, California, United States of America, 2 Life Technologies, Beverly, Massachusetts, United States of America

Abstract Methylation, the addition of methyl groups to cytosine (C), plays an important role in the regulation of gene expression in both normal and dysfunctional cells. During bisulfite conversion and subsequent PCR amplification, unmethylated Cs are converted into thymine (T), while methylated Cs will not be converted. Sequencing of this bisulfite-treated DNA permits the detection of methylation at specific sites. Through the introduction of next-generation sequencing technologies (NGS) simultaneous analysis of methylation motifs in multiple regions provides the opportunity for hypothesis-free study of the entire methylome. Here we present a whole methylome sequencing study that compares two different bisulfite conversion methods (in solution versus in gel), utilizing the high throughput of the SOLiDTM System. Advantages and disadvantages of the two different bisulfite conversion methods for constructing sequencing libraries are discussed. Furthermore, the application of the SOLiDTM bisulfite sequencing to larger and more complex genomes is shown with preliminary in silico created bisulfite converted reads. Citation: Bormann Chung CA, Boyd VL, McKernan KJ, Fu Y, Monighetti C, et al. (2010) Whole Methylome Analysis by Ultra-Deep Sequencing Using Two-Base Encoding. PLoS ONE 5(2): e9320. doi:10.1371/journal.pone.0009320 Editor: Raya Khanin, Memorial Sloan Kettering Cancer Center, United States of America Received October 9, 2009; Accepted January 26, 2010; Published February 22, 2010 Copyright: ß 2010 Bormann Chung et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by Life Technologies. Data available upon request. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: All authors are employees of Life Technologies, in which some authors may have equity and/or stock options. Life Technologies manufactures and sells the SOLiD system and reagents used to perform the global sequencing analysis provided in this manuscript. Life Technology provided the instrumentation, reagents, software and salaries to the scientists who conducted the studies. The authors’ affiliation to Life Technologies does not interfer with adherence to PLos One policies on sharing data and all the materials used are commercially available. The decision to publish was a volunteer activity on the part of the authors. * E-mail: Christina.Chung@lifetech.com

will be introduced into the sequence at non-methylated C sites and Cs remain at 5mC sites (Fig. 1B), thus allowing an exact interrogation of all possible methylation sites in the genomic sequence. Next-generation sequencing technologies (NGS) provide an ideal tool for DNA methylation profiling, due to their massively parallel sequencing capability, which provides a huge amount of data in a relatively short amount of time at minimal cost per base compared to Sanger sequencing technologies. Whereas several studies have been published that use NGS to perform DNA methylation profiling following an enrichment technique [2,5–9], only one other study to our knowledge has been published studying a whole methylome with bisulfite sequencing [10]. Here we present the first whole methylome bisulfite sequencing study using the SOLiDTM (Sequencing by Oligonucleotide Ligation and Detection) platform. This technology differs from other NGS technologies by the interrogation of two bases at a time by ligation chemistry, and detection of one of four colors associated with those specific two bases (for more details on this technology and color sequencing, see [11–14]). We also show the comparison of two bisulfite conversion methods–performing bisulfite conversion in solution versus in a polyacrylamide gel. The SOLiD sequencing reported here was performed with bacterial libraries that were prepared similar to our previous publication [15], but with some differences that were informative.

Introduction The addition of methyl groups to cytosine (C) through DNA methyltransferases plays an important role in the regulation of human chromatin structure and gene expression. Methylation of C is involved in biological processes such as X chromosome inactivation, imprinting, embryogenesis, gametogenesis, and silencing of repetitive DNA elements in healthy and diseased cells. Thus, the study of DNA methylation can provide important insights into the regulation of cell differentiation, development, and diseases, such as cancer [1]. Most DNA methylation studies to date depend on pre-selection or enrichment of local genome areas through enzymatic approaches or the use of specific antibodies (MeDip) or DNA-binding proteins (e.g., MethylMiner) and subsequent Sanger sequencing technologies [2,3]. However the biological importance and complex nature of DNA methylation has led to an increased interest in studying this phenomenon on a global approach, i.e., a methylation profile of the whole genome or ‘‘methylome’’. Bisulfite sequencing is widely used for DNA methylation profiling, because of its accuracy and its ability to provide information about the methylation status of C independent of the genomic location or sequence context [4]. Prior to sequencing, the DNA is bisulfite treated, which converts unmethylated Cs to uracil (U), while 5-methylcytosine (5mC) remain unchanged. During subsequent PCR amplification, C to thymine (T) changes PLoS ONE | www.plosone.org

February 2010 | Volume 5 | Issue 2 | e9320

Global Bisulfite Sequencing

Figure 1. Library construction to protect the adapter sequence from bisulfite conversion. Genomic DNA (5 ug) was sheared by sonication and end-repaired to yield 59-phosphorylated (59P) blunt ends. Two double-stranded oligonucleotide adaptors, having only one preselected oligonucleotide protected by 5mC (5mC/black) against bisulfite conversion, were ligated to the DNA fragments. After nick-translation with 5mC-dNTP one adaptor consists of two fully 5mC-protected oligonucleotides, whereas the other adaptor still contains one oligonucleotide with unprotected regular Cs (Fig. 1A). Following a size selection to 175â&#x20AC;&#x201C;225 bp on an agarose gel, equal amounts of DNA (240 ng) were used for bisulfite conversion in solution and in gel, respectively. During bisulfite conversion, the DNA is denatured and due to only three 5mC-protected adaptor strands, the fourth adaptor strand was bisulfite converted thus changing the sequence by altering C to U. During PCR amplification (scheme B) with regular four-base primers (A, G, C, and T) complement to the library adaptors, only one of the fragments was amplified. 59P = 59-phosphorylated blunt ends, 5mC = 5-methylcytosine. doi:10.1371/journal.pone.0009320.g001

PLoS ONE | www.plosone.org

February 2010 | Volume 5 | Issue 2 | e9320

Global Bisulfite Sequencing

reference are invalid. During the matching pipeline in the SOLiDTM System Analysis Pipeline Tool, reads are automatically mapped to both the forward and reverse-complement sequence of the reference. Therefore, reads that were erroneously mapped to the reverse-complement strand of the + or 2 bisulfite converted reference were removed from the mapping file, and the new file was put through the Analysis Pipeline Tool again to obtain the correct mapping statistics. Through comparison of the first (including matches to the reverse-complement strand) and the second mapping statistics, the amount of mismapped reads was determined. Only 0.010â&#x20AC;&#x201C;0.015% of total matches were mismapped to the reverse-complement strand (Table 1). Investigating the mismapped reads for library bis-sol, 2 bisulfite reference further, showed that 0.00003% of reads were expected mismatches due to bisulfite conversion, 0.002% were mismapped due to CCWGG motifs in the reads, 0.003% were mismapped due to incomplete bisulfite conversion, and 0.01% were mismapped due to instrument errors or contamination (resulting in 0.015% of total mismapped reads). The mapping overall showed to be very accurate (.99.98%) and comparable to non-bisulfite SOLiD sequencing. The total number of matches to the bisulfite converted reference (as shown in table 2) can be obtained by adding up all reads mapped to both the + and 2 bisuflite converted reference. In both libraries (bis-sol and bis-gel) ,60% of total reads could be matched uniquely ( = each read maps to one single position on the reference sequence) to the +/2 bisulfite converted reference sequence. This number is comparable to unique matching statistics of sequence reads from regular non-bisulfite converted DH10B libraries (SOLiDTM System E. coli DH10B Fragment Data Set, http://solidsoftwaretools.com/gf/project/dh10bfrag/). As expected, the bisulfite-converted libraries (bis-sol and bis-gel) did not map well against the normal reference (Table 2). Less than 0.1% of sequencing reads were mapped uniquely to this reference. This difference in unique matching (towards +/2 bisulfite converted or normal reference) is also illustrated in Figures 2A and B, which show a coverage plot of sequencing reads from both libraries against the three reference sequences. Sequencing reads of both libraries covered the + and 2 bisulfite converted references more than 270 times. The coverage plots (Figures 2A and B), reveal one large non-covered region. This region contains a 113 kb region that is exactly duplicated in tandem [18]. Thus, sequencing reads will map to both duplications and do not appear

Results and Discussion Library Construction and Bisulfite Conversion in Solution versus in Gel Two bisulfite-converted libraries of Escherichia coli (E. coli) DH10B were constructed, and the bisulfite conversion was performed in solution (bis-sol) or in a polyacrylamide gel (bis-gel) as previously described [15] (see Figure 1 for outline). Equal amounts of DNA (240 ng) were used for both bisulfite conversion methods to compare the efficiency of each method. Both bis-sol and bis-gel amplified equally well with 12 PCR cycles yielding 2.19 ng/ul and 2.39 ng/ul of DNA, respectively, thus showing that both conversion methods result in libraries of equal quantity. Apparently, DNA loss is not as high as previously anticipated during in solution bisulfite conversion. However, there are less experimental steps during in-gel bisulfite conversion, shortening the hands-on time in the lab. On the other hand, the in-gel method is limited by the DNA input, because the thin polyacrylamide gel has a low DNA capacity. Therefore, the in solution method is preferred for larger DNA input (.2 ug starting material), while the in gel method is ideal for low amounts of starting material. As shown previously [15], 50 ng and even 5 ng can be bisulfite converted successfully in a gel with 15 cycles and 22 cycles of PCR amplification. This would correspond to about 500 ng or 50 ng of starting material, respectively, if losses are assumed to be the same as in the experiment described here. Loss of library molecules for NGS applications is apparent for other bisulfite conversion protocols [10,16] as evidenced by the need for higher DNA input (5 ug) and higher required number of cycles for amplification (18 cycles). A lower number of PCR cycles during library construction is desired to prevent the introduction of PCR biases and consequently reduced complexity of libraries.

SOLiD Sequencing The two libraries, bis-sol and bis-gel, were amplified on magnetic beads by emulsion PCR (ePCR) according to standard SOLiD protocols, with the exception that additional dATP and dTTP was added to the aqueous ePCR phase to compensate for the low complexity of the bisulfite converted libraries. A 5% increase in the concentration of dATP and dTTP was sufficient to improve ePCR yields and is within the expected variable caused by hydrolysis of the dNTPs to di- and monophosphate. The slight change in the dNTP ratio is below that used to cause mutagenesis [17]. Additionally, bisulfite-SOLiD sequencing data provided excellent reference matching, similar to the non-bisulfite treated DH10B data. Each library was then sequenced to 50 base pairs on two quarters of a slide with SOLiD 3.0 chemistry. After bisulfite conversion, the sense and antisense (+/2) DNA strands are no longer complementary: non-methylated Cs in both the + and the 2 strand will appear in the SOLiD reads as T, whereas only methylated Cs appear as Cs (Fig. 1B). Thus, two bisulfite converted reference sequences were created in silico from DH10B by replacing all Cs with Ts for the + and the 2 strand, respectively (+/2 bisulfite converted reference). As a control, regular non-bisulfite-converted DH10B genomic sequence was used (normal reference). The SOLiD reads from the two libraries were mapped towards all three reference sequences allowing a maximum of five mismatches, using the SOLiDTM System Analysis Pipeline Tool. Due to the design of the library construction, only one DNAfragment strand had fully 5mC-protected Adapter sequences and was therefore amplified during large-scale library PCR (Fig. 1). Consequently, reverse-complement reads are non-existent and matches to the reverse-complement of the bisulfite converted PLoS ONE | www.plosone.org

Table 1. Percent of mismatches to the reverse complement strand of the bisulfite converted reference.

+ bisulfite converted Reference Count bis-sol

bis-gel

Total matches

21,197,732

Mismatches to the reverse complement

3,049

Total matches

28,901,701

Mismatches to the reverse complement

3,100

2 bisulfite converted Reference %

Count

21,227,698 0.014

3,129

0.015

29,019,613 0.011

3,013

0.010

According to the library construction design, bisulfite converted reads should not map to the reverse complement of the +/2 bisulfited reference. Sequencing reads of the two libraries (bis-sol and bis-gel) were first mapped to both strands of each reference. Invalid matches to the reverse complement strand were then filtered out and counted. doi:10.1371/journal.pone.0009320.t001

February 2010 | Volume 5 | Issue 2 | e9320

Global Bisulfite Sequencing

Cs will show up in this analysis as a valid SNP T R C (+ or 2 bisulfite converted reference R sequencing read). Thus, all positions of T R C SNPs were matched to the location of CCWGG sites to determine eligibility of a non-bisulfite converted C to be methylated. All other T R C SNPs are therefore incomplete bisulfite converted Cs. Only between 0.001â&#x20AC;&#x201C;0.003% of Cs fell into the later category (Table 3), thus showing that bisulfite conversion during library construction in-gel and in-solution were both greater than 99.99%. Of the expected 12,174 CCWGG motifs present in the DH10B genome sequence, ,11,300 or ,92.8% motifs were covered with both bis-sol and bis-gel libraries (Table 3). Over 99.5% of those covered CCWGG sites were methylated (CCmWGG) and the remaining 0.5% were partially methylated (CYWGG) (Figure 4). The complete or nearly complete methylation of all CCWGG sites was expected, since E. coli has a repair system that preserves methylated dcm sites [19]. Each base of the genome was sequenced multiple times (,300-fold coverage), which permitted easy identification of partially methylated sites. A site was called partially methylated, if (1) T ( = non-methylated) was called at least 25% of the time or if (2) T was called at least 14% and the same genomic position was partially methylated after criteria 1 in the corresponding library (bis-sol or bis-gel). All 52 or 54 (+ or 2 bisulfite converted reference, respectively) CYWGG sites that were found in the bis-gel library were also present in the bis-sol library. In total 44 CYWGG sites were present in both libraries (bis-sol and bis-gel) and on both strands (+/2 bisulfite converted reference). The biological significance of partially methylated sites in DH10B is unknown and requires further investigation. However, the detection of those sites by SOLiD sequencing demonstrates how deep-sequencing of the whole methylome and determination of the methylation status at specific sites can be achieved by next-generation sequencing technologies. In order to investigate, how a lower mismatch number might affect the detection sensitivity for methylation and incomplete bisulfite conversion due to higher reference bias, a genotyping approach was used to locate all reads that match a CCWGG motif. Read counts for each motif were then adjusted by using different number of mismatches (2, 3, 4, and 5 mismatches, respectively) (Table S1). Beyond the slight gains by increasing the number of mismatches from 2 to 3 and from 3 to 4, the high coverage of SOLiD data practically eliminates any reference bias due to low mismatch numbers. All CCWGG motifs in DH10B are separated far enough from each other, so that at least one local 50 bp read covering each Cm can in theory be correctly matched to the bisulfite converted genome, using at least 4 mismatches. There are 57 motifs that can not be queried in silico by any 50 bp read at 3 mismatches, and 61 motifs at 2 mismatches, which results in a 99.5% theoretical coverage of all CCWGG motifs. However, the human genome is more difficult to query for methylated CpG motifs, due to clustered CpG-islands. These non-separable islands need to be mapped to unconverted reference sequences in order to ensure correct matching. In conclusion, two different bisulfite conversion methods were compared using a 5mC-protected adaptor protocol to construct libraries of E. coli strain DH10B. Regardless of the bisulfite conversion method, excellent mapping statistics to in silico bisulfiteconverted references were obtained and methylated CCWGG sites were identified. Thus, both library construction methods are equally well-suited for whole methylome sequencing. Furthermore, this study shows that SOLiD-bisulfite sequencing is sensitive enough to identify partially methylated sites. The ability to detect the relatively rare methylation event at a single site has to date been hampered by the limitations of the tools traditionally

Table 2. Matching statistics of libraries bis-sol and bis-gel against bisulfite converted and normal reference.

bis-sol

+/2 bisulfite converted Reference

Normal Reference

Count

Beads found

74,366,093

Uniquely placed beads (#5 mismatches)

42,419,252

Bases not uniquely covered 332,578 bis-gel

Beads found

89,599,880

Uniquely placed beads (#5 mismatches)

57,915,201

Bases not uniquely covered 330,772

74,366,093 57.04 35,389

0.05

7.10

87.22

4,087,762 89,599,880

64.64 68,581

0.08

7.06

73.32

3,436,445

doi:10.1371/journal.pone.0009320.t002

as uniquely mapped reads. A coverage plot of regular non-bisulfite converted DH10B sequencing reads (Figure 2C, sequencing data from SOLiDTM System E. coli DH10B Fragment Data Set, http:// solidsoftwaretools.com/gf/project/dh10bfrag/) shows the same region uncovered. The conversion of the genome from a 4-base alphabet to a 3base alphabet lengthens the number of bases required for unique alignment. This reduced complexity in base space increases the number of non-unique sequence matches to the in silico bisulfite converted genome. SOLiD sequencing is based on two-base interrogation during color-space sequencing, which retains all four sequencing colors, even when sequencing a genome lacking several of the 16 possible two-base combinations, such as in bisulfite sequencing (Figure 3). The assembly of the genome as dinucleotide units (color space) increases the ability to uniquely map in color space relative to base space [13]. The purpose of bisulfite sequencing is to identify methylated CpGs which manifest in short read sequences as mismatches. With perfectly aligned reads that contain methylated motifs, both base space and color space will provide unique mapping. However, in silico calculations using the bisulfite-converted human genome that permits mismatches in reads containing methylated CpGs show a 5% increase in unique mapping for 25 bp reads (2 mismatches) and 50 bp reads (5 mismatches) in chromosome 20. This color space advantage increases to 40% over base space when mapping the in silico created bisulfite converted 25 bp reads (with 2 mismatches) for the entire human genome. This increase in unique mapping is expected to extend to the mapping of longer reads (50 bp, 5 mismatches) in color space, as shown through a sampling approach with chromosome 17. Aligning this chromosome against the whole genome reveals an advantage of mapping in color space by 42% for 25 bp reads (2 mismatches) and 8% for 50 bp reads (5 mismatches) in terms of percentages of uniquely aligned portions.

Identification of Methylated CCmWGG Sites (W = either A or T) Two DNA methylases are present in E. coli, dam and dcm. While dam methylates adenine (A), dcm methylates the second C residue in the motif CCWGG to form 5mC [19â&#x20AC;&#x201C;20]. Therefore, nonbisulfite converted Cs, which correspond to methylated Cs, should only be present in this motif. In order to find those methylated Cs, the uniquely mapped reads (to the + and 2 bisulfite converted references) were first analyzed using the SNP pipeline of the SOLiDTM System Analysis Pipeline Tool. Non-bisulfite converted PLoS ONE | www.plosone.org

February 2010 | Volume 5 | Issue 2 | e9320

Global Bisulfite Sequencing

Figure 2. Comparison of the sequencing coverage for bis-sol, bis gel and a non-bisulfite converted DH10B sequencing run. Plots A (bis-sol) and B (bis gel) show the coverage of sequencing reads against the +/2 bisulfite converted references (dark/light blue) and the normal reference (red). As a comparison, plot C shows the coverage of a regular non-bisulfited DH10B sequencing run matched towards the normal reference (yellow) (Data from SOLiD software communtiy web-site). The missing coverage in all three plots corresponds to a 133 kb perfect repeat in tandem. Since reads will match to both repeats, they will not show up under the unique matches. doi:10.1371/journal.pone.0009320.g002

v2.0 User Guide (Applied Biosystems, Foster City, CA, USA). Bacterial DH10B gDNA (Lofstrand Labs Limited, Gaithersburg, MD, USA) (5 mg) was sheared into fragments in a 13665 mm borosilicate tube (Covaris, Woburn, MA, USA) in 500 mL 10 mM Tris, pH 8.0, plus 5% w/v 2-micron borosilicate glass dry spheres (Duke Scientific Corporation, Fremont, CA, USA), using a Covaris S2 (Covaris, Woburn, MA, USA) (shearing conditions: cycle no. 10, bath temperature 5uC, Mode: power tracking, duty cycle 20%, intensity 10, cycles/burst 1000, time 60 sec). The sheared DNA was subsequently purified with the MinElute Reaction Cleanup Kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions using buffer ERC and eluting the DNA off the columns using two times 15 uL buffer EB. The DNA was then quantitated using a NanoDrop ND 1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA,

used to study methylation. This new level of detection will be an advantage in many applications of bisulfite sequencing in which copy number variation of a specific methylation motif is of biological importance. Additionally, preliminary in silico computations indicate an advantage of bisulfite color versus base space sequencing in more complex genomes, such as human. However, only future bisulfite sequencing experiments of a human genome on NGS platforms, will give a definite answer to this question.

Materials and Methods Library Construction (Figure 1A) The protocol is based upon the ‘‘SOLiDTM System Fragment Library Preparation: Higher Input (2–20 ug) or Higher-Complexity DNA’’ protocol of the Applied Biosystems SOLiDTM System PLoS ONE | www.plosone.org

February 2010 | Volume 5 | Issue 2 | e9320

Global Bisulfite Sequencing

(for details and sequences see [15]) whereas double-stranded adaptor P2 was identical to the adaptor used in the standard protocol. The adaptors were ligated to the end-repaired DNA fragments in a 30:1 molar ratio for 10 min at room temperature using the Quick Ligation Kit (New England BioLabs, Beverly, MA, USA) (1x quick ligase reaction buffer, 1 uL quick ligase/ 40 mL reaction volume). The ligation reaction was purified using Agencourt AMPure beads (Agencourt Bioscience Corporation, Beverly, MA, USA) in order to exclude any remaining 68 bp adaptor-dimers. For this, 1.8 volumes of AMPure beads were added to the ligation reaction and incubated with rotation for 5 minutes at room temperature. Then the beads were placed on a magnetic stand and the supernatant was discarded. The beads were washed three times with 70% ethanol, air dried, and DNA was eluted off the beads by adding 35 mL 10 mM Tris, pH 8.0. During the adaptor ligation only the 59-39 strand of the adaptor ligated to the 59P ends of the DNA fragments. After the DNA purification the 39-59 adaptor strands were filled in by nicktranslation using a dNTP solution, containing 5m-dCTP instead of dCTP as previously described [15]. The nick translation reaction was carried out for 30 minutes at 16uC using DNA Polymerase I (New England BioLabs, Beverly, MA, USA) (1x NEB buffer 2, 2 mM 5mC-dNTP, 0.25 U/ml DNA Polymerase I). The enzymatic reaction was purified with the MinElute Reaction Cleanup Kit using 20 mL buffer EB to elute the DNA off the column and quantitated as described above. The nick-translated DNA was then size selected to 175–225 bp on a 3% agarose gel (BioRad Laboratories, Hercules, CA, USA). In order to purify the DNA from the gel, the MinElute Gel Extraction Kit (Qiagen, Valencia, CA, USA) was used by adding six volumes of buffer QG to the gel pieces and vortexing the mixture until the gel was dissolved (about 5 min). This solution was then applied to the MinElute columns and washed according to manufacturer’s instructions. The DNA was eluted off the columns by applying 25 mL buffer EB and quantitated as described above. A 240 ng aliquot of the size-selected DNA was bisulfite converted in solution as previously described [15]. An equal

Figure 3. Sequencing of normal versus bisulfite converted DNA using two-base encoding. During bisulfite conversion unmethylated cytosine (C) gets converted to thimine (T), thus reducing the sequence into predominantly three bases (adenine (A), guanine (G), and T). During color space sequencing, two bases are interrogated at the same time, resulting in one color being recorded. Thus sequencing a reduced complex sequence, such as bisulfite converted DNA, still results into all four colors being used during sequencing. FAM, CY3, TXR, CY5: Fluorescent labels used for color space sequencing. doi:10.1371/journal.pone.0009320.g003

USA). In order to repair damaged DNA ends and obtain 59phosphorylated blunt-ends (59P), the fragments were end-repaired using the End-It DNA End-Repair Kit (Epicentre Biotechnologies, Madison, WI, USA) according to the manufacturer’s instructions and incubated at room temperature for 30 minutes. The enzymatic reaction was purified using the MinElute Reaction Cleanup Kit and the DNA recovered with two 20 ul buffer EB elutions, then quantified as described above. Adaptors used in this protocol deviated from the standard SOLiD fragment library protocol in that the top strand P1-A of the double-stranded P1 adaptor was synthesized using 5mC in place of C in order to protect the adaptor from modification during bisulfite conversion

Table 3. Methylation status of CCWGG sites and bisulfite conversion efficiency in bis-sol and bis-gel.

bis-sol

bis-gel

+ bisulfite converted Reference

2 bisulfite converted Reference

Count

CCWGG sites covered

11,295

CCWGG

0.00

11,292 0

0.00

CCmWGG

11,241

99.52

11,236

99.50

CYWGG

0.49

0.50

Total C

1,190,995

1,188,905

unconverted C

CCWGG sites covered

11,304

CCWGG

0.00

CCmWGG

11,252

99.54

11,248

99.52

CYWGG

0.46

0.48

0.002

Total C

1,190,995

unconverted C

0.003

0.002

11,302 0.00

1,188,905 0.001

Fully CCmWGG and partially methylated CYWGG sites were discovered by comparing the sequencing reads against the +/2 bisulfite converted reference. Methylated Cs will show up as a SNP conversion T R C (bisulfite converted reference R sequence read) within the CCWGG motif. A T R C SNP outside this motif indicates an incomplete bisulfite conversion (unconverted C). Comparing this number with the total number of Cs present in the DH10B genome, indicates the bisulfite conversion efficiency of the library protocol. doi:10.1371/journal.pone.0009320.t003

PLoS ONE | www.plosone.org

February 2010 | Volume 5 | Issue 2 | e9320

Global Bisulfite Sequencing

Figure 4. Visualization of methylated CCmWGG and hemimethyalted CYWGG sites with the SOLiDTM System Alignment Browser. One fully methylated CCmWGG sites (seen at the left) and one partially methylated CYWGG site (right) are shown. Methylated Cs show up as a T R C (bisulfite converted reference R sequencing read) SNP conversion (light green), while non-methylated Cs show up as a T in the bisulfite converted context and match to the bisulfite converted reference. light green = valid adjacent mismatch = SNP, blue = invalid adjacent mismatch, grey = isolated mismatch (for details on mapping in color space and definition of valid versus invalid mismatches, please refer to 11). doi:10.1371/journal.pone.0009320.g004

(Invitrogen, Carlsbad, CA, USA) were added to the aqueous phase to compensate for the AT-rich template due to bisulfite conversion. The composition of the aqueous phase was as follows: 1x PCR buffer, 14 mM dNTP, 0.7 mM dATP, 0.7 mM dTTP, 25 mM MgCl2, 40 nM ePCR primer P1, 3 mM ePCR primer P2, 0.54 U/mL AmpliTaq Gold DNA polymerase. The aqueous phase was then introduced to a whirling oil phase in an ULTRATURRAXH Turbo Drive (IKA, Staufen, Germany) to create a water-in-oil emulsion. This emulsion was transferred to a 96-well plate and thermocycled using the recommended PCR conditions. After PCR amplification, emulsions were broken using butanol, the beads were washed, enriched, and terminal transferased before quantification and deposition onto a slide for sequencing according to manufacturerâ&#x20AC;&#x2122;s instructions.

aliquot was run into a 6% cross-linked Retardation Gel (Invitrogen, Carlsbad, CA, USA). The DNA band was cut from the gel and bisulfite converted within the gel as described previously [15]. The bisulfite-converted DNA from both bisulfite conversion methods was PCR amplified in three 100 ml reactions using 1x Platinum PCR Supermix (Invitrogen, Carlsbad, CA, USA), 1 mM primer 1 & 1 mM primer 2 (Standard SOLiD library PCR primers, Applied Biosystems, Foster City, CA, USA), 0.025 U/mL AmpliTaq DNA Polymerase, LD (Applied Biosystems, Foster City, CA, USA). The cycling conditions were as follows: 95uC for 5 min; 12 cycles of 95uC for 15 sec, 62uC for 15 sec, & 70uC for 1 min; 70uC for 5 min. After PCR amplification the in gel bisulfite converted library (bis-gel) was applied to a 0.45 mm filter NanoSep column (Pall Life Sciences, East Hills, NY, USA) and centrifuged for 5 min at 10,0006g to remove gel pieces from the library solution. The bis-gel and the bis-sol libraries were then purified using AMPure beads as described above and DNA was eluted from the beads in 20 mL 10 mM Tris, pH 8.0. The two libraries were finally quantitated using the 2100 Bioanalyzer with a DNA 1000 Chip (Agilent Technologies, Santa Clara, CA, USA) and using the Qubit fluorometer with the Quant-it dsDNA HS Kit (Invitrogen, Carlsbad, CA, USA).

Sequencing of Templated Beads Templated beads were deposited onto two slide quadrants per sample and sequencing was carried out to 50 bases using SOLiD v3.0 chemistry and manufacturerâ&#x20AC;&#x2122;s instructions.

Data Analysis Two bisulfite converted references (+ and 2 strands) were created by replacing in silico all Cs to Ts in the sense (+) and antisense (2) strands of the DH10B genome (GenBank accession CP000948), respectively. Sequencing reads were aligned to the + and 2 bisulfite converted references and the DH10B genome (normal reference) using the SOLiDTM System Analysis Pipeline Tool (http:// solidsoftwaretools.com/gf/project/corona/), allowing a maximum

Templated Bead Preparation Emulsion PCR (ePCR) was performed according to standard Applied Biosystems SOLiDTM 3 System: Templated Bead Preparation Guide, with the exception that extra dATP and dTTP PLoS ONE | www.plosone.org

February 2010 | Volume 5 | Issue 2 | e9320

Global Bisulfite Sequencing

of five mismatches per read. Matches to the reverse complement of the + and 2 bisulfite converted references were discarded, since only one strand was amplified during library construction and thus matches to the reverse complement are mismatches. SNPs in SOLiDTM datasets were identified using the SOLiDTM System Analysis Pipeline Tool. Non-converted Cs were identified and matched with the respective positions of possible methylation sites (motif CCWGG) in the DH10B genome.

Found at: doi:10.1371/journal.pone.0009320.s001 (0.05 MB DOC)

Acknowledgments We’d like to thank our colleagues Catalin Barbacioru, Craig Cummings, Nisha Mulakken, Yongming Sun, and Gerald Zon for their valuable tips and comments on this work.

Author Contributions

Supporting Information

Conceived and designed the experiments: VLB KJM MAB. Performed the experiments: CABC VLB CKM. Analyzed the data: CABC YLF HP. Wrote the paper: CABC.

Comparison of different number of mismatches on detection sensitivity for methylation and incomplete bisulfite conversion. Table S1

References 1. Gopalakrishnan S, Van Emburgh BO, Robertson KD (2008) DNA methylation in development and human disease. Mutat Res 647: 30–38. 2. Meisner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, et al. (2008) Genomescale DNA methylation maps of pluripotent and differentiated cells. Nature 454: 766–771. 3. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, et al. (2005) Chromosome-wide and promoter-specific analysis identify sites of differential DNA methylation in normal and transformed human cells. Nature Genet 37: 853–862. 4. Beck S, Rakyan VK (2008) The methylome: approaches for global DNA profiling. Trends Genet 24: 231–237. 5. Smith ZD, Gu H, Bock C, Gnirke A, Meissner A (2009) High-throughput bisulfite sequencing in mammalian genomes. Methods 48: 226–232. 6. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, et al. (2007) Highresolution profiling of histone methylations in the human genome. Cell 129: 823–837. 7. Hodges E, Smith AD, Kendall J, Xuan Z, Ravi K, et al. (2009) High definition profiling of mamalian DNA methylation by array capture and single molecule bisulfite sequencing. Genome Res, doi/10.1101/gr.095190.109/. 8. Korshunova Y, Maloney RK, Lakey N, Citek RW, Bacher B, et al. (2008) Massively parallel bisulphite pyrosequencing reveals the molecular complexity of breast cancer-associated cytosine-methylation patterns obtained from tissue and serum DNA. Genome Res 18: 19–29. 9. Pomraning KR, Smith KM, Freitag M (2009) Genome-wide high throughput analysis of DNA methylation in eukaryotes. Methods 47: 142–150. 10. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, et al. (2008) Highly intergrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133: 523–536.

PLoS ONE | www.plosone.org

11. Homer N, Merriman B, Nelson SF (2009) Local alignment of two-base encoded DNA sequence. BMC Bioinformatics 10: 175–185. 12. Mardis ER (2008) Next-generation DNA Sequencing Methods. Annu Rev Genomics Hum Genet 9: 387–402. 13. McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y (2009) Sequence and structural variation in a human genome uncovered by short-read massively parallel ligation sequencing using two-base encoding. Genome Research 19: 1527–1541. 14. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nature Biotech 26: 1135–1145. 15. Ranade SS, Bormann Chung C, Zon G, Boyd VL (2009) Preparation of genome-wide DNA fragment libraries using bisulfite in polyacrylamide gel electrophoresis slices with formamide denaturation and quality control for massively parallel sequencing by oligonucleotide ligation and detection. Anal Biochem 390: 126–135. 16. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, et al. (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452: 215–219. 17. Chen J, Hebert PDN (1998) Directed termination PCR: a one-step approach to mutation detection. Nucleic Acids Research 26: 1546–1547. 18. Durfee T, Nelson R, Baldwin S, Plunkett G, Burland V, et al. (2008) The complete genome sequence of Escherichia coli DH10B: insights into the biology of a laboratory workhorse. J Bacteriol 190: 2597–2606. 19. Ringquist S, Smith CL (1992) The Escherichia coli chromosome contains specific, unmethylated dam and dcm sites. Proc Natl Acad Sci USA 89: 4539–4543. 20. Marinus MG (1987) DNA methylation in Escherichia coli. Annu Rev Genet 21: 113–131.

February 2010 | Volume 5 | Issue 2 | e9320

Role for DNA Methylation in the Regulation of miR-200c and miR-141 Expression in Normal and Cancer Cells Lukas Vrba1,4, Taylor J. Jensen1,2, James C. Garbe3, Ronald L. Heimark1, Anne E. Cress1, Sally Dickinson1, Martha R. Stampfer1,3, Bernard W. Futscher1,2* 1 Arizona Cancer Center, The University of Arizona, Tucson, Arizona, United States of America, 2 Department of Pharmacology & Toxicology, College of Pharmacy, The University of Arizona, Tucson, Arizona, United States of America, 3 Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America, 4 Biology Centre ASCR, v.v.i., Institute of Plant Molecular Biology, Ceske Budejovice, Czech Republic

Abstract Background: The microRNA-200 family participates in the maintenance of an epithelial phenotype and loss of its expression can result in epithelial to mesenchymal transition (EMT). Furthermore, the loss of expression of miR-200 family members is linked to an aggressive cancer phenotype. Regulation of the miR-200 family expression in normal and cancer cells is not fully understood. Methodology/Principal Findings: Epigenetic mechanisms participate in the control of miR-200c and miR-141 expression in both normal and cancer cells. A CpG island near the predicted mir-200c/mir-141 transcription start site shows a striking correlation between miR-200c and miR-141 expression and DNA methylation in both normal and cancer cells, as determined by MassARRAY technology. The CpG island is unmethylated in human miR-200/miR-141 expressing epithelial cells and in miR-200c/miR-141 positive tumor cells. The CpG island is heavily methylated in human miR-200c/miR-141 negative fibroblasts and miR-200c/miR-141 negative tumor cells. Mouse cells show a similar inverse correlation between DNA methylation and miR-200c expression. Enrichment of permissive histone modifications, H3 acetylation and H3K4 trimethylation, is seen in normal miR-200c/miR-141-positive epithelial cells, as determined by chromatin immunoprecipitation coupled to real-time PCR. In contrast, repressive H3K9 dimethylation marks are present in normal miR-200c/miR-141negative fibroblasts and miR-200c/miR-141 negative cancer cells and the permissive histone modifications are absent. The epigenetic modifier drug, 5-aza-29-deoxycytidine, reactivates miR-200c/miR-141 expression showing that epigenetic mechanisms play a functional role in their transcriptional control. Conclusions/Significance: We report that DNA methylation plays a role in the normal cell type-specific expression of miR200c and miR-141 and this role appears evolutionarily conserved, since similar results were obtained in mouse. Aberrant DNA methylation of the miR-200c/141 CpG island is closely linked to their inappropriate silencing in cancer cells. Since the miR-200c cluster plays a significant role in EMT, our results suggest an important role for DNA methylation in the control of phenotypic conversions in normal cells. Citation: Vrba L, Jensen TJ, Garbe JC, Heimark RL, Cress AE, et al. (2010) Role for DNA Methylation in the Regulation of miR-200c and miR-141 Expression in Normal and Cancer Cells. PLoS ONE 5(1): e8697. doi:10.1371/journal.pone.0008697 Editor: Catherine M. Suter, Victor Chang Cardiac Research Institute, Australia Received October 13, 2009; Accepted December 21, 2009; Published January 13, 2010 Copyright: Ă&#x; 2010 Vrba et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Grants R01CA65662 to B.W.F supported this work. Center Grants P30ES06694 and P30CA023074, and the BIO5 interdisciplinary biotechnology center at the UA supported the Genomics Shared Service. J.C.G. and M.R.S. were supported by NIH U54 CA112970, DOD BCRP BC060444, and the Office of Energy Research, Office of Health and Biological Research, U.S. Department of Energy under Contract No. DE-AC03-76SF00098. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: bfutscher@azcc.arizona.edu

important in the maintenance of cell identity [5,6]. These types of miRNA are prime targets for epigenetic control, and early studies of miRNA control support this possibility [7,8]. miR-200c and miR-141 are members of the miR-200 family and are important regulators of the epithelial to mesenchymal transition (EMT) [5,9,10,11]. In addition to the role of miR-200c and miR-141 in the phenotypic conversion of normal cells, dysregulation of normal patterns of miR-200c expression occurs in multiple types of cancer cells and is linked to tumor progression [2,6,12,13,14,15]. The mechanism responsible for the control of miR-200c expression in both normal and cancer cells is not fully understood. In this study, we show that the epigenetic state is closely linked to normal cell type specific expression of miR-200c

Introduction miRNAs are single-stranded, 20-24 nt long RNAs that regulate gene expression at the posttranscriptional level. miRNAs frequently target 39 UTRs of mRNA, and since miRNA target motifs do not require complete homology, hundreds of mRNA targets may exist for each miRNA. Current estimates are that there are nearly 900 unique miRNAs encoded in the human genome, and these miRNAs control, in part, the expression of more than one third of human genes [1]. A number of miRNA dysregulated in human cancer have been shown to have oncogenic or tumor suppressive activity [2,3,4]. These include miRNA species that show cell type specific patterns of expression, some of which are PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8697

DNA Methylation of Mir-200c

and miR-141, and this epigenetic state is dysregulated in carcinoma cells, where loss of miR200c/141 expression is linked to aberrant DNA methylation and histone modifications. Finally, we found that the miR-200c regulation by DNA methylation is evolutionarily conserved between humans and mice. Since miR200c plays a significant role in EMT, our results suggest that DNA methylation plays an important role in the control of phenotypic conversions of normal and cancer cells.

definition based on size, GC content and CpG dinucleotide frequency [17], as well as its location with respect to the transcriptional unit [18]. In addition, this region is considered a CpG island based on a recently published probabilistic definition [19]. We analyzed this region as a possible target of epigenetic control. The epigenetic state of the miR-200c/141 CpG island shows clear and extensive cell type specific differences between normal miR-200c/141-positive and miR-200c/141-negative cells. We used MassARRAY technology to analyze the DNA methylation state of the mir-200c cluster CpG island (Figure 2B). Results show that the CpG sites are unmethylated in three separate strains of miR-200c/miR-141-positive HMEC. In contrast, all the CpG sites are highly methylated in the isogenic miR-200c/miR-141-negative fibroblast strains. The inverse correlation between miRNA expression and DNA methylation extends to other miR-200c/ miR-141-positive/negative pairs of normal cells, such as prostate epithelial cells and skin keratinocytes, and their mesenchymal cell type counterparts, prostate and skin fibroblasts. Thus, the miR200c cluster CpG island is unmethylated in normal miR-200c/ miR-141-positive epithelial cells, while being densely methylated in the paired normal miR-200c/miR-141-negative fibroblasts (Figure 1B, Figure 2B, Figure S2). miR-200c and miR-141 expression is lost in different types of cancer cells [5,11,20,21], and we sought to determine if this loss of expression was linked to epigenetic changes in the miR-200c/ miR-141 CpG island. We analyzed 11 breast cancer cell lines, and in each case, miR-200c and miR-141 expression was closely linked to the DNA methylation state of the CpG island (Figure 3A; Figure S3). Seven of the breast cancer cell lines tested express miR-200c and miR-141 and each has an unmethylated mir-200c CpG island. The other four breast cancer cell lines tested do not express miR-200c and miR-141 and exhibit a densely methylated mir200c CpG island. A similar picture emerges with respect to prostate cancer cells. We show two prostate cancer cell lines (PC3

Results and Discussion The miR-200 family is comprised of five miRNAs that are encoded within two clusters. Each cluster encodes a polycistronic gene. One cluster resides on human chromosome 1 and encodes miR-200b, miR-200a, and miR-429, while the other cluster is located on human chromosome 12, and encodes miR-200c and miR-141. Our small RNA library sequencing data (Figure 1A) show that the miR-200 family is highly expressed in cultured normal human mammary epithelial cells (HMEC) derived from three different individuals, whereas the isogenic human mammary fibroblast cells (FB) lack miR-200 family expression (Figure 1A). It is apparent from the small RNA library sequencing data (Figure 1A) that the most highly expressed members of the miR200 family in HMEC are miR-200c and miR-141. We corroborated the expression of miR-200c and miR-141 in the same set of normal mammary samples by real-time PCR, and then expanded these results to pairs of epithelial cells and fibroblasts from prostate and skin, as well (Figure 1B; Figure S1). In all cases, miR-200c and miR-141 were highly expressed in epithelial cells, but were not expressed in fibroblasts. The mir-200c hairpin coding sequence and approximately 300 bp of upstream genomic sequence is CpG rich. According to our calculations using the program CpG Cluster [16] this region is a highly statistically significant CpG cluster (Figure 2A). This CpG cluster (length 334 bp, GC% 68.56, O/E ratio 0.58, 21CpGs, pvalue 8.44610211) has the characteristics close to a CpG island

Figure 1. miR-200c is expressed in an epithelial selective fashion. A. miR-200 family expression according to massive parallel sequencing of small RNA libraries from a set of three isogenic pairs of human mammary epithelial cells (HMEC) and fibroblasts (FB). The expression of the mir-200b200a-429 cluster located on human chromosome 1, and the expression of the mir-200c-141 cluster located on human chromosome 12 are shown. With the average of 63,829 counts out of 3,926,984 per library in HMEC, miR-200c forms 1.625% of all small RNAs in these cells. B. Real-time PCR assessment of miR-200c expression in normal cell types. The left panel shows the expression of miR-200c in the same samples as panel A. The right panel shows the expression of miR-200c in human prostate epithelial cells (PREC), prostate stromal fibroblasts (PSF), human skin keratinocytes (Kcytes) and skin fibroblasts (HFF). The data are normalized relative to let-7a, which is expressed at equivalent levels between different samples according to the small RNA sequencing data. Real time PCR analysis of miR-141 expression in these samples is provided in Figure S1. doi:10.1371/journal.pone.0008697.g001

PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8697

DNA Methylation of Mir-200c

Figure 2. The mir-200c CpG island shows differential cytosine methylation between miR-200c-positive and miR-200c-negative normal human tissues. A. A diagram of the genomic region of hsa-mir-200c. The top bar shows the specific fragments analyzed by MassARRAY. The red fragments named with CpG sites indicate the fragments from which DNA methylation data was obtained. This data is presented panel B. Below this is the region analyzed by MassARRAY in relation to the genomic location (in blue), followed by the region of the real-time PCR amplicon for chromatin immunoprecipitation analysis (ChIP), and the CpG island identified by the program CpGcluster. The regions encoding the mir-200c and mir-141 hairpins and the putative transcription start (TSS) region inferred from the human EST track of the UCSC genome browser are displayed, and each circle on this track represents the position of a CpG dinucleotide. The ruler at the bottom shows the location on human chromosome 12 according to human genome assembly hg18. B. Summary of 5-methylcytosine levels obtained by MassARRAY analysis of the hsa-mir-200c CpG island in samples characterized in Figure 1B. The y-axis shows the percent of cytosine methylation within the individual CpG units marked on x-axis. The CpG units within the MassARRAY amplicon are numbered in the reverse direction, with CpG 2 being located within the miR-200c coding sequence. doi:10.1371/journal.pone.0008697.g002

DNA methylation pattern of the miR-200c/141 CpG island seen in normal miR-200c/miR-141-negative cells, and that the aberrant DNA methylation of the miR200c/141 CpG island in these cancer cells is associated with its transcriptional silencing in carcinoma cells. To demonstrate the functional significance of the epigenetic state of the miR-200c/mir-141 CpG island in cancer cells, we

and PC3 B1) where loss of miR-200c and miR-141 expression is linked with aberrant DNA methylation of the mir-200c/141 CpG island (Figure 3B; Figure S3), and two prostate cancer cell lines (LNCaP and DU145) that retain miR-200c/miR-141 expression and an unmethylated mir-200c/141 CpG island. Together these results indicate that cancer cells derived from normal miR-200c/ miR-141-positive epithelial cells can replicate the cell type-specific PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8697

DNA Methylation of Mir-200c

Figure 4. 5-aza-29-deoxycytidine treatment reactivates miR200c expression in breast and prostate cancer cell lines. Cells were treated with 3 mM 5-aza-29-deoxycytidine for 96 h. The level of expression of miR-200c was measured by real-time PCR. The average of 4 independent samples is displayed, the error bars show the standard error of measurement. The values were normalized to untreated controls (100%). Figure S4 shows the 5-aza-29-deoxycytidine-mediated reactivation of miR-141 in the same samples. doi:10.1371/journal.pone.0008697.g004

of mir-200c/141 in the three different strains of miR-200c/miR141-positive HMEC exists in a transcriptionally competent state; it is enriched for the transcriptionally permissive modifications of histone H3 acetylation (H3Ac) and lysine 4 trimethylation (H3TriMeK4), while the transcriptionally repressive histone mark of histone H3 lysine 9 dimethylation (H3DiMeK9) is underrepresented (Figure 5). In contrast, in the isogenic miR-200c/miR141-negative mammary fibroblasts permissive histone modifications are absent, and the repressive H3 lysine 9 dimethylation mark is present (Figure 5). Similarly, the breast cancer cell lines

Figure 3. DNA methylation of mir-200c CpG island in breast and prostate cancer cell lines. A. miR-200c expression and mir-200c CpG island methylation in eleven breast cancer cell lines. B. miR-200c expression and mir-200c CpG island methylation in four prostate cancer cell lines. The top panel of each figure shows the expression of miR200c in cancer samples as detected by real-time PCR, normalized to let7a. The bottom panel shows the methylation level of the mir-200c CpG island region in the same cancer samples. The level of methylation of individual CpG units within the MassARRAY amplicon is displayed as a heatmap with the lowest methylation in yellow and the highest methylation in blue. The y-axis marks the individual CpG units. doi:10.1371/journal.pone.0008697.g003

exposed cancer cells to the epigenetic modifier and DNA methyltransferase inhibitor 5-aza-29-deoxycytidine (5-AdC). The miR-200c/miR-141-negative breast cancer cell lines MDA-MB231 and BT549 and prostate cancer cell line PC3 were treated with 3 mM 5-AdC for 96 h and miR-200c/141 expression was assessed by real-time PCR. Figure 4 shows 5-AdC reactivated miR-200c expression in all three cancer cell lines. The level of miR-200c increased 4.3-fold in MDA-MB-231 (p-value = 0.0004), 6.4-fold in BT549 (p-value = 0.0107) and 4.2-fold in PC3 cells (pvalue = 0.0072). A similar reactivation of miR-141 expression (pvalue,0.01) was also observed in these cancer cell lines after 5AdC treatment (Figure S4). These data suggest that epigenetic mechanisms participate in the inappropriate repression of miR200c/miR-141 expression in cancer cells. The histone modification state of the mir-200c cluster CpG island also shows cell type-specific differences that are closely linked to the expression state of miR-200c/141 in normal and cancer cells. Figure 5 shows the results of chromatin immunoprecipitations coupled to quantitative real-time PCR analysis that were used to examine the histone modification state of the miR200c/141 CpG island in normal and cancer cells. The CpG island PLoS ONE | www.plosone.org

Figure 5. The histone modification state of the mir-200c CpG island. Permissive histone marks represented by acetylation of histone H3 (H3Ac) and trimethylation of lysine 4 of histone H3 (H3TriMeK4) as well as the repressive histone mark dimethylation of lysine 9 of histone H3 (H3diMeK9) were analyzed using chromatin immunoprecipitation coupled to real-time PCR of the region described in Figure 2A. HMEC samples are shown in green, isogenic FB samples are shown in red, and two miR-200-negative breast cancer cell lines are in blue. The y-axis shows fold enrichment of each histone mark over input DNA within the mir-200c CpG island. doi:10.1371/journal.pone.0008697.g005

January 2010 | Volume 5 | Issue 1 | e8697

DNA Methylation of Mir-200c

CpG island (Figure 6A; length 325 bp, GC% 66.77, O/E ratio 0.58, 19 CpGs, p-value 1.06610210). To evaluate a potential role for DNA methylation in the control of miR-200c/141 in mice, CpG methylation and miRNA expression were analyzed in mouse epithelial cells (epidermis of SKH-1 mouse and keratinocyte cell lines 308 and 6R90) and mouse fibroblasts (cell lines NIH 3T3, NIH 3T6, and NR6). A MassARRAY amplicon was designed to analyze the DNA methylation state of the mouse mir-200c/141 region homologous to that analyzed in human (Figure 6A). Strikingly similar results for miR-200c were found between the human cells and mouse cells. Mouse keratinocytes expressed significant levels of miR-200c, while the mouse fibroblasts did not express detectable levels of miR-200c (Figure 6B). DNA methylation analysis by MassARRAY revealed that the miR200c-positive keratinocytes showed minimal DNA methylation in the mir-200c CpG island, while the miR-200c-negative mouse fibroblasts showed extensive DNA methylation of all CpG sites in the region (Figure 6B). The significant conservation in DNA sequence, patterns of cell type-specific DNA methylation, and the associated miR-200c expression patterns between the human and mouse genomes, which are separated by 75 million years of evolution [22], provides evidence that epigenetic mechanisms play a functional role in the control of miR-200c expression. In summary, our findings provide multiple lines of evidence that epigenetic mechanisms are involved in the regulation of miR200c/141 expression in both normal and cancer cells. First, there is a consistent inverse correlation between expression and DNA methylation states in normal human and mouse cell types, as well as human breast and prostate cancer cell lines. Second, different histone codes exist between miR-200c/141 expressing and nonexpressing cells that accurately mirror the expression and DNA methylation states. Third, the epigenetic modifier 5-aza-29deoxycytidine relieves the repression of miR-200c/miR-141 in cancer cell lines. Fourth, the link between DNA methylation and expression states occurs across mammalian species, since it is seen in human and mouse. Taken together, these findings indicate that miR-200c/141 is an evolutionarily conserved epigenetically labile miRNA cluster. Dysregulation of miR-200c and miR-141 occurs in multiple cancer types [5,11,20,21,23,24,25,26], and this dysregulation involves a compromise of the epigenetic state of the CpG island associated with miR-200c and miR141. Results suggest that these carcinoma cells may co-opt de novo DNA methylation pathways involved in the epigenetic control of normal cell type-specific genes, such as those that govern the epigenetic state of miR-200c/ miR-141. A similar apparent co-option of cell type specific DNA methylation pathways by cancer cells is also seen in protein-coding genes, such as maspin and 14-3-3 sigma [27,28]. Together these results suggest that pathways responsible for the establishment or maintenance of normal cell type-specific DNA methylation states may be disrupted during carcinogenesis. Since miR-200c and miR141 play an important role in EMT and therefore cell identity, disruption of mechanisms that govern cell type specific DNA methylation patterns during carcinogenesis could likely effect expression of miR-200c and miR141 and provide phenotypic plasticity to cancer cells. Support of this possibility comes from the phenotypes of the cancer cells analyzed in this study. All four of the breast cancer cell lines that lost miR-200c and miR-141 expression have an aberrantly methylated mir-200c/141 CpG island, and each of these cell lines displays a mesenchymal phenotype [11,29]. In contrast, those breast cancer cell lines that express miR-200c and miR-141 and have an unmethylated CpG island display an epithelial phenotype [11,29]. A similar picture emerges in the prostate cancer cell lines. The

that had lost miR-200c/141 expression lost histone H3 acetylation and K4 trimethylation and acquired a repressive histone state, enriched for the H3 lysine 9 dimethylation mark (Figure 5). No enrichment of trimethylation of histone H3 lysine 27 was detected in the miR-200c/141 CpG island in the samples analyzed (Figure S5). Taken together, the results from the analyses of miR-200c/ 141 expression, DNA methylation, and histone modification states across a variety of normal and cancer cell types demonstrate a close link between the expression of mir-200c/141 and the epigenetic state of their associated CpG island. Finally, we sought to determine if the epigenetic regulation of miR-200c expression in normal cells is conserved evolutionarily, reasoning that DNA methylation-linked control of miR-200c expression across mammalian species would provide further experimental support for epigenetic control of cell-type specific expression of miR-200c. The whole genomic cluster containing mir-200c and mir-141 is well conserved between the human and mouse genome. Similar to the human mir-200c/141 genomic region, the mouse mir-200c/141 genomic region also contains a

Figure 6. Epigenetic control of miR-200c expression is evolutionarily conserved. A. Diagram of the mouse mmu-mir-200c genomic interval. The top bar shows the area analyzed by MassARRAY. The regions encoding the hairpins of mir-200c and mir-141 and the putative transcription start (TSS) inferred from the mouse EST track displayed on the UCSC genome browser are shown, and each circle on this track represents the location of a CpG dinucleotide. Similar to the human hsa-mir-200c, the mouse mmu-mir-200c contains a CpG island, identified by the program CpG Cluster. The ruler at the bottom shows the location on mouse chromosome 6 according to genome assembly mm9. The genes are encoded on the (-) strand. B. Mouse cells show a similar cell type specific pattern in miR-200c expression to human cells and this expression is linked to the DNA methylation state of the CpG island. The left panel shows the expression of miR-200c in mouse epithelial cells (308, 6R90, SKH-1 epidermis) and mouse fibroblast cell lines (NIH 3T3, NR6, NIH 3T6) as detected by real-time PCR. The right panel shows the methylation level of the mir-200c CpG island region in the same mouse samples. The level of methylation of individual CpG units within the MassARRAY amplicon is displayed as a heatmap with the lowest methylation in yellow and the highest methylation in blue. The x-axis marks the individual CpG units. CpG units within MassARRAY amplicon are numbered in reverse direction, with CpG 1 being located within the miR-200c coding sequence. doi:10.1371/journal.pone.0008697.g006

PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8697

DNA Methylation of Mir-200c

PC3 cells that have lost miR-200c and miR-141 expression, display an aberrantly methylated CpG island and a mesenchymal phenotype, whereas LnCaP and Du145 retain miR-200c and miR-141 expression and an epithelial phenotype [11,30]. These results suggest that DNA methylation may control the phenotypic changes observed in cancer cells.

Real-time PCR detection of miRNA Real-time PCR detection of microRNAs was performed in principle as described [42]. Reverse transcription was performed using TaqMan Reverse Transcription Reagents (Applied Biosystems, Foster City, CA, USA). Real-time PCR was conducted on an ABI Prism 7500 Sequence Detection System (Applied Biosystems, Foster City, CA, USA) using PerfeCta SYBR Green SuperMix, Low ROX (Quanta Biosciences, Gaithersburg, MD, USA) with a 95uC denaturation for 3 minutes followed by 40 cycles of 95uC for 15 seconds and 60uC for 45 seconds. Differences in expression were determined using the comparative Ct method described in the ABI user manual relative to let-7a. Primer sequences are listed in Table S1.

Materials and Methods Cell lines and cell culture Finite lifespan pre-stasis HMEC from specimens 184 (batch D), 48R (batch T), and 240L (batch B), were derived from reduction mammoplasty tissue of women aged 21, 16, and 19 respectively. Cells were initiated as organoids in primary culture in serumcontaining M85 medium supplemented with oxytocin (Bachem) at 0.1 nM, and maintained in M87A medium supplemented with oxytocin and cholera toxin at 0.5 ng/ml [31]. Fibroblasts from specimens 184, 48, and 240 L were obtained from the same reduction mammoplasty tissue and were grown in DMEM/F12 with 10% FBS and 10 mg/ml insulin [31] and further propagated in DMEM/F12 with 10% FBS. Prostate epithelial cells were obtained from Clonetics (San Diego, CA), and fetal skin keratinocytes from Cell Applications (San Diego, CA.) and were grown according to the suppliers instructions. Human foreskin fibroblasts (HFFs) were maintained and cultured by the Arizona Cancer Center Cell Culture Shared Service. Human prostate stromal fibroblasts (PSF) were were cultured as previously described [32]. Breast cancer cell lines BT549, HS578T, MCF7, MDA-MB-157, MDA-MB-231, MDA-MB-453, MDA-MB-468, UACC893, UACC1179, UACC2087, and UACC3199 were cultured as previously described [33,34]. Prostate cancer cell lines PC3, PC3 B1, LNCaP, and DU145 [35,36,37] were maintained in RPMI 1640 medium containing 10% fetal bovine serum supplemented with 100 units/ml penicillin and 50 mg/ml streptomycin. Mouse keratinocyte cell lines 308 and 6R90 were cultured as described [38], mouse fibroblast cell lines NIH 3T3, NIH 3T6 and NR6 [39,40,41] were maintained in DMEM medium containing 10% fetal bovine serum. SKH-1 mouse epidermis samples were removed from liquid nitrogen snap frozen dorsal skin by scraping on dry ice. miRNA library preparation, sequencing and analysis Total RNA was extracted using Trizol. The small RNA fraction was purified on a 15% polyacrylamide-urea gel. A preadenylated adaptor was ligated to the 39 end of the small RNA followed by purification of the ligation product on a 15% PAA-urea gel. An Illumina specific 59 adaptor was ligated and the product was purified on a 10% PAA-urea gel. Small RNA with ligated adaptors was reverse transcribed into DNA using a RT primer with Illumina specific extension. cDNA was then PCR amplified using Illumina specific primers and the PCR product was purified on a 3% agarose gel. Small RNA libraries were submitted for Illumina sequencing to NCGR (Santa Fe, NM). Reads from Illumina GAII were mapped to the hg18 human genome assembly using program Novoalign (www.novocraft.com). Output from Novoalign was further analyzed in R (http://www.r-project.org). The counts of individual miRNAs were normalized for average library size (3,926,984 counts).

CpG island prediction CpG islands were predicted using the program CpGcluster [16]. This program uses a statistical approach to search for regions with significant enrichment of CpG dinucleotides rather than parameters within a sliding window. We set the threshold to 50 (median distance) and p-value cut to 1028.

DNA methylation analysis by MassARRAY DNA methylation analysis by MassARRAY was performed as described [43]. Primer sequences are listed in Table S1.

5-aza-29-deoxycytidine treatment Cells were treated with 3 mM 5-aza-29-deoxycytidine (Sigma, St Louis, MO, USA) for 96 h, as previously described [44].

Chromatin immunoprecipitation Chromatin immoprecipitation (ChIP) analysis was performed as described previously [33,45,46] with antibodies against acetylated histone H3 (#06-599, Millipore), trimethylated histone H3 K4 (#05-745, Upstate), dimethylated histone H3 K9 (CS200587, Millipore), and trimethylated histone H3 K27 (#07-449, Millipore). Equal amounts (1 ng) of ChIP and input DNA were used for real-time PCR analysis. Primers were designed for use with the Human Universal Probe Library Set (Roche Diagnostics, Indianapolis, IN, USA). Real-time PCR was conducted on an ABI Prism 7500 Sequence Detection System (Applied Biosystems, Foster City, CA, USA) using PerfeCta qPCR SuperMix, Low ROX (Quanta Biosciences, Gaithersburg, MD, USA) with a 95uC denaturation for 3 minutes followed by 40 cycles of 95uC for 15 seconds and 60uC for 45 seconds. Primer sequences are listed in Table S1.

Supporting Information Figure S1 Real-time PCR assessment of miR-141 expression in normal cell types. The left panel shows the expression of miR-141 in three isogenic pairs of mammary epithelial cells (HMEC) and mammary fibroblasts (FB). The right panel shows the expression of miR-141 in human prostate epithelial cells (PREC), prostate stromal fibroblasts (PSF), human skin keratinocytes (Kcytes) and skin fibroblasts (HFF). The data are normalized relative to let-7a, which is expressed at consistent levels between different samples according to the small RNA sequencing data. Found at: doi:10.1371/journal.pone.0008697.s001 (0.13 MB TIF)

Nucleic acid isolation

Figure S2 DNA methylation of the mir-200c CpG island inversely correlates with miR-200c expression in normal human samples. This figure summarizes data shown in Figure 1B and 2B. The upper panel shows the expression of miR-200c detected by real-time PCR. The bottom panel shows the methylation level of

RNA was isolated using either Trizol (Invitrogen) or the RNeasy Mini kit (Qiagen) and quantified by absorption measurements at 260 nm. Genomic DNA was isolated using the DNeasy Blood and Tissue Kit (Qiagen) and quantified spectrophotometrically. PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8697

DNA Methylation of Mir-200c

mir-200c CpG island region in the same human samples. The level of methylation of individual CpG units within the MassARRAY amplicon is displayed as a heatmap with the lowest methylation in yellow and the highest methylation in blue. The yaxis marks the individual CpG units. Found at: doi:10.1371/journal.pone.0008697.s002 (0.19 MB TIF)

region of the mir-200c CpG island described in Figure 2A were analyzed by chromatin immunoprecipitation coupled to real-time PCR. Epithelial cells (HMEC) are shown in green and their isogenic fibroblasts (FB) are shown in red. The y-axis shows a lack of enrichment of the histone H3 K27 trimethylation mark within the mir-200c CpG island relative to input DNA in all the samples analyzed. Found at: doi:10.1371/journal.pone.0008697.s005 (0.08 MB TIF)

Figure S3 Real-time PCR assessment of miR-141 expression in breast and prostate cancer cell lines. The left panel shows the expression of miR-141 in eleven human breast cancer cell lines. The right panel shows the expression of miR-141 in four human prostate cancer cell lines. Found at: doi:10.1371/journal.pone.0008697.s003 (0.14 MB TIF)

Table S1 List of primer sequences used in the study Found at: doi:10.1371/journal.pone.0008697.s006 (0.03 MB PDF)

Acknowledgments

miR-141 expression in cancer cell lines is reactivated by 5-aza-29-deoxycytidine treatment. Cells were treated with 3 mM 5-AdC for 96 h. The level of expression of miR-141 was measured by real-time PCR. The average of 4 measurements is displayed, the error bars show the standard error of measurement. The values were normalized to untreated controls (100%). Found at: doi:10.1371/journal.pone.0008697.s004 (0.06 MB TIF) Figure S4

We thank Brenna Rheinheimer (UA) and Batul Merchant (LBNL) for outstanding technical support.

Author Contributions Conceived and designed the experiments: LV BWF. Performed the experiments: LV TJJ. Analyzed the data: LV TJJ. Contributed reagents/ materials/analysis tools: JG RRH AC SD MRS. Wrote the paper: LV MRS BWF.

Figure S5 Histone H3 K27 trimethylation state of the mir-200c CpG island. Histone H3 lysine 27 trimethylation levels of the

References 1. Kim VN, Han J, Siomi MC (2009) Biogenesis of small RNAs in animals. Nat Rev Mol Cell Biol 10: 126–139. 2. Ma L, Teruya-Feldstein J, Weinberg RA (2007) Tumour invasion and metastasis initiated by microRNA-10b in breast cancer. Nature 449: 682–688. 3. Negrini M, Nicoloso MS, Calin GA (2009) MicroRNAs and cancer–new paradigms in molecular oncology. Curr Opin Cell Biol 21: 470–479. 4. Bracken CP, Gregory PA, Khew-Goodall Y, Goodall GJ (2009) The role of microRNAs in metastasis and epithelial-mesenchymal transition. Cell Mol Life Sci 66: 1682–1699. 5. Gregory PA, Bert AG, Paterson EL, Barry SC, Tsykin A, et al. (2008) The miR200 family and miR-205 regulate epithelial to mesenchymal transition by targeting ZEB1 and SIP1. Nat Cell Biol 10: 593–601. 6. Peter ME (2009) Let-7 and miR-200 microRNAs: guardians against pluripotency and cancer progression. Cell Cycle 8: 843–852. 7. Visone R, Croce CM (2009) MiRNAs and cancer. Am J Pathol 174: 1131–1138. 8. Valeri N, Vannini I, Fanini F, Calore F, Adair B, et al. (2009) Epigenetics, miRNAs, and human cancer: a new chapter in human gene regulation. Mamm Genome. 9. Hurteau GJ, Carlson JA, Spivack SD, Brock GJ (2007) Overexpression of the microRNA hsa-miR-200c leads to reduced expression of transcription factor 8 and increased expression of E-cadherin. Cancer Res 67: 7972–7976. 10. Burk U, Schubert J, Wellner U, Schmalhofer O, Vincan E, et al. (2008) A reciprocal repression between ZEB1 and members of the miR-200 family promotes EMT and invasion in cancer cells. EMBO Rep 9: 582–589. 11. Park SM, Gaur AB, Lengyel E, Peter ME (2008) The miR-200 family determines the epithelial phenotype of cancer cells by targeting the E-cadherin repressors ZEB1 and ZEB2. Genes Dev 22: 894–907. 12. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, et al. (2005) MicroRNA expression profiles classify human cancers. Nature 435: 834–838. 13. Volinia S, Calin GA, Liu CG, Ambs S, Cimmino A, et al. (2006) A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci U S A 103: 2257–2261. 14. Friedman JM, Liang G, Liu CC, Wolff EM, Tsai YC, et al. (2009) The putative tumor suppressor microRNA-101 modulates the cancer epigenome by repressing the polycomb group protein EZH2. Cancer Res 69: 2623– 2629. 15. Varambally S, Cao Q, Mani RS, Shankar S, Wang X, et al. (2008) Genomic loss of microRNA-101 leads to overexpression of histone methyltransferase EZH2 in cancer. Science 322: 1695–1699. 16. Hackenberg M, Previti C, Luque-Escamilla PL, Carpena P, Martinez-Aroza J, et al. (2006) CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics 7: 446. 17. Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196: 261–282. 18. Ioshikhes IP, Zhang MQ (2000) Large-scale human promoter mapping using CpG islands. Nat Genet 26: 61–63. 19. Irizarry RA, Wu H, Feinberg AP (2009) A species-generalized probabilistic model-based definition of CpG islands. Mamm Genome. 20. Du Y, Xu Y, Ding L, Yao H, Yu H, et al. (2009) Down-regulation of miR-141 in gastric cancer and its involvement in cell growth. J Gastroenterol 44: 556– 561.

PLoS ONE | www.plosone.org

21. Shimono Y, Zabala M, Cho RW, Lobo N, Dalerba P, et al. (2009) Downregulation of miRNA-200c links breast cancer stem cells with normal stem cells. Cell 138: 592–603. 22. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520–562. 23. Iorio MV, Visone R, Di Leva G, Donati V, Petrocca F, et al. (2007) MicroRNA signatures in human ovarian cancer. Cancer Res 67: 8699–8707. 24. Ladeiro Y, Couchy G, Balabaud C, Bioulac-Sage P, Pelletier L, et al. (2008) MicroRNA profiling in hepatocellular tumors is associated with clinical features and oncogene/tumor suppressor gene mutations. Hepatology 47: 1955–1963. 25. Nam EJ, Yoon H, Kim SW, Kim H, Kim YT, et al. (2008) MicroRNA expression profiles in serous ovarian carcinoma. Clin Cancer Res 14: 2690–2695. 26. Kong D, Li Y, Wang Z, Banerjee S, Ahmad A, et al. (2009) miR-200 Regulates PDGF-D-Mediated Epithelial-Mesenchymal Transition, Adhesion, and Invasion of Prostate Cancer Cells. Stem Cells 27: 1712–1721. 27. Futscher BW, Oshiro MM, Wozniak RJ, Holtan N, Hanigan CL, et al. (2002) Role for DNA methylation in the control of cell type specific maspin expression. Nat Genet 31: 175–179. 28. Oshiro MM, Futscher BW, Lisberg A, Wozniak RJ, Klimecki WT, et al. (2005) Epigenetic regulation of the cell type-specific gene 14-3-3sigma. Neoplasia 7: 799–808. 29. Blick T, Widodo E, Hugo H, Waltham M, Lenburg ME, et al. (2008) Epithelial mesenchymal transition traits in human breast cancer cell lines. Clin Exp Metastasis 25: 629–642. 30. Hugo H, Ackland ML, Blick T, Lawrence MG, Clements JA, et al. (2007) Epithelial–mesenchymal and mesenchymal–epithelial transitions in carcinoma progression. J Cell Physiol 213: 374–383. 31. Garbe JC, Bhattacharya S, Merchant B, Bassett E, Swisshelm K, et al. (2009) Molecular distinctions between stasis and telomere attrition senescence barriers shown by long-term culture of normal human mammary epithelial cells. Cancer Res 69: 7557–7568. 32. Tran NL, Nagle RB, Cress AE, Heimark RL (1999) N-Cadherin expression in human prostate carcinoma cell lines. An epithelial-mesenchymal transformation mediating adhesion withStromal cells. Am J Pathol 155: 787–798. 33. Oshiro MM, Watts GS, Wozniak RJ, Junk DJ, Munoz-Rodriguez JL, et al. (2003) Mutant p53 and aberrant cytosine methylation cooperate to silence gene expression. Oncogene 22: 3624–3634. 34. Domann FE, Rice JC, Hendrix MJ, Futscher BW (2000) Epigenetic silencing of maspin gene expression in human breast cancers. Int J Cancer 85: 805–810. 35. Kaighn ME, Narayan KS, Ohnuki Y, Lechner JF, Jones LW (1979) Establishment and characterization of a human prostatic carcinoma cell line (PC-3). Invest Urol 17: 16–23. 36. Stone KR, Mickey DD, Wunderli H, Mickey GH, Paulson DF (1978) Isolation of a human prostate carcinoma cell line (DU 145). Int J Cancer 21: 274–281. 37. Horoszewicz JS, Leong SS, Kawinski E, Karr JP, Rosenthal H, et al. (1983) LNCaP model of human prostatic carcinoma. Cancer Res 43: 1809–1818. 38. Gupta A, Rosenberger SF, Bowden GT (1999) Increased ROS levels contribute to elevated transcription factor and MAP kinase activities in malignantly progressed mouse keratinocyte cell lines. Carcinogenesis 20: 2063–2073.

January 2010 | Volume 5 | Issue 1 | e8697

DNA Methylation of Mir-200c

43. Novak P, Jensen TJ, Garbe JC, Stampfer MR, Futscher BW (2009) Stepwise DNA methylation changes are linked to escape from defined proliferation barriers and mammary epithelial cell immortalization. Cancer Res 69: 5251–5258. 44. Wozniak RJ, Klimecki WT, Lau SS, Feinstein Y, Futscher BW (2007) 5-Aza-29deoxycytidine-mediated reductions in G9A histone methyltransferase and histone H3 K9 di-methylation levels are linked to tumor suppressor gene reactivation. Oncogene 26: 77–90. 45. Vrba L, Junk DJ, Novak P, Futscher BW (2008) p53 induces distinct epigenetic states at its direct target promoters. BMC Genomics 9: 486. 46. Jensen TJ, Novak P, Eblin KE, Gandolfi AJ, Futscher BW (2008) Epigenetic remodeling during arsenical-induced malignant transformation. Carcinogenesis 29: 1500–1508.

39. Pruss RM, Herschman HR (1977) Variants of 3T3 cells lacking mitogenic response to epidermal growth factor. Proc Natl Acad Sci U S A 74: 3918–3921. 40. Jainchill JL, Aaronson SA, Todaro GJ (1969) Murine sarcoma and leukemia viruses: assay using clonal lines of contact-inhibited mouse cells. J Virol 4: 549–553. 41. Todaro GJ, Green H (1963) Quantitative studies of the growth of mouse embryo cells in culture and their development into established lines. J Cell Biol 17: 299–313. 42. Sharbati-Tehrani S, Kutz-Lohroff B, Bergbauer R, Scholven J, Einspanier R (2008) miR-Q: a novel quantitative RT-PCR approach for the expression profiling of small RNA molecules such as miRNAs in a complex sample. BMC Mol Biol 9: 34.

PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8697

Differential DNA Methylation Correlates with Differential Expression of Angiogenic Factors in Human Heart Failure Mehregan Movassagh1, Mun-Kit Choy1, Martin Goddard2, Martin R. Bennett1, Thomas A. Down3, Roger S.-Y. Foo1* 1 Division of Cardiovascular Medicine, Addenbrooke’s Centre for Clinical Investigation, University of Cambridge, Cambridge, United Kingdom, 2 Department of Histopathology, Papworth Hospital, Cambridge, United Kingdom, 3 Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, United Kingdom

Abstract Epigenetic mechanisms such as microRNA and histone modification are crucially responsible for dysregulated gene expression in heart failure. In contrast, the role of DNA methylation, another well-characterized epigenetic mark, is unknown. In order to examine whether human cardiomyopathy of different etiologies are connected by a unifying pattern of DNA methylation pattern, we undertook profiling with ischaemic and idiopathic end-stage cardiomyopathic left ventricular (LV) explants from patients who had undergone cardiac transplantation compared to normal control. We performed a preliminary analysis using methylated-DNA immunoprecipitation-chip (MeDIP-chip), validated differential methylation loci by bisulfite-(BS) PCR and high throughput sequencing, and identified 3 angiogenesis-related genetic loci that were differentially methylated. Using quantitative RT-PCR, we found that the expression of these genes differed significantly between CM hearts and normal control (p,0.01). Moreover, for each individual LV tissue, differential methylation showed a predicted correlation to differential expression of the corresponding gene. Thus, differential DNA methylation exists in human cardiomyopathy. In this series of heterogenous cardiomyopathic LV explants, differential DNA methylation was found in at least 3 angiogenesis-related genes. While in other systems, changes in DNA methylation at specific genomic loci usually precede changes in the expression of corresponding genes, our current findings in cardiomyopathy merit further investigation to determine whether DNA methylation changes play a causative role in the progression of heart failure. Citation: Movassagh M, Choy M-K, Goddard M, Bennett MR, Down TA, et al. (2010) Differential DNA Methylation Correlates with Differential Expression of Angiogenic Factors in Human Heart Failure. PLoS ONE 5(1): e8564. doi:10.1371/journal.pone.0008564 Editor: Michael Polymenis, Texas A&M University, United States of America Received October 20, 2009; Accepted November 30, 2009; Published January 13, 2010 Copyright: ß 2010 Movassagh et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by British Heart Foundations grants FS/07/035 (to RS-YF), PG 06/101/21461 (to MRB and RS-YF) and RG04/001 (to MRB), and the NIHR Cambridge Biomedical Research Centre. Support was also received through Dr Steve Baker from Roche Ltd for 454 FLX sequencing. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: rsyf2@cam.ac.uk

linked to increased gene expression [10,11]. Although a direct molecular mechanism to explain these observations is still lacking, basic mechanisms by which DNA methyl-transferases and putative DNA de-methylase complexes function have recently been uncovered [12–14]. Complementary to this, one of the most important findings in the field of epigenetics in recent times is the discovery of DNA methylation variation between individuals at a genome-wide level [15]. Evidence indicates that this variation may be in part inherited [15] and in part, acquired [16,17]. At least through its control of gene expression, DNA methylation variation may account for complex disease susceptibility or progression [15,18]. For instance, Fraga et al [19] found that although monozygotic twins share a common genotype and DNA methylation was indistinguishable in younger twins, older twins exhibited remarkable methylation differences that correlated with a pattern of differential gene expression. Similarly, consistent with the notion of drifts in DNA methylation with increasing age, acquired variation in DNA methylation has been attributed to causes such as environmental, hormonal and stochastic events [16,17,20]. Differential DNA methylation, either through its influence on gene expression or other yet unknown mechanisms, could therefore explain differences in disease susceptibility or phenotypic discor-

Introduction The pathogenesis of heart failure involves molecular mechanisms which are becoming better understood [1,2] and studies in both experimental models and humans demonstrate the important relevance of dysregulated gene expression [1,3]. Furthermore, transcriptomic analyses of human dilated cardiomyopathy show a consistent and distinct pattern of gene expression [4], and dysregulated expression of both coding and non-coding genes directly affects heart failure development and progression [3,5]. Higher order mechanisms such as microRNA [5,6] and histone modifications [7,8] can alter gene expression control in cardiovascular disease, without involving any change in the underlying DNA sequence. The direct covalent modification of DNA cytosine nucleotides by methylation is another well-recognized epigenetic mechanism but its role in cardiovascular disease is unknown. DNA cytosine-methylation alters accessibility for transcription factor complexes at a local level and, like histone modifications, affects chromatin structure at regional and genome-wide levels. A wellcharacterised functional effect of DNA methylation is thus control of gene expression [9]. In this respect, hypomethylation of the 59 promoter end of a gene correlates with increased expression of the gene [9], whereas hypermethylation within the body of a gene, is PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8564

DNA Methylation, Heart Failure

dance seen in monozygotic twin pairs in spite of their identical DNA sequences. In the wider population, differential DNA methylation may similarly contribute to the diversity of phenotypes, pathogenesis and progression of complex diseases. Apart from the strong association already identified between differential DNA methylation and cancer [21], there are now on-going efforts to investigate the link between DNA methylation variation and other complex diseases such as schizophrenia [22], diabetes [23] and inflammatory bowel disease [24]. Here, we tested the hypothesis that global DNA methylation profiles in human cardiac tissue differ between cardiomyopathy and normal control, and aimed to identify a subset of genomic loci whose differential methylation is correlated to differential expression of their corresponding genes.

MAN)[25–26] (Figures S1 and S2). Of the target CM-DMRs meeting these criteria, three candidates were identified by gene ontology analysis and GENECARDS to relate to angiogenesis (AMOTL2, ARHGAP24 and PECAM1, Figure 1 and Figure S4). Next, we undertook bisulfite-(BS) PCR and massive parallel amplicon sequencing for these CM-DMRs using a second and larger set of LV samples (diseased: III, IV, V, VI; and controls: B, C, D, E; Table 1). Bisulfite treatment of gDNA converts unmethylated-cytosine nucleotides to uracil but leaves methylated-cytosine residues unaffected. This difference is then detected as a C/T nucleotide polymorphism at each CpG site by subsequent PCR and sequencing, providing the gold-standard high-resolution information about the methylation status of a DNA region. Amplicon sequences were matched to the reference Human Genome and DNA methylation (%) for each locus was determined (Figure 1).

Results Differentially Methylated DNA Regions in Dilated Cardiomyopathy (CM-DMRs)

Differential Expression of Genes Related to CM-DMR Although epigenetics may direct complex effects including changes in higher-order chromatin structure, DNA methylation differences are at least currently understood to control gene expression. At the local DNA level, hypomethylation in the 59 or promoter region of a gene is associated with increased transcription; whereas hypermethylation of the body of a gene is associated with its active transcription [10,11]. We therefore set out to determine expression of the target genes that we have identified. Before quantifying target gene expression, we ascertained that our LV samples had at least 10-fold upregulated levels of NPPA mRNA compared to controls (Figure S3, p = 0.008). NPPA is upregulated in cardiomyopathy and represents the fetal gene program re-induction that is expected for myocardial disease [1]. Differential DNA methylation at the NPPA gene CpG island and promoter locus was however absent on MeDIP-chip (data not

As a proof of concept that DNA methylation in a subset of genomic loci may connect end-stage cardiomyopathy with different etiologies, we set out initially to profile a series of heterogenous cardiomyopathic left ventricles and a single normal control (diseased: samples I, II, III; control: sample A; Table 1) using MeDIP-chip (Method summarised in Figure S1). We used the Nimblegen ‘‘CpG island and promoter’’ microarray chip (Roche Nimblegen, WI) which covers all annotated Human Refseq gene promoters (24,659) and CpG islands (28,226) as annotated on the UCSC genome browser. Based on a DMR (differentially methylated region) Tstatistic.+3.0 and ,23.0, candidate regions (CM-DMRs) were identified using the validated algorithm-Bayesian tool that had been developed specifically for handling MeDIP data (BAT-

Table 1. Details of patient LV samples.

LV sample Control

End-stage cardiomyopathy

Code

Age

Details

Medications

Asian

RTA

Hypoxic brain damage secondary to drowning

LV non-compaction, no history of coronary artery disease

Bumetanine, Amiodarone, Enalapril

Idiopathic, no history of CAD

Aspirin. Bisoprolol, Frusemide, Spironolactone

III

Ischaemic, diabetic

Clopidogrel, Ramipril, Digoxin, Spironolactone, Bumetanide, Gliclazide, Simvastatin

Idiopathic, no history of CAD

Perindopril, Carvedilol, Warfarin

Idiopathic, no history of CAD

Frusemide, Spironolactone, Lisinopril, Amiodarone, Atorvastatin, Warfarin

Ischaemic

Clopidogrel, Perindopril, Nicorandil, Bisoprolol, Imdur, Spironolactone, Frusemide, Digoxin, Simvastatin

VII

Ischaemic, diabetic

Aspirin, Clopidogrel, Ramipril, Bisoprolol, Frusemide, Spironolactone, Frusemide, Insulin

VIII

Ischaemic

Aspirin, Ramipril, Carvedilol, Frusemide, Digoxin, Atorvastatin

RTA: road traffic accident; all LV samples were from Caucasian males (except A: Asian male). CAD: coronary artery disease. doi:10.1371/journal.pone.0008564.t001

PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8564

DNA Methylation, Heart Failure

Figure 1. Differential DNA methylation profile for 3 candidate CM-DMR. (A) DMR24, (B) DMR36 and (C) DMR11. DNA methylation (%) was determined for a set of 8 LV samples (4 controls: B–E, 4 diseased (CM): IV–VII) by BS-PCR-sequencing as detailed in methods. Lower panel, DMR24 and DMR36 lie within the body of the genes: AMOTL2 and ARHGAP24; whereas DMR11 lies in the 59 regulatory region of PECAM1. doi:10.1371/journal.pone.0008564.g001

hypermethylation in the 59 region of PECAM1 in dilated hearts (DMR11, Figure 1C) correlated with decreased expression of PECAM1 (Figure 2C). Similarly consistent with the predicted effect of gene body methylation, hypomethylation within the gene body of AMOTL2 (DMR24, Figure 1A) correlated with reduced expression of AMOTL2 (Figure 2A); and hypermethylation within the gene body of ARHGAP24 (DMR36, Figure 1B) correlated with increased expression of ARHGAP24 (Figure 2B). We further analysed the correlation between gene expression and differential methylation in each individual LV tissue directly and independently of disease, and found as predicted, that 59

shown), indicating that NPPA gene expression may not be controlled by DNA methylation in this context or differential NPPA gene expression may correlate to differential methylation at a non-CpG island locus that was not present on the Nimblegen chip. In contrast, qPCR for transcripts of the 3 genes that were associated with the 3 CM-DMRs that we had identified, showed statistically significant differential expression between control LV and CM LV (Figure 2; p = 0.004 for AMOLT2, p,0.001 for ARHGAP24, p = 0.006 for PECAM1). Consistent with the predicted effect of hypermethylation in the 59 regulatory region of genes,

Figure 2. Differential DNA methylation for 3 CM-DMR correlates with differential gene expression. (A–C) Gene expression profiles for the gene corresponding to each DMR: AMOTL2, ARHGAP24, and PECAM1. Quantitative PCR was performed in a set of 11 LV samples (4 controls: C–F, 7 CM: II–VIII). QPCR experiments were performed in triplicate for each sample. ** p,0.05. (D–F) Correlation between gene expression and DNA methylation using Spearmans rank order correlation coefficient. doi:10.1371/journal.pone.0008564.g002

PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8564

DNA Methylation, Heart Failure

region methylation correlated inversely to gene expression (DMR11 in Figure 2F). Similarly for DMR24 and DMR36, a positive correlation existed between gene body methylation and gene expression (Figure 2D and E). Importantly, although differential gene expression patterns were opposite for AMOTL2 and ARHGAP24 in dilated hearts (decreased and increased expression respectively), the correlation between methylation and gene expression was positive in both. Moreover, individual LV samples with the highest or lowest expression of each of these genes mapped to the predicted methylation state in their correlation between methylation and expression independently of disease state.

differential methylation corresponds to differential gene expression. This was anticipated since several disease processes and molecular pathways such as apoptosis, dyregulated calcium signalling, decompensated contractility, G-coupled protein receptor down-regulation, and maladaptive angiogenesis also connect cardiomyopathy of disparate etiologies [2]. Dysregulated gene expression and differential DNA methylation of angiogenic factors may indeed connect heart failure with different etiologies, in the same manner that raised levels of NPPA and the re-induction of a fetal gene programme marks heart failure irregardless of its inciting cause [2]. As with cDNA microarray experiments utilizing whole heart tissue with mixed cell populations, our findings may also reflect a change in the predominance of a particular cell type in each tissue sample. We have however not detected any expression pattern change in gene families that are characteristically specific to non-myocyte cell types such as fibroblasts (data not shown). At a broad genomic level, 3-dimensional intra-chromosomal and inter-chromosomal DNA-DNA interactions may partition the genome into active and inactive domains [32]. These interactions characteristically involve DNA regulatory elements. Protein complexes comprising of DNA-binding proteins bound to these regulatory elements may hold together DNA conformations to form chromatin centres of active transcription or transcriptional factories. By altering access to DNA-binding proteins, different DNA methylation states of these DNA regulatory elements in disease versus control may therefore be responsible for important changes in 3-D conformations and hene, gene expression [33]. Although in this study, we have examined the effect of DNA methylation on local proximal gene expression, instead of a linear series of genes and promoters, the 3-D model of transcriptional regulatory networks [32] suggests that altered DNA methylation may influence the expression of genes in sites that are distal or distant [33]. Nevertheless for the control of proximal gene expression, others have previously reported that a 6% methylation difference within the RASSF1 gene comparing between methylation profiles of the cerebellum and cerebrum corresponded with a 2-fold difference in RASSF1 gene expression in these 2 different parts of the brain [22]. More recently Barres et al showed that 2–5% methylation difference in the PGC1A promoter related to a 3.5 fold difference in PGC1a gene expression in the vastus lateralis muscle of patients with type II diabetes [23]. In breast cancer, 8% methylation difference was associated with a 1.5 to 3.5 fold expression difference of the ATM gene [34]; and comparing between children conceived in vitro or in vivo, 7% and 9.7% methylation difference in the COPG2 and CEBPA genes related to 2.05 and 1.77 fold changes in gene expression respectively [35]. Using the BS-PCR-sequencing strategy we have found here that differential methylation between control and CM ranged up to 32% (Figure S1 and data not shown). More importantly, we have demonstrated the biological significance for this range of differential methylation in 3 DMRs. In the case of DMR24/AMOTL2, 2–3% differential methylation between diseased hearts and controls correlated with a 2.5-fold decrease in AMOTL2 gene expression. For DMR36/ARHGAP24, a 3–5% difference in methylation corresponded to a 2.5-fold increase in gene expression. Functionally, AMOTL2 belongs to the angiomotin family which mediates inhibition of endothelial cell migration and tube formation by binding to angiostatin [36]. ARHGAP genes encode RhoGAP family proteins, and using subtraction-hybridization in endothelial cells undergoing capillary-tube formation, ARHGAP24 (also known as p73RhoGAP) was also found to regulate capillarytube formation [37]. ARHGAP24 expression was up-regulated in an angiogenic milieu but unchanged under non-angiogenic

Discussion In eukaryotes, DNA methylation occurs by the addition of a methyl group to the carbon 59 position of the nucleotide cytosine ring, and cytosine methylation in mammals, is found most commonly in the context of the sequence 59-CG-39, which is also referred to as a CpG dinucleotide. In the mammalian genome, an estimated 70% of all CpGs are methylated [27]. Unmethylated CpG on the other hand are largely grouped in clusters called ‘‘CpG islands’’ in the 59 regulatory region of many genes. The frequency of CpG dinucleotides in ‘‘CpG islands’’ is higher than is found in other DNA regions. Notably, differential methylation of CpG islands is part of the epigenetic variation found in humans [18,27]. Consistent with previous observations correlating gene expression with DNA methylation [9–11], we have found that hypermethylation within the 59 region of the PECAM1 gene correlated with its reduced expression in different cardiac samples. Hypermethylation within the body of the ARHGAP24 gene correlated with its increased expression; and hypomethylation within the body of the AMOTL2 gene correlated with its decreased expression. Although our study was not designed to address the question of causality, current evidence suggests that locus-specific DNA methylation either permissively or necessarily controls gene-specific expression [16,17,21,27]. This may occur by the binding of methylated DNA binding domain (MBD) proteins and polycomb group proteins which displaces the transcription machinery and thereby maintaining epigenetic silencing of transcriptional activity [9]. The hypermethylation mark may in some cases, be laid down subsequent to transcriptional down-regulation of a gene [28], but experiments using the DNA methyl-transferase inhibitor (59deoxyazacytidine) show that de-methylation of a specific gene promoter can re-establish gene expression [21,29]. Similarly the interaction between histone modification and DNA methylation in regulating gene expression is currently unclear, and may indeed be locus specific. At least in some cases, histone deacetylase inhibitors such as Trichostatin A is also required with de-methylating agents in order to restore gene expression in an otherwise densely methylated gene locus [30]. The interaction between histone de-acetylation and DNA methylation in myocardial disease will be particularly important to understand since the critical role of histone deacetylases (HDAC) in cardiac hypertrophy and heart failure has already been established [8,31]. This is the first demonstration that differential DNA methylation exists between human end-stage cardiomyopathic hearts and normal controls. The utility of the MeDIP-chip dataset may be limited because only a single control heart was used, but we have also used a second series of hearts and verified using the second methodology of bisulphite sequencing, that differential methylation exists in heart failure. Moreover despite the heterogeneity of our samples, we have found that for at least 3 genomic loci, PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8564

DNA Methylation, Heart Failure

using a hand-held homogenizer, treated with 200 mg/ml RNase A (Qiagen) for 15 min at room temperature and thereafter digested with 1 mg/ml Proteinase K (Roche Diagnostics, Burgess Hill, UK) overnight.

conditions [38]. The role of PECAM1 (or CD31) and regulation of its expression in angiogenesis have also long been investigated [38,39]. Our current findings implicating the differential expression of these 3 genes in end-stage heart failure may reflect adaptive or maladaptive angiogenic processes in disease pathogenesis, and will require further direct investigation. Even so, a transcriptomic analysis that was performed recently using endomyocardial biopsies from patients with new onset heart failure revealed that disease prognosis may be predicted based on the expression profile of a series of genes included genes of angiogenic factors, one of which was ARHGAP26 [4]. Moreover, the important role of angiogenesis has been demonstrated before, both in human ischaemic heart disease [40] and experimental models of nonischaemic related heart failure [41]. Our results show that differential DNA methylation occurs in human end-stage cardiomyopathy. Gene expression is dysregulated in heart failure and a subset of this connecting end-stage disease with different etiologies may be explained by differential DNA methylation together with other epigenetic mechanisms such as histone de-acetylation. As these epigenetic mechanisms may be altered by the environment and diet, differential DNA methylation may be responsible for integrating environmental/dietary signals and inherited traits to influence heart failure pathogenesis and progression. Larger studies will be needed to identify other differentially methylated genomic loci that bear a statistically significant association to heart failure. Moreover recent evidence suggests that future analysis should also include DNA regions outside of CpG islands because at least in human colon cancer, significant methylation variation was found in sequences .2 kb away from promoters and CpG islands (termed ‘‘CpG island shores’’) [42]. Unravelling these additional complex layere of gene expression control will improve therapeutic options and alter patient management for this complex disease.

Methylated-Cytosine DNA Immunoprecipitation– Microarray Chip (MeDIP-Chip) According to protocols previously described (26–28), at least 25 mg of gDNA sample was diluted in TE buffer (10 mM TrisHCl, pH 7.5, 1 mM EDTA) and sheared to between 100–800 bp fragments using Bioruptor (Diagenode, Belgium). 4 mg of each sample was saved as INPUT and the rest heated to 95uC for 10 min and immediately placed on ice. Immunoprecipitated was performed using 2.5 mg of a-59methyl-cytosine antibody per mg of sheared gDNA in IP buffer (20 mM Na-Phosphate, pH 7.0, 1 M NaCl, 2% Triton-X100). Samples were rotated overnight at 4uC and 10 ml of 50% Protein-A Agarose slurry (pre-washed in 0.1% BSA-PBS and equilibrated in IP buffer) was subsequently added per mg of DNA. Samples were rotated for further 2.5 hr and washed 3 times with IP buffer before elution using 250 ml lysis buffer (1 M Tris-HCl, pH 8.0, 0.5 M EDTA, 10% SDS, 280 mg/ml Proteinase K) and incubation for 2 hr at 55uC. MeDIP was purified and precipitated using phenol and chloroform:isoamyl alcohol. The extent of methylated DNA enrichment in our MeDIP samples was verified by qPCR for the normally methylated target region of OXT (Figure S1B and Figure S4). We also verified the expected depletion of the unmethylated target region of UBE2B. Four-mg of INPUT and MeDIP for each sample were labelled with Cy3 and Cy5, respectively and co-hybridised to the Nimblegen ‘‘CpG island and promoter’’ microarray chip (Nimblegen, WI). This Nimblegen array chip comprises of 385,000 isothermal probes of between 50–75 mer length with a median probe spacing of 101 bp, and is based on the HG18 human genome assembly. These probes cover all reported Human Refseq gene promoters (24,659) that ranged from 2800 bp to +200 bp relative to transcription start sites (TSS) and all reported CpG islands (28,226) annotated on the UCSC genome browser.

Materials and Methods Ethics Statement Human myocardium was collected by a protocol approved by the Papworth (Cambridge) Hospital Tissue Bank review board and the Cambridgeshire Research Ethics Committee (UK). Written consent was obtained from every individual according to the Papworth Tissue Bank protocol.

Bisulfite Treatment of gDNA, PCR and Massive Parallel Amplicon Sequencing Bisulfite (BS) conversion of gDNA was performed using the EZ DNA Methylation-Gold kit (Zymo Research, Orange, CA) according to manufacturer’s protocol. PCR was performed using BS-treated gDNA samples as template, and BS-specific primers that were designed at a minimum length of 20 bp against selected DMRs using the MethPrimer program. (http://www.urogene. org/methprimer/index1.html). Composite primers were designed by incorporating 19 bp Roche FLX Primer-A and Primer-B specific sequences to the 59-ends of Forward and Reverse primers respectively. Primers were tested to confirm the amplification of BS-treated gDNA only and individual PCR products were run on 2% agarose gels to verify product size (data not shown). Concentration of PCR amplicons was determined using the QIAxcel system and amplicons from each LV sample were pooled at equimolar concentrations and sequenced on individual lanes in a Next Generation 454 FLX sequencing machine (Roche). We obtained between a 4.4–7.3 million bp reads per sample per lane of FLX sequencing which corresponds to an average number of 24,000 reads of 250 bp average product size per lane. This in turn corresponded to an average depth of about 1,100 reads per individual PCR product.

Human Left Ventricular Myocardium Left ventricular (LV) tissue was obtained from male patients undergoing cardiac transplant for end-stage heart failure. Normal non-donor suitable human LV tissue was from healthy male individuals involved in road traffic accidents. At the time of transplantation or donor harvest, whole hearts were removed after preservation and transported in cold cardioplegic solution (cardioplegia formula and Hartmann’s solution) similar to the procedure described before at Imperial College, London [43]. Following analysis by a cardiovascular pathologist (M.G.), left ventricular segments were cut and stored immediately in RNAlater (Ambion, Austin, Tx). Individual patient details are listed in Table 1. Integrity of genomic DNA (gDNA) and RNA isolated from each tissue was verified by Nanodrop (Thermo Scientific, Wilmington, DE), QIAxcel system (Qiagen, Crawley, West Sussex, UK) (for DNA) and 2100 Bioanalyzer (Agilent Technologies, Stockport, Cheshire, UK) (for RNA).

Genomic DNA Isolation gDNA was isolated from LV samples using Genomic DNA G100 Tips (Qiagen, Crawley, UK). Samples were homogenised PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8564

DNA Methylation, Heart Failure

Amplicon sequence reads corresponding to the BS-PCR products were aligned to the Human Reference Genome, and the extent of methylation (DNA methylation%) was determined by comparing the total number of Cs (methylated) to Ts (unmethylated) for each CpG site in a single DMR.

significance. Analysis for the association between DNA methylation and gene expression was performed using Spearmans rank order correlation coefficient.

Supporting Information Figure S1 Methodology (A) Work flow for MeDIP-chip and BSPCR-sequencing using different sets of LV samples. (B) Enrichment for methylated gene, OXT, and non-enrichment of unmethylated gene, UBE2B, demonstrated by qPCR validates the effectiveness of MeDIP. Found at: doi:10.1371/journal.pone.0008564.s001 (1.28 MB TIF)

RNA Extraction, cDNA Synthesis and Expression Quantitative PCR At least 30 mg of frozen LV sample was thawed in 1 ml of TRIreagent (Sigma-Aldrich, St Louis, MO) and homogenised for 3 times 20 sec bursts, using the Lysing Matrix (QBiogene, Cambridge, UK) in a FastPrep machine (FP120, QBiogene). Thereafter, beads were centrifuged at 3000 rpm for 3 min, supernatant transferred to a clean Eppendorf and RNA extraction was performed according to manufacturer’s protocol. Twenty-ml of cDNA was synthesised from 1 mg of Total RNA, using a mixture of both oligo-dT and random hexamers and the ‘‘Superscript-III first strand cDNA synthesis kit’’ (Invitrogen, Paisley, UK). Integrity of RNA for all samples was checked using the 2100 Bioanalyzer (Agilent Technologies). Quantitative real-time PCR for house keeping genes was initially performed using 4 ml of 1:20 pre-diluted cDNA in a 20 ml reaction and Taqman Gene Expression Assays specific for 18S, GAPDH, RPLPO and TBP. Using geNorm (http://medgen.ugent.be/,jvdesomp/genorm/#PrimerDesign) we determined that of these housekeeping genes, RPLPO was most stable for both control and diseased samples. Quantitative PCR performed for target genes using validated Taqman Gene Expression Assay primers (Applied Biosystems, Foster City, CA) was therefore normalised against RPLPO. Quantitative PCR for both target genes and RPLPO were performed at least in triplicate on the same diluted cDNA samples.

Figure S2 Global differential DNA methylation in end-stage dilated cardiomyopathy. BATMAN analysis of MeDIP-chip for one control LV vs. average of 3 LV from end-stage cardiomyopathy hearts (CM). Heatmap represents BATMAN DMR Tstatistic score. Found at: doi:10.1371/journal.pone.0008564.s002 (4.91 MB TIF) Figure S3 Quantitative PCR for NPPA showing .10-fold upregulation in CM samples compared to control. Found at: doi:10.1371/journal.pone.0008564.s003 (0.20 MB TIF)

Examples for 3 CM-DMRs that were identified by MeDIP-chip in relation to an integrated resource of tissue-DMRs in other normal somatic and germ-line tissues (ref. 28). Found at: doi:10.1371/journal.pone.0008564.s004 (4.95 MB TIF)

Figure S4

Acknowledgments The authors thank members of the Division of Cardiovascular Medicine, ACCI for helpful discussions and input.

Statistical Analysis

Author Contributions

Analysis for MeDIP-chip was performed using the BATMAN algorithm as previously described (27). Analysis for quantitative PCR was performed using the non-parametric Mann-Whitney ttest and two-tailed p values were used to determine statistical

Conceived and designed the experiments: MM MRB RSF. Performed the experiments: MM MkC RSF. Analyzed the data: MM TD. Contributed reagents/materials/analysis tools: MJG. Wrote the paper: RSF.

References 1. Mudd JO, Kass DA (2008) Tackling heart failure in the twenty-first century. Nature 451: 919–928. 2. Hill JA, Olson EN (2008) Cardiac plasticity. N Engl J Med 358: 1370–1380. 3. Dorn GW 2nd, Matkovich SJ (2008) Put your chips on transcriptomics. Circulation 118: 216–218. 4. Heidecker B, Kasper EK, Wittstein IS, Champion HC, Breton E, et al. (2008) Transcriptomic biomarkers for individual risk assessment in new-onset heart failure. Circulation 118: 238–246. 5. van Rooij E, Liu N, Olson EN (2008) MicroRNAs flex their muscles. Trends Genet 24: 159–166. 6. van Rooij E, Sutherland LB, Liu N, Williams AH, McAnally J, et al. (2006) A signature pattern of stress-responsive microRNAs that can evoke cardiac hypertrophy and heart failure. Proc Natl Acad Sci U S A 103: 18255– 18260. 7. Zhang CL, McKinsey TA, Chang S, Antos CL, Hill JA, et al. (2002) Class II histone deacetylases act as signal-responsive repressors of cardiac hypertrophy. Cell 110: 479–488. 8. Backs J, Olson EN (2006) Control of cardiac growth by histone acetylation/ deacetylation. Circ Res 98: 15–24. 9. Klose RJ, Bird AP (2006) Genomic DNA methylation: the mark and its mediators. Trends Biochem Sci 31: 89–97. 10. Rauch TA, Wu X, Zhong X, Riggs AD, Pfeifer GP (2009) A human B cell methylome at 100-base pair resolution. Proc Natl Acad Sci U S A 106: 671–678. 11. Ball MP, Li JB, Gao Y, Lee JH, LeProust EM, et al. (2009) Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nat Biotechnol 27: 361–368. 12. Wolffe AP, Jones PL, Wade PA (1999) DNA demethylation. Proc Natl Acad Sci U S A 96: 5894–5896. 13. Rai K, Huggins IJ, James SR, Karpf AR, Jones DA, et al. (2008) DNA demethylation in zebrafish involves the coupling of a deaminase, a glycosylase, and gadd45. Cell 135: 1201–1212.

PLoS ONE | www.plosone.org

14. Metivier R, Gallais R, Tiffoche C, Le Peron C, Jurkowska RZ, et al. (2008) Cyclical DNA methylation of a transcriptionally active promoter. Nature 452: 45–50. 15. Rakyan VK, Beck S (2006) Epigenetic variation and inheritance in mammals. Curr Opin Genet Dev 16: 573–577. 16. Jaenisch R, Bird A (2003) Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet 33 Suppl: 245–254. 17. Jirtle RL, Skinner MK (2007) Environmental epigenomics and disease susceptibility. Nat Rev Genet 8: 253–262. 18. Peaston AE, Whitelaw E (2006) Epigenetics and phenotypic variation in mammals. Mamm Genome 17: 365–374. 19. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, et al. (2005) Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A 102: 10604–10609. 20. Ingrosso D, Cimmino A, Perna AF, Masella L, De Santo NG, et al. (2003) Folate treatment and unbalanced methylation and changes of allelic expression induced by hyperhomocysteinaemia in patients with uraemia. Lancet 361: 1693–1699. 21. Esteller M (2008) Epigenetics in cancer. N Engl J Med 358: 1148–1159. 22. Mill J, Tang T, Kaminsky Z, Khare T, Yazdanpanah S, et al. (2008) Epigenomic profiling reveals DNA-methylation changes associated with major psychosis. Am J Hum Genet 82: 696–711. 23. Barres R, Osler ME, Yan J, Rune A, Fritz T, et al. (2009) Non-CpG methylation of the PGC-1alpha promoter through DNMT3B controls mitochondrial density. Cell Metab 10: 189–198. 24. Backdahl L, Bushell A, Beck S (2009) Inflammatory signalling as mediator of epigenetic modulation in tissue-specific chronic inflammation. Int J Biochem Cell Biol 41: 176–184. 25. Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, et al. (2008) A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol 26: 779–785.

January 2010 | Volume 5 | Issue 1 | e8564

DNA Methylation, Heart Failure

35. Katari S, Turan N, Bibikova M, Erinle O, Chalian R, et al. (2009) DNA methylation and gene expression differences in children conceived in vitro or in vivo. Hum Mol Genet 18: 3769–3778. 36. Bratt A, Wilson WJ, Troyanovsky B, Aase K, Kessler R, et al. (2002) Angiomotin belongs to a novel protein family with conserved coiled-coil and PDZ binding domains. Gene 298: 69–77. 37. Su ZJ, Hahn CN, Goodall GJ, Reck NM, Leske AF, et al. (2004) A vascular cellrestricted RhoGAP, p73RhoGAP, is a key regulator of angiogenesis. Proc Natl Acad Sci U S A 101: 12212–12217. 38. Woodfin A, Voisin MB, Nourshargh S (2007) PECAM-1: a multi-functional molecule in inflammation and vascular biology. Arterioscler Thromb Vasc Biol 27: 2514–2523. 39. Cao G, Fehrenbach ML, Williams JT, Finklestein JM, Zhu JX, et al. (2009) Angiogenesis in platelet endothelial cell adhesion molecule-1-null mice. Am J Pathol 175: 903–915. 40. Lee SH, Wolf PL, Escudero R, Deutsch R, Jamieson SW, et al. (2000) Early expression of angiogenesis factors in acute myocardial ischemia and infarction. N Engl J Med 342: 626–633. 41. Sano M, Minamino T, Toko H, Miyauchi H, Orimo M, et al. (2007) p53induced inhibition of Hif-1 causes cardiac dysfunction during pressure overload. Nature 446: 444–448. 42. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, et al. (2009) The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 41: 178–186. 43. Adamson DL, Money-Kyrle AR, Harding SE (2000) Functional evidence for a cyclic-AMP related mechanism of action of the beta(2)-adrenoceptor in human ventricular myocytes. J Mol Cell Cardiol 32: 1353–1360.

26. Rakyan VK, Down TA, Thorne NP, Flicek P, et al. (2008) An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res 18: 1518–1529. 27. Issa JP (2004) CpG island methylator phenotype in cancer. Nat Rev Cancer 4: 988–993. 28. Enver T, Zhang JW, Papayannopoulo T, Stamatoyannopoulos G (1988) DNA methylation: a secondary event in globin gene switching? Genes Dev 2: 698–706. 29. Lujambio A, Ropero S, Ballestar E, Fraga MF, Cerrato C, et al. (2007) Genetic unmasking of an epigenetically silenced microRNA in human cancer cells. Cancer Res 67: 1424–1429. 30. Hong C, Moorefield KS, Jun P, Aldape KD, Kharbanda S, et al. (2007) Epigenome scans and cancer genome sequencing converge on WNK2, a kinaseindependent suppressor of cell growth. Proc Natl Acad Sci U S A 104: 10974–10979. 31. Haberland M, Montgomery RL, Olson EN (2009) The many roles of histone deacetylases in development and physiology: implications for disease and therapy. Nat Rev Genet 10: 32–42. 32. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289–293. 33. Cedar H, Bergman Y (2009) Linking DNA methylation and histone modification: patterns and paradigms. Nat Rev Genet 10: 295–304. 34. Flanagan JM, Munoz-Alegre M, Henderson S, Tang T, Sun P, et al. (2009) Gene Body Hypermethylation of ATM in Peripheral Blood DNA of Bilateral Breast Cancer Patients. Hum Mol Genet 18: 1332–1342.

PLoS ONE | www.plosone.org

January 2010 | Volume 5 | Issue 1 | e8564

MASTERING CHANGE

Breakthrough in 5-hmC quantitation for epigenetics Interested in simplifying the study of DNA methylation, particularly 5-hydroxymethylcytosine? Try the EpiMark™ 5-hmC and 5-mC Analysis Kit, a robust enzymatic method for the locus-specific detection of methylated (5-mC) and hydroxymethylated (5-hmC) cytosine. As the first commercially available PCR-based kit to reproducibly identify and quantitate the presence of 5-hmC, this simple 3-step protocol will expand your potential for epigenetics research and biomarker discovery. Identify and quantitate methylation states with the EpiMark™ 5-hmC and 5-mC Analysis Kit % 5-hmC % Unmethylated C

• Compatible with existing techniques (PCR) • Amenable to high throughput

80 60

Visit neb.com/epigenetics to learn more, and explore the complete listing of EpiMark validated products from NEB.

40 20 0

• Reproducible quantitation of 5-hmC and 5-mC • Easy-to-use protocols

% 5-mC 100

Advantages:

Brain

Liver

Heart

Spleen

Analysis of the different methylation states in Balb/C mouse tissue samples shows a variation in the amount of 5-hmC present at locus 12.

CLONING & MAPPING

DNA AMPLIFICATION & PCR

RNA ANALYSIS

PROTEIN EXPRESSION & ANALYSIS

GENE EXPRESSION & CELLULAR ANALYSIS