Leakey Foundation Final Report
MHCs, mate choice, and dispersal decisions in wild Cebus capucinus Katharine Jack1 and Jessica Lynch Alfaro2 1. Tulane University, Department of Anthropology, kjack@tulane.edu 2. University of California at Los Angeles, Institute for Society and Genetics, jlynchalfaro@ucla.edu
Brief Summary: We set out to examine the role of the major histocompatibility complex (MHC) in the mating and dispersal decisions of white-faced capuchins (Cebus capucinus) in Santa Rosa National Park, Costa Rica. Fecal samples were collected individuals in five groups (N=140). We developed and validated a new MHC protocol using bacterial cloning and 454 next-generation sequencing to examine allelic diversity at the MHC-DRB exon 2 (N=75). Our future work will sequence the remaining individuals thereby enabling us to address our research questions. We will also examine a second allele (DQB exon 3) to further investigate MHC diversity within the study population.
Publication Summary: The first publication from our Leakey-funded research will describe and compare the protocol, validation and population-level findings of the bacterial cloning and next-generation sequencing techniques for the MHC DRB exon 2 alleles in the Santa Rosa capuchin monkeys. The paper will also characterize the diversity within the population and situate the information in the context of the phylogeny of MHC DRB exon 2 alleles for all primates that have allelic variation described for this gene. We will also discuss some of the limitations and cautions in regards to using these techniques with fecal samples from wild primates.
1
FINAL REPORT: January 2014 PROJECT OVERVIEW To address questions about the relationship of MHC genes to social behavior and dispersal/mating decision making in wild primates, we developed a new protocol for processing fecal samples using next generation techniques to sequence them, on a 454 platform. Much of our research effort during the grant period was spent developing and validating these techniques, for several reasons: very little MHC data has been published for capuchin monkeys, so we had to discover the best primer combination through trial and error; MHC genes had not been sequenced before in fecal samples for Neotropical primates, so we had to validate our techniques using tissue and blood samples in concert with our target samples; and the 454 technique is also relatively novel for MHC genes, so we had to troubleshoot the preparation protocol for successful sequencing. After the fecal samples were collected in the field and transported to our laboratory in UCLA, we developed primers to maximize harvesting of allelic diversity in the population for the MHC DRB 2 exon. We first performed bacterial cloning of 24 colonies/sample on eight samples. Our results from the bacterial cloning (see below) indicated that within population variation in alleles was extremely high, and that it would be beneficial to employ next generation sequencing methods to capture thousands of sequences per individual in order to be able to better characterize the allelic diversity within the population. The next step was to develop the methods to run the samples on 454 Junior sequencing runs. Finally, we developed quality control techniques and programming codes to manage the sequence data, and are currently working on the data analysis of the MHC sequence data in relation to social, life history, and other genetic information for the capuchin monkey individuals within the population at Santa Rosa. The rationale for using this new innovative technique (“454�) is: 1) this has been done in very few studies to date (so may get a higher impact paper based on "novelty" of method being applied), 2) the method is more efficient and less work intensive, 3) the cost is more reasonable, 4) the method enables us to avoid the problem of recovering multiple sequences of each allele to confirm novel mutations, 5) if the primers work, we can potentially capture pseudogenes, 6) we could run samples for ALL individuals for whom we have collected samples, and 7) high throughput and higher quality control-- we get many copies of each allele sequenced (~80x coverage/gene copy based on 6 alleles/individual).
LABORATORY METHODS Samples were placed in sterile 50 ml tubes containing lysis buffer and stored at 4Âş C until use. Genomic DNA was extracted from fecal matter, as well as tissue and blood samples, using the QIAmp DNA Stool Kit (Qiagen, Valencia, CA) and the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA), respectively.
2
Bacterial Cloning and Sequencing: We selected four Santa Rosa Cebus capucinus individuals for cloning; we amplified Baloo, Luna, and Malfoy from fecal samples, and Hedwig from tissue. Luna was cloned twice, independently, to assess reproducibility of our results using fecal samples. We used a combination of fecal, tissue, and blood samples to assess the feasibility of capturing MHC allelic diversity using non-invasive sampling methods (fecal). We also selected three samples as outgroups: Cebus capucinus from Quepos, in southern Costa Rica (blood), Cebus albifrons (tissue; from Brazil), and Sapajus apella (blood; captive animal). PCR amplifications were performed in 20 ul total volume reactions containing 2.0 ul 5x GoTaq Flexi Buffer (Promega), 2.0 ul 25mM MgCl2, 0.16 ul dNTP (25 mM each), 0.8 ul (each) forward and reverse primer (10 uM), 0.16 ul GoTaq Flexi DNA Polymerase (5U/ul), and 5 ng DNA. Primer sequences used in this study were modified from primers published by Kriener et al. (2001) (Forward: Tub1F NWP alt, 5’-TGTCATCACTTCAACGGGACGG-3’; Reverse: k5Rceb, 5’-CTCTCCGCTGCACTGTGAAGCTCTC-3’). PCR cycle conditions were as follows: 2 minute denaturation at 94°C, followed by 35 cycles of 94°C for 30 s; 60°C for 40 s; 72°C for 1 minute, then a final extension at 72°C for 7 minutes. The PCR products were purified using AMPure XP (Agencourt), then cloned using the pGEM Tvector system (Promega) following the manufacturer’s protocol. Transformed cells were plated on LB plates containing 100 mg/ml ampicillin and IPTG/X-gal, then incubated overnight at 37°C. Twenty-four white colonies per individual were transferred from plates to 1.0 ml LB + amp (100 mg/ml) broth and grown overnight at 37°C. Cultured cells were then purified using the Qiagen Miniprep kit (Qiagen, Valencia, CA) and amplified using the same PCR recipe described above, but with M13 primers. PCR cycle conditions were as follows: 2 minute denaturation at 94°C, followed by 32 cycles of 94°C for 30 s; 60°C for 30 s; 72°C for 1 minute, then a final extension at 72°C for 7 minutes. PCR products were purified using ExoSap (Amersham Biosciences) then bidirectionally sequenced using the BigDye Terminator v.3.1 cycle sequencing kit (1/8th reaction) (Applied Biosystems) on an ABI 3730xl Genetic Analyzer (Applied Biosystems) at Cornell University. Sequence contigs were generated and analyzed in Geneious 5.5 (Biomatters Inc.). Contigs were checked for vector contamination using VecScreen (http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html). Alleles obtained at least twice, from either the same or different individuals, were identified and retained for further analyses.
454 sequencing: For the 454 sequencing, we selected samples from 75 individuals in the Santa Rosa population, and included one Sapajus apella sample as an outgroup. We used the same primers as described above for bacterial cloning. We ran duplicate samples for some individuals (Hedwig, Winky,
3
Fleur) to assess repeatability within and across runs. We also included six blanks as quality controls. In total, the MHC class II DRB locus was amplified in 81 samples using the polymerase chain reaction (PCR) and Roche GS FLX Titanium Fusion primers with 10 nt MID tag sequences. PCRs were carried out in two 25 ul reactions using the KAPA Hifi DNA Polymerase kit (KAPA Biosystems, Boston, MA). Reactions contained 8 ng DNA and 5.0 ul of 5x Fidelity buffer, 0.25 ul MgCl2 (25 mM), 0.75 dNTP (10 mM each), 1.5 ul forward + reverse primers (10 uM each), and 0.5 Hifi DNA polymerase (1 U/ul). Each individual was assigned a separate MID tag sequence and the appropriate primers were used for each reaction. Amplicons were generated using the following PCR cycle conditions: initial denaturation at 94°C for 4 minutes, followed by 35 cycles of 98°C for 20 s, 68°C for 20 s, 72°C for 40 s, then a final extension at 72°C for 5 minutes. Duplicate PCR reactions for each individual were pooled and run on a 1.2% agarose gel. Bands were excised and purified using the Qiaquick Gel Extraction Kit (Qiagen, Valencia, CA). Purified product concentrations were quantified using the Qubit dsDNA BR Assay (Life Technologies). Two equimolar pools were generated and sequenced in two Roche GS Junior sequencing runs at the GenoSeq Sequencing Core (University of California, Los Angeles). Data management and quality control: The junior sequencing runs produced over 150,000 sequences for MHC DRB exon 2. We took several steps to improve the quality of the data. First we selected only those sequences that were the correct basepair length (194 bp, corresponding to a 64 amino acid sequence in the Frame 3 position) to match this gene fragment based on the primers used. We removed all sequences that had bases with Phred scores lower than 20 (a measure of probability of the correct base pair read), as well as all sequences that had mismatches to the primer regions. After our clean-up process, 2 of the 6 blanks did still have some sequences present (5 sequences for one blank, and 27 for the other—this is in contrast to the samples, which had from 500-2600 sequences per individual after clean up). The highest number of identical sequences found in these blanks was eleven. Other than the one case of eleven, the highest number of identical sequences in the blanks was four. Based on this information, we cleaned the data in three different ways: accepting only those alleles that were represented 25 or more times, 12 or more times, and five or more times, and compared the results using each of these thresholds. For our population level analyses of number and quantity of alleles and amino acid sequences present for this gene across the population, we only included alleles that were found in at least two individuals in the population.
4
RESULTS Cloning: As described above, we amplified MHC genes from three Santa Rosa individuals from fecal samples, and one from tissue. One Santa Rosa individual (Luna) was cloned twice, independently, to assess reproducibility of our results. We also selected three samples as outgroups: Cebus capucinus from Quepos, in southern Costa Rica (blood), Cebus albifrons (tissue; from Brazil), and Sapajus apella (blood; captive animal). We found a high diversity of alleles (33 unique alleles for Santa Rosa, with 26 unique functional amino acid chains—plus four non-functional alleles with stop codons) and a significant amount of allele sharing within the Santa Rosa population, and we saw no indication that the blood or tissue samples were amplifying differently than the fecal samples. However, our replication of the bacterial cloning with the same individual, Luna, resulted in somewhat discordant results across the two experiments. The most common alleles in each experiment were also shared across experiments (n = 5 alleles recovered in both cloning experiments), but there were also six alleles unique to the Luna 2 experiment, and two alleles recovered solely in the Luna 1 experiment. This suggested to us that allelic diversity might be so high in Cebus capucinus that bacterial cloning of 24 clones/individual might not be enough to capture reliably all the allelic diversity per individual at that locus. This was one of the main drivers in our decision to implement 454 next generation sequencing of the MHC DRB alleles. The Cebus capucinus sequences from an individual in southern Costa Rica (Quepos) shared several alleles in common with the Santa Rosa capuchins, and these alleles were distributed within several of the different clades also found for Santa Rosa. The Cebus albifrons individual did not have alleles in common with either the Santa Rosa or the Quepos Cebus capucinus individuals, but the primers did amplify six different clades of MHC alleles for Cebus albifrons, distributed throughout the tree for C. capucinus. In contrast, at least for the individual sequenced, it appears that these primers are not as good at capturing MHC allelic diversity in Sapajus species; only three clades were recovered from the Sapajus apella sequences, none of them were shared alleles with any of the gracile capuchin samples, and one of them was the outgroup to the rest of the tree. These preliminary results suggest that MHC evolution shows a distinct pattern in gracile capuchins in comparison to robust capuchins, and that the primers we have developed seem promising within the gracile capuchin clade.
454 sequencing: After clean up and quality control, we counted a total of 85 distinct MHC DRB exon 2 alleles captured from the Santa Rosa population through 454 sequencing. One of these alleles was nonfunctional as an amino acid sequence (with a stop codon). Allele sharing ranged from the most common allele, shared by 72 of 75 individuals, to the rarest alleles in the population, each shared by 2 individuals. Amino acid sequence variation within the Santa Rosa population is depicted in Figure 1 below, with each row a unique sequence in the population and changes in color signifying changes in amino acid
5
sequence compared to the consensus sequence (i.e. most common sequence when comparing across the population). As seen for other species, the MHC DRB genes are highly variable in some sections of the sequence, and highly conserved in others.
Figure 1. Amino acid sequence variation within the Santa Rosa population
Comparison of bacterial cloning and 454 sequencing: There was strong concordance between the Santa Rosa MHC alleles recovered through bacterial cloning and through 454 sequencing. The 454 method recovered more alleles and more clades than cloning, as expected considering that we analyzed 75 individuals with 454 technology and only 4 individuals using the cloning method. Only six alleles that were recovered in the bacterial cloning were not sequenced by the 454 method.
Comparison of Cebus capucinus MHC alleles to those for other primates: We compared the Cebus capucinus alleles from the Santa Rosa population, as recovered from both bacterial and 454 methods, to the MHC-DRB alleles (or HLA equivalents) found on the GenBank database for other primate species (See Figure 2 below). Some of the Cebus capucinus alleles were nested in various clades with other Neotropical primate alleles: species in these clades included Sapajus apella, Callthrix jacchus, Saguinus 6
oedipus, Saguinus labiatus, and Cebuella pygmaea. In a major clade within the tree, Callicebus moloch was the outgroup to two sister clades, one entirely composed of Cebus capucinus alleles, and the other composed to two subclades, one all Aotus alleles for various owl monkey species, and the other all alleles from Old World Monkeys, apes, and humans.
Figure 2. Comparison of Cebus capucinus MHC alleles to those for other primates
7