Waddell et al, 1999

Page 1

Syst. Biol. 48(1):119–137, 1999

Assessing the Cretaceous Superordinal Divergence Times within Birds and Placental Mammals by Using Whole Mitochondrial Protein Sequences and an Extended Statistical Framework PETER J. WADDELL1,3 , YING CAO1 , MASAMI HASEGAWA1 , AND DAVID P. MINDELL2 1

Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569 , Japan; E-mail: waddell@ism.ac.jp (P. J. W.) 2 Department of Biology and Museum of Zoology, University of Michigan, Ann Arbor, Michigan 48109, USA

Abstract.— Using the set of all vertebrate mtDNA protein sequences published as of May 1998, plus unpublished examples for elephant and birds, we examined divergence times in Placentalia and Aves. Using a parsimony-based test, we identi ed a subset of slower evolutionary rate placental sequences that do not appear to violate the clock assumption. Analyzing just these sequences decreases support for Marsupionta and the carnivore + perissodactyl group but increases support for armadillo diverging earlier than rabbit (which may represent the whole Glires group). A major theme of the paper is to use more comprehensive estimates of divergence time standard error (SE). From the well-studied horse/rhino split, estimated to be 55 million years before present (mybp), the splitting time within carnivores is con dently shown to be older than 50 million years. Some of our estimates of divergence times within placentals are relatively old, at up to 169 million years, but are within 2 SE of other published estimates. The whale/cow split at 65 mybp may be older than commonly assumed. All the sampled splits between the main groups of fereuungulates (the clade of carnivores, cetartiodactyls, perissodactyls, and pholidotes) seem to be distinctly before the Cretaceous/Tertiary boundary. Analyses suggest a close relationship between elephants (representing Afrotheria) and armadillos (Xenarthra), and our timing of this splitting is coincident with the opening of the South Atlantic, a major vicarian t event. Recalibratin g with this event (at 100 mybp), we obtain younger estimates for the earliest splits among placentals . Divergence times within birds are also assessed by using previously unpublished sequences. We fail to reject a clock for all bird taxa available. Unfortunately, available deep calibration points for birds are questionable, so a new calibration based on the age of the Anseriform stem lineage is estimated. The divergence time of rhea and ostrich may be much more recent than commonly assumed, while that of passerines may be older. Our major concern is the rooting point of the bird subtree, as the nearest outgroup (alligator) is very distant. [bird phylogeny; mammal order phylogeny; mitochondrial genomes; molecular divergence times; sequencing errors.]

It is becoming increasingly popular to infer divergence time estimates based on molecular sequences. An important statistical principle is that a single point estimate without an associated standard error (SE) is near useless for critical assessment. In this paper, we pay particular attention to giving more realistic estimates of errors on these divergence times by incorporating and integrating alternative sources of error. This is necessary because if molecular biologists are to have a meaningful discussion with each other, or with paleontologists, on the issue of molecular divergence times, then they must be prepared to calculate comprehensive SE estimates (Waddell and Penny, 1996). 3 Present address: Institute of Molecular Biosciences, Massey University, Palmerston North, New Zealand, Email: waddell@onyx.si.edu.

Here our approach is to check if a clockconstrained maximum-likelihood (ML) tree is, or is not, rejected under the best- tting model we have. A gamma (G ) model is typically more appropriate for a coding sequence than is the assumption that all sites evolve at the same rate (see Swofford et al., 1996, for a review of such models). The model should allow better estimation of edge lengths, enabling more accurate inference of divergence times (Waddell and Penny, 1996). The edge lengths of the ML tree give arguably the most reliable estimates of relative divergence times because of the statistical ef ciency of ML estimation (Hasegawa et al., 1985, 1987, 1989; Kishino and Hasegawa, 1990). The next step is to pick the best calibration point(s) based on fossil or other information. Lastly, modelbased divergence timeestimates ought to be

119


120

VOL. 48

SYSTEMATIC BIOLOGY

accompanied by SE values (Hasegawa et al., 1989), preferably those that attempt to take into account all the major sources of error that can be quanti ed (Waddell, 1995:473– 476; Waddell and Penny, 1996). Following the techniques of the last two references, we integrate three major sources of error (fossil calibration error, nite sequence length error, and ancestral polymorphism) and quantify a fourth, sequencing error. We hope this approach will let people more clearly see the bene ts and pitfalls of calibration of molecular divergence times just as Felsenstein (1985) was able to highlight the need for good statistical support in analysis of molecular evolutionary trees. Here, we estimate when major lineages within mammals and birds originated. It has long been recognized that these divergences may have begun well back into the Cretaceous (e.g., Gregory, 1910; Novacek, 1993; Simpson, 1945). Recently, molecular data are con rming this view (e.g., Hedges et al., 1996; Cooper and Penny, 1997; Springer, 1997; Kumar and Hedges, 1998), but with much uncertainty remaining as to exactly when and which clades formed in the Cretaceous. This is changing. For example, evidence has been found for an elephant (representing Afrotheria) + armadillo (Xenarthra) clade (Stanhope et al., 1998; Waddell et al., 1999). Results below suggest a link between this group (named Atlantogenata) and the opening of the South Atlantic Ocean. MATERIALS AND METHODS Data and Alignment The data analyzed are all the mtDNA sequences published for vertebrates as of May 1998, complete for all protein genes. The principal mammalian taxa and GenBank accession numbers used in this study (see Fig. 1) were: Lagomorpha— rabbit Oryctolagus cuniculus #AJ001588; Xenarthra—armadillo Dasypus novemcinctus #Y11832; Proboscidea—African elephant Loxodonta africana (Hauf, unpubl.); Cetartiodactyla (Cetacea + Artiodactyla)—blue whale Balaenoptera musculus #X72204, cow Bos taurus #J01394, Perissodactyla—donkey

Equus asinus #X97337, horse Equus caballus #X79547, white rhino Ceratotherium simum #Y07726, Indian rhino Rhinoceros unicornis #X97336; Carnivora—cat Felis catus #U20753, grey seal Halichoerus grypus #X72004, harbor seal Phoca vitulina #X63726. For all other taxon names and accession numbers see Waddell et al. (1999). The birds used were the published sequences for Struthioniformes—ostrich Struthio camelus #Y12025 (Ha¨ rlid et al., 1998), and Galliformes—chicken Gallus gallus #X52392. In addition, ve previously unpublished complete sequences for mitochondrial (mt)DNA proteins are used (from Mindell et al., 1999; see also Mindell et al., 1997). These are for the taxa Passeriformes—village indigobird Vidua chalybeata #AF090341; Falconiformes— peregrine falcon Falco peregrinus #AF090338; Struthioniformes—greater rhea Rhea americana #AF090339; and Anseriformes— redhead duck Aythya americana #AF090337; and for the American alligator Alligator mississippiensis #AF069428. For details of the sequencing of these genes see either of the above papers. In addition, the complete mt 12S-rRNA sequences used (#U83709– U83787) are from Mindell et al. (1997). These data sets (called SSBAA and SSB12S, respectively) are available from the Web site www.utexas.edu/ftp/depts/systbiol/. Sequences were converted from nucleotide sequences to inferred amino acid (AA) sequences by using the vertebrate mtDNA code. The sequences were carefully aligned by eye, and any regions of ambiguity for amniotes, frogs, coelacanths, or ray- ned sh were excluded. This data set, SSBAA, has 3362 sites. Also excluded is the ND6 gene; the only protein-coding gene encoded on the light stand, it shows distinct mutation biases. Therefore, it is not appropriate to mix it in with the homogeneous model based methods used for analyzing the other 12 genes. Analyses The program PAUP* d63-64 (Swofford, 1998) was used for all parsimony and nucleotide level ML analyses using TBR


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

121

FIGURE 1. A clock-constrained ML tree of 13 slow mammalian AA mtDNA sequences. The model used the mtmam (mammalian mitochondrial) transition matrix of CodeML, and assumed site rates followed a discrete G distribution with 8 equal sized rate-classes (lnL is –26224.62, G shape = 0.249). The equivalent nonclock model had lnL = –26207.25 and G shape = 0.288. The date at each node, in millions of years before present (mybp) is that after considering ancestral polymorphism. The SE values shown take into account uncertainties related to the calibration point (shown in bold), nite sequence length, and ancestral polymorphism.

searches, while Phylip 3.5 (Felsenstein, 1993) was used for Neighbor Joining (NJ) trees. To count changes on each edge of a 3-taxon tree, we used MacClade 3.05 (Maddison and Madison, 1992). Evaluations of trees on the basis of likelihood were made with ProtML in MOLPHY (Adachi and Hasegawa, 1996), whereas the clock-constrained trees were evaluated with CodeML in PAML 1.4 (Yang, 1997); in either case, the supplied empirical rate matrix for mitochondrial proteins was used. Both of these programs, plus PAUP* , allow the use of the normal approximation test of the difference of two log-likelihood (lnL) scores (Kishino

and Hasegawa, 1989). Bootstrapping followed Felsenstein (1985), except with ML for amino acids where the Resampled Estimated Log Likelihood (RELL) approximation was used (see Adachi and Hasegawa, 1996). From the results for constrained and unconstrained trees, standard likelihood ratio tests of the clock were performed (e.g., Felsenstein, 1993; Swofford et al., 1996). Calculations of branching times (and their SEs) were made on a spreadsheet modi ed from that used in Waddell and Penny (1996). For checking the rooting point of the bird subtree, constant site removal (CSR) LogDet distances applied to AAs were used in conjunc-


122

SYSTEMATIC BIOLOGY

tion with NJ. For clarity, these last two methods are described in more detail below. RESULTS Identifying Mammalian Sequences with Faster Rates Judging the rates of sequence evolution on rooted phylogenies by eye when using tree reconstruction methods can be hazardous. Parsimony, especially, detects more convergent and parallel changes when there is a better sampling of sequences within a group (Swofford et al., 1996). Consequently, the long unbranched lineages are most likely to show the greatest underestimation of rates. This same factor also affects likelihood if the model has underestimated the total number of substitutions. The appearance of scaled trees suggests this factor may affect NJ less than the previous character based methods; such trees often tend to have relatively longer terminal edges and shorter internal edges, irrespective of the distance used. A consequence of the above factor is that likelihood ratio tests will often reject the clock when in fact the data are close to clocklike, especially when branching patterns are highly asymmetric. Use of the likelihood ratio can also lead to false acceptance of the clock, if the faster lineages are deeper in the tree, and if these taxa have poor species sampling. Tests that are immune to these factors can be based on three species: a nominated outgroup and two ingroup taxa (or for a tree based clock test, two outgroups and two ingroups). One such test, unambiguous parsimony sites, is described by Mindell and Honeycutt (1990). An essentially identical test was described by Tajima (1993). The parsimony test is as follows: Count only unambiguous changes among the three species, and assume that the length of the external edges to the two ingroup taxa are equal under the clock, with a binomial sampling error. In our data as the counts on the number of changes are well above 10 per lineage, a Pearson statistic test of the equality of two lineages is appropriate. The test statistic, X 2 = (a – e)2 / e +

VOL. 48

(b – e)2 / e = (a – b) 2 / (a + b), where a is the length to ingroup 1, b is the length to ingroup 2, and e, the expected value given equal rates, is (a + b) / 2). Then X 2 is, assuming independence of sites, asymptotically chi-squared distributed with 1 d.f. (this is effectively a two-tailed test). A value of 3.84 or higher is signi cant at the 5% level. The results of the relative rates tests are shown in Table 1. Taking the ostrich as the outgroup (the slowest of the published reptile mtDNA sequences, as tested against frog), we evaluated the rate difference between platypus and various other mammalian mtDNA lineages. The platypus was selected, as it has been suggested as a relatively slower rate taxon (e.g., Gemmell and Westerman, 1994, although this is contradicted by the tree they show). Our results suggest otherwise, since the rate in opposum is not signi cantly higher than platypus. Further, the rate in wallaroo is apparently slower than either platypus or opossum, although rechecking with use of a closer outgroup (platypus) showed no signi cant difference between wallaroo and opossum. We see no good evidence that the platypus is slower than the marsupials. Note our nding that marsupials could be slower than either monotremes or placentals is consistent with Figure 3 in Gemmell and Westerman (1994) but is opposite to their own conclusions. The tests performed here (Table 1) show platypus is slower than many of the taxa that tend to diverge deepest in the placental subtree (e.g., hedgehog, murid rodents, primates, elephant). However, there is no evidence that the platypus is slower than the placental taxa that tend to cluster as nearest relatives, in the fereuungulate group [here represented by cetartiodactyls, perissodactlys, carnivores; see Waddell, Okada, et al. (1999) for the de nition of this group, which also includes pangolins or pholidotes], nor is it slower than rabbit or armadillo. This result runs counter to conventional wisdom; the implications are considered at the end of this section. Next, we shift the outgroup inwards to further assess rate changes in placentals (Ta-


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

ble 1). The platypus now becomes the outgroup, and a fairly typical fereuungulate, the cow, becomes the xed ingroup. We now see even stronger evidence that the lineages usually placed near the root of the placental subtree have accelerated rates (this time, including the guinea pig). The hedgehog, primates, and elephant are clearly the most accelerated. There is good fossil evidence, particularly large body size (Shoshani and Tassey, 1996), suggesting that proboscideans have had signi cantly longer generation times and slower metabolic rates than most mammal species (including rodents) for at least 50 million years. The results in Table 1 are the strongest evidence so far that the elephant’s mtDNA protein evolution is completely at odds with generation time and with general metabolic rate theories of mutation rate. Also note the failure to detect signi cant rate variation among the fereuungulates and close relatives—with one exception: The whale has a signi cantly faster rate than cow. Like the elephant, the accelerated rate in whale contradicts the generation time and metabolic rate hypotheses, again with a 50 million year + history of fossils (e.g., Thewissen and Madar, 1999) suggesting no recent changes in the attributes of large size and long generation time. The possibility of a faster rate attributable to adaptation to aquatic life doesn’t hold up well, for seals apparently do not show an accelerated rate (Table 1). As a nal check, we used rabbit as a still closer outgroup to the fereuungulates. All previous conclusions hold; the rate in whale is con rmed as faster, but there is no significant evidence for a faster rate in armadillo. The apparently higher rate in placental taxa found mostly near the root of the reconstructed trees is notable and is discussed further later. Tree for Placentals Showing Slower Rates With such a clear cut difference in rates of sequence evolution, we analyzed the slower-rate species separately from those showing faster rates. With ProtML, the optimal tree is identical to the ML tree and the AA LogDet NJ subtree reported in

123

Fig. 1 of Waddell, Cao, et al. (1999) obtained after the rapidly evolving mammalian taxa and the three atypical sh species are pruned off, except that the armadillo moves outside of rabbit (with 60% RELL bootstrap support). This new tree is shown in Figure 1. We speculate this reversal is due to a signal in the larger data set uniting rabbit with the earlier diverging rodents (a Glires signal), while the rodents themselves also show evidence of being attracted towards the root (perhaps by the long-edged hedgehog, as discussed in Waddell, Cao, et al., 1999). Strikingly, the RELL support for the carnivore–perissodactyl group is down to 54%, and even less with invariant sites removed. The early diverging position of armadillo is consistent with analyses in Waddell, Cao, et al. (1999), where the armadillo and the elephant moved outside the rodents, lagomorphs, and primates in analyses where only the more slowly evolving sites were selected. Neighbor joining shows some further changes from the ProtML tree for analyses of just the slower rate species. The armadillo diverges rst of the placentals (e.g., according to observed distances, with 52% bootstrap support). The alternative of rabbit diverging rst gets 44%, while the third local alternative of armadillo + rabbit gets only 4% bootstrap support. Also, Cetungulata (cetartiodactyls + perissodactyls) becomes optimal and receives 52% of the bootstrap support, while the carnivore + perissodactyl grouping receives 46%, and the third local rearrangement carnivore + cetartiodactyl just 2%. Parsimony analyses go further, with support as great as 68% for Cetungulata and 68% for armadillo diverging rst. Coincidentally, support for Marsupionta drops to 70%, but the alternative of monotremes + placentals gets the remaining 30% (leaving near zero support for the traditional tree). Analyzing just the faster placental sequences with all outgroups gives the same tree as the ProtML tree using all sites (Waddell et al., 1999) after the slower-rate species have been pruned off. The support for whale and elephant together (the only two “ungulates”) rises to 99% local RELL support, per-


124

VOL. 48

SYSTEMATIC BIOLOGY TABLE 1.

Three-taxon unambiguous parsimony reconstruction rate tests.

outgroup

ingroup 1

Number of subs.

ostrich ostrich ostrich platypus

platypus platypus opossum opossum

187 210 136 158

opossum wallaroo wallaroo wallaroo

191 167 89 127

0.04 4.90* 9.82* 3.37

ostrich ostrich ostrich ostrich ostrich ostrich ostrich ostrich ostrich ostrich ostrich ostrich ostrich ostrich

platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus

196 189 193 210 214 206 221 211 219 229 208 214 202 211

hedgehog mouse rat guinea-pig human rabbit elephant armadillo cow blue whale horse white rhino cat harbor seal

315 254 258 240 308 213 305 225 210 222 203 211 218 216

27.71* 9.54* 9.37* 2.00 16.93* 0.12 13.41* 0.45 0.19 0.11 0.06 0.02 0.61 0.06

platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus platypus

cow cow cow cow cow cow cow cow cow cow cow cow cow cow cow

164 176 181 176 143 163 163 141 153 87 94 97 96 91 104

hedgehog mouse rat guinea-pig human gibbon rabbit elephant armadillo blue whale horse white rhino Indian rhino cat harbor seal

361 244 251 241 323 313 159 314 177 126 86 105 106 113 119

73.92* 11.01* 11.34* 10.13* 69.53* 47.27* 0.05 65.78* 1.75 7.14* 0.36 0.32 0.50 2.37 1.01

rabbit rabbit rabbit rabbit rabbit rabbit rabbit

cow cow cow cow cow cow cow

177 104 128 136 134 140 138

armadillo blue whale horse white rhino Indian rhino cat harbor seal

194 153 99 108 121 124 118

0.78 9.34* 3.70 3.21 0.66 0.97 1.56

*

ingroup 2

Number of subs.

X2

P < 0.05.

haps because of some type of long-branch attraction. Under the clock constraint, and with a discrete G distribution (with 10 rate classes) and the mtmam rate matrix of CodeML, the ML tree was identical to the ProtML tree for the slower rate species. This tree is shown in Figure 1. Placing the rabbit earlier than the armadillo dropped the likelihood to –26230.10, or about 6 lnL units worse. Placing rabbit and armadillo together was

even worse (–26232.43), mimicking the results with other methods. The alternative arrangement with platypus as sister to the other mammals was worse by 12.2 lnL units. Divergence Times for Placental Mammals Figure 1 shows the optimal clockconstrained ML tree for all sequences that do not fail the 3-taxon rate test. The clock model is signi cantly worse than the nonclock


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

model (D ln L = 17.37 lnL units, p < 0.001). Removing Marsupials and Monotremes, the t improves considerably (D ln L = 10.53, d.f. = 8, p = 0.01). However, by the AIC criterion (e.g. Adachi and Hasegawa, 1996), the clock is close to acceptable, and with the BIC criterion, quite acceptable. Note that even if a clock is rejected, simulations in Sanderson (1997) suggest that an ML clock model offers better estimates of relative divergence times (because of smaller sequence length-related sampling errors) than does tting an autocorrelated model to allow for unequal evolutionary rates if the rejection margin for the clock is not large. The best calibration point for the species in this tree is the divergence between horse and rhino. The rst horse fossils are commonly accepted to be from as long ago as the earliest Eocene or 54.8 million years before the present (mybp); fossils that seem to represent the sister rhino–tapir ancestral lineage are of a similar age (e.g., McKenna and Bell, 1997). This is about as good as fossil calibration points get, in that there exist multiple good fossils representing both sister lineages, which appear in appropriate chronological order. Indeed, this calibration point is the only one in the whole tree of Figure 1 for which there is not considerable uncertainty (see below). To allow for possible misidenti cation of the very earliest fossils, we consider this split could be as young as 52 mybp. To account for the already differentiated fossils of the two lineages appearing rather suddenly (probably from migration to the fossil sites), we consider the split could be as much as 58 mybp (a conservative estimate; e.g., D. Archibald, pers. comm.). So we have a conservative 55 mybp calibration point (SE ~ 1.5). Here, we are assuming the two extremes of 52 and 58 mybp are each 2 SE from the median of 55 mybp; a simple way of inferring standard errors we hope to re ne in the future. Comprehensive Divergence Time Standard Errors Next we follow and re ne the steps used in Waddell and Penny (1996) to calculate comprehensive estimate of SE.

125

Six possible sources of error affecting these calibrations are (1) error on the edge lengths of the tree, (2) error on the fossil calibration time, (3) ancestral polymorphism, (4) sequence errors, (5) erroneous model assumptions, and (6) error in sequence alignment. Here we integrate (i.e., accommodate) the rst three sources of error. Factor 6 is unlikely to be a problem here, for the protein regions align well, and we have been conservative in excluding areas of ambiguous alignment. Evaluating the total error due to model assumptions, factor 5, is beyond the scope of this paper (see Waddell, 1995:344– 346, for ways this may be done). Sequencing errors, factor 4, are considered in more detail later. Here we give a worked example of the steps in integrating errors 1–3, while a more comprehensive treatment of errors 1–6 is in preparation (P.J.W.). The errors on the edge lengths in the ML tree are assessed by using the delta-method (Felsenstein, 1993, and references therein, as output by CodeML). The errors on the calibration point have already been mentioned, and the calibration date is 55 mybp ± 1.5. Let us assume that the average coalescent time for the mtDNA species in this tree is typical of species today. A coalescent time of 1 million years would seem to be appropriate (e.g., Hudson, 1990), if possibly a slight overestimate for some taxa (e.g., humans). If we assume that the ancestral population was in equilibrium, then the SE of the coalescent time for two lineages is equal to the mean (Hudson, 1990). The horse/rhino split is inferred to be 55 mybp, having occurred 0.05593 substitutions ago on the ML tree in Figure 1. Thus, ancestral polymorphism amounts to an expected 1/ (1 + 55) ´ 0.05580 = 0.000996 substitutions per site, so the variance is 0.00996 2 . The inferred length of this, and all other external edges, is then adjusted by subtracting this expected effect of polymorphism, so that the estimated horse/rhino split of 55 mybp is thus considered to be 0.05580 – 0.000996 = 0.054804 substitutions before present. The calculation of the SE values in Figure 1 is now illustrated with the rst split among the fereuungulates.


126

SYSTEMATIC BIOLOGY

VOL. 48

1. The errors attributable to edge length are combined with those attributable to ancestral polymorphism. For example, the divergence time of horse/rhino is 0.054804 substitutions per site before present, with variance 0.003382 = 0.0000114 from the ML tree. Adding in the variance from polymorphism gives 0.0000114 + 0.0000010 = 0.0000124 for the horse/rhino split. Do likewise for all other splits so cow vs other fereuungulates 0.095094, with variance = 0.0000202 + 0.0000010 = 0.0000212. 2. Estimating the ratio of the cow/horse split to the horse/rhino split gives 0.095094/0.054804 = 1.735. 3. Calculate the variance of this ratio using, Var[x/ y] = (E[x]/E[y]) 2 (var[x]/E[x] 2 + var[y]/E[y]2 – 2 cov[x, y]/(E[x]E[y])). As in Waddell and Penny (1996) we ignore the covariance term; using the numbers above, we get Var[x / y] = 0.159 2 . 4. The inferred divergence time is 55 (SE = 1.5) ´ 1.735 (SE = 0.159) = 95.43. Var[uv] = {E[u]}2 var[v] + {E[u]}2 var[v] + var[u]var[v], which gives a nal variance of 83.48 (with square root » 9.1).

cetartiodactyl, carnivore, and perissodactyl fossils in the Asian region are consistent with this. Other features of note in this tree, including the divergence times of armadillo/elephant, whale/cow, and within carnivores, are considered in the discussion. In addition, we can infer the whale/cow split by using the tree in Figure 1 to date the age of the separate lineage leading to cow, then multiplying this age by the ratio of the length of the external cow edge (0.072) to the sum of the external cow + common (whale + cow) edges estimated on the unconstrained ML tree. This gives 0.072/(0.032 + 0.072), or 69% times 95 = 65 (million years). This is slightly older than the dates commonly inferred, which are 55 mybp (the oldest suggested fossil date) to 60 mybp. These latter are perhaps underestimates, as the hippo sequence is not yet available, and hippos appear to be the nearest living relatives to whales. Similarly, we can date the split of the armadillo (Xenarthra) from the elephant (representing the African clade). This gives a ratio of (0.146/(0.146 + 0.0372)) = 0.80, and 80% of 152 million years is 122 million years.

Thus, the age and SE on the rst split in the fereuungulates is estimated to be 95.4 mybp ± 9.1. Note that the SE is considerably larger than the type of gures usually reported. For example, some would infer the SE of this split by scaling up the SE on the tree by the calibration factor, which would translate to just 4.5 million years. Others might take extreme fossil dates of the horse/rhino split as, say, 53 and 57 mybp (with everything else treated as though perfectly known) which will infer SE of only 1.7. Clearly, neither approach is adequate. Although the cow/horse split has a fairly large SE, we can still con dently say that the radiation of the fereuungulates began considerably before the Cretaceous/Tertiary (K/T) boundary (at least 77 mybp). Such times (from about 75 to 100 mybp) accord well with the Zhelestids being near the root of the ungulates (Archibald, 1996). We do not believe the ungulates are a real clade (see Waddell, Cao et al., 1999; Waddell, Okada et al., 1999), but the Zhelestids may well be near the root of the fereuungulates; early

Here we follow the same steps described above applied to the complete mtDNA AA sequences of six avian lineages. With passerines coming out at the root of the tree for birds in other analyses (Mindell et al., 1997, 1999; H¨arlid et al., 1998; Adachi and Hasegawa, 1996), an immediate question must be, “Is the evolutionary rate in this lineage accelerated?” If so, we might suspect the result to be an artifact; rate heterogeneity is widespread among vertebrates, including birds (e.g., Mindell et al., 1996; Bleiweiss, 1998; Nunn and Stanley, 1998). Since generalizing about rates across taxa and genes is problematic, it is best to make rate tests on the data at hand. As there are few taxa, we show all pairwise 3-taxon tests in Table 2. Remarkably, there are only two marginally signi cant results, for chicken/duck and duck/rhea pairings, and neither of these is con rmed by using the alternative outgroup. For the table overall, no general trends could be discerned, except that the passerine had the higher rate in all pairwise

Checking Relative Rates in Birds


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

127

TABLE 2. Relative rate test of amino acids changes. Above the diagonal the alligator is the outgroup, while below the diagonal the outgroup is platypus. In the upper-right triangle numbers are unambiguous substitutions on the edge to taxa i (row) then j (column). Results of the lower-left triangle are reversed, i.e., substitutions for taxa j then i, to allow for easier comparison with the equivalent entry in the upper triangle. Ostrich

Ostrich Rhea Chicken Duck Falcon Passerine *

59/53 90/102 79/102 108/111 124/153

Rhea

Chicken

Duck

Falcon

Passerine

47/54

95/76 106/80

77/87 89/92 72/101 *

99/98 110/102 97/115 112/101

117/130 133/139 123/155 122/125 124/138

87/105 77/106 * 106/115 124/156

85/96 117/108 140/157

120/100 127/133

128/154

P < 0.05.

comparisons. The same tests based on rst and second positions also fail to give signi cant results. Perhaps the outgroups are too far separated, yet clear rate differences were apparent between mammalian species tested with equally distant outgroups. The Bird Subtrees Root Although the mtDNA of birds shows no signs of unequal evolutionary rates by the above tests, it is useful to check the tree with what is expected to be another robust method of tree estimation, the invariant sites or CSR-LogDet distance (Waddell, 1995:118– 124; Waddell and Steel, 1997; Waddell et al., 1997), combined with NJ. The CSR-LogDet is a modi cation of the standard LogDet (Barry and Hartigan, 1987; Lockhart et al., 1994) that allows for site rate heterogeneity, as well as unequal AA composition. Figure 2 shows the optimal tree. The result of 1000 bootstrap replicates was (((((ostrich, rhea):1000, (chicken, duck):580):959, falcon):999, passerine):1000), using the exact same data (set 2) and programs as in Waddell, Cao et al. (1999). The tree is identical to that in Mindell et al. (1997), whereas the bootstrap support shown is slightly higher in parts. Rooting so that passerine diverges rst is still strongly supported. Next, weighting as zero the 1090 constant sites (in proportion to the base composition of the constant sites) estimated to be invariant by the capture–recapture method described in Waddell (1995) and Waddell, Cao et al. (1999) gives (((((ostrich, rhea):999, (chicken, duck):647):880, falcon):995, passerine):1000). That is the sup-

port hardly changes; the bootstrap support for the clade consisting of Galliformes + Anseriformes climbs to 65%, whereas support for the ratite + chicken + duck group drops. Removal of sites that changed in particular groups of mammals (site stripping), a procedure aimed at removing the fasterevolving sites (see Waddell, Cao et al., 1999), did not change the tree. However, bootstrap support values did change. For example, removing all sites that change in the Primate clade drops the bootstrap support of the partition with ratites, chicken, and duck down to 74% (with falcon occasionally coming in), whereas the support for the anseriform + galliform clade drops below 50% (yet support for the root at passerines is unaffected). A number of other methods such as those applied to mammals in Waddell, Cao et al. (1999) were also tried, but again none changed the root. Contradictory evidence regarding the passerine root includes the rooting of the 12S–16S rRNA data at positions closer to Anseriformes and Galliformes (Mindell et al., 1997). While the support for the root with 12S–16S is weaker, it is in closer agreement with the conventional view that the earliest split among living birds separates the paleognathous birds (ratites and tinamous) from all others (e.g., Cracraft and Mindell, 1989; Sibley and Ahlquist, 1990). Rhea and Alligator Sequencing Error Estimates During the writing of this paper, a second complete mtDNA sequence from the same species of rhea was published (Ha¨ rlid


128

SYSTEMATIC BIOLOGY

VOL. 48

FIGURE 2. A clock-constrained ML tree of bird AA mtDNA sequences. The model is that used in Figure 1 but with (lnL = –17250.42, G shape = 0.281). The equivalent nonclock model had lnL = –17249.92 and shape = 0.281. The date at each node takes into account ancestral polymorphism. The one SE shown takes into account uncertainties in the calibration point (shown in bold), the nite sequence length, and ancestral polymorphism.

et al., 1998). Its inclusion allows an assessment of intraspeci c diversity and, perhaps more importantly, a re ned estimate of sequencing errors (Waddell, 1995; Waddell and Penny, 1996). The regions we have analyzed have a total of 16 transition and 17 transversion differences. The speci c numbers of each type of change are AC 1, AG 2, AT 0, CA 0, CG 7, CT 2, GA 5, GC 7, GT 0, TA 2, TC 2, and TG 0. Surprising is the very low transition/transversion ratio, and the high rate of GC and CG transversions. Examination of the sequences shows that at least 8 of these can be traced to 4 GC order-sequencing errors, all in the H¨arlid et al. (1998) sequence—a common type of error with manual sequencing. If the unambiguous nucleotide changes are mapped by

using parsimony, 5 appear on the terminal edge to the Mindell et al. (1997, 1999) sequence (henceforth the new sequence), but 20 are on the edge to that of H¨arlid et al. Overall, it seems likely that the sequencing error rate is worse than 1/1000 in the H¨arlid et al. sequence. For AA, the probable errors show up even more clearly. The new rhea sequence has a terminal branch length of zero, while that of Ha¨ rlid et al. has a terminal branch of length 10 among the 3362 sites used in our alignment (and 13 among all the protein-coding sites we aligned prior to selecting the bestaligned sites; data not shown). The effect of these errors on estimated edge lengths can be gauged by measuring the edge length with one, and then the other, removed. Un-


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

der parsimony, and with only the unambiguous changes on the tree in Figure 2 used, the new sequence gives the subtree (Rhea: 97, Ostrich: 90): 55) versus ((Rhea: 105, Ostrich: 89): 54) for the sequence of H¨arlid et al. These numbers can be used for a simple divergence time estimate. The time to the paleognath stem ancestor is (A + 55)/A times the time to the rhea/ostrich split (a possible calibration point), where A equals (rhea + ostrich)/2. This infers the paleognath stem is 158.8% of the time since the ostrich/rhea split when using the Mindell et al. sequence but 155.7% when using the H¨arlid et al. sequence. This 3.1.% difference is probably largely attributable to sequence error. Two complete alligator mtDNA protein sequences have also been submitted to GenBank (Janke and Arnason, 1997; and in June 1998, that of Mindell et al., 1997, 1999). Here too there is evidence of sequencing errors. The originally submitted sequence of Janke and Arnason (GenBank, February, 1998) had differences AC 1, AG 0, AT 0, CA 0, CG 1, CT 2, GA 2, GC 0, GT 0, TA 2, TC 1, and TG 1, for a total of 10 differences in the 3362 codons we analyzed. After two updated rereleases to GenBank (May and July 98) the Janke and Arnason sequence converged to the Mindell et al. sequence, showing just two transition differences (GA and TC). The tree of unambiguous changes clearly shows that only the last Janke and Arnason sequence is very close to the new sequence. Among the AA changes, the differences between the two sequences shrank from 5 to just 1. The tree of these alternative sequences shows that the rst two Janke and Arnason releases are at least 3 AA substitutions closer to the ancestral sequence, suggesting that ambiguities may have initially been resolved by taking into account the sequences of other taxa. Thus, the Janke and Arnason alligator sequence shows clear evidence of about 1 sequencing error per 1.2 kb. Sorenson et al. (in press) compared two ostrich sequences by these same two groups and found similar types and rates of errors. Error rates in sequencing seem to be dependent on both the individual researcher and the techniques used. All else being equal, the most reliable method is prob-

129

ably direct machine-sequencing of polymerase chain reaction (PCR) products (being sure that the PCR is done under strict annealing conditions). A second choice is machine-sequencing of 3 or more clones, each from a separate PCR product, and hand-sequencing of a single clone (of any origin) is likely to have the worst expected rate. For a recent publication of shark mtDNA (Cao, Waddell et al., 1998), the second choice was used. Here, sequences of ~ 5 kb for clones from separate PCR products of the same sample tended to have an average of 3–4 differences. The rule employed was of calling a base only if it was identical for the two strands in each clone. To resolve disagreements, anking primers were made and the resulting sequence from this ampli cation was used to break ties. The raw clone gures suggest a sequencing error rate of about ~ 1 per 2 kb, which the checking should cut back at least by a factor of two, one would hope, so an error rate of 1/4000 or better may be achieved. Possible Avian Calibration Dates Here we forgo relying on a calibration point well outside of birds, and develop one within birds. A calibration point often used is the split of crocodilians from birds (variously 230 to 270 mybp), or mammals from birds (about 300 to 350 mybp; e.g., Carrol, 1987; Benton, 1990). The problem with these dates is not just their uncertainty, but more critically, the assumption that the rate of evolution in the long lineage leading to birds has not changed appreciably. This seems very unlikely, given the relatively rapid rates of crocodilian mtDNA evolution (Janke and Arnason, 1997) and evidence for slower rates in some birds relative to other tetrapods (e.g., Adachi et al., 1993; Mindell et al., 1996). This also places a premium on molecular evolutionary models being able to accurately estimate many multiple hits when site rates vary appreciably, when the AA composition may shift, and when the true model of substitution is very uncertain (see discussion of Waddell et al., 1997). A possible calibration point within birds is the divergence time of rhea and ostrich, as-


130

SYSTEMATIC BIOLOGY

suming an ancient Gondwanic distribution. This could be up to 100 million years, if we take it from the rst time Africa and South America were fully separated by the proto South Atlantic Ocean (Smith et al., 1994). However, this interpretation is made more dif cult by (1) evidence that rhea and ostrich may not be closest relatives (Cooper, 1997; but see Lee et al., 1997), (2) fossils suggesting the rst ratites were capable of sustained ight (Houde, 1986), and (3) inference of an ostrich/rhea split about 50 mybp, based on a crocodilian/bird split of 254 mybp (H a¨ rlid et al., 1998). A putative rhea ancestor fossil (a large leg bone) from South America is dated at close to 60 mybp (Feduccia, 1996:285), but no clear synapomorphies linking it to ratites, or more speci cally, to the Rhea lineage, have been reported. Given the appearance of many unrelated giant birds in the early Paleocene, the assignment of this fossil unfortunately remains uncertain. The fossil anseriform Presbyornis is potentially useful as a calibration point within birds. The oldest fossils well assigned to this taxon (the skull seems crucial for con dent identi cation) are early Eocene from Wyoming and Utah (approximately 53–55 mybp). Livezey (1997) has analyzed the cladistic position of this taxon and concluded that the younger and fairly complete Presbyornis fossils can con dently be placed within the Anseriformes. Speci cally, Presbyornis appears to be related to ducks, geese, and swans (Anatidae), to the exclusion of the screamers (Anhimidae) and the magpie goose (Anseranatidae). Assignment of older fossils to Presbyornis, including a giant form in the early Paleocene about 62 mybp (Olson, 1994), is undermined by the fossils lacking critical parts (e.g., the skull) and not showing any clear synapomorphies that link them to the Eocene taxa or even to Anseriformes in general (Livezey, 1997). Fossil discoveries by Peter Houde and Michael Daniels from the Eocene Green River Formation and the Lower Eocene London Clay, respectively, demonstrate the existence of early Eocene, up to 55 mybp, fossil anseriforms that may be associated with the modern screamer lineage (Anhimidae) (unpubl. data, discussed in Feduccia,

VOL. 48

1996:217). With the constraint of anseriform monophyly, the Houde fossil appears as either sister to Anseres (including Anatidae) or sister to the common ancestor of both Anhimidae and Anatidae in parsimony analyses (Houde, unpubl.). Thus, there are fossils for two distinctive anseriform lineages with modern af nities dating back to at least 55 mybp. This age is supported by the age of the fossil beds and by setting of the dates for the origin of these taxa to the nearest well-dated faunal turnover boundary, here the Paleocene/Eocene transition. Total Age of the Anseriformes Stem Lineage A re ned estimate of the divergence time of Anseriformes from other birds (in particular the Galliformes, which appear likely to be closest relatives) is now made with a molecular phylogeny that includes all the main anseriform lineages. The complete 12S rRNA data set of Mindell et al. (1997) has the necessary taxon sampling. Branch lengths from these sequences are estimated after constraining to either of the published hypotheses for relationships within Anseriformes. These are either the current molecular view of (ducks* , (screamers, magpie goose)) (Mindell et al., 1997), or the prevalent morphological view of ((ducks* , magpie goose), screamers) (Livezey, 1997). The calibration point assumed here is Presbyornis (its position marked by *) associated with the stem lineage to ducks and geese, dating this lineage 55 mybp in our clock-constrained trees. Sequences for all Anseriformes, Galliformes, and ratites are analyzed rst. Using just the transversional changes generally gives older dates, whereas using all sites which may miss multiple hits amongst the deeper branches, since the deeper internodes become very short. The best model for transversions optimized the frequency of purines and pyrimidines, and gave Pinv = 0.47, and G shape = 0.38, when these parameters were optimized jointly (indicating highly unequal site-to-site substitution rates). The likelihood ratio statistic, with and without a clock enforced, was D ln L = 37.98, d.f. = 19, so initially a clock is a poor t. This outcome appears to be related to a slower


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

131

rate in ratites, as their removal from the data set above gives a signi cant improvement in t (D ln L = 30.88, d.f. = 17). Removal of two aberrant taxa, Coturnix and Anas formosa (which are misplaced in the tree when using transversions alone), improves t further, so the clock is now reasonable with the remaining taxa ( D ln L = 15.34, d.f. = 15). The inferred date of the split of Anseriformes and Galliformes was 72.3 mybp when the ratites were the outgroup, and slightly older with just Anseriformes and Galliformes in the analysis. Adding in Falconiformes (minus the taxa Circus, Gyps, Gampsonyx, Pernis, Pandion, Sagittarius, and Falco, which change position when the clock constraint is applied) as the new outgroup, we nd the clock again ts reasonably (D ln L = 22.05, d.f. = 22), and the inferred date is 66.1 mybp (or 71.8 mybp when forcing this topology to that of the morphological tree). Overall, these dates have large SE values (not least because the mt 125 rRNA site rates are so extremely unequal). However, the only estimates > 80 mybp for the age of the Anseriform lineage came by using just transversions and depend upon two possibilities: that the screamer-like fossils really are closest to screamers, and that the morphological tree is wrong. To recap then, the absolute minimum date for the Anseriform lineage is 55 mybp, whereas most paleontologists would suspect at least another 5 million years older because of the fossil diversity seen at this time. Taken with the molecular calibrations, a conservative interpretation of this evidence is that the Anseriform stem lineage began between 58 to 78 mybp, giving a midpoint of 68 mybp (SE 5) according to the heuristic approach described earlier for mammals.

to ratites has a worse parsimony score (1567) and is 7.2 lnL units worse than the best tree. Alternative trees, such as ratites diverging rst, are far from optimal and are strongly rejected under the clock assumption. A likelihood ratio test of the clock to the nonclock model is nonsigni cant; indeed, the clock ts remarkably well with less than 1 lnL unit worse t, but 4 fewer parameters. Likewise, the same test based on the best- tting ML model (GTR +Pinv + G ) for rst and second positions fails to reject the clock. Using the internal calibration point of 68 mybp for the Anseriformes/Galliformes split, we obtain the age estimates for other bird lineage shown in Figure 2. The times and SEs on Figure 2 are calculated as for mammals. Interestingly, even though the calibration point has a considerably larger SE, the overall SE values are only slightly worse for splits of similar age (the other sources of errors being very similar in size). Using the Anseriformes calibration, the birds have nearly the same average evolutionary rate as the slower-rate mammals. For example, the ML tree branch lengths from chicken to duck (Fig. 2) are 0.09343 ´ 2, divided by 2 ´ 69 million years (including estimated ancestral polymorphism) = 0.00135 substitutions per million years, while for horse to rhino (Fig. 1) it is 2 ´ 0.0558/(2 ´ 56) = 0.00098. Using pairwise distances (ML distances estimated under the model used in Fig. 1, with shape parameter optimized by ML on that tree) in place of branch lengths gives similar rates: 0.00119 for duck to chicken and 0.00088 for horse to rhino (thus, if anything, the horse to rhino rate seems slower).

Bird Divergence Times

The new calibrations and error estimates are based on an explicit model and an analytical method that are expected to return minimal sampling errors (e.g., Hasegawa et al., 1989; Kuhner and Felsenstein, 1994). Note that many of the splits we are calibrating, e.g., between Cetartiodactyla, Perissodactyla, and Carnivora, are not the age of orders. Rather, they are the time of origin for superordinal groups. Generally, we be-

Using exactly the same ML model used for mammals earlier, we nd three clock-constrained bird trees with nearly equally optimal likelihoods. The optimal tree, shown in Figure 2, includes the chicken + duck clades on it. The tree with chicken closer to ratites has the same parsimony length (1564), but a lnL worse by 9.5 lnL units, whereas the third tree with duck closer

D ISCUSSION


132

SYSTEMATIC BIOLOGY

lieve that the term “order” in placentals and birds should refer to crown groups (the last common ancestor of the most divergent extant lineages in an “order” and all its descendants). Adherence to this rule should be observed to avoid the confusion that will otherwise follow. It does seem possible that some mammalian “crown group” orders have origins in the late Cretaceous (e.g., rodents: Springer, 1997; Kumar and Hedges, 1998), but the data are not de nitive as yet. We note here that our dates for the origins of Carnivora and Cetartiodactyla, although not likely to be Cretaceous, are older than typically assumed, especially when recognizing pigs and camels as earlier splits within Cetartiodactyla (Gatesy et al., 1999). We have shown that sequencing errors are a concern in at least some of the sequences. Even if we take the average sequencing error rate to be about half the average of that suggested for the alligator (as originally submitted by Janke and Arnason, 1997) and the rhea of H¨arlid et al. (1998), then we will need to consider their effect in future calibrations (Waddell and Penny, 1996). They could be incorporated approximately by combining them with the ancestral polymorphism error term (Waddell, unpubl.). We also wonder if sequencing errors could be part of the reason for the oldest splits being proportionately overestimated by the ML model. Accurate sequences are even more critical to other studies of molecular evolution (e.g., detecting deleterious mutations; Hasegawa et al., 1998). The pattern of sequencing errors discussed here, can be expected to elevate proportions of nonsynonymous substitutions more within than between related species. Other errors we did not model include the possibility of having the wrong tree, and the possibility of many lineages changing their evolutionary rates together. The latter is very dif cult to test, but the former should be integrated in the future. Divergence Times of Birds Our estimate of the rhea to ostrich split at 39 mybp is considerably younger than the 80 to 100 mybp that has been invoked by others (e.g., Sibley and Ahlquist, 1990), as would be expected if the cause was the opening

VOL. 48

of the South Atlantic Ocean. This younger date is consistent with the hypothesis that extant ratites are independently derived from ying Eocene paleognathous birds (Houde, 1986). Our estimate is also younger then the 51 mybp estimate by H a¨ rlid et al. (1998), who used a smaller set of taxa and calibrated with the distant bird versus crocodile split (assumed to be 245 mybp). Although both rhea and ostrich might be expected to show slower rates of molecular evolution due to a general inverse correlation between body size and rate for many vertebrates, we found no clear evidence of this (based on rates of amino acid substitution; Table 2). Other dates for lineage origins were in the middle to late Cretaceous (Fig. 2). Assuming the tree topology is correct, these dates are inconsistent with the recent hypothesis of Feduccia (1996) that diversi cation among extant avian orders took place after the extinctions near the K/T boundary. This is still the case if we drop all the estimated divergence dates (Fig. 2) by the magnitude of 2 SE. If the root truly falls between paleognaths and neognaths, then all bird divergence time estimates are compromised by the ratites undeniably showing a lower rate than chicken and duck, while falcon and the passerine show considerably higher rates (this is con rmed by doing the 3-taxon tests assuming ratites are the outgroup to all other birds). This will make it much more dif cult to have con dence in any divergence time estimates with these genes for birds. We should not be complacent just because methods such as the LogDet support the root in this position; all methods can be wrong if the model does not suitably approximate the real substitution process. The LogDet may be especially mislead by sites with different rates showing distinct base compositions (Waddell, 1995:ch. 3). So, while Mindell et al. (1999) show that a passerine root is still favored after adding a second passerine and a turtle to the outgroups, the ratite root may claim up to 25% of the bootstrap support under the most general ML models. The unavoidable fact in rooting the bird mtDNA trees is that all possible extant outgrowths are distant and with some showing


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

accelerated rates relative to birds, including alligator, and now also snake (Kumazawa et al., 1998). Before marsupial or monotreme sequences were available, it was not possible to reliably root the placentals (e.g., Adachi et al., 1993; Cao et al., 1998). It is interesting that the small bodied and speciose passerines, like the rodents, are two groups suspect of being misplaced, and are also groups for which there is argument over their antiquity. Even now, there are renewed doubts that the root of the placentals has been identi ed with any con dence (Sullivan and Swofford, 1997; Waddell, Cao et al., 1999; Waddell, Okada et al., 1999; and herein). It is also possible that with distant outgroups (the tests, e.g., to the whale in Table 1, show this factor) the 3-taxon rate tests for birds may have poor power to reject the clock, if the observed AA substitutions are either near saturation or at least in a plateau phase for observable divergence. Given such contradictory evidence, it is interesting to consider whether the rooting position of birds will change with more taxa added. However, using a large set of published cytochrome b sequences, Adachi and Hasegawa (1996) obtained a subtree identical to that in Figure 2, that is, (passerines, (falcon, (chicken, duck))), while H¨arlid et al. (1998), using a smaller set of sequences, obtained the subtree(passerines,(duck,(chicken,(ratites)))). Stitching these trees together gives (passerines,(falcon,(duck,(chicken,(ratites))))), a tree seen in an appreciable portion of the bootstrap replicates for the complete mtDNA data. Thus, it may be some time before the addition of bird taxa alone to the mtDNA data causes a strong shift from the inferred root location. Divergence Times of Placental Mammals We mentioned earlier that the horse/ rhino split is the one most suitable for calibrating tree 1. Other calibration points that are sometimes considered reasonable are whale/cow and cat/seal. Problems with using the whale calibration point for these data are that the whale sequence is atypical (faster rate); further, the taxonomic position of whale is contentious, with no clearly

133

identi ed fossil sister taxa. The oldest whale fossil (Packicetus) is 51 or 52 mybp (Thewissen and Madar, 1999), and Figure 1 suggests this may be a marked underestimate of the actual divergence time from cow. For placentals, Arnason and Gullberg (1996) have chosen to make a fossil Artiodactyl/Cetacea split, assumed to be 60 mybp (the A/C 60), their standard reference point. Since we cannot yet con dently identify a close relative to whales in the fossil record, the whale split from cow could be appreciably older or younger than 60 mybp. What we obtain in Figure 1 is a Ruminant/Cetacea standard at 65 million years (or R/C 65, SE > 7). This wide (51 to 65+ mybp) uncertainty highlights the problems currently associated with placing too much emphasis on this calibration point. Adding new sequences (e.g., hippo, pig, camel) and identifying earlier fossil members of these other main cetartiodactyl lineages should help. The rst split in carnivores is feliform (cat) from caniform (seals). However, the date for this is very contentious—some arguing for 40 to 45 mybp, and others, especially on the basis of dental evidence, suggesting much earlier; at least late Paleocene (or 55 to 60 mybp; Wyss and Flynn, 1993). The molecular dating in Figure 1 suggests that this divergence was, with about 95% con dence, in the range of 49 to 72 mybp. This is clearly distinctly older than the fossils that can con dently be assigned to either feliforms or caniforms, which are less than 42 million years old (Wyss and Flynn, 1993; J. Alroy, pers. comm.). Our ndings lend coincidental support to the hypothesis of Flynn and Galiano (1982) that this split was within the paraphyletic “Miacoidea” during the Paleocene. The molecular-based date is, however, also consistent with the hypothesis that the two lineages did not arise from within Miacoidea, but that then suggests that the crucial early fossils are missing (or unidenti ed). Given the strongly coincident biogeographical and vicariant timing evidence for the origin of a clade consisting of Afrotheria and the South American Xenarthra, plus the analyses discussed in Waddell, Cao et


134

SYSTEMATIC BIOLOGY

al. (1999), we name this clade the Atlantogenata. The name is derived from “Atlanto”, referring to the Atlantic ocean (and also the lost mythical land to the west of the pillars of Hercules), and “genata” for generated by (i.e., the vicariant divergence generated by the Atlantic ocean). It is de ned as a crown group, that is, the last common ancestor of Afrotheria (whose orders are listed in Stanhope et al., 1998) and Xenarthra plus all this ancestor’s descendants. The estimated date of 122 mybp for the origin of Atlantogenata is about 20% too old to be coincident with the opening of the South Atlantic, as shown in Smith et al. (1994:38–39). They place the event at about 100 mybp, which is certainly within the estimated error range. If we recalibrate the deepest splits in Figure 1 with this date, then we get a divergence time for the elephant + armadillo lineage from the others at about 138 mybp, which seems more reasonable, and a split of rabbit (and Glires?) from other placentals at about 124 mybp. These later dates are in reasonable agreement with those in Springer (1997) and Kumar and Hedges (1998), who used wholly independent data sets and methods. The split of African mammals from Xenarthra (Waddell, Cao et al., 1999) is one of the rst demonstrated coincidences of a geographically localized placental superordinal clade (Africa and South America) with a splitting time in the range accepted for a major biogeographic event. If we compare the results in Table 1 with the tree in Waddell, Cao et al. (1999), some interesting patterns emerge. On the AA LogDet NJ tree (Waddell, Cao et al., 1999:Fig. 1), the deepest placental lineages all show accelerated rates. This suggests that either (a) the ancestor had a rate perhaps like the marsupials, then increased, and nally slowed down again in the lineage leading to fereuungulates, or, (b) if the tRNA tree of Waddell et al. (1999) is closer to the truth, the common ancestor may have had the slow rate but there was a speed-up in the group comprising hedgehog + primate + rodent, while another two rate increases independently occurred in the elephant and whale lineages. In either case, these rate

VOL. 48

changes cause problems for phylogeny reconstruction and divergence time estimates. One possibility is that calibrations that include the rodent and hedgehog sequences could yield serious overestimates for the divergence times of these lineages, and also that these mammals are being misplaced deepest in some trees. Another interesting result is that the xenarthran comes out deepest of the “slow rate” placental mammals, which adds evidence that the “faster rate” mammals could be leading to distortion of the placental tree. This is consistent with at least some of the trees in Waddell, Cao et al. (1999) and is more in accord with the morphology-based hypothesis (Epitheria) that xenarthrans are the rst branch amongst all placentals. If we assess the branching times on the tree with rabbit rst, then the splitting times of rabbit and armadillo basically just interchange from what is shown in Figure 1, but all other times change little (i.e., they change by less than 1/4 of 1 SE). So here at least, having the wrong root may make only a small difference to our divergence time conclusions. Regarding a possible rate change in all placental lineages, at just the K/T boundary, we know of no mechanism to explain this. It violates the neutral model and nds no support in any particular selective model. If the change is hypothesised to be a speed up, then slow down, only near the K/T boundary, then the factor increase involved would have to be very large to explain the deepest placental dates we see, making this hypothesis unlikely. One possible factor is body size increase. However, many taxa do not follow this trend; for example, lagomorphs are small but have a slow rate, whereas large elephants, whales, and crocodiles have a high rate, so there is no good empirical support for this at present. An interesting possibility that deserves further study is that the splitting of many major placental lineages, including those leading separately to Xenarthra, lagomorphs, rodents, non-African insectivores, the Afrotheria, Fereuungulata, and Primates, occurred quite close together in an interval about 95 to 130 mybp (after recalibrating off the Atlantic armadillo/elephant


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

split to counter a possible trend to overestimate the deepest times). Analyses here, analyses in Waddell, Cao et al. (1999), Waddell, Okada et al. (1999), and unpublished analyses, are suggesting that just prior to this time the main groups were possibly Glires, Atlantogenata, an emended or new Archonta [that is, Euarchonta (Primates + Dermoptera + Scandentia], Fereuungulata plus Chiroptera, and the core insectivores or Eulipotyphyla (shrews, moles, solenodons and hedgehogs). If so, such an early burst of diversi cation may explain some of the extra dif culties being experienced in resolving the most-interior branches of the placental tree. ACKNOWLEDGMENTS This work was supported by the Marsden Fund of New Zealand (P.J.W.), grants from the Ministry of Education, Science, Sports and Culture of Japan (M.H., P.J.W., Y.C., D.P.M.), JSPS (P.J.W. and Y.C.) and the National Science Foundation (D.P.M.). We thank the two reviewers plus Christopher Austin and Peter Houde for helpful comments. We are especially grateful to Mike Sorenson for his extensive assistance in gathering sequence data for birds and alligator.

REFERENCES ADACHI, J., Y. CAO , AND M. HASEGAWA . 1993. Tempo and mode of mitochondrial DNA evolution in vertebrates at the amino acid sequence level: rapid evolution in warm-blooded vertebrates. J. Mol. Evol. 36:270–281. ADACHI, J., AND M. HASEGAWA . 1996. MOLPHY Version 2.3: Programs for molecular phylogenetics based on maximum likelihood. Comp. Sci. Monogr. 28, Institute of Statistics and Mathematics, Tokyo. ARCHIBALD , J. D. 1996. Fossil evidence for a late Cretaceous origin of “hoofed” mammals. Science 272:1150 –1153. ARNASON , U., AND A. GULLBERG . 1996. Cytochrome b nucleotide sequences and the identi cation of ve primary lineages of extant Cetaceans. Mol. Biol. Evol. 13:407–417. BARRY, D., AND J. A. HARTIGAN . 1987. Asynchronous distance between homologous DNA sequences. Biometrics 43:261–276. BENTON , M. J. 1990. Phylogeny of the major tetrapod groups: Morphological data and divergence dates. J. Mol. Evol. 30:409–424. BLEIWEISS , R. 1998. Relative rate tests and biological causes of molecular evolution in hummingbirds. Mol. Biol. Evol. 15:481–491. CAO , Y., A. J ANKE , P. J. WADDELL , M. W ESTERMAN , O. TAKENAKA, S. MURATA , N. OKADA , S. PAABO , AND M. HASEGAWA . 1998. Con ict among individual mi-

135

tochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 47:307–322. CAO, Y., P. J. W ADDELL , N. OKADA , AND M. HASEGAWA . (1998.) The complete mitochondrial DNA sequence of the shark Mustelus manazo: evaluating rooting contradictions with living bony vertebrates. Mol. Biol. Evol. 15:1637–1646. CARROL , R. L. 1987. Vertebrate paleontology and evolution. W. H. Freeman, New York. COOPER, A. 1997. Studies of avian ancient DNA: From Jurassic Park to modern island extinctions. Pages 345–374 in Avian molecular systematics (D. P. Mindell, ed.). Academic Press, San Diego, California. COOPER, A., AND D. PENNEY. 1997. Mass survival of birds across the Cretaceous–Tertiary boundary: molecular evidence. Science 275:1109 –1113. CRACRAFT , J., AND D. P. MINDELL . 1989. The early history of modern birds. A comparison of molecular and morphological evidence. Pages 389–403 in The hierarchy of life (B. Fernholm, K. Bremer, and H. J o¨ rnvall, eds.). Elsevier, Amsterdam. FEDUCCIA , A. 1996. The origin and evolution of birds. Yale Univ. Press, New Haven, Connecticut. FELSENSTEIN , J. 1985. Con dence limits on phylogenies: An approach using the bootstrap. Evolution 39:783–791. FELSENSTEIN , J. 1993. PHYLIP: Phylogeny inference package, version 3.5c. Department of Genetics, Univ. of Washington, Seattle. FLYNN , J. J., AND H. GALIANO . 1982. Phylogeny of early tertiary Carnivora, with description of a new species of Protictis from the middle Eocene of northwestern Wyom. Am. Mus. Novit. 2725:1–64. GATESY , J., M. MILINKOVITCH , V. WADDELL , AND M. STANHOPE . (1999). The stability of cladistic relationships between Cetacea and higher-level artiodactyl taxa. Syst. Biol. 48:6–20 (this issue). GEMMELL , N. J., AND M. W ESTERMAN . 1994. Phylogenetic relationships within the class Mammalia: A study using mitochondrial 12S RNA sequences. J. Mammal. Evol. 2:3–23. GREGORY, W. K. 1910. The orders of mammals. Bull. Am. Mus. Nat. Hist. 27:1–524. ¨ HARLID , A., A. JANKE, AND U. ARNASON . 1998. The complete mitochondrial genome of Rhea americana and early avian divergences. J. Mol. Evol. 46:669–679. HASEGAWA , M., Y. CAO , AND Z. YANG . 1998. Preponderance of slightly deleterious polymorphism in mitochondrial DNA: Nonsynonymous/synonymous rate ratio is much higher within than between species. Mol. Biol. Evol. 15:1499–1505. HASEGAWA , M., H. KISHINO, AND T. YANO. 1985. Dating of the human–ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 21:160–174. HASEGAWA , M., H. KISHINO, AND T. YANO. 1987. Man’s place in Hominoidea as inferred from molecular clocks of DNA. J. Mol. Evol. 26:132–147. HASEGAWA , M., H. KISHINO, AND T. YANO . 1989. Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J. Mol. Evol. 18:461–476. HEDGES , S. B., P. H. PARKER, C. G. SIBLEY, AND S. KUMAR. 1996. Continental breakup and the ordi-


136

SYSTEMATIC BIOLOGY

nal diversi cation of birds and mammals. Nature 381:226–229. HOUDE , P. 1986. Ostrich ancestors found in the Northern Hemisphere suggest new hypothesis of ratite origins. Nature 324:563–565. HUDSON , R. R. 1990. Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1–44. J ANKE , A., AND U. ARNASON . 1997. The complete mitochondrial genome of Alligator mississipiensis and the separation between recent Archosauria (birds and crocodiles). Mol. Biol. Evol. 14:1266–1272. KISHINO , H., AND M. HASEGAWA . 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170– 179. KISHINO , H., AND M. HASEGAWA . 1990. Converting distance to time: An applicatio n to human evolution. Methods Enzymol. 183:550–570. KUHNER, M. K., AND J. FELSENSTEIN . 1994. A simulation study of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11:459–468. KUMAR , S., AND C. B. HEDGES . 1998. A molecular timescale for vertebrate evolution. Nature 392:917– 920. KUMAZAWA , Y., H. OTA , M. NISHIDA , AND T. OZAWA. 1998. The complete nucleotide sequence of a snake (Dinodon semicarinatus) mitochondrial genome with two identical control regions. Genetics 150:313–329. LEE, K., J. FEINSTEIN , AND J. CRACRAFT. 1997. The phylogeny of ratite birds: Resolving con icts between molecular and morphological data sets. Pages 173–208 in Avian molecular evolution and systematics (D. P. Mindell, ed.). Academic Press, San Diego. LIVEZEY, B. C. 1997. A phylogenetic analysis of basal Anseriformes, the fossil Presbyornis, and the interordinal relationships of waterfowl. Zool. J. Linn. Soc. 121:361–428. LOCKHART , P. J., M. A. STEEL , M. D. HENDY , AND D. PENNY. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11: 605–612. MADDISON , W. P., AND D. R. MADDISON . 1992. MacClade, version 3. Sinauer Associates, Sunderland, Massachusetts. MCKENNA, M. C., AND S. K. BELL . 1997. Classi cation of mammals above the species level. Columbia Univ. Press, New York. MINDELL , D. P., AND R. HONEYCUTT . 1990. Ribosomal RNA in vertebrates: Evolution and phylogenetic applications . Ann. Rev. Ecol. Syst. 21:541–566. MINDELL , D. P., A. KNIGHT, C. BAER , AND C. J. HUDDLESTON . 1996. Slow rates of molecular evolution in birds and the metabolic rate and body temperature hypotheses. Mol. Biol. Evol. 13:422–426. MINDELL , D. P., M. D. SORENSON , D. E. DIMCHEFF, M. HASEGAWA , J. C. AST , AND T. YURI . 1999. Interordinal relationships of birds and other reptiles based on whole mitochondrial genomes. Syst. Biol. 48:138–152 (this issue).

VOL. 48

MINDELL , D. P., M. D. SORENSON , C. J. HUDDLESTONE , H. C. MIRANDA , A. KNIGHT, S. J. SAWCHUK , AND T. YURI. 1997. Phylogenetic relationships within and among select avian orders based on mitochondrial DNA. Pages 214–247 in Avian molecular evolution and systematics (D. P. Mindell, ed.). Academic Press, San Diego. NOVACEK, M. J. 1993. Re ections on higher mammalian phylogenetics. J. Mammal. Evol. 1:3–30. NUNN , G. B., AND S. E. STANLEY . 1998. Body size effects and rates of cytochrome b evolution in tubenosed seabirds. Mol. Biol. Evol. 15:1360–1371. OLSON , S. L. 1994. A giant Presbyornis (Aves: Anseriformes) and other birds from the Paleocene Aquia Formation of Maryland and Virginia. Proc. Biol. Soc. of Washington 107:429–435. SANDERSON , M. J. 1997. A non-parametric approach to estimating divergence times in the absence of rate constancy. Mol. Biol. Evol. 14:1218–1231. SHOSHANI, J., AND P. TASSEY . 1996. The Proboscidea: Evolution and Palaeoecolog y of elephants and their relatives. Oxford Univ. Press, Oxford, England. SIBLEY, C. G., AND J. A. AHLQUIST . 1990. Phylogeny and classi cation of birds: A study in molecular evolution. Yale Univ. Press, New Haven, Connecticut. SIMPSON , G. G. 1945. The principles of classi cation and the classi cation of mammals. Bull. Am. Mus. Nat. Hist. 85:1–350. SMITH , A. G., D. G. SMITH , AND B. M. FUNNELL . 1994. Atlas of Mesozoic and Cenozoic coastlines. Cambridge Univ. Press, Cambridge, England. SORENSON , M. D., J. C. AST , D. E. DIMCHEFF, T. YURI, AND D. P. MINDELL . In press. Primers for a PCR-based approach to mitochondrial genome sequencing in birds and other vertebrates. Mol. Phylogenet. Evol. SPRINGER, M. S. 1997. Molecular clocks and the timing of the placental and marsupial radiations in relation to the Cretaceous–Tertiary boundary. J. Mammal. Evol. 4:285–302. STANHOPE , M. J., V. G. W ADDELL , O. MADSEN , W. DE J ONG, C. B. HEDGES , G. C. CLEVEN , D. KAO, AND M. S. SPRINGER. 1998. Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals. Proc. Natl. Acad. Sci. USA 95:9967–9972. SULLIVAN , J., AND D. L. SWOFFORD. 1997. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J. Mammal. Evol. 4:77–86. SWOFFORD, D. L. 1998. PAUP* : Phylogentic analysis under parsimony, version 4.0 (d62-64). Sinauer, Sunderland, Massachusetts. SWOFFORD, D. L., G. J. OLSEN , P. J. WADDELL , AND D. M. HILLIS. 1996. Phylogenetic Inference. Pages 407–514 in Molecular systematics, 2nd edition (D. M. Hillis, C. Moritz, and B. K. Mable, eds.). Sinauer, Sunderland, Massachusetts. TAJIMA, F. 1993. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135:599–607. THEWISSEN , J. G. M., AND S. I. MADAR . 1999. Implication s of ankle morphology for the phyloge-


1999

WADDELL ET AL.—CRETACEOUS DIVERGENCE TIMES IN BIRDS AND MAMMALS

netic relations among ungulates. Syst. Biol. 48:21–30 (this issue). WADDELL , P. J. 1995. Statistical methods of phylogenetic analysis: Including Hadamard conjugations, LogDet transforms, and maximum likelihood. Ph.D. Thesis, Massey Univ., Palmerston North, New Zealand. WADDELL , P. J., Y. CAO, J. HAUF , AND M. HASEGAWA . 1999. Using novel methods to evaluate mammalian mtDNA and detect internal con icts: including AA invariant sitesLogDet and site stripping, with special reference to the position of hedgehog, armadillo, and elephant. Syst. Biol. 48:31–53 (this issue). WADDELL , P. J., N. OKADA, AND M. HASEGAWA . 1999. Towards resolving the interordinal relationships of placental mammals. Syst. Biol. 48:1–5 (this issue). WADDELL , P. J. AND D. PENNY. 1996. Evolutionary trees of apes and humans from DNA sequences. Pages 53–73 in Handbook of Human Symbolic Evolution (A. J. Lock and C. R. Peters, eds.). Oxford University Press, Oxford, England.

137

WADDELL , P. J., D. PENNY, AND T. MOORE . 1997. Hadamard conjugations and modeling sequence evolution with unequal rates across sites. Mol. Phylog. Evol. 8:33–50. WADDELL , P. J., AND M. A. STEEL . 1997. General time reversible distances with unequal rates across sites: Mixing G and inverse Gaussian distributions with invariant sites. Mol. Phyl. Evol. 8:398–414. WYSS , A. R., AND J. J. FLYNN . 1993. Phylogenetic analysis and de nition of the Carnivora. Pages 32– 52 in Mammal phylogeny: Mesozoic differentiation, multituberculates , monotremes, early eutherians, and marsupials (F. S. Szalay, M. J. Novacek, and M. C. McKenna, eds.). Springer-Verlag, New York. YANG , Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 5:555–556. Received 15 July 1998; accepted 15 September 1998 Associate Editor: R. Olmstead


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.