New Methods in Evolutionary Research

Page 1


Introduction

New Methods in Evolutionary Research

Edited by Rob Freckleton and Bob O’Hara When we launched Methods in Ecology and Evolution we were keen to encompass a range of methodologies, and to give authors the chance to publish as wide a variety of methods as possible, as well as maximise the audience of ecologists and evolutionary biologists that had access to this work. In this Virtual Issue, we highlight the variety of papers and evolutionary methods that we have published in the first 2.5 years of MEE.

The breadth of subjects is remarkable - these range from the analysis of barcodes and DNA sequences, to citizen science approaches for collecting data. Population genetics and macroevolution have been well represented, while statistical methods papers have been popular with a healthy number of submissions, as well as downloads by our readers. Of course, as a journal that aims to link evolution and ecology, this is only one part of the papers we publish and in some ways the division between ecology and evolution is slightly artificial as many methods are cross-cutting. However here we wish to take the opportunity to showcase the excellent papers that have a largely evolutionary content.


Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring Douglas W. Yu, Yinqiu Ji, Brent C. Emerson, Xiaoyang Wang, Chengxi Ye, Chunyan Yang, Zhaoli Ding

Summary 1. Traditional biodiversity assessment is costly in time, money and taxonomic expertise. Moreover, data are frequently collected in ways (e.g. visual bird lists) that are unsuitable for auditing by neutral parties, which is necessary for dispute resolution. 2. We present protocols for the extraction of ecological, taxonomic and phylogenetic information from bulk samples of arthropods. The protocols combine mass trapping of arthropods, mass-PCR amplification of the COI barcode gene, pyrosequencing and bioinformatic analysis, which together we call ‘metabarcoding’. 3. We construct seven communities of arthropods (mostly insects) and show that it is possible to recover a substantial proportion of the original taxonomic information. We further demonstrate, for the first time, that metabarcoding allows for the precise estimation of pairwise community dissimilarity (beta diversity) and within-community phylogenetic diversity (alpha diversity), despite the inevitable loss of taxonomic information inherent to metabarcoding. 4. Alpha and beta diversity metrics are the raw materials of ecology and the environmental sciences, facilitating assessment of the state of the environment with a broad and efficient measure of biodiversity.


Barcoding's next top model: an evaluation of nucleotide substitution models for specimen identification Rupert A. Collins, Laura M. Boykin, Robert H. Cruickshank, Karen F. Armstrong

Summary 1. DNA barcoding studies use Kimura's two-parameter substitution model (K2P) as the de facto standard for constructing genetic distance matrices. Distances generated under this model then provide the basis for most downstream analyses, but uncertainty in model choice is rarely explored and could potentially affect how reliably DNA barcodes discriminate species. 2. Using information-theoretic approaches for a data set comprising 14 472 DNA barcodes from 14 published studies, we tested whether the K2P model was a good fit at the species level and whether applying a better fitting model biased error rates or changed overall identification success. 3. We report that the K2P was a poorly fitting model at the species level; it was never selected as the best model and very rarely selected as a credible alternative model. Despite the lack of support for the K2P model, differences in distance between best model and K2P model estimates were usually minimal, and importantly, identification success rates were largely unaffected by model choice even when interspecific threshold values were reassessed. 4. Although these conclusions may justify using the K2P model for specimen identification purposes, we found simpler metrics such as p distance performed equally well, perhaps obviating the requirement for model correction in DNA barcoding. Conversely, when incorporating genetic distance data into taxonomic studies, we advocate a more thorough examination of model uncertainty.


nadiv: an R package to create relatedness matrices for estimating non-additive genetic variances in animal models Matthew E. Wolak

Summary 1. The Non-Additive InVerses (nadiv) R software package contains functions to create and use non-additive genetic relationship matrices in the animal model of quantitative genetics. 2. This study discusses the concepts relevant to non-additive genetic effects and introduces the package. 3. nadiv includes functions to create the inverse of the dominance and epistatic relatedness matrices from a pedigree, which are required for estimating these genetic variances in an animal model. The study focuses on three widely used software programs in ecology and in evolutionary biology (ASReml, MCMCglmm and WOMBAT) and how nadiv can be used in conjunction with each. Simple tutorials are provided in the Supporting Information.


jPopGen Suite: population genetic analysis of DNA polymorphism from nucleotide sequences with errors Xiaoming Liu

Summary 1. Next-generation sequencing (NGS) is being increasingly used in ecological and evolutionary studies. Though promising, NGS is known to be error-prone. Sequencing error can cause significant bias for population genetic analysis of a sequence sample. 2. We present jPopGen Suite, an integrated tool for population genetic analysis of DNA polymorphisms from nucleotide sequences. It is specially designed for data with a non-negligible error rate, although it serves well for ‘error-free’ data. It implements several methods for estimating the population mutation rate, population growth rate and conducting neutrality tests. 3. jPopGen Suite facilitates the population genetic analysis of NGS data in various applications and is freely available for noncommercial users at http://sites.google.com/site/jpopgen/.


TempNet: a method to display statistical parsimony networks for heterochronous DNA sequence data Stefan Prost, Christian N. K. Anderson

Summary 1. Heterochronous data have been used to study demographic changes in epidemiology and ancient DNA studies, revolutionizing our understanding of complex evolutionary processes such as invasions, migrations and responses to drugs or climate change. While there are sophisticated applications based on Markov-Chain Monte Carlo or Approximate Bayesian Computation to study these processes through time, summarizing the raw genetic data in an intuitively meaningful graphic can be challenging, most notably if identical haplotypes are present at different points in time.

2. We present temporal networks, an attractive way to display and summarize relationships within the heterochronous data so commonly used in ancient DNA or epidemiological research. TempNet is a user-friendly R script that creates journalquality figures from genetic data in standard formats (FASTA, CLUSTAL, etc.). These figures are customizable and interactive within the R graphics window. Using three examples, we demonstrate that TempNet can deal with standard-sized datasets, as well as datasets of hundreds of sequences from fast-evolving organisms. 3. Temporal networks are flexible ways to illustrate genetic relationships through time. Furthermore, this approach is not limited to time-stamped data, but can also be used for different data partitioning strategies, such as spatial or phenotypic groupings. The R script presented here will be useful in illustrating complex genetic relationships between groups.


Accounting for uncertainty in species delineation during the analysis of environmental DNA sequence data Jeff R. Powell

Summary 1. Defining species boundaries represents a significant challenge in biodiversity studies, especially as these studies increasingly rely on high-throughput DNA sequencing technologies. A promising approach for delineating species in environmental sequence data combines phylogenetics and coalescence theory to estimate species boundaries from distributions of lineage birth rates within multispecies coalescent trees. 2. Existing methods for interpreting these models utilize hypothetico-deductive reasoning to identify thresholds associated with a mixed speciation-coalescent model that fits the data better than a null model. Here, I describe an alternative approach that ranks and assigns weights to models based on their fit to the data using information criteria and then uses model averaging to estimate parameters and species probabilities. 3. This approach is applied to data from two independent studies that address (i) patterns of cospeciation in an aphid– bacterial symbiosis and (ii) diversity of bacterial communities associated with the human gut. In both of these cases, accounting for uncertainty during model selection allowed greater flexibility to detect variable (with respect to time) speciation-coalescent thresholds among lineages.

4. The precision of the predicted species boundaries varied among the studies, and the variance-to-mean ratio for richness estimates ranged from 0.023 to 0.079. Sample-based estimates of gut bacteria richness revealed that accounting for uncertainty during species delineation increased the variance in the estimates of population means (by individual from which the samples were taken or by sex of the individuals) by up to 7.5%. 5. In ecological and evolutionary studies, conclusions are highly dependent on the classification system that is adopted; given the uncertainty in species boundaries observed here, ignoring this source of error (as is common practice) likely results in inflated type I error rates. The approach described here represents an objective, theory-based method for predicting species boundaries and explicitly incorporates uncertainty in the classification system into biodiversity estimation, thus allowing researchers to better address the causes and consequences of biodiversity.


Directed terminal restriction analysis tool (DRAT): an aid to enzyme selection for directed terminal-restriction fragment length polymorphisms David M. Roberts, PietĂ G. Schofield, Suzanne Donn, Tim J. Daniell

Summary 1. T-RFLP is an established tool for high-throughput studies of microbial communities, which can, with care and practical validation, be enhanced to aid identification of specific organisms in a community by associating T-RFs from experimental runs with predicted T-RFs from a set of existing sequences. A barrier to this approach is the laborious process of selecting diagnostic restriction enzyme(s) for further validation. 2. Here, we describe directed terminal restriction analysis tool (DRAT), a software tool that aids the design of directed terminal-restriction fragment length polymorphism (DT-RFLP) strategies, to separate DNA targets based on restriction enzyme polymorphisms. The software assesses multiple user-supplied DNA sequences, ranks optimal restriction endonucleases for separating targets and provides summary information including the length of diagnostic terminal restriction fragments. A worked example suggesting enzymes uniquely separating selected arbuscular mycorrhizal fungal groups is presented. 3. This tool greatly facilitates identification of diagnostic restriction enzymes for user-designated groups within complex populations and provides expected product sizes for all designated groups.


Testing the time-for-speciation effect in the assembly of regional biotas Daniel L. Rabosky

Summary 1. Rates of evolutionary diversification play a fundamental role in the assembly of regional communities, but the relative balance of diversity-dependent and diversity-independent rate control remains controversial. Recent studies have reported a significant relationship between the amount of time a geographic region has been occupied and species richness, implying that feedbacks between species interactions and diversification rates may be less important than diversity-independent mechanisms in generating regional species pools.

2. Previous analyses of the regional age-diversity relationship have used a range of metrics to quantify the amount of ‘evolutionary time’ that a region has been occupied, but the relative performance of these metrics has not been quantified. 3. Here, I evaluate the performance of the most commonly used methods and data transformations for assessing the regional age-diversity relationship. 4. I find that process-based models of diversification are more appropriate than process-independent models for evaluating the influence of time on species richness. I also demonstrate that time should not be log-transformed when testing the regional time-for-speciation hypothesis, as in some recent studies. 5. Application of this framework to patterns of elevational richness in several recent studies provides support for a logistic model of diversity accumulation within elevational bands and implies that evolutionary age alone cannot fully account for current species richness. 6. These results indicate that process-based models, in concert with appropriate data transformation, provide a robust foundation for inference on the causes of regional diversity gradients.


MOTMOT: models of trait macroevolution on trees Gavin H. Thomas, Robert P. Freckleton

Summary 1. Models of trait macroevolution on trees (MOTMOT) is a new software package that tests for variation in the tempo and mode of continuous character evolution on phylogenetic trees. MOTMOT provides tools to fit a range of models of trait evolution with emphasis on variation in the rate of evolution between clades and character states. 2. We introduce a new method, trait MEDUSA, to identify the location of major changes in the rate of evolution of continuous traits on phylogenetic trees. We demonstrate trait MEDUSA and the other main functions of MOTMOT, using body size of Anolis lizards. 3. MOTMOT is open source software written in the R language and is freely available from CRAN (http://cran.rproject.org/web/packages/).


RBrownie: an R package for testing hypotheses about rates of evolutionary change J. Conrad Stack, Luke J. Harmon, Brian O'Meara

Summary 1. Maximum likelihood analyses for testing hypotheses about how rates of disparification might vary across clades can provide important insight into the evolutionary process. While the Brownie phylogenetic library can perform such analyses, it does so outside of a general scripting environment. 2. We present RBrownie, an interface between the Brownie phylogenetic library and the R software environment, which provides easy access to the main methods in Brownie (see O'Meara 2008; PhD Dissertation, Nature Precedings), including discrete ancestral state reconstruction. In addition, RBrownie supplies a direct interface to Brownie, allowing advanced users to construct more complex combinations of analyses and to execute any newly added Brownie functions. 3. Overall, it is a package that features evolutionary rate analyses in a flexible and familiar environment.


phytools: an R package for phylogenetic comparative biology (and other things) Liam J. Revell

Summary 1. Here, I present a new, multifunctional phylogenetics package, phytools, for the R statistical computing environment. 2. The focus of the package is on methods for phylogenetic comparative biology; however, it also includes tools for tree inference, phylogeny input/output, plotting, manipulation and several other tasks. 3. I describe and tabulate the major methods implemented in phytools, and in addition provide some demonstration of its use in the form of two illustrative examples. 4. Finally, I conclude by briefly describing an active web-log that I use to document present and future developments for phytools. I also note other web resources for phylogenetics in the R computational environment.


How to measure and test phylogenetic signal Tamara Münkemüller, Sébastien Lavergne, Bruno Bzeznik, Stéphane Dray, Thibaut Jombart, Katja Schiffers, Wilfried Thuiller

Summary 1. Phylogenetic signal is the tendency of related species to resemble each other more than species drawn at random from the same tree. This pattern is of considerable interest in a range of ecological and evolutionary research areas, and various indices have been proposed for quantifying it. Unfortunately, these indices often lead to contrasting results, and guidelines for choosing the most appropriate index are lacking. 2. Here, we compare the performance of four commonly used indices using simulated data. Data were generated with numerical simulations of trait evolution along phylogenetic trees under a variety of evolutionary models. We investigated the sensitivity of the approaches to the size of phylogenies, the resolution of tree structure and the availability of branch length information, examining both the response of the selected indices and the power of the associated statistical tests. 3. We found that under a Brownian motion (BM) model of trait evolution, Abouheif’s Cmean and Pagel’s λ performed well and substantially better than Moran’s I and Blomberg’s K. Pagel’s λ provided a reliable effect size measure and performed better for discriminating between more complex models of trait evolution, but was computationally more demanding than Abouheif’s Cmean. Blomberg’s K was most suitable to capture the effects of changing evolutionary rates in simulation experiments.

4. Interestingly, sample size influenced not only the uncertainty but also the expected values of most indices, while polytomies and missing branch length information had only negligible impacts. 5. We propose guidelines for choosing among indices, depending on (a) their sensitivity to true underlying patterns of phylogenetic signal, (b) whether a test or a quantitative measure is required and (c) their sensitivities to different topologies of phylogenies. 6. These guidelines aim to better assess phylogenetic signal and distinguish it from random trait distributions. They were developed under the assumption of BM, and additional simulations with more complex trait evolution models show that they are to a certain degree generalizable. They are particularly useful in comparative analyses, when requiring a proxy for niche similarity, and in conservation studies that explore phylogenetic loss associated with extinction risks of specific clades.


smatr 3– an R package for estimation and inference about allometric lines David I. Warton, Remko A. Duursma, Daniel S. Falster, Sara Taskinen

Summary 1. The Standardised Major Axis Tests and Routines (SMATR) software provides tools for estimation and inference about allometric lines, currently widely used in ecology and evolution. 2. This paper describes some significant improvements to the functionality of the package, now available on R in smatr version 3. 3. New inclusions in the package include sma and ma functions that accept formula input and perform the key inference tasks; multiple comparisons; graphical methods for visualising data and checking (S)MA assumptions; robust (S)MA estimation and inference tools.


On thinning of chains in MCMC William A. Link, Mitchell J. Eaton

Summary 1. Markov chain Monte Carlo (MCMC) is a simulation technique that has revolutionised the analysis of ecological data, allowing the fitting of complex models in a Bayesian framework. Since 2001, there have been nearly 200 papers using MCMC in publications of the Ecological Society of America and the British Ecological Society, including more than 75 in the journal Ecology and 35 in the Journal of Applied Ecology.

2. We have noted that many authors routinely ‘thin’ their simulations, discarding all but every kth sampled value; of the studies we surveyed with details on MCMC implementation, 40% reported thinning. 3. Thinning is often unnecessary and always inefficient, reducing the precision with which features of the Markov chain are summarised. The inefficiency of thinning MCMC output has been known since the early 1990’s, long before MCMC appeared in ecological publications. 4. We discuss the background and prevalence of thinning, illustrate its consequences, discuss circumstances when it might be regarded as a reasonable option and recommend against routine thinning of chains unless necessitated by computer memory limitations.


Quantifying individual variation in reaction norms: how study design affects the accuracy, precision and power of random regression models Martijn van de Pol

Summary 1. Quantifying individual heterogeneity in plasticity is becoming common in studies of evolutionary ecology, climate change ecology and animal personality. Individual variation in reaction norms is typically quantified using random effects in a mixed modelling framework. However, little is known about what sampling effort and design provide sufficient accuracy, precision and power. 2. I developed ‘odprism’, an easy-to-use software package for the statistical language R, which can be used to investigate the accuracy, precision and power of random regression models for various types of data structures. Moreover, I conducted simulations to derive rules-of-thumb for four design decisions that biologists often face. 3. First, I investigated the trade-off between sampling many individuals a few times versus sampling few individuals often. Generally, at least 40 individuals should be sampled with a total sample size of at least 1000 to obtain accurate and precise estimates of individual variation in elevation and slopes of linear reaction norms and their correlation. Contrasting a previous recommendation, it is worthwhile to bias the ratio of number of individuals over replicates towards sampling more individuals. 4. Second, I considered how the range of environmental conditions over which individuals are sampled affects the optimal sampling strategy. I show that when all individuals experience the same conditions during a sampling event, sampling each individual only twice should be strictly avoided. 5. Third, I examined the case where the number of replicates per individual is constrained by their lifespan, as is common when sampling annual traits in the wild. I show that for a given sampling effort, it is much easier to detect individual variation in reaction norms for long-lived than for short-lived species. 6. Fourth, I investigated the performance of random regression models when studying traits under selection. Reassuringly, directional viability selection barely caused any bias in estimates of variance components. 7. Random regression models are inherently data hungry, and reviewing the literature shows that particularly behavioural studies have low sampling effort. Therefore, the software and rules-of-thumbs I identified for designing reaction-norm studies should help researchers make more informed choices, which likely improve the reliability and interpretation of plasticity studies.


Evolution MegaLab: a case study in citizen science methods Jenny P. Worthington, Jonathan Silvertown, Laurence Cook, Robert Cameron, Mike Dodd, Richard M. Greenwood, Kevin McConway, Peter Skelton

Summary 1. Volunteers have helped in scientific surveys of birds and other organisms for decades, but more recently, the use of the Internet has enormously widened the opportunity for citizen science and greatly increased its practice. There is now a need to share experience of which methods work and which do not. Here, we describe how we planned and executed the Evolution MegaLab, one of the largest surveys of polymorphism in wild species so far undertaken. 2. The aim of the Evolution MegaLab was to exploit the occasion of Charles Darwin’s double centenary in 2009 to mobilize the widest possible section of the general public in Europe to help survey shell polymorphism in the banded snails Cepaea nemoralis and Cepaea hortensis. These data were then compared with historical records to detect evolutionary change that may have taken place in the decades between samples. 3. Records of polymorphism in over 7000 populations sampled throughout the natural range of the two species were captured from published and unpublished sources and added to an online database. These data could be explored by the general public via a Google Maps interface on the project website (http://evolutionmegalab.org). The website contained a welcome page that explained what evolution is and how recent changes in climate, and the abundance of predatory birds (song thrushes Turdus philomelos) might have caused an evolutionary change in the shell patterns of banded snails. 4. A network of collaborators in 15 European countries was formed, with each country responsible for translating the website and associated materials, recruiting volunteers and raising any funds required locally. A total of 6461 users registered with the site, and 7629 records were submitted. We used an online quiz to train users and to test their ability, to recognize the correct snails and their morphs. Every user received automated, immediate feedback that compared their data with nearby records from the historical database. 5. The critical tasks achieved by the Evolution MegaLab that any citizen science project must tackle are as follows: (i) design of an appropriate project, (ii) recruitment, motivation and training of volunteers, and (iii) ensuring data quality.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.