Phylo Lab Projects

General Research Interests

The Moore lab is primarily focused on the development and application of statistical phylogenetic methods. The major component of this research program involves developing Bayesian statistical approaches for inferring the evolutionary relationships among species (phylogenetic methods), and also entails advancing methods for making inferences regarding various evolutionary processes from phylogenetic trees (comparative-phylogenetic methods).
Empirical research in the lab involves applying these statistical phylogenetic methods to explore fundamental questions concerning the evolution of species—particularly the interaction between biogeographic history, character evolution, and lineage diversification.
Here is a brief description of some of the big, externally funded projects that currently underway in the Phylo Lab.

BETAbase:
the Bayesian
Evolutionary
Tree Archive

Advancing the Bayesian phylogenetic revolution
Because they provide an explicit historical perspective, phylogenies have become central to virtually all areas of research in evolutionary biology, ecology, molecular biology and epidemiology. Increasingly, phylogenies are inferred in a Bayesian statistical framework, as it provides several potential benefits, including: (1) the capacity to infer phylogenies from increasingly complex genomic datasets using correspondingly complex stochastic models; (2) the ability to incorporate relevant prior information on the model parameters, and; (3) the ability to describe (and accommodate) uncertainty in phylogenetic estimates (and inferences based on those phylogenies) by virtue of inferring posterior probability distributions (rather than point estimates) of the phylogenetic parameters.

Although the adoption of a Bayesian approach has undoubtedly revolutionized the field of statistical phylogenetics, the revolution is nevertheless incomplete in several important respects, including: (1) the adherence to "vague" prior probability densities (typically invoking the default priors assumed in the phylogenetic software); (2) the casual assessment of the effectiveness of the Markov Chain Monte Carlo (MCMC) algorithms used to approximate the posterior probability density, and; (3) the omission of phylogenetic uncertainty from comparative phylogenetic studies (typically inferences are based on a point summary of the posterior probability distribution of sampled trees).

This project involves the creation of the Bayesian Evolutionary Tree Analysis relational database (BETAbase), an interactive repository for the MCMC output of Bayesian phylogenetic analyses. This database will substantially advance the field of Bayesian phylogenetic inference (and by extension, the myriad fields of biological sciences that rely on phylogenetic information) by promoting more rigorous evaluation of empirical phylogenetic studies, sponsoring the development of tools to assess the performance of MCMC algorithms and providing training for their effective application. Moreover, BETAbase will enable the phylogenetic community to focus theoretical efforts on areas where existing phylogenetic methods are deficient, promote more robust comparative phylogenetic analyses, and safeguard the substantial computational investment of the systematic community. Team members of this NSF-funded project (DBI-1356737) include Bob Thomson (co-PI), Sebastian Höhna, Andrew Magee, Mike May, and Brian Moore (co-PI).

Exploring the impact of model (mis)specification on Bayesian divergence-time estimates

Many evolutionary inference problems require an ultrametric tree that confers temporal information: i.e., where branch durations and node heights are rendered proportional to absolute or relative time. Increasingly, divergence times are inferred in a Bayesian statistical framework under so-called ‘relaxed-clock models’ that accommodate variation in substitution rates among lineages. Remarkable progress in the development of stochastic models for estimating divergence times is one of the major success stories in modern biology.

The diversity of increasingly complex (and hopefully realistic) relaxed-clock models has the potential to improve divergence-time estimates, but demands careful (and computationally expensive) comparison among candidate models in order to assess their fit to a given empirical dataset. Moreover, the statistical behavior of many relaxed-clock models has not been carefully explored by simulation, so the relative merits (and pitfalls) of alternative relaxed-clock models is poorly understood. This state of affairs is particularly worrisome, as the overwhelming majority (82.5%) of empirical studies forgo formal selection among the many candidate relaxed-clock models, and instead simply assume a single (UCLN) model, which has seemingly become de rigueur in our field. The consequences of this emerging convention—both for divergence-time estimates and also for the inferences based on these trees—are potentially profound but presently unknown.

This project explores the statistical behavior of relaxed-clock models and calibration methods on a large sample of empirical datasets. Specifically, we are applying all currently implemented relaxed-clock models and calibration methods to a large sample of empirical datasets to: (1) reveal the extent to which divergence-time estimates are sensitive to the chosen relaxed-clock model/calibration method; (2) explore the relative influence of the three primary model components—branch-rate priors, node-age/tree priors, and calibration approaches—on divergence-time estimates; (3) assess the relative fit of the pool of candidate relaxed-clock models to real datasets using robust Bayesian model-comparison methods; and (4) develop analytical protocols and implement pipelines that automate the efficient exploration of relaxed-clock model space for empirical analyses. Team members of this NSF-funded project (DEB-1457835) include Sebastian Höhna, Mike May, Jiansi Gao, and Brian Moore (PI).

How well are
we playing the
"dating game"?