The fine line between complexity and inaccuracy

Trait-dependent diversification models can be enticing in that a simple setup can give fascinating results. However, without thoughtful model design, conclusions drawn from the models may be wrong. Model design is a balancing act between increasing model complexity to accurately portray a complex reality without increasing it so much that results are inaccurate. While we were not able to show an interesting effect of animal seed disperser and range size on diversification, our study can be a road map to results that can be trusted and further method development.

Above: Fruit colour categories mapped onto the palm phylogenetic tree. Categories are described based on their range size and disperser group.

Inferring how certain traits affect the rates of species diversification, the net result of species formation and extinction, is an extremely powerful tool within evolutionary biology. Or rather, it would be if it could be done accurately.

Models that estimate trait-dependent diversification are relatively easy to set up and run, and exist for discreet and continuous characters. Further, publicly available trait data are excellent for many taxa across the Tree of Life, information such as body size, leaf area, or habitat type. These two elements made a trait-dependent diversification study in palms a good opportunity for a master’s degree project. The two palm traits we were interested were fruit colour and range size, which are inherently linked through dispersal.

Editors’ choice: (Open access.)
Hill, A., Jiménez, M. F. T., Chazot, N., Cássia-Silva, C., Faurby, S., Herrera-Alsina, L., & Bacon, C. D. (2023). Apparent effect of range size and fruit colour on palm diversification may be spurious. Journal of Biogeography, 00, 1–13.

Different fruit colours affect dispersal by being more or less detectable to different groups of animals. Most mammals are red-green colour blind, and therefore do not see red fruits very well among green foliage. Birds are able to differentiate red from green, and also show a preference for darker fruits. With different plant species having different fruit colours, this means that when fruits are ingested and their seeds spread by animals, they can potentially be spread further by the more mobile birds, or less far by the less mobile mammals. Plants with colours preferred by birds may therefore have larger range sizes. Different range sizes also affect diversification, where extremely large or small range sizes may lead to lower a diversification rate. Our hypothesis was therefore that an intermediate range size would be the most likely to have highest diversification.

Given the potential interesting diversification dynamics of fruit colour and range size, we compiled data on palm fruit colour and geographic occurrence, the latter which we used to estimate range size. From there we used what we thought were straightforward models to estimate diversification, and found some interesting results, which later would be shown to be completely wrong during validation.

While trait-dependent diversification studies tell an interesting story, the methods (State-dependent Speciation and Extinction models) are not as straight-forward as they may seem. The results of the models are not valid unless additional steps are taken to ensure results. We learned this during the review process, where the need to verify the results by, among other things, testing for the influence of other traits, was requested. We designed new models, far more complex than the original ones. The new models we ran gave us results, but something was still off. The model likelihood was better in nested models than the original, which cannot be the case because the more complex model always has a better likelihood than a less complex, nested model. This indicates that the models were finishing their optimisation process while stuck in a local optimum value, and not reaching the global optimum value. At this point we invited Leonel Herrera-Alsina, the lead author of the models we were using to participate in our study. With his help we devised models that would be able to finish optimising properly. Designing good models turned out to be a balancing act between increasing complexity to achieve better model fit and more appropriately mimic reality, but not so much that the models will seemingly converge when actually being stuck in a local optimum. However, there are potentially a massive number of potential models, which could give entirely different results. There is only one globally most fit model, so how can we be sure that we have found the best model without testing all of them? As we showed that other traits better explain the effect of range size and fruit colour on diversification, we instead focus in on discussing how to quantify the uncertainty around whether a global optimum model has been found.

Blindly trusting results, even from widely used methods, can lead to inaccurate results. Our work also highlights the importance of the review process, which led us to spend more time validating our models than producing them, and rightly so. What seemed like a relatively easy way of producing potentially fascinating results proved more controversial and unreliable than we had expected. But our results will hopefully serve as a compass in the development of trait-dependent diversification methods and other macroecological and evolutionary models in general. In the future we will likely not shy away from using these methods, but will certainly put particular emphasis on validating any results from them.

Written by:
Adrian Hill
PhD student, University of Gothenburg (Sweden)

Additional information:

Published by jbiogeography

Contributing to the growth and societal relevance of the discipline of biogeography through dissemination of biogeographical research.

Leave a Reply

%d bloggers like this: