David R. Anderson, ed.Springer, Dordrecht, The Netherlands, 2008,
184 pp., $39.95 (soft), ISBN: 978-0-387-74073-7
Model selection is the process of specifying the form of a model and estimating its parameters, often in a misguided attempt to estimate truth. A few statistical methods traditionally used to select a model are analysis of variance (ANOVA) and multiple and logistic regression. Information-theoretic methods of model selection date back to the mid-1970s, but researchers in the life sciences have been slow to adopt these newer methods. One advocate of using an information-theoretic approach is David Anderson, who has written extensively on the subject, including the co-authored book Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (Burnham and Anderson 2002). Two possible reasons why information-theoretic approaches have not been adopted by scientists are that the approaches are steeped in mathematical theory that can be challenging to understand, and many statistical courses taught for students in the life sciences introduce only traditional null hypothesis testing. Anderson recognized the need for a practical book which introduced a step-by-step guide about information-theoretic approaches and minimized the mathematical theory.
This primer is a quick read, only seven chapters, and includes a glossary of key terms and an appendix. Each chapter begins with a short biography primarily of people who contributed to information-theoretic approaches and ends in many cases with exercises meant to expand and test the reader's comprehension of the material. The first two chapters describe traditional methods of model selection, such as ANOVA and multiple and logistic regression, their limitations, and qualities of a good model. In traditional methods, the form of the model is given, all analyses of the data are post hoc, P values are used to determine significance of the model, and the end result is one best model automatically selected by the statistical test. The problems with traditional methods are: a general lack of good science and a priori thinking (scientists "data dredge" too often, which can lead to spurious, nonrepeatable results); heavy reliance on an arbitrary P value; a lack of parsimony; and evaluation of only one model (the null hypothesis is tested and is often "uninteresting or implausible"). Anderson asserts that there is no true model, as models can only approximate truth and that the goal of model selection is not to estimate truth per se, rather to find models that work. The information-theoretic approach finds useful models by modeling the information in the data and separating the data from "noise."
Anderson mainly focuses the remainder of his book on Akaike's Information Criterion (AIC), which was the first tool that linked information theory to statistical theory. AIC requires that hypotheses and several plausible candidate models are formulated before collecting data. Use of AIC eliminates the problems associated with traditional hypothesis testing by calculating the relative worth of each model in the candidate set, eliminating P values (instead, numerical values represent the scientific evidence and are followed by value judgments made by the investigators), and enforcing parsimony by adding a penalty for adding useless parameters. AIC is particularly effective at resolving conflicts in all subdisciplines in the life sciences, and especially conflicts over technical issues, such as the handling of the data and biases resulting from the choice of one statistical tool over alternatives. AIC can resolve conflicts partially because conflicting parties must agree a priori on the questions to be addressed, the data to be analyzed, the specific methodology to analyze the data, and what results will support one conclusion over another. AIC also retains all of the data, providing a more complete picture, and allows the direct comparison of multiple hypotheses. In entomological research, AIC is useful for evaluating complex systems. For example, AIC can determine the relative importance of influences on population dynamics of arthropods, proximate causes of division of labor in social insects, factors influencing selection of a host plant by herbivores, mechanisms which may underlie patterns of diversity, and control measures in integrated pest management programs.
Anderson makes AIC an accessible technique by using a couple of simple data sets as examples and then referring to these each time he introduces new information about AIC. Each chapter nicely builds on the previous chapter(s), and the reader can use the data sets to duplicate the calculations in the book to verify their understanding of the material. The appendix is also useful because it discusses more theory, other methods such as Bayesian, and common misuses and misinterpretations of AIC.
The book has limitations. For a book which is meant to introduce scientists to information-theoretic approaches and help them calculate AIC, many of the exercises are not practical calculations and analyses of data. I would like the exercises to give the reader the opportunity to calculate and analyze data sets. Anderson also could have made the book more practical by presenting an example data set and fully analyzing it and interpreting it within one summary chapter or a section in the appendix. The calculations and analyses for his example data sets are spread out among the chapters and therefore do not provide a succinct blueprint for a scientist who is using AIC to analyze their own data. By providing a thorough blueprint in one place, it would have clarified some components of AIC that only get a cursory treatment. For example, Anderson never satisfactorily describes how to calculate the parameters of models, which are essential to use AIC.
Overall, Anderson does a good job of convincing the reader of the usefulness and simplicity of AIC, helping the reader learn to calculate AIC, and explaining the pitfalls of information-theoretic approaches. I had only ever heard of AIC before reading this book, but afterward I was knowledgeable enough to apply the tool in my research. This book should be considered required reading for any scientist who has not learned the value, simplicity, and methodology of AIC.
Matthew L. Richardson
Program in Ecology, Evolution, and Conservation Biology,
University of Illinois, Urbana, IL 61801,
E-mail: mlrichar@illinois.edu
Annals of Entomological Society of America
Vol. 102, No. 4, July 2009, Page 737 - 738