Bayesian model selection for group studies
Introduction
Model comparison and selection is central to the scientific process, in that it allows one to evaluate different hypotheses about the way data are caused (Pitt and Myung, 2002). Nearly all scientific reporting rests upon some form of model comparison, which represents a probabilistic statement about the beliefs in one hypothesis relative to some other(s), given observations or data. The fundamental Neyman–Pearson lemma states that the best statistic upon which to base model selection is simply the probability of observing the data under one model, divided by the probability under another model (Neyman and Pearson, 1933). This is known as a log-likelihood ratio. In a classical (frequentist) setting, the distribution of the log-likelihood ratio, under the null hypothesis that there is no difference between models, can be computed relatively easy for some models. Common examples include Wilk's Lambda for linear multivariate models and the F- and t-statistics for univariate models. In a Bayesian setting, the equivalent to the log-likelihood ratio is the log-evidence ratio, which is commonly known as a Bayes factor (Kass and Raftery, 1995). An important property of Bayes factors are that they can deal both with nested and non-nested models. In contrast, frequentist model comparison can be seen as a special case of Bayes factors where, under certain hierarchical restrictions on the models, their null distribution is readily available.
In this paper, we will consider the general case of how to use the model evidence for analyses at the group level, without putting any constraints on the models compared. These models can be nonlinear, possibly dynamic and, critically, do not necessarily bear a hierarchical relationship to each other, i.e. they are not necessarily nested. The application domain we have in mind is the comparison of dynamic causal models (DCMs) for fMRI or electrophysiological data (Friston et al., 2003, Stephan et al., 2007a) that have been inverted for each subject. However, the theoretical framework described in this paper can be applied to any model, for example when comparing different source reconstruction methods for EEG/MEG or selecting among competing computational models of learning and decision-making.
This paper is structured as follows. First, to ensure this paper is self-contained, particularly for readers without an in-depth knowledge of Bayesian statistics, we summarise the concept of log-evidence as a measure of model goodness and review commonly used approximations to it, i.e. the Akaike Information Criterion (AIC; Akaike, 1974), the Bayesian Information Criterion (BIC; Schwarz, 1978), and the negative free-energy (F). These approximations, which are described in Appendix A, differ in how they trade-off model fit against model complexity. Given any of these approximations to the log-evidence, we then consider model comparison at the group level. We address this issue both from a classical and Bayesian perspective. First, in a frequentist setting, we consider classical inference on the log-evidences themselves by treating them as summary statistics that reflect the evidence for each model for a given subject. Subsequently, using a hierarchical model and variational Bayes (VB), we describe a novel technique for inference on the conditional density of the models per se, given data (or log-evidences) from all subjects. This rests on treating the model as a random variable and estimating the parameters of a Dirichlet distribution, which describes the probabilities for all models considered. These probabilities then define a multinomial distribution over model space, allowing one to compute how likely it is that a specific model generated the data of a subject chosen at random.
We compare and contrast these random effects approaches to the conventional use of the group Bayes factor (GBF), an approach for model comparison at the between-subject level that has been used extensively in previous group studies in neuroimaging. For example, the GBF has been used frequently to decide between competing dynamic causal models fitted to fMRI (Acs and Greenlee, 2008, Allen et al., 2008, Grol et al., 2007, Heim et al., 2008, Kumar et al., 2007, Leff et al., 2008, Smith et al., 2006, Stephan et al., 2007b, Stephan et al., 2007c, Summerfield and Koechlin, 2008) and EEG data (Garrido et al., 2007, Garrido et al., 2008). While the GBF is a simple and straightforward index for model comparison at the group level, it assumes that all the subjects' data are generated by the same model (i.e. a fixed effects approach) and can be influenced adversely by violations of this assumption.
The novel Bayesian framework presented in this paper does not suffer from these shortcomings: it can quantify the probability that a particular model generated the data for any randomly selected subject, relative to other models, and it is robust to the presence of outliers. In the analyses below, we illustrate the advantages of this new approach using synthetic and empirical data. We show that computing a conditional density of the model probabilities, given the log-evidences for all subjects, can be superior to both the GBF and frequentist tests applied to the log-evidences. In particular, we found that our Bayesian approach is markedly more robust than either of the other approaches in the presence of outlying subjects.
Section snippets
The model evidence and its approximations
The model evidence p(y|m) is the probability of obtaining observed data y given a particular model m. It can be considered the holy grail of any model inversion and is necessary to compare different models or hypotheses. The evidence for some models can be computed relatively easily (e.g., for linear models); however, in general, computing the model evidence entails integrating out any dependency on the model parameters ϑ:
In many cases, this integration is analytically
Results
In what follows, we compare classical inference, the GBF (fixed effects) and inference on model space (random effects) using both synthetic and real data. These data have been previously published and have been analysed in various ways, including group level model inference using GBFs (Stephan et al., 2007b, Stephan et al., 2007c, 2008).
Discussion
In this paper, we have introduced a novel approach for model selection at the group level. Provisional experience suggests that this approach represents a more powerful way of quantifying one's belief that a particular model is more likely than any other at the group level, relative to the conventional GBF. Critically, this variational Bayesian approach rests on treating the model switches mi as a random variable, within a full hierarchical model for multi-subject data (see Fig. 1), and thus
Acknowledgments
This work was funded by the Wellcome Trust (KES, WDP, RJM, KJF) and the University Research Priority Program “Foundations of Human Social Behaviour” at the University of Zurich (KES). JD is funded by Marie Curie Fellowship. We are very grateful to Marcia Bennett for helping prepare this manuscript, to the FIL Methods Group, particularly Justin Chumbley, for useful discussions and to Jon Roiser and Dominik Bach for helpful comments on practical applications. Finally, we would like to thank the
References (43)
- et al.
Connectivity modulation of early visual processing areas during covert and overt tracking tasks
NeuroImage
(2008) - et al.
Integrated Bayesian models of learning and decision making for saccadic eye movements
Neural Netw.
(2008) - et al.
Using nonlinear models in fMRI data analysis: model selection and activation detection
NeuroImage
(2006) - et al.
Detecting and adjusting for artifacts in fMRI time series data
NeuroImage
(2005) - et al.
Nonlinear responses in fMRI: the Balloon model, Volterra kernels, and other hemodynamics
NeuroImage
(2000) - et al.
Dynamic causal modelling
NeuroImage
(2003) - et al.
Variational free-energy and the Laplace approximation
NeuroImage
(2007) - et al.
Dynamic causal modelling of evoked potentials: a reproducibility study
NeuroImage
(2007) - et al.
The functional anatomy of the MMN: a DCM study of the roving paradigm
NeuroImage
(2008) - et al.
Population-level inferences for distributed MEG source localization under multiple constraints: application to face-evoked fields
NeuroImage
(2007)