Bayesian model selection for group studies — Revisited
Introduction
Any statistical measure of empirical evidence rests on some form of model comparison. In a classical setting, one typically compares the null with an alternative hypothesis, where the former is a model of how chance could have generated the data. Theoretical results specify the sense in which model comparison can be considered optimal. For example, the Neyman–Pearson lemma essentially states that statistical tests based on the likelihood ratio (such as a simple t-test) are the most powerful, i.e., they have the best chance of detecting an effect (see e.g., Casella and Berger, 2001). From this perspective, Bayesian model comparison can be seen as a simple extension to likelihood tests, in that it allows for the comparison of more than two models. In fact, likelihood ratios are used in a Bayesian setting, under the name of Bayes factors (Kass and Raftery, 1995). These are just the ratio of experimental evidence in favour of one model relative to another. Having said this, established classical and Bayesian techniques may give different answers to the same question — a difference that has entertained generations of statisticians (see e.g., Fienberg, 2006).
In this paper, we consider the problem of performing random effects Bayesian model selection (BMS) at the group level. This was originally addressed in Stephan et al. (2009), where models were treated as random effects that could differ between subjects and have a fixed (unknown) distribution in the population. The implicit hierarchical model is then inverted using variational or sampling techniques (see Penny et al., 2010), to provide conditional estimates of the frequency with which any model prevails in the population. This random effects BMS procedure complements fixed effects procedures that assume that subjects are sampled from a homogenous population with one (unknown) model (cf. the log group Bayes factor that sums log-evidences over subjects; Stephan et al., 2007). Stephan et al. (2009) also introduced the notion of exceedance probability, which measures how likely it is that any given model is more frequent than all other models in the comparison set. These two summary statistics typically constitute the results of random effects BMS (see, for example, den Ouden et al., 2010).
While the random effects BMS procedure suggested in Stephan et al. (2009) and Penny et al. (2010) has proven useful in practice — and has been employed by more than hundred published studies to date, some conceptual issues are still outstanding. In this paper, we extend the approach described in Stephan et al. (2009) in three ways: (i) we provide a complete picture of the statistical risk incurred when performing group BMS, (ii) we examine the formal difference between random effects BMS and classical random effects analyses of parameter estimates, when asking whether a particular parameter is zero or not, and (iii) we address the problem of between-group and between-condition comparisons.
Section 2 revisits random effects BMS, providing a definition of the null at the group level. This allows us to quantify the statistical risk incurred by performing random effects BMS, i.e. how likely it is that differences in model evidences are due to chance. En passant, we clarify the interpretation of exceedance probabilities and provide guidance with regard to summary statistics that should be reported when using random effects BMS.
Section 3 addresses the difference between random effects BMS and classical random effects analyses of parameter estimates. In principle, group effects can be assessed using a classical random effects analysis of the parameter estimates across subjects (e.g., using t-tests), or using random effects BMS (reduced versus full model). However, these approaches do not answer the same question (and therefore may not give the same answer). Here, we explain the nature of this difference and identify the situations that would yield identical or different conclusions.
Section 4 introduces a simple extension to the original framework proposed in Stephan et al. (2009). In brief, we propose a test of whether two (or more) groups of subjects come from the same population. We also address the related issue of between condition comparisons. The key idea behind these procedures is a generalization of the intuition that underlies classical paired t-tests; i.e. one has to quantify the evidence for a difference — as opposed to the difference of evidences.
For all three issues, we use Monte-Carlo simulations to assess the performance of random effects BMS in the context of key applications, e.g. Dynamic Causal Modeling (see Daunizeau et al., 2011a for a recent review).
Section snippets
On the statistical risk of group BMS
In this section, we first revisit the approach to random effects BMS proposed in Stephan et al. (2009), recasting it as an extension of Polya's urn model. This serves to identify the nature of the risk associated with model selection. In brief, we focus on the risk of stating that a given model is a better explanation for the data than other models, given that chance could have favoured this particular model. In turn, we propose a simple Bayesian “omnibus test”, to exclude chance as a likely
Random effects BMS and classical random effects analysis of parameter estimates
In this section, we focus on a specific question, namely “whether a model parameter is zero or not” at the group level. In a classical setting, this is typically addressed using a two-sided t-test on the parameter of interest. Effectively, this relies on the parameter estimate — from each subject — as a summary statistic to perform a random effects analysis; testing whether the group mean is significantly different from zero. However, one could also perform a group BMS with two models (with and
Between-group and between-condition BMS
In this section, we address the relationship between different treatment conditions and groups; for example, dealing with one group of subjects measured under two conditions,4 or two groups of subjects. Until now,
Discussion
In this work, we introduced three extensions of our original approach to random effects BMS (Stephan et al., 2009). First, we have described a protected exceedance probability that any model is more frequent than the others (above and beyond chance). Second, we have presented systematic simulations of various approaches to address questions about specific treatment effects on model parameters using group studies. Third, we considered approaches to between-condition and between-group BMS
Acknowledgments
This work was supported by the European Research Council (JD), by the Ville de Paris (LR), and by the IHU-A-ICM (JD, LR). KES acknowledges support by the René and Susanne Braginsky Foundation and KJF acknowledges support from the Wellcome Trust.
Conflict of interest
The authors declare that there are no conflicts of interest.
References (33)
- et al.
Forward and backward connections in the brain: a DCM study of functional asymmetries
Neuroimage
(2009) - et al.
Dynamic causal modeling: a critical review of the biophysical and statistical foundations
Neuroimage
(2011) - et al.
Stochastic dynamic causal modelling of fMRI data: should we care about neural noise?
Neuroimage
(2012) - et al.
Model-based influences on humans' choices and striatal prediction errors
Neuron
(2011) - et al.
Cerebral pathways in processing of affective prosody: a dynamic causal modeling study
Neuroimage
(2006) - et al.
Dynamic causal modelling
Neuroimage
(2003) - et al.
Mixed-effects and fMRI studies
Neuroimage
(2005) - et al.
Variational free energy and the Laplace approximation
Neuroimage
(2007) - et al.
Generalisability, random effects and population inference
Neuroimage
(1998) Comparing dynamic causal models using AIC, BIC and free energy
Neuroimage
(2012)
Comparing hemodynamic models with DCM
Neuroimage
Bayesian model selection for group studies
Neuroimage
Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables
Information measures and model selection
Bull. Int. Stat. Inst.
Preserved feedforward but impaired top–down processes in the vegetative state
Science
Statistical Inference
Cited by (404)
Optimizing competence in the service of collaboration
2024, Cognitive PsychologyA cognitive-computational account of mood swings in adolescence
2024, Trends in Cognitive SciencesDynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making
2024, Journal of Mathematical PsychologyAutonomous behaviour and the limits of human volition
2024, CognitionRisky decisions are influenced by individual attributes as a function of risk preference
2023, Cognitive Psychology