Elsevier

NeuroImage

Volume 83, December 2013, Pages 472-484
NeuroImage

Brain morphometry reproducibility in multi-center 3 T MRI studies: A comparison of cross-sectional and longitudinal segmentations

https://doi.org/10.1016/j.neuroimage.2013.05.007Get rights and content

Highlights

  • We implemented a multi-site 3 T MRI protocol for brain morphometry on 8 EU sites.

  • We acquired across-session test-retest data on 40 healthy elderly subjects.

  • We calculated the reproducibility of cortical and volumetric FreeSurfer estimates.

  • Longitudinal segmentation was more reliable than cross-sectional on all sites.

Abstract

Large-scale longitudinal multi-site MRI brain morphometry studies are becoming increasingly crucial to characterize both normal and clinical population groups using fully automated segmentation tools. The test–retest reproducibility of morphometry data acquired across multiple scanning sessions, and for different MR vendors, is an important reliability indicator since it defines the sensitivity of a protocol to detect longitudinal effects in a consortium. There is very limited knowledge about how across-session reliability of morphometry estimates might be affected by different 3 T MRI systems. Moreover, there is a need for optimal acquisition and analysis protocols in order to reduce sample sizes. A recent study has shown that the longitudinal FreeSurfer segmentation offers improved within session test–retest reproducibility relative to the cross-sectional segmentation at one 3 T site using a nonstandard multi-echo MPRAGE sequence. In this study we implement a multi-site 3 T MRI morphometry protocol based on vendor provided T1 structural sequences from different vendors (3D MPRAGE on Siemens and Philips, 3D IR-SPGR on GE) implemented in 8 sites located in 4 European countries. The protocols used mild acceleration factors (1.5–2) when possible. We acquired across-session test–retest structural data of a group of healthy elderly subjects (5 subjects per site) and compared the across-session reproducibility of two full-brain automated segmentation methods based on either longitudinal or cross-sectional FreeSurfer processing. The segmentations include cortical thickness, intracranial, ventricle and subcortical volumes. Reproducibility is evaluated as absolute changes relative to the mean (%), Dice coefficient for volume overlap and intraclass correlation coefficients across two sessions. We found that this acquisition and analysis protocol gives comparable reproducibility results to previous studies that used longer acquisitions without acceleration. We also show that the longitudinal processing is systematically more reliable across sites regardless of MRI system differences. The reproducibility errors of the longitudinal segmentations are on average approximately half of those obtained with the cross sectional analysis for all volume segmentations and for entorhinal cortical thickness. No significant differences in reliability are found between the segmentation methods for the other cortical thickness estimates. The average of two MPRAGE volumes acquired within each test–retest session did not systematically improve the across-session reproducibility of morphometry estimates. Our results extend those from previous studies that showed improved reliability of the longitudinal analysis at single sites and/or with non-standard acquisition methods. The multi-site acquisition and analysis protocol presented here is promising for clinical applications since it allows for smaller sample sizes per MRI site or shorter trials in studies evaluating the role of potential biomarkers to predict disease progression or treatment effects.

Introduction

Methods that enable the characterization of human brain morphometry from MRI data are demonstrating important applications in neuroscience. Several reviews describe how morphometry tools have been applied to investigate a variety of populations, including, but not limited to, normal development (Silk and Wood, 2011), normal aging (Mueller et al., 2007), Alzheimer's disease (Drago et al., 2011, Fjell and Walhovd, 2012, Frisoni et al., 2010, Jack, 2011), Parkinson's disease (Kostić and Filippi, 2011), autism (Chen et al., 2011), bipolar disorders (Selvaraj et al., 2012), epilepsy (Bernasconi et al., 2011) and schizophrenia (Levitt et al., 2010). One particular example of a successful contribution of brain morphometry to the field of neurodegenerative diseases is the fact that hippocampal volume has been recently approved as biomarker to enrich the population selection in clinical trials that study early stages of Alzheimer's disease (EMA/CHMP/SAWP/809208/2011).

There are several methods to obtain brain morphometry estimates from MRI data. Manual segmentation of specific brain structures on MRI made by trained raters, with its high inter-rater reliability, is considered as the gold standard by many neuroimaging studies (Rojas et al., 2004, Whitwell et al., 2005). However, due to its time-costs, manual segmentations are not practically applicable for large studies involving many subjects and different brain structures. Various automated and semi-automated algorithms have been proposed, including atlas-based methods (Alemán-Gómez et al., 2007, Fischl et al., 2002, Lötjönen et al., 2010, Magnotta et al., 2002, Wolz et al., 2010), voxel-based morphometry with statistical parametric mapping (Ashburner and Friston, 2000), tensor-based morphometry (Leow et al., 2005, Studholme et al., 2001) and boundary shift integral methods (Camara et al., 2007, Smith et al., 2002). This list of brain morphometry analysis methods is by no means complete nor does this paper attempt to compare and contrast these methods.

Automated morphometric analysis is of particular interest in longitudinal studies aimed at characterizing disease progression or the effect of therapeutic treatments, both when using known and when searching for new useful biomarkers. In particular, longitudinal multi-center MRI studies are becoming an increasingly common strategy to collect large datasets while distributing the data acquisition load across multiple partners (Van Horn and Toga, 2009), and probably one of the largest examples is the Alzheimer's Neuroimage Initiative, or ADNI (Carrillo et al., 2012). One critical factor that limits the sensitivity to detect changes in any longitudinal study is the reproducibility of repeated measures. The test–retest reliability of MRI-derived morphometric estimates may be affected by a variety of factors (Jovicich et al., 2009), including hydration status of the subject (Walters et al., 2001), instrument related factors such as scanner manufacturer, field strength, head RF coil, magnetic gradients (Jovicich et al., 2006), pulse sequence and image analysis methods (Han et al., 2006). Repeated acquisitions within a single scan session without subject repositioning may be used to characterize the best attainable reproducibility conditions from an acquisition and analysis protocol. However, the reproducibility errors present in a longitudinal study are better described by repeated acquisitions obtained in different sessions several days apart. Such across-session differences will include additional sources of variance like MRI system instabilities, differences in head positioning within the RF coil, differences in automated acquisition procedures like auto shimming, as well as potential effects from how different operators follow instructions to execute the same acquisition protocol. Across-session reproducibility is even more challenging in multicenter neuroimaging clinical studies where comparable results are usually difficult to obtain due to the added variability from site differences in the MRI hardware, acquisition protocols and operators.

Despite the wide usage of automated morphometric techniques applied to 3 T MRI studies, across-site test–retest reliability of morphometry measures has not been thoroughly investigated and thus its impact on statistical analysis is not clearly defined. Table 1 outlines studies that, to the best of our knowledge, have reported across-session test–retest reproducibility measures of morphometric data derived from healthy volunteers using 3 T systems. Most studies were done on a single MRI system (Kruggel et al., 2010, Morey et al., 2010, Wonderlick et al., 2009), except for one study that evaluated major MRI system upgrade effects on reproducibility, therefore considering effectively two different systems (Jovicich et al., 2009). These studies have been performed on only two vendors (Siemens and GE), and three models (Trio, Trio TIM, GE Excite) that nowadays tend to be less common as the manufacturers develop newer versions. In addition, morphometry segmentation tools have also been evolving. Recently, a FreeSurfer longitudinal image processing framework has been developed (Reuter et al., 2012) showing a significant increase in precision and discrimination power when compared with tools originally designed for the FreeSurfer cross-sectional analysis. In that study the test–retest reliability of the longitudinal stream was evaluated at 3 T, but it was done for repeated acquisitions obtained during the same session and also when using a particular sequence, multi-echo 3D MPRAGE (van der Kouwe et al., 2008), that has interesting advantages relative to the standard 3D MPRAGE (Wonderlick et al., 2009) but that is not yet commonly available across all vendors. To date there are no studies evaluating the across-session test–retest reproducibility of this new longitudinal analysis at 3 T, for one or more MRI system vendors, while using an MRI acquisition that is standard across vendors.

All of these issues are relevant to the PharmaCog project, a new industry-academic European project aimed at identifying biomarkers sensitive to symptomatic and disease modifying effects of drugs for Alzheimer's disease (http://www.alzheimer-europe.org/FR/Research/PharmaCog). One of the objectives of the PharmaCog project is to investigate potential biomarkers derived from human brain structural and functional MRI, in particular brain morphometry. Within this context, the goals of the present PharmaCog study were the following: i) implement a multi-site 3 T MRI data acquisition protocol for morphometry analysis, ii) acquire across-session test–retest data from a population of healthy elderly subjects, and iii) evaluate and compare the across-session reproducibility of the cross-sectional and longitudinal FreeSurfer segmentation analyses within and across MRI sites. This work is therefore an extension of previous work (Reuter et al., 2012), evaluating the across-session reproducibility of the segmentation results (cortical thickness, intracranial, ventricular and subcortical volumes) on a variety of 3 T MRI scanning platforms (Table 1). To keep a manageable number of variables in this study we do not manipulate the acquisition sequence other than trying to implement a target common protocol across all sites following in great part ADNI recommendations. The study is focused on the comparison of the test–retest reproducibility of morphometric results derived from two variants of the FreeSurfer segmentation, comparisons with other segmentation methods are beyond the scope of this work.

Section snippets

Subjects

Nine clinical sites participated in this study across Italy (Brescia, Verona, and Genoa), Spain (Barcelona), France (Marseille, Lille, and Toulouse) and Germany (Leipzig and Essen). The Brescia site was responsible for the coordination and analysis of the whole study and did not acquire MRI data. Each MRI site recruited 5 local volunteers within an age range of 50–80 years. The subject's age range corresponds to the same one of the clinical population that will be studied with the protocols

Results

In this study, we estimate the test–retest reliability of morphometry measures derived from structural T1-weighted 3 T MRI data and evaluate how their reproducibility errors are affected by FreeSurfer processing stream (CS, LG) and MRI site (eight 3 T MRI scanners from different vendors: GE, Siemens, Philips) on healthy elderly volunteers scanned in two separate sessions at least one week apart. This short period between the test and retest sessions was chosen to minimize biological changes that

Discussion

The main goal of this study was to investigate the effects on reliability of two variants of the automated FreeSurfer brain segmentation analysis when used in a 3 T MRI consortium. The choices of MRI data acquisition and data analysis protocols can affect reproducibility errors and are therefore crucial in longitudinal studies aimed at evaluating MRI-derived biomarkers for disease progression and/or treatment efficacy. In this brain morphometry study we show for the first time the across-session

Conclusions

This study achieved the following three main goals: i) a structural MRI acquisition protocol for morphometry analysis was implemented across eight 3 T MRI sites (3D MPRAGE, most sites using mildly accelerated acquisitions) covering various vendors (Siemens, Philips, GE) and countries (Italy, Spain, Germany and France); ii) within- and across-session test–retest data were acquired from a group of 40 healthy elderly volunteers (5 different volunteers per MRI site), generating a dataset with a

Acknowledgments

PharmaCog is funded by the EU-FP7 for the Innovative Medicine Initiative (grant no. 115009). All members of the PharmaCog project deserve sincere acknowledgment for their significant efforts, but unfortunately, they are too numerous to mention. The authors would like to thank especially to people who contributed to the early phases of this study, including Luca Venturi, Genoveffa Borsci and Thomas Günther.

Conflict of interest

The authors have no conflict of interests to declare.

References (49)

  • V.S. Kostić et al.

    Neuroanatomical correlates of depression and apathy in Parkinson's disease: magnetic resonance imaging studies

    J. Neurol. Sci.

    (2011)
  • F. Kruggel et al.

    Impact of scanner hardware and imaging protocol on image quality and compartment volume precision in the ADNI cohort

    NeuroImage

    (2010)
  • A. Leow et al.

    Brain structural mapping using a novel hybrid implicit/explicit framework based on the level-set method

    NeuroImage

    (2005)
  • J.M. Lötjönen et al.

    Fast and robust multi-atlas segmentation of brain magnetic resonance images

    NeuroImage

    (2010)
  • V.A. Magnotta et al.

    Structural MR image processing using the BRAINS2 toolbox

    Comput. Med. Imaging Graph.

    (2002)
  • M.I. Miller

    Computational anatomy: shape, growth, and atrophy comparison via diffeomorphisms

    NeuroImage

    (2004)
  • S.G. Mueller et al.

    Measurement of hippocampal subfields and age-related changes with high resolution MRI at 4 T

    Neurobiol. Aging

    (2007)
  • M. Reuter et al.

    Within-subject template estimation for unbiased longitudinal image analysis

    NeuroImage

    (2012)
  • S.M. Smith et al.

    Accurate, robust, and automated longitudinal and cross-sectional brain change analysis

    NeuroImage

    (2002)
  • A.J. van der Kouwe et al.

    Brain morphometry with multiecho MPRAGE

    NeuroImage

    (2008)
  • R. Wolz et al.

    LEAP: learning embeddings for atlas propagation

    NeuroImage

    (2010)
  • J.S. Wonderlick et al.

    Reliability of MRI-derived cortical and subcortical morphometric measures: effects of pulse sequence, voxel geometry, and parallel imaging

    NeuroImage

    (2009)
  • Y. Alemán-Gómez et al.

    IBASPM: toolbox for automatic parcellation of brain structures

  • A. Bernasconi et al.

    Advances in MRI for ‘cryptogenic’ epilepsies

    Nat. Rev. Neurol.

    (2011)
  • Cited by (141)

    • A multi-scanner neuroimaging data harmonization using RAVEL and ComBat

      2021, NeuroImage
      Citation Excerpt :

      Scanner effects refer to both within- and between-scanner variability and harmonization refers to removal of such variability. In neuroimaging, it has been shown that scanner effects can affect downstream analyses of derived measures of regional healthy tissue or brain lesion volumes (Jovicich et al., 2013; Schnack et al., 2010; Schwartz et al., 2019). These effects can be very large and exceed the biological variations of interest.

    View all citing articles on Scopus
    1

    Authors contributed equally to this work.

    View full text