Elsevier

NeuroImage

Volume 59, Issue 2, 16 January 2012, Pages 895-907
NeuroImage

Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease

https://doi.org/10.1016/j.neuroimage.2011.09.069Get rights and content

Abstract

Many machine learning and pattern classification methods have been applied to the diagnosis of Alzheimer's disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). Recently, rather than predicting categorical variables as in classification, several pattern regression methods have also been used to estimate continuous clinical variables from brain images. However, most existing regression methods focus on estimating multiple clinical variables separately and thus cannot utilize the intrinsic useful correlation information among different clinical variables. On the other hand, in those regression methods, only a single modality of data (usually only the structural MRI) is often used, without considering the complementary information that can be provided by different modalities. In this paper, we propose a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi-modal data. Here, the variables include not only the clinical variables used for regression but also the categorical variable used for classification, with different tasks corresponding to prediction of different variables. Specifically, our method contains two key components, i.e., (1) a multi-task feature selection which selects the common subset of relevant features for multiple variables from each modality, and (2) a multi-modal support vector machine which fuses the above-selected features from all modalities to predict multiple (regression and classification) variables. To validate our method, we perform two sets of experiments on ADNI baseline MRI, FDG-PET, and cerebrospinal fluid (CSF) data from 45 AD patients, 91 MCI patients, and 50 healthy controls (HC). In the first set of experiments, we estimate two clinical variables such as Mini Mental State Examination (MMSE) and Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), as well as one categorical variable (with value of ‘AD’, ‘MCI’ or ‘HC’), from the baseline MRI, FDG-PET, and CSF data. In the second set of experiments, we predict the 2-year changes of MMSE and ADAS-Cog scores and also the conversion of MCI to AD from the baseline MRI, FDG-PET, and CSF data. The results on both sets of experiments demonstrate that our proposed M3T learning scheme can achieve better performance on both regression and classification tasks than the conventional learning methods.

Introduction

Alzheimer's disease (AD) is the most common form of dementia diagnosed in people over 65 years of age. It is reported that there are 26.6 million AD sufferers worldwide, and 1 in 85 people will be affected by 2050 (Ron et al., 2007). Thus, accurate diagnosis of AD and especially its early stage, i.e., mild cognitive impairment (MCI), is very important for timely therapy and possible delay of the disease. Over the past decade, many machine learning and pattern classification methods have been used for early diagnosis of AD and MCI based on different modalities of biomarkers, e.g., the structural brain atrophy measured by magnetic resonance imaging (MRI) (de Leon et al., 2007, Du et al., 2007, Fjell et al., 2010, McEvoy et al., 2009), metabolic alterations in the brain measured by fluorodeoxyglucose positron emission tomography (FDG-PET) (De Santi et al., 2001, Morris et al., 2001), and pathological amyloid depositions measured through cerebrospinal fluid (CSF) (Bouwman et al., 2007b, Fjell et al., 2010, Mattsson et al., 2009, Shaw et al., 2009), etc. In all these methods, classification models are learned from training subjects to predict categorical classification variable (i.e., class label) on test subjects.

Recently, rather than predicting categorical variables as in classification, several studies begin to estimate continuous clinical variables from brain images (Duchesne et al., 2005, Duchesne et al., 2009, Fan et al., 2010, Stonnington et al., 2010, Wang et al., 2010). This kind of research is important because it can help evaluate the stage of AD pathology and predict future progression. Different from classification that classifies a subject into binary or multiple categories, regression needs to estimate continuous values and are thus more challenging. In the literature, a number of regression methods have been used for estimating clinical variables based on neuroimaging data. For example, linear regression models were used to estimate the 1-year Mini Mental State Examination (MMSE) changes from structural MR brain images (Duchesne et al., 2005, Duchesne et al., 2009). High-dimensional kernel-based regression method, i.e., Relevance Vector Machine (RVM), were also used to estimate clinical variables, including MMSE and Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), from structural MR brain images (Fan et al., 2010, Stonnington et al., 2010, Wang et al., 2010). Besides clinical variables, regression methods have also been used for estimating age of individual subject from MR brain images (Ashburner, 2007, Franke et al., 2010).

In the practical diagnosis of AD, multiple clinical variables are generally acquired, e.g., MMSE and ADAS-Cog, etc. Specifically, MMSE is used to examine the orientation to time and place, the immediate and delayed recall of three words, the attention and calculations, language, and visuoconstructional functions (Folstein et al., 1975), while ADAS-Cog is a global measure encompassing the core symptoms of AD (Rosen et al., 1984). It is known that there exist inherent correlations among multiple clinical variables of a subject, since the underlying pathology is the same (Fan et al., 2010, Stonnington et al., 2010). However, most existing regression methods model different clinical variables separately, without considering their inherent correlations that may be useful for robust and accurate estimation of clinical variables from brain images. Moreover, to our knowledge, none of the existing regression methods used for estimating clinical variables ever exploit the class labels which are often available in the training subjects and are helpful to aid the accurate estimation of regression variables, and vice versa.

On the other hand, although multi-modal data are often acquired for AD diagnosis, e.g., MRI, PET, and CSF biomarkers, nearly all existing regression methods developed for estimation of clinical variables were based only on one imaging modality, i.e., mostly on the structural MRI. Recent studies have indicated that the biomarkers from different modalities provide complementary information, which is very useful for AD diagnosis (Fjell et al., 2010, Landau et al., 2010, Walhovd et al., 2010b). More recently, a number of research works have used multi-modal data for AD or MCI classification and obtained the improved performance compared with the methods based only on single-modal data (Fan et al., 2008, Hinrichs et al., 2011, Vemuri et al., 2009, Walhovd et al., 2010a, Zhang et al., 2011). However, to the best of our knowledge, the same type of study in imaging-based regression, i.e., estimation of clinical variables from multi-modal data, was not investigated previously.

Inspired by the above problems, in this paper, we propose a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi-modal data. Here, the variables include not only the continuous clinical variables for regression (MMSE, ADAS-Cog) but also the categorical variable for classification (i.e., class label). We treat the estimation of different regression or classification variables as different tasks, and use a multi-task learning method (Argyriou et al., 2008, Obozinski et al., 2006) developed in the machine learning community for joint regression and classification learning. Specifically, at first, we assume that the related tasks share a common relevant feature subset but with a varying amount of influence on each task, and thus adopt a multi-task feature selection method to obtain a common feature subset for different tasks simultaneously. Then, we use a multi-modal support vector machine (SVM) method to fuse the above-selected features from each modality to estimate multiple regression and classification variables.

We validate our method on two sets of experiments. In the first set of experiments, we estimate two regression variables (MMSE and ADAS-Cog) and one classification variable (with value of ‘AD’, ‘MCI’ or ‘HC’) from the baseline MRI, PET, and CSF data. In the second of experiment, we predict the 2-year changes of MMSE and ADAS-Cog scores and also the conversion of MCI to AD from the baseline MRI, PET, and CSF data. We hypothesize that the joint estimation or prediction of multiple regression and classification variables would perform better than estimating or predicting each individual variable separately, and that the use of multi-modal data (MRI, PET and CSF) would perform better on joint regression and classification than the use of only single-modal data.

Section snippets

Method

The data used in the preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies, and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test

Subjects

The ADNI general eligibility criteria are described at www.adni-info.org. Briefly, subjects are between 55 and 90 years of age, having a study partner able to provide an independent evaluation of functioning. Specific psychoactive medications will be excluded. General inclusion/exclusion criteria are as follows: 1) healthy subjects: MMSE scores between 24 and 30, a Clinical Dementia Rating (CDR) of 0, non-depressed, non MCI, and nondemented; 2) MCI subjects: MMSE scores between 24 and 30, a

MRI, PET, and CSF

A detailed description on acquiring MRI, PET and CSF data from ADNI as used in this paper can be found at (Zhang et al., 2011). Briefly, structural MR scans were acquired from 1.5T scanners. Raw Digital Imaging and Communications in Medicine (DICOM) MRI scans were downloaded from the public ADNI site (www.loni.ucla.edu/ADNI), reviewed for quality, and automatically corrected for spatial distortion caused by gradient nonlinearity and B1 field inhomogeneity. PET images were acquired 30–60 minutes

Image analysis

Image pre-processing is performed for all MR and PET images following the same procedures as in (Zhang et al., 2011). First, we do anterior commissure (AC)–posterior commissure (PC) correction on all images, and use the N3 algorithm (Sled et al., 1998) to correct the intensity inhomogeneity. Next, we do skull-stripping on structural MR images using both brain surface extractor (BSE) (Shattuck et al., 2001) and brain extraction tool (BET) (Smith, 2002), followed by manual edition and intensity

Multi-modal multi-task (M3T) learning

A new learning method, namely multi-modal multi-task (M3T) learning, is presented here to simultaneously learn multiple tasks from multi-modal data. Fig. 1 illustrates the new learning problem with comparison to the existing standard Single-Modal Single-Task (SMST) learning, Multi-Task learning, and Multi-Modal learning. As can be seen from Fig. 1, in SMST and Multi-Task learning (Figs. 1(a–b)), each subject has only one modality of data represented as xi, while, in M3T and Multi-Modal learning

Validation

To evaluate the performance of different methods, we perform two sets of experiments on 186 ADNI baseline MRI, PET, and CSF data, respectively, from 45 AD, 91 MCI (including 43 MCI-C and 48 MCI-NC), and 50 HC. In the first set of experiments (Experiment 1), we estimate two clinical variables (including MMSE and ADAS-Cog) and one categorical variable (with class label of ‘AD’, ‘MCI’ or ‘HC’) from the baseline brain data of all 186 subjects. It is worth noting that only the baseline data of MRI,

Experiment 1: estimating clinical stages (MMSE, ADAS-Cog, and class label)

We first estimate the clinical stages, including two regression variables (MMSE and ADAS-Cog) and one classification variable (i.e., class label with a value of ‘AD’, ‘MCI’ or ‘HC’), from the baseline MRI, PET, and CSF data. It is worth noting that the original multi-class classification problem is formulated as two binary classification problems, i.e., AD vs. HC and MCI vs. HC, as mentioned above. Table 2 shows the performances of the proposed M3T method, compared with three methods each using

Discussion

In this paper, we have proposed a new multi-modal multi-task (M3T) learning method with two successive steps, i.e., multi-task feature selection and multi-modal support vector machine, to jointly predict multiple regression and classification variables from multi-modal data. Our proposed method has been validated on 186 baseline subjects from ADNI through two different sets of experiments. In the first set of experiment, we tested its performance in jointly estimating the MMSE and ADAS-Cog

Conclusion

In summary, our experimental results have showed that our proposed multi-modal multi-task (M3T) method can effectively perform multiple-tasks learning from multi-modal data. Specifically, it can effectively estimate the MMSE and ADAS-Cog scores and the classification label in both AD vs. HC and MCI vs. HC classifications, and can also predict the 2-year MMSE and ADAS-Cog changes and the classification label in MCI-C vs. MCI-NC classification. To the best of our knowledge, it made the first

Acknowledgment

This work was supported in part by NIH grants EB006733, EB008374, EB009634 and MH088520, and also by National Science Foundation of China under grant No. 60875030.

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following:

References (55)

  • K. Franke et al.

    Estimating the age of healthy subjects from T1-weighted MRI scans using kernel methods: exploring the influence of various parameters

    NeuroImage

    (2010)
  • C. Hinrichs et al.

    Predictive markers for AD in a multi-modality framework: an analysis of MCI progression in the ADNI population

    NeuroImage

    (2011)
  • N. Kabani et al.

    A 3D atlas of the human brain

    NeuroImage

    (1998)
  • C. Misra et al.

    Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: results from ADNI

    NeuroImage

    (2009)
  • D.W. Shattuck et al.

    Magnetic resonance image tissue classification using a partial volume model

    NeuroImage

    (2001)
  • C.M. Stonnington et al.

    Predicting clinical scores from magnetic resonance scans in Alzheimer's disease

    NeuroImage

    (2010)
  • K.B. Walhovd et al.

    Multi-modal imaging predicts memory performance in normal aging and cognitive decline

    Neurobiol. Aging

    (2010)
  • Y. Wang et al.

    High-dimensional pattern regression using machine learning: from medical images to continuous clinical variables

    NeuroImage

    (2010)
  • D. Zhang et al.

    Multimodal classification of Alzheimer's disease and mild cognitive impairment

    NeuroImage

    (2011)
  • A. Argyriou et al.

    Convex multi-task feature learning

    Mach. Learn.

    (2008)
  • J. Bi et al.

    An improved multi-task learning approach with applications in medical diagnosis

  • F.H. Bouwman et al.

    Longitudinal changes of CSF biomarkers in memory clinic patients

    Neurology

    (2007)
  • C.C. Chang et al.

    LIBSVM: a Library for Support Vector Machines

    (2001)
  • G. Chetelat et al.

    Mapping gray matter loss with voxel-based morphometry in mild cognitive impairment

    NeuroReport

    (2002)
  • G. Chetelat et al.

    FDG-PET measurement is more accurate than neuropsychological assessments to predict global cognitive deterioration in patients with mild cognitive impairment

    Neurocase

    (2005)
  • C. Davatzikos et al.

    Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification

    Neurobiol Aging

    (2010)
  • M.J. de Leon et al.

    Longitudinal CSF isoprostane and MRI atrophy in the progression to AD

    J. Neurol.

    (2007)
  • Cited by (595)

    View all citing articles on Scopus
    1

    Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: www.loni.ucla.edu/ADNI/Collaboration/ADNI_Authorship_list.pdf.

    View full text