Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease

doi:10.1016/j.neuroimage.2011.09.069

NeuroImage

Volume 59, Issue 2, 16 January 2012, Pages 895-907

https://doi.org/10.1016/j.neuroimage.2011.09.069 Get rights and content

Abstract

Many machine learning and pattern classification methods have been applied to the diagnosis of Alzheimer's disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). Recently, rather than predicting categorical variables as in classification, several pattern regression methods have also been used to estimate continuous clinical variables from brain images. However, most existing regression methods focus on estimating multiple clinical variables separately and thus cannot utilize the intrinsic useful correlation information among different clinical variables. On the other hand, in those regression methods, only a single modality of data (usually only the structural MRI) is often used, without considering the complementary information that can be provided by different modalities. In this paper, we propose a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi-modal data. Here, the variables include not only the clinical variables used for regression but also the categorical variable used for classification, with different tasks corresponding to prediction of different variables. Specifically, our method contains two key components, i.e., (1) a multi-task feature selection which selects the common subset of relevant features for multiple variables from each modality, and (2) a multi-modal support vector machine which fuses the above-selected features from all modalities to predict multiple (regression and classification) variables. To validate our method, we perform two sets of experiments on ADNI baseline MRI, FDG-PET, and cerebrospinal fluid (CSF) data from 45 AD patients, 91 MCI patients, and 50 healthy controls (HC). In the first set of experiments, we estimate two clinical variables such as Mini Mental State Examination (MMSE) and Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), as well as one categorical variable (with value of ‘AD’, ‘MCI’ or ‘HC’), from the baseline MRI, FDG-PET, and CSF data. In the second set of experiments, we predict the 2-year changes of MMSE and ADAS-Cog scores and also the conversion of MCI to AD from the baseline MRI, FDG-PET, and CSF data. The results on both sets of experiments demonstrate that our proposed M3T learning scheme can achieve better performance on both regression and classification tasks than the conventional learning methods.

Introduction

Alzheimer's disease (AD) is the most common form of dementia diagnosed in people over 65 years of age. It is reported that there are 26.6 million AD sufferers worldwide, and 1 in 85 people will be affected by 2050 (Ron et al., 2007). Thus, accurate diagnosis of AD and especially its early stage, i.e., mild cognitive impairment (MCI), is very important for timely therapy and possible delay of the disease. Over the past decade, many machine learning and pattern classification methods have been used for early diagnosis of AD and MCI based on different modalities of biomarkers, e.g., the structural brain atrophy measured by magnetic resonance imaging (MRI) (de Leon et al., 2007, Du et al., 2007, Fjell et al., 2010, McEvoy et al., 2009), metabolic alterations in the brain measured by fluorodeoxyglucose positron emission tomography (FDG-PET) (De Santi et al., 2001, Morris et al., 2001), and pathological amyloid depositions measured through cerebrospinal fluid (CSF) (Bouwman et al., 2007b, Fjell et al., 2010, Mattsson et al., 2009, Shaw et al., 2009), etc. In all these methods, classification models are learned from training subjects to predict categorical classification variable (i.e., class label) on test subjects.

Recently, rather than predicting categorical variables as in classification, several studies begin to estimate continuous clinical variables from brain images (Duchesne et al., 2005, Duchesne et al., 2009, Fan et al., 2010, Stonnington et al., 2010, Wang et al., 2010). This kind of research is important because it can help evaluate the stage of AD pathology and predict future progression. Different from classification that classifies a subject into binary or multiple categories, regression needs to estimate continuous values and are thus more challenging. In the literature, a number of regression methods have been used for estimating clinical variables based on neuroimaging data. For example, linear regression models were used to estimate the 1-year Mini Mental State Examination (MMSE) changes from structural MR brain images (Duchesne et al., 2005, Duchesne et al., 2009). High-dimensional kernel-based regression method, i.e., Relevance Vector Machine (RVM), were also used to estimate clinical variables, including MMSE and Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), from structural MR brain images (Fan et al., 2010, Stonnington et al., 2010, Wang et al., 2010). Besides clinical variables, regression methods have also been used for estimating age of individual subject from MR brain images (Ashburner, 2007, Franke et al., 2010).

In the practical diagnosis of AD, multiple clinical variables are generally acquired, e.g., MMSE and ADAS-Cog, etc. Specifically, MMSE is used to examine the orientation to time and place, the immediate and delayed recall of three words, the attention and calculations, language, and visuoconstructional functions (Folstein et al., 1975), while ADAS-Cog is a global measure encompassing the core symptoms of AD (Rosen et al., 1984). It is known that there exist inherent correlations among multiple clinical variables of a subject, since the underlying pathology is the same (Fan et al., 2010, Stonnington et al., 2010). However, most existing regression methods model different clinical variables separately, without considering their inherent correlations that may be useful for robust and accurate estimation of clinical variables from brain images. Moreover, to our knowledge, none of the existing regression methods used for estimating clinical variables ever exploit the class labels which are often available in the training subjects and are helpful to aid the accurate estimation of regression variables, and vice versa.

On the other hand, although multi-modal data are often acquired for AD diagnosis, e.g., MRI, PET, and CSF biomarkers, nearly all existing regression methods developed for estimation of clinical variables were based only on one imaging modality, i.e., mostly on the structural MRI. Recent studies have indicated that the biomarkers from different modalities provide complementary information, which is very useful for AD diagnosis (Fjell et al., 2010, Landau et al., 2010, Walhovd et al., 2010b). More recently, a number of research works have used multi-modal data for AD or MCI classification and obtained the improved performance compared with the methods based only on single-modal data (Fan et al., 2008, Hinrichs et al., 2011, Vemuri et al., 2009, Walhovd et al., 2010a, Zhang et al., 2011). However, to the best of our knowledge, the same type of study in imaging-based regression, i.e., estimation of clinical variables from multi-modal data, was not investigated previously.

Inspired by the above problems, in this paper, we propose a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi-modal data. Here, the variables include not only the continuous clinical variables for regression (MMSE, ADAS-Cog) but also the categorical variable for classification (i.e., class label). We treat the estimation of different regression or classification variables as different tasks, and use a multi-task learning method (Argyriou et al., 2008, Obozinski et al., 2006) developed in the machine learning community for joint regression and classification learning. Specifically, at first, we assume that the related tasks share a common relevant feature subset but with a varying amount of influence on each task, and thus adopt a multi-task feature selection method to obtain a common feature subset for different tasks simultaneously. Then, we use a multi-modal support vector machine (SVM) method to fuse the above-selected features from each modality to estimate multiple regression and classification variables.

We validate our method on two sets of experiments. In the first set of experiments, we estimate two regression variables (MMSE and ADAS-Cog) and one classification variable (with value of ‘AD’, ‘MCI’ or ‘HC’) from the baseline MRI, PET, and CSF data. In the second of experiment, we predict the 2-year changes of MMSE and ADAS-Cog scores and also the conversion of MCI to AD from the baseline MRI, PET, and CSF data. We hypothesize that the joint estimation or prediction of multiple regression and classification variables would perform better than estimating or predicting each individual variable separately, and that the use of multi-modal data (MRI, PET and CSF) would perform better on joint regression and classification than the use of only single-modal data.

Section snippets

Method

The data used in the preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies, and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test

Subjects

The ADNI general eligibility criteria are described at www.adni-info.org. Briefly, subjects are between 55 and 90 years of age, having a study partner able to provide an independent evaluation of functioning. Specific psychoactive medications will be excluded. General inclusion/exclusion criteria are as follows: 1) healthy subjects: MMSE scores between 24 and 30, a Clinical Dementia Rating (CDR) of 0, non-depressed, non MCI, and nondemented; 2) MCI subjects: MMSE scores between 24 and 30, a

MRI, PET, and CSF

A detailed description on acquiring MRI, PET and CSF data from ADNI as used in this paper can be found at (Zhang et al., 2011). Briefly, structural MR scans were acquired from 1.5T scanners. Raw Digital Imaging and Communications in Medicine (DICOM) MRI scans were downloaded from the public ADNI site (www.loni.ucla.edu/ADNI), reviewed for quality, and automatically corrected for spatial distortion caused by gradient nonlinearity and B₁ field inhomogeneity. PET images were acquired 30–60 minutes

Image analysis

Image pre-processing is performed for all MR and PET images following the same procedures as in (Zhang et al., 2011). First, we do anterior commissure (AC)–posterior commissure (PC) correction on all images, and use the N3 algorithm (Sled et al., 1998) to correct the intensity inhomogeneity. Next, we do skull-stripping on structural MR images using both brain surface extractor (BSE) (Shattuck et al., 2001) and brain extraction tool (BET) (Smith, 2002), followed by manual edition and intensity

Multi-modal multi-task (M3T) learning

A new learning method, namely multi-modal multi-task (M3T) learning, is presented here to simultaneously learn multiple tasks from multi-modal data. Fig. 1 illustrates the new learning problem with comparison to the existing standard Single-Modal Single-Task (SMST) learning, Multi-Task learning, and Multi-Modal learning. As can be seen from Fig. 1, in SMST and Multi-Task learning (Figs. 1(a–b)), each subject has only one modality of data represented as x_i, while, in M3T and Multi-Modal learning

Validation

To evaluate the performance of different methods, we perform two sets of experiments on 186 ADNI baseline MRI, PET, and CSF data, respectively, from 45 AD, 91 MCI (including 43 MCI-C and 48 MCI-NC), and 50 HC. In the first set of experiments (Experiment 1), we estimate two clinical variables (including MMSE and ADAS-Cog) and one categorical variable (with class label of ‘AD’, ‘MCI’ or ‘HC’) from the baseline brain data of all 186 subjects. It is worth noting that only the baseline data of MRI,

Experiment 1: estimating clinical stages (MMSE, ADAS-Cog, and class label)

We first estimate the clinical stages, including two regression variables (MMSE and ADAS-Cog) and one classification variable (i.e., class label with a value of ‘AD’, ‘MCI’ or ‘HC’), from the baseline MRI, PET, and CSF data. It is worth noting that the original multi-class classification problem is formulated as two binary classification problems, i.e., AD vs. HC and MCI vs. HC, as mentioned above. Table 2 shows the performances of the proposed M3T method, compared with three methods each using

Discussion

In this paper, we have proposed a new multi-modal multi-task (M3T) learning method with two successive steps, i.e., multi-task feature selection and multi-modal support vector machine, to jointly predict multiple regression and classification variables from multi-modal data. Our proposed method has been validated on 186 baseline subjects from ADNI through two different sets of experiments. In the first set of experiment, we tested its performance in jointly estimating the MMSE and ADAS-Cog

Conclusion

In summary, our experimental results have showed that our proposed multi-modal multi-task (M3T) method can effectively perform multiple-tasks learning from multi-modal data. Specifically, it can effectively estimate the MMSE and ADAS-Cog scores and the classification label in both AD vs. HC and MCI vs. HC classifications, and can also predict the 2-year MMSE and ADAS-Cog changes and the classification label in MCI-C vs. MCI-NC classification. To the best of our knowledge, it made the first

Acknowledgment

This work was supported in part by NIH grants EB006733, EB008374, EB009634 and MH088520, and also by National Science Foundation of China under grant No. 60875030.

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following:

References (55)

L.G. Apostolova et al.
3D PIB and CSF biomarker associations with hippocampal atrophy in ADNI subjects
Neurobiol Aging
(2010)
J. Ashburner
A fast diffeomorphic image registration algorithm
NeuroImage
(2007)
F.H. Bouwman et al.
CSF biomarkers and medial temporal lobe atrophy predict dementia in mild cognitive impairment
Neurobiol. Aging
(2007)
A. Convit et al.
Atrophy of the medial occipitotemporal, inferior, and middle temporal gyri in non-demented elderly predict decline to Alzheimer's disease
Neurobiol. Aging
(2000)
S. De Santi et al.
Hippocampal formation glucose metabolism and volume losses in MCI and AD
Neurobiol. Aging
(2001)
S. Duchesne et al.
Relating one-year cognitive change in mild cognitive impairment to baseline MRI features
NeuroImage
(2009)
Y. Fan et al.
Structural and functional biomarkers of prodromal Alzheimer's disease: a high-dimensional pattern classification study
NeuroImage
(2008)
A. Fellgiebel et al.
FDG-PET and CSF phospho-tau for prediction of cognitive decline in mild cognitive impairment
Psychiatry Res.
(2007)
M.F. Folstein et al.
“Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician
J. Psychiatr. Res.
(1975)
N. Fox et al.
Imaging cerebral atrophy: normal ageing to Alzheimer's disease
Lancet
(2004)

J. Bi et al.

An improved multi-task learning approach with applications in medical diagnosis

F.H. Bouwman et al.

Longitudinal changes of CSF biomarkers in memory clinic patients

Neurology

(2007)

C.C. Chang et al.

LIBSVM: a Library for Support Vector Machines

(2001)

G. Chetelat et al.

Mapping gray matter loss with voxel-based morphometry in mild cognitive impairment

NeuroReport

(2002)

G. Chetelat et al.

FDG-PET measurement is more accurate than neuropsychological assessments to predict global cognitive deterioration in patients with mild cognitive impairment

Neurocase

(2005)

C. Davatzikos et al.

Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification

Neurobiol Aging

(2010)

M.J. de Leon et al.

Longitudinal CSF isoprostane and MRI atrophy in the progression to AD

J. Neurol.

(2007)

Cited by (595)

Multi-level graph regularized robust multi-modal feature selection for Alzheimer's disease classification
2024, Knowledge-Based Systems
Multi-modal classification has demonstrated its superiority over conventional single-modal based methods on Alzheimer’s disease (AD) diagnosis, and multi-modal feature selection has attracted increasing attention. However, most previous approaches use a fixed affinity matrix to describe the local neighborhood relations among samples, and only consider the intra-modal similarity while ignoring the inter-modal similarity. Besides, they generally treat all samples equally and neglect the negative influence of noise and outliers. For solving these problems, this paper proposes a new multi-level Graph regularized Robust Multi-modal Feature Selection method called GRMFS that simultaneously performs noise-robust feature selection and adaptive multi-level similarity preservation. On the one hand, GRMFS introduces an $ɛ$ -capped $ℓ_{2}$ -norm loss into regression framework to improve the robustness against outliers, which adaptively assigns a weight to each sample. On the other hand, to explore the intrinsic multi-modal local structures, GRMFS simultaneously learns intra-modal and inter-modal local similarities, and preserves them in subspace to guide feature selection. Experiments on real AD database illustrate the advantages of our proposed in identifying disease status compared with other approaches.
A 3D multi-scale CycleGAN framework for generating synthetic PETs from MRIs for Alzheimer's disease diagnosis
2024, Image and Vision Computing
This paper proposes a novel framework for generating synthesized PET images from MRIs to fill in missing PETs and help with Alzheimer's disease (AD) diagnosis. This framework employs a 3D multi-scale image-to-image CycleGAN architecture for the end-to-end translation of MRI and PET domains together. A hybrid loss function is also proposed to enforce structural similarity while preserving voxel-wise similarity and avoiding blurry images. As shown by the quantitative and visual assessment of the synthesized PETs, this framework is superior to the state-of-the-art. Moreover, using these synthesized PETs helps improve the ternary classification of AD subjects (AD vs. MCI vs. NC). Specifically, assuming an extreme case where none of the subjects has a PET, feeding the classifier with MRIs and their corresponding synthetic PETs results in a more accurate diagnosis than feeding it with just available MRIs. Accordingly, the proposed framework can help improve AD diagnosis, which is the final goal of the current study. Ablation investigation of the proposed multi-scale framework as well as the proposed loss function, is also conducted to study their contribution to the quality of synthesized PETs. Furthermore, other factors, such as stopping criteria, the type of normalization layer, the activation function, and dropouts, are examined, concluding that the appropriate use of these factors can significantly improve the quality of synthesized PETs.
Multimodal active subspace analysis for computing assessment oriented subspaces from neuroimaging data
2024, Journal of Neuroscience Methods
For successful biomarker discovery, it is essential to develop computational frameworks that summarize high-dimensional neuroimaging data in terms of involved sub-systems of the brain, while also revealing underlying heterogeneous functional and structural changes covarying with specific cognitive and biological traits. However, unsupervised decompositions do not inculcate clinical assessment information, while supervised approaches extract only individual feature importance, thereby impeding qualitative interpretation at the level of subspaces.
We present a novel framework to extract robust multimodal brain subspaces associated with changes in a given cognitive or biological trait. Our approach involves active subspace learning on the gradients of a trained machine learning model followed by clustering to extract and summarize the most salient and consistent subspaces associated with the target variable.
Through a rigorous cross-validation procedure on an Alzheimer’s disease (AD) dataset, our framework successfully extracts multimodal subspaces specific to a given clinical assessment (e.g., memory and other cognitive skills), and also retains predictive performance in standard machine learning algorithms. We also show that the salient active subspace directions occur consistently across randomly sub-sampled repetitions of the analysis.
Compared to existing unsupervised decompositions based on principle component analysis, the subspace components in our framework retain higher predictive information.
As an important step towards biomarker discovery, our framework not only uncovers AD-related brain regions in the associated brain subspaces, but also enables automated identification of multiple underlying structural and functional sub-systems of the brain that collectively characterize changes in memory and proficiency in cognitive skills related to brain disorders like AD.
A hybrid multimodal machine learning model for Detecting Alzheimer's disease
2024, Computers in Biology and Medicine
Alzheimer's disease (AD) diagnosis utilizing single modality neuroimaging data has limitations. Multimodal fusion of complementary biomarkers may improve diagnostic performance. This study proposes a multimodal machine learning framework integrating magnetic resonance imaging (MRI), positron emission tomography (PET) and cerebrospinal fluid (CSF) assays for enhanced AD characterization. The model incorporates a hybrid algorithm combining enhanced Harris Hawks Optimization (HHO) algorithm referred to as ILHHO, with Kernel Extreme Learning Machine (KELM) classifier for simultaneous feature selection and classification. ILHHO enhances HHO's search efficiency by integrating iterative mapping (IM) to improve population diversity and local escaping operator (LEO) to balance exploration-exploitation. Comparative analysis with other improved HHO algorithms, classic meta-heuristic algorithms (MHAs), and state-of-the-art MHAs on IEEE CEC2014 benchmark functions indicates that ILHHO achieves superior optimization performance compared to other comparative algorithms. The synergistic ILHHO-KELM model is evaluated on 202 AD Neuroimaging Initiative (ADNI) subjects. Results demonstrate superior multimodal classification accuracy over single modalities, validating the importance of fusing heterogeneous biomarkers. MRI + PET + CSF achieves 99.2 % accuracy for AD vs. normal control (NC), outperforming conventional and proposed methods. Discriminative feature analysis provides further insights into differential AD-related neurodegeneration patterns detected by MRI and PET. The differential PET and MRI features demonstrate how the two modalities provide complementary biomarkers. The neuroanatomical relevance of selected features supports ILHHO-KELM's potential for extracting sensitive AD imaging signatures. Overall, the study showcases the advantages of capitalizing on complementary multimodal data through advanced feature learning techniques for improving AD diagnosis.
Multi-modality approaches for medical support systems: A systematic review of the last decade
2024, Information Fusion
Healthcare traditionally relies on single-modality approaches, which limit the information available for medical decisions. However, advancements in technology and the availability of diverse data sources have made it feasible to integrate multiple modalities and gain a more comprehensive understanding of patients' conditions. Multi-modality approaches involve fusing and analyzing various data types, including medical images, biosignals, clinical records, and other relevant sources.
This systematic review provides a comprehensive exploration of the multi-modality approaches in healthcare, with a specific focus on disease diagnosis and prognosis. The adoption of multi-modality approaches in healthcare is crucial for personalized medicine, as it enables a comprehensive profile of each patient, considering their genetic makeup, imaging characteristics, clinical history, and other relevant factors. The review also discusses the technical challenges associated with fusing heterogeneous multimodal data and highlights the emergence of deep learning approaches as a powerful paradigm for multimodal data integration.
Gaussian Process-based prediction of memory performance and biomarker status in ageing and Alzheimer's disease—A systematic model evaluation
2023, Medical Image Analysis
Neuroimaging markers based on Magnetic Resonance Imaging (MRI) combined with various other measures (such as genetic covariates, biomarkers, vascular risk factors, neuropsychological tests etc.) might provide useful predictions of clinical outcomes during the progression towards Alzheimer’s disease (AD). The use of multiple features in predictive frameworks for clinical outcomes has become increasingly prevalent in AD research. However, many studies do not focus on systematically and accurately evaluating combinations of multiple input features. Hence, the aim of the present work is to explore and assess optimal combinations of various features for MR-based prediction of (1) cognitive status and (2) biomarker positivity with a multi-kernel learning Gaussian process framework. The explored features and parameters included (A) combinations of brain tissues, modulation, smoothing, and image resolution; (B) incorporating demographics & clinical covariates; (C) the impact of the size of the training data set; (D) the influence of dimensionality reduction and the choice of kernel types. The approach was tested in a large German cohort including 959 subjects from the multicentric longitudinal study of cognitive impairment and dementia (DELCODE). Our evaluation suggests the best prediction of memory performance was obtained for a combination of neuroimaging markers, demographics, genetic information (ApoE4) and CSF biomarkers explaining 57% of outcome variance in out-of-sample predictions. The highest performance for A $β$ 42/40 status classification was achieved for a combination of demographics, ApoE4, and a memory score while usage of structural MRI further improved the classification of individual patient’s pTau status.

View all citing articles on Scopus

¹: Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: www.loni.ucla.edu/ADNI/Collaboration/ADNI_Authorship_list.pdf.

View full text

Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease

Abstract

Introduction

Section snippets

Method

Subjects

MRI, PET, and CSF

Image analysis

Multi-modal multi-task (M3T) learning

Validation

Experiment 1: estimating clinical stages (MMSE, ADAS-Cog, and class label)

Discussion

Conclusion

Acknowledgment

Neurobiol Aging

NeuroImage

Neurobiol. Aging

Neurobiol. Aging

Neurobiol. Aging

NeuroImage

NeuroImage

Psychiatry Res.

J. Psychiatr. Res.

Lancet

NeuroImage

NeuroImage

NeuroImage

NeuroImage

NeuroImage

NeuroImage

Neurobiol. Aging

NeuroImage

NeuroImage

Convex multi-task feature learning

Mach. Learn.

An improved multi-task learning approach with applications in medical diagnosis

Longitudinal changes of CSF biomarkers in memory clinic patients

Neurology

LIBSVM: a Library for Support Vector Machines

Mapping gray matter loss with voxel-based morphometry in mild cognitive impairment

NeuroReport

FDG-PET measurement is more accurate than neuropsychological assessments to predict global cognitive deterioration in patients with mild cognitive impairment

Neurocase

Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification

Neurobiol Aging

Longitudinal CSF isoprostane and MRI atrophy in the progression to AD

J. Neurol.