A Novel Method to Estimate Long‐Term Chronological Changes From Fragmented Observations in Disease Progression
Abstract
Clinical observations of patients with chronic diseases are often restricted in terms of duration. Therefore, obtaining a quantitative and comprehensive understanding of the chronology of chronic diseases is challenging, because of the inability to precisely estimate the patient's disease stage at the time point of observation. We developed a novel method to reconstitute long‐term disease progression from temporally fragmented data by extending the nonlinear mixed‐effects model to incorporate the estimation of “disease time” of each subject. Application of this method to sporadic Alzheimer's disease successfully depicted disease progression over 20 years. The covariate analysis revealed earlier onset of amyloid‐β accumulation in male and female apolipoprotein E ε4 homozygotes, whereas disease progression was remarkably slower in female ε3 homozygotes compared with female ε4 carriers and males. Simulation of a clinical trial suggests patient recruitment using the information of precise disease time of each patient will decrease the sample size required for clinical trials.
Study Highlights
WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC?
☑ The clinical database of chronic disease, like Alzheimer’s disease, often lacks observation on the subject spanning the whole disease progression, making it difficult to obtain the comprehensive understanding of the chronology of chronic diseases.
WHAT QUESTION DID THIS STUDY ADDRESS?
☑ This study addressed whether we can build an algorithm that can estimate the long‐term disease progression from a clinical database.
WHAT DOES THIS STUDY ADD TO OUR KNOWLEDGE?
☑ The proposed algorithm gives a framework that is able to reconstruct the long‐term disease progression from temporally restricted observations, including the distribution of intersubject and intrasubject variability and the evaluation of the effects of covariates. Sex and ApoE genotype, known factors that affect the disease progression of Alzheimer’s disease, were quantitatively analyzed with our model, and the difference in the disease progression by these covariates was determined.
HOW THIS MIGHT CHANGE CLINICAL PHARMACOLOGY OR TRANSLATIONAL SCIENCE?
☑ The proposed method enables simulation of clinical trials and individual diagnosis using Bayesian posterior estimation.
In chronic diseases that progress over decades, a precise understanding of the progression of the disease is crucial, both for decision making in clinical treatment and for developing new drugs. The molecular mechanisms contributing to disease exacerbation often switch in a manner depending on the stage of the disease1, 2; thus, the clinical effectiveness of a given therapeutic agent often depends on the disease stage. Therefore, accurate estimation of the disease stage in each patient is desirable for selecting optimal treatment. Likewise, successful development of new drugs will require specification of the disease stage at which the compound is assumed to be most effective, based on its pharmacological target, and recruitment of patients at the appropriate stage of progression into the relevant clinical trials. For those reasons, accurate estimation of the population distribution in disease progression is necessary. For example, the progression of Alzheimer’s disease (AD) is thought to span decades from the appearance of the first sign (i.e., the accumulation of amyloid‐β (Aβ) in the brain) to the onset of severe clinical symptoms.3, 4 The results of a clinical trial of anti‐Aβ antibodies, which are being developed as potential disease‐modifying drugs for AD, suggested that intervention at an earlier stage of the disease would be preferable.5, 6
Despite the demands for a quantitative description of long‐term disease progression, it is practically difficult to perform a cohort study over several decades. Yet, it is feasible to obtain fragmentary time profiles from numerous patients at various stages of disease progression, because relatively short‐term cohort studies have been performed for many chronic diseases, including AD.7-13 Because the intervals between observations within each subject are known, partial disease progression within individual subjects is observed. There have been some attempts to locate patients’ data along the disease stage, and estimate the entire time course simultaneously.14-21 However, there has not been a comprehensive framework that has statistical basis and realizes flexible description of the time course with nonlinear evolution, intersubject variability, and effects of covariates, such as sex and genotype.
Therefore, we developed a novel method, termed “statistical restoration of fragmented time course (SReFT),” by extending the nonlinear mixed‐effects model (NLMM) (Figure S1). NLMM is a statistical framework suitable for describing longitudinal observations with repeated measures. It is able to assess intersubject variability and intrasubject variability separately; assess the effects of covariates, such as sex of subjects; and enable personalized diagnosis.22 However, NLMM requires time points for the data. In case of describing disease progression, NLMM requires the precise elapsed time since the onset of the disease to the point of observation for each subject, which we term the “disease time” of the subject. Therefore, in the present study, we extended the conventional NLMM to incorporate the estimation of the disease time of each subject's observation, based on maximum likelihood estimation. Thereby, it is possible to solve the problem of lacking time‐point information for each patient's fragmented data, and the entire time course of progression of chronic disease can be reconstituted. Moreover, because we provide the mathematical foundation for this method, the significance of the covariate effects can be evaluated on the basis of statistical criteria, as in NLMM. Covariate analysis reveals important information, such as the key factors aggravating the chronic disease, and leads to the design of a personalized treatment strategy by stratifying patients by those factors. Practically, SReFT can be regarded as an extension from the previous studies,14-21 with additional features. Herein, we report the estimated long‐term disease progression of AD by applying SReFT to the Alzheimer's Disease Neuroimaging Initiative (ADNI) databases (Table 1).7
| Variable | Normal | MCI | AD | Total |
|---|---|---|---|---|
| (N = 83) | (N = 242) | (N = 112) | (N = 437) | |
| Age, ya
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
76.2 ± 5.0 | 74.2 ± 6.9 | 74.2 ± 8.3 | 74.6 ± 7.0 |
| Female, No. (%) | 42 (50.6) | 93 (38.4) | 46 (41.1) | 181 (41.4) |
| Weight at baseline, kga
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
74.7 ± 16.0 | 76.6 ± 14.8 | 72.4 ± 13.2 | 75.2 ± 14.7 |
| AD treatment, a/b (%)b
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
3/59 (5.1) | 86/146 (59) | 19/20 (95) | 108/225 (48) |
| ApoE | ||||
| ε2/ε3 | 4 | 10 | 1 | 15 |
| ε3/ε3 | 44 | 73 | 28 | 145 |
| ε3/ε4 | 31 | 120 | 56 | 207 |
| ε4/ε4 | 4 | 39 | 27 | 70 |
| Observation period, ya
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
2.9 ± 2.2 | 2.4 ± 1.8 | 1.6 ± 0.8 | 2.3 ± 1.8 |
| CDR‐SB, n (N)c
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
83 (384) | 242 (1129) | 112 (395) | 437 (1908) |
| FDG–PET, n (N)c
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
69 (172) | 182 (513) | 64 (176) | 315 (861) |
| Hippocampus, n (N)c
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
81 (300) | 241 (942) | 112 (340) | 434 (1582) |
| Ventricle, n (N)c
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
50 (198) | 140 (619) | 92 (303) | 282 (1120) |
| CSF Aβ, n (N)c
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
83 (159) | 242 (434) | 112 (205) | 437 (798) |
| Amyloid‐PET, n (N)c
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
54 (68) | 134 (173) | 25 (33) | 213 (274) |
| CSF tau, n (N)c
Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
|
81 (156) | 242 (435) | 111 (204) | 434 (795) |
- Aβ, amyloid‐β; AD, Alzheimer’s disease; ADNI, Alzheimer's Disease Neuroimaging Initiative; CDR‐SB, Clinical Dementia Rating Scale Sum of Boxes; CSF, cerebrospinal fluid; FDG, fluorodeoxyglucose; MCI, mild cognitive impairment; PET, positron emission tomography; SReFT, statistical restoration of fragmented time course.
- a Values represent means ± SDs.
- b a, number of subjects with AD treatment; b, number of subjects with information of treatment in the ADNI.
- c Number of subjects (n) and number of observations (N).
Furthermore, once the population distribution of the disease progression is obtained by analyzing the database, including multiple subjects with SReFT, quantitative predictions can be conducted in clinical practices, such as individualized diagnosis by Bayesian a posteriori estimation for a new subject; clinical trials can also be simulated by Monte–Carlo method. As an example of the application potential of our algorithm, we propose incorporating the information regarding the disease time of the subject into the inclusion criteria in a clinical trial. We conducted Monte–Carlo simulation of a clinical trial of a drug modeling anti‐Aβ treatment using the population distribution obtained from the analysis of ADNI. The results suggest that simulation‐aided design is useful in clinical trials.
Results
Concept of SReFT
The fitting of the “hyperparameters” (the parameters that define the distribution in the population, such as the means or variances of the parameters of the nonlinear function, or variances of the residual error) in SReFT is achieved by maximizing the marginal likelihood function, as in NLMM.23-25 In addition to hyperparameters, SReFT also estimates the disease time of the subject: the most likely time point of an observed fragment for each subject along the disease progression. During the iterative optimization, subjects are shifted back and forth along the temporal axis to maximize the likelihood function (Figure S1).
(1)where f is value of biomarker, t is the elapsed time since the disease onset, {a, b, c} is the parameter set obeying a multivariate normal distribution, and ε is the residual error obeying normal distribution. Although this function has only three parameters, it offers good flexibility for describing monotonic changes in biomarkers, such as linear‐like changes, exponential‐like changes, and sigmoidal changes. In practice, this double‐exponential function was log transformed to convert multiplicative error mentioned above into additive error, and treated as an NLMM. The theoretical and algorithmic details are given in the Methods and Supplementary Text, Theoretical Note.
Demonstration of SReFT analysis with virtual data sets
We first assessed the ability of SReFT to restore lost temporal information using virtual data. The parameters of the virtual data, such as number of observations, number of subjects, number of biomarkers, changes in biomarker profiles, and sampling intervals, were determined by considering actual biomarker information from ADNI. We generated virtual data sets of 400 subjects from a specific hyperparameter set with three biomarkers, assuming disease progression over a period of 20–30 years. The data were generated with a model incorporating the effects of covariates, sex, and genotype (Eq. s23), using the distribution parameters shown in Table S1. The same model was used in the following analysis of the ADNI data. The observation length within a subject was chosen randomly from the uniform distribution on [4, 5, 6] at 1‐year intervals (Figure 1a). The information about the disease times was then deleted from the data (Figure 1a). In the current analysis, disease time 0 was defined as the point at which the mean progression of biomarker 3 passes through a constant value. Analysis of these virtual data sets using SReFT successfully reconstituted the disease times, with the estimated parameter values closely approximating those of the original parameters (Figures 1b and S2a). The mean and SD of the difference between the estimated disease time and real disease time were 0.89 and 2.27 years, respectively. Figure 1b shows the result of estimated population‐mean curves for 100 rounds of simulations for each combination of the covariates. The estimated long‐term changes in the biomarkers almost coincided with the original time profile. There is an improvement in the estimation accuracy of the disease time from that of GRACE proposed by Donohue et al.14 (Figures 1c and S2b). These Monte–Carlo simulations suggested that SReFT can estimate the hyperparameters with acceptable accuracy by using data with N = 400 subjects in this simulation case, which are similar to the real ADNI data, and also confirmed that SReFT can estimate the disease times from temporally fragmented information.

Application of SReFT to the ADNI data set
We applied SReFT to analyze ADNI data, including the following: cerebrospinal fluid (CSF) Aβ (1–42) and amyloid–positron emission tomography (PET) imaging reflect the Aβ accumulation in the brain; CSF tau reflects neurodegeneration levels; the volumes of the hippocampi and ventricles are measured by magnetic resonance imaging; the Clinical Dementia Rating Scale Sum of Boxes (CDR‐SB) score measures both the level of cognitive impairment and disability in activities of daily living; and the fluorodeoxyglucose–PET reflects brain glucose consumption (Figure S3a). CSF Aβ has been reported to represent bimodal distribution in the ADNI database, with a threshold value of 192 pg/ml.26 Therefore, a decrease in CSF Aβ levels below this value can be regarded as the onset of disease progression, designated as disease time 0 in this study. The estimated AD progression was represented in Figure S3b.
Next, we examined the model that incorporates covariates. Several studies have suggested that multiple factors affect the progression of AD (e.g., the ε4 allele of the ApoE gene is one of the most famous risk factors for AD).27 It has also been suggested that the risk of carrying the ε4 allele is greater in females than in males.28-30 Thus, we incorporated the effects of sex, the ApoE genotype, and the interaction between these factors on AD progression as the covariates. Details of the covariate model are described in Supplementary Text (Eqs. s23 and s24). The estimated final covariates, biomarker evolutions, and parameters are shown in Figure 2 and Tables 2 and 3. The summary statistics of the estimated disease times of subjects were as follows: mean, 13.39 years; SD, 5.72 years; minimum, −1.19 years; maximum, 27.21 years.

| Variable | Male | Female | ||||
|---|---|---|---|---|---|---|
| ε3/ε3 | ε3/ε4 | ε4/ε4 | ε3/ε3 | ε3/ε4 | ε4/ε4 | |
| Effects on baseline (dY) | ||||||
| CDR‐SB | NS | NS | NS | −0.43 | −0.17 | −0.43 |
| FDG–PET | NS | NS | NS | 0.55 | 0.35 | 0.16 |
| Hippocampus | NS | NS | NS | NS | NS | NS |
| Ventricle | NS | NS | −0.57 | −1.05 | −1.05 | −1.62 |
| CSF Aβ | NS | NS | −1.04 | 0.58 | 0.03 | −0.46 |
| Amyloid–PET | NS | NS | NS | NS | NS | NS |
| CSF tau | NS | NS | NS | NS | 0.37 | 0.59 |
| Effects on progression (dT) | ||||||
| Time | NS | NS | NS | −0.21 | 0.17 | 0.10 |
- The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details.
- Aβ, amyloid‐β; AD, Alzheimer’s disease; CDR‐SB, Clinical Dementia Rating Scale Sum of Boxes; CSF, cerebrospinal fluid; FDG, fluorodeoxyglucose; NS, not significant; PET, positron emission tomography; SReFT, statistical restoration of fragmented time course.
| Variable | Mean | Variances | Residuals | Effects on baseline (dY) | Effects on progression (dT) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| α | β | γ | α | β | γ | Sex | ApoE | Sex × ApoE | Sex | ApoE | Sex × ApoE | ||||||
| f | ε3/ε4 | ε4/ε4 | f × ε3/ε4 | f × ε4/ε4 | f | ε3/ε4 | ε4/ε4 | f × ε3/ε4 | f × ε4/ε4 | ||||||||
| CDR‐SB | −1.043 | 0.013 | 0.177 | 0.320 | – | 0.000549 | 0.084 | −0.43 | 0 | 0 | 0.26 | 0 | −0.21 | 0 | 0 | 0.38 | 0.31 |
| FDG–PET | 0.751 | −0.014 | 0.149 | 0.446 | – | 0.000161 | 0.172 | 0.55 | 0 | 0 | −0.20 | −0.39 | |||||
| Hippocampus | 1.481 | −0.046 | 0.088 | 0.410 | – | 0.000300 | 0.019 | 0 | 0 | 0 | 0 | 0 | |||||
| Ventricles | −1.232 | 0.068 | 0.056 | 0.989 | – | 0.000486 | 0.004 | −1.05 | 0 | −0.57 | 0 | 0 | |||||
| CSF Aβ | 1.746 | −0.376 | −0.199 | – | 0.028 | – | 0.182 | 0.58 | 0 | −1.04 | −0.55 | 0 | |||||
| Amyloid–PET | −3.659 | 1.205 | −0.308 | 0.732 | – | 0.000001 | 0.075 | 0 | 0 | 0 | 0 | 0 | |||||
| CSF tau | −1.370 | 0.219 | −0.135 | 0.751 | – | 0.000601 | 0.090 | 0 | 0 | 0 | 0.37 | 0.59 | |||||
- Hyperparameters for the mean, variance of parameters, the variance of the residuals, and estimated values of the effects of covariates. The parameters α, β, and γ are defined in Eqs. s12‐s15. The mean α of CSF Aβ was fixed at 1.746, which is the log‐normalized value of 192 pg/ml. The term dY is the effect on the baseline level of each biomarker value, and has the same dimension as the real value of each biomarker. The term dT is the effect showing the relative rate of the disease‐progression speed affecting all of the biomarkers. See Supplementary Text for details (Eqs. s23 and s24).
- Aβ, amyloid‐β; ADNI, Alzheimer's Disease Neuroimaging Initiative; CDR‐SB, Clinical Dementia Rating Scale Sum of Boxes; CSF, cerebrospinal fluid; FDG, fluorodeoxyglucose; PET, positron emission tomography.
As for progression rate, there was no significant difference between male ε4 carriers and noncarriers; however, the disease progression rates of female ε4 heterozygotes and female ε4 homozygotes were 17% and 10% faster than that of males, respectively, whereas the progression of female ε4 noncarriers was 21% slower than that of males (Table 2). Thus, the disease progressed approximately 40% faster in female ε4 carriers than noncarriers.
SReFT enabled the estimation of parameters for each subject using Bayesian estimation as well as overall individual changes in biomarker profiles. Using these parameters, the subjects’ age at a particular disease time point could be estimated. Figure 2c shows the calculated age corresponding to Aβ accumulation onset (t = 0) and mild AD development (CDR‐SB = 4.5),31 stratified by sex and ApoE genotype. The average age at t = 0 was 5 years lower in ε4 homozygotes of both sexes. However, mild AD emerged at a much higher age in female ε3 homozygotes, probably because of slow disease progression. In addition, no significant difference in the age of conversion to mild AD was detected between female ε4 heterozygotes and homozygotes, despite the difference in age at t = 0, probably because the progression of female ε4 heterozygotes was estimated as slightly faster than that of female ε4 homozygotes (117% and 110% of males, respectively). The validity of the computations was confirmed by analyzing the residual plots (Figure S4).
These analyses of the ADNI data were conducted without incorporating the nondiagonal elements (correlation) of the variance–covariance matrix in the model, to avoid excessive computational time. However, we confirmed that omitting the correlations in the model does not considerably affect the result (Table S2).
Bootstrap analyses
To evaluate the robustness of the estimation with the original ADNI data set, we performed a bootstrap analysis with 100 randomly resampled data sets to confirm the reproducibility of the parameters estimated by the final covariate model. The mean biomarker changes estimated by the bootstrap analysis coincided well with the changes estimated from the original data set (Figure 3). The reproducibility of parameter values and disease times, as determined by the bootstrap analyses, is described in Figure S5. The estimations of the disease times of all subjects and most of the parameters were within the range of the mean ± SD of bootstrap analyses. The exceptions included covariates of the ApoE ε2 heterozygotes, probably because of the small number of ε2 heterozygotes in the ADNI data set. The other exceptions were some insignificant parameters when original values were calculated from the normalized values. Variability of these results reflected little contribution to the actual biomarker evolution.

Simulation‐aided design of clinical trials

(2)
(lower panel), where t is the disease time at inclusion, and a is a parameter of the function defining the steepness of the slope (upper panel). (d) The estimated number of subjects necessary to detect the drug effect (N) after 2 years of clinical trial, calculated by a simulation of clinical trial using the distribution of male subjects. Shown is the dependency of N on parameter a and parameter T, where T defines the interval of criterion in time. Disease time of subject at the initial inclusion is restricted to [T T+4.67]. The yellow line indicates T with the smallest N for various parameter a. (e) The dependency of the smallest number of subjects required on parameter a of male subjects. The solid blue line shows how N for criterion in value depends on a using the CDR‐SB value. The dashed red line shows the smallest N for criterion in time (the N value along the yellow line in d). (f) The dependency of N on parameter a and parameter T of female ε3 homozygotes is shown. (g) The dependency of the smallest number of subjects required on parameter a of female ε3 homozygotes. (h) The dependency of the optimal disease time to criterion on a. Parallel simulations of clinical trials were conducted for different groups of covariates using the hyperparameters obtained from the ADNI data, and the optimal disease times are plotted separately for each group. AD, Alzheimer’s disease; ADNI, Alzheimer’s Disease Neuroimaging Initiative; CDR‐SB, Clinical Dementia Rating Scale Sum of Boxes. Simulation‐aided design of clinical trial.
where Rj is the rate of disease progression of the jth subject after administration, t0j is the disease time at inclusion of the jth subject, and a is a parameter of function defining the steepness of the slope corresponding to the degree of dependency of the drug effect on disease time (Figure 4c). ξ is the interindividual difference of the drug effect obeying normal distribution, and σdrug is the SD of ξ. We used the value σdrug = 0.5. R increases monotonically from 0 to 1 with respect to disease time. Although the model of the drug effect remains purely theoretical and conceptual in this study, it can be refined empirically in a future study when data, such as those from preliminary completed clinical trials, are made available.
Figure 4d illustrates the number of subjects required to detect a drug effect using the disease–time criterion for various combinations of the parameter a and the parameter T, which defines the time point of the window of criterion. We have confirmed that the value of σdrug affects the results of the simulation little. The simulation result using the distribution parameter for males is shown. When there is a low dependency of the drug effect on disease stage (a ~ 0), setting the time window to a late phase of the disease progression resulted in a small number of subjects, because average change of the value of CDR‐SB is greater in the late phase. On the other hand, when there is a strong dependency of the drug effect on disease stage (a ~ 0.5), the number of subjects necessary is smaller for earlier phase of the disease stage because the drug effect is little in the later phase of the disease. Thus, the optimal time for setting the time window of the criterion showed nonlinear dependency on a and T, as indicated with a yellow solid line on the heat map (Figure 4d,f). The simulation suggested that the conventional inclusion criteria of subjects by restricting the value of a cognitive score led to the requirement of exponentially increasing numbers of subjects, because the dependency of the drug–effect on disease time increased. This would explain the past failure of clinical trials to detect efficacy of anti‐AD drugs.6, 32-35 On the other hand, although the drug effect strongly depends on disease times, the number of subjects required to detect a drug effect using the disease–time criterion is far less, if inclusion of subjects is in an earlier stage. The necessary number of subjects is confined to practical values, by changing the inclusion criterion for the subjects’ disease times (Figure 4e‐h).
Discussion
Our proposed method, SReFT, extends the conventional NLMM25, 36 to enable handling data that lack the common time zero point across subjects. SReFT estimates both the disease time and the disease progression (mean and variances) simultaneously. Using the estimated variances, SReFT automatically renders information about disease time from more reliable biomarkers with smaller variances (Supplementary Text, Theoretical Note, 4). We selected CDR‐SB alone, a biomarker for cognitive impairment, and avoided use of multiple biomarkers for cognitive scores, such as Mini‐Mental State Examination and Alzheimer's Disease Assessment Scale–cognitive subscale. This is because incorporating biomarkers with high correlations would result in a biased estimation, when analyzed with a covariance matrix with only the diagonal elements. It may be possible to yield proper results of SReFT analysis with strong correlations among the biomarker profiles when ample computer resources are available, because we have shown properness of the theory when nondiagonal correlations are considered in the model (Supplementary Text, Theoretical Note, 5).
SReFT analysis indicated that the values of CSF Aβ and amyloid‐PET changed more steeply at first, whereas the values of CDR‐SB and fluorodeoxyglucose–PET were steady during the early stage but altered markedly during the later stage (Figure S3b). The volumes of the hippocampi and ventricles were changed more evenly as the disease progressed. As for CSF tau, the mean value increased moderately as a function of time; however, the values varied widely across subjects, especially during the later stage. By using plots of the SD of the posterior distribution of the disease time, we found that the volume of the hippocampi was the most informative biomarker for the disease–time estimation of subjects across all disease stages (Figure S3c). These results are consistent with those of previous reports.3, 14
Because SReFT is a maximum likelihood estimator, covariate–model selections can be conducted on the basis of statistics, such as the likelihood ratio test. Moreover, our covariate model dissociates the effects on the progression rate and asymptotic level of each biomarker (Table 2). By using these features, we could quantitate the influence of the ApoE genotype and sex on AD progression (Figure 2). The overall larger effects of the ApoE allele on disease progression in females compared with in males are consistent with previous studies.30 For example, Altmann et al. reported that the influence of carrying the ApoE ε4 allele on the risk of conversion to diagnosis of AD was observed in both males and females, although the influence was smaller in males.30 We confirmed that the average age of onset of Aβ accumulation was approximately 5 years younger in ApoE ε4 homozygotes compared with in ApoE ε3/ε3 carriers or ApoE ε3/ε4 carriers for both sexes (Figure 2c). However, it is interesting that the genotype‐dependent difference in the progression rate in males was not significant in our analysis (Table 2). In the current analysis, covariates other than sex and ApoE genotype were not incorporated into the model because of the high computational cost for covariate modeling.37 Other potential covariates could be additionally incorporated into the model if the problem of high computational cost is resolved. Overall, the present analysis provides more quantitative results regarding the effects of covariates than previous reports, and may help to understand the molecular mechanisms of AD and novel therapeutic strategies.
The intersubject and the intrasubject variances estimated separately by our method offer useful applications, such as individualized diagnosis via Bayesian inference or simulation of clinical trials. By Bayesian interference using the population disease progression, SReFT enables us to estimate the current disease time of a new subject and give subject‐specific prediction of the disease progression. Subject‐specific estimation of passage time from a specific event can be conducted for any type of data, when the time course of multiple biomarkers in the population is available. For example, in compliance evaluation, SReFT could be used to estimate the dosing time by observing drug concentration and other biomarkers that show time‐dependent change after drug is taken. We demonstrated an example of the practical application in designing clinical trials. The typical inclusion criterion used in clinical trials is the setting of cutoff values for clinical observations. However, this conventional recruitment omits the potentially useful hidden information about the subject (viz., the disease time of the subject), which can now be estimated using SReFT. Our simulation results indicated that the number of subjects necessary to detect a drug effect could be reduced markedly when the disease time is incorporated into the inclusion criteria. Simulations conducted separately for different covariates revealed large differences across the groups in the optimal disease time for subject recruitment and number of subjects necessary (Figure 4h). Particularly for female ε4 noncarriers, the optimal time for recruitment includes generally older values, and the number of subjects necessary is larger than in other groups. Covariate‐specific predictions may show better results in the optimization of trials. Other than disease time, SReFT also provides the expected effect of the covariates and the parameters reflecting subject‐specific progression, which are also potentially useful as inclusion criteria. We proposed a possible application of our method and used a simple and abstract model to describe the drug effects. Quantitative knowledge gained from data, such as small preliminary clinical trials or past clinical trials, would increase the predictive power of the simulation. As we found in the simulation, criterion in value may result in including subjects at various disease stages. Thus, analyzing data of a clinical trial with SReFT may reveal novel knowledge as to the disease progression, drug effect, and differences between groups, although the restoration of the disease progression using data of a clinical trial would be restricted to a certain period of the whole disease progression, because the data usually do not cover the whole disease progression. A more reliable model can be obtained that describes the entire progression of the disease if a large database with a relatively wide range of disease populations (like ADNI), rather than data from a clinical trial, is analyzed. Recently, several consortia have been created for sharing such databases,8-13 and SReFT might be used to analyze those databases.
Methods
Data included in the analysis of AD
Subjects who participated in any of the studies of ADNI1,7 ADNIGO,7 or ADNI27 were selected, with the exclusion criteria as follows: subjects with
- no information about age, sex, or ApoE genotype;
- no CSF Aβ data;
- average CSF Aβ levels >192 pg/ml26
- mild cognitive impairment (MCI) reversion to cognitively normal (CN) or dementia reversion to MCI or CN; and
- ε2/ε4 ApoE genotype.
As a result, data of 437 subjects in total (CN: 83 (stable CN, 68; CN‐to‐MCI converter, 12; CN‐to‐dementia converter, 3); MCI: 242 (stable MCI, 143; MCI‐to‐dementia converter, 99); dementia: 112) were collected.
Source codes
Two sets of the source codes are provided (https://tokudakeita@bitbucket.org/tokudakeita/sreft). One is MATLAB open source codes, including sample codes to demonstrate SReFT. Application for user's specific area should be easy. The other file is the bundle file of “Numeric Analysis Program for Pharmacokinetics (Napp),”38 written in Objective‐C, which specifies the model and is used in the analysis of ADNI. This code can be compiled with Xcode by Apple and used as an add‐in to the coming version of Napp. Then, SReFT can be conducted on Napp. Please refer to Supplementary Text, Theoretical Note, for a description of the method.
Bootstrap analyses
We performed bootstrap analysis via Monte–Carlo resampling to confirm the reproducibility of the parameters estimated by the final covariate model. We drew 100 resamples with the same number of subjects (n = 437) per resample as the ADNI data set by random resampling from the original collection of subjects,39 followed by SReFT analysis with 100 resamples with the final covariate model. Reproducibility of all estimated parameters was evaluated by comparing the parameter values estimated by bootstrap data sets with those estimated by the original ADNI data set.
Acknowledgments
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and early Alzheimer’s disease. For up‐to‐date information, see http://www.adni-info.org. We appreciate Dr. Joga Gobburu’s valuable advice (University of Maryland), especially the advice on naming SReFT. We also thank Dr. Yaning Wang (FDA) for his suggestions. We are grateful to Mr. Kazutoshi Yokozuka and Mr. Hidefumi Kasai for their initial work in organizing the ADNI database.
Funding
Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI; National Institutes of Health Grant U01 AG024904) and Department of Defense (DOD) ADNI (DOD award number W81XWH‐12‐2‐0012). ADNI is funded by the National Institute on Aging, The National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol‐Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann‐La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. This work was supported by a grant Accelerating Regulatory Science Initiative from the Ministry of Health, Labour and Welfare of Japan.
Conflict of Interest
The authors declared no competing interests for this work.
Author Contributions
T. Ishida, K.T., A.H., M.H., and H.S. wrote the manuscript. T. Ishida, K.T., and A.H. designed the research. T. Ishida and K.T. performed the research. T. Ishida, K.T., A.H., M.H., S.K., H.T., T. Iwatsubo, T.M., and H.S. analyzed the data. A.H. and K.T. contributed new reagents/analytical tools.
Appendix 1:
Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.




