Volume 10, Issue 4 p. 350-361
Article
Open Access

Identification of high-dimensional omics-derived predictors for tumor growth dynamics using machine learning and pharmacometric modeling

Laura B. Zwep

Laura B. Zwep

Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

Mathematical Institute, Leiden University, Leiden, The Netherlands

Search for more papers by this author
Kevin L. W. Duisters

Kevin L. W. Duisters

Mathematical Institute, Leiden University, Leiden, The Netherlands

Search for more papers by this author
Martijn Jansen

Martijn Jansen

Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

Search for more papers by this author
Tingjie Guo

Tingjie Guo

Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

Department of Intensive Care Medicine, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

Search for more papers by this author
Jacqueline J. Meulman

Jacqueline J. Meulman

Mathematical Institute, Leiden University, Leiden, The Netherlands

Search for more papers by this author
Parth J. Upadhyay

Parth J. Upadhyay

Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

Search for more papers by this author
J. G. Coen van Hasselt

Corresponding Author

J. G. Coen van Hasselt

Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands

Correspondence

J. G. Coen van Hasselt and Laura B. Zwep, Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.

Emails: [email protected], [email protected]

Search for more papers by this author
First published: 01 April 2021
Citations: 3

Funding information

No funding was received for this work.

Abstract

Pharmacometric modeling can capture tumor growth inhibition (TGI) dynamics and variability. These approaches do not usually consider covariates in high-dimensional settings, whereas high-dimensional molecular profiling technologies (“omics”) are being increasingly considered for prediction of anticancer drug treatment response. Machine learning (ML) approaches have been applied to identify high-dimensional omics predictors for treatment outcome. Here, we aimed to combine TGI modeling and ML approaches for two distinct aims: omics-based prediction of tumor growth profiles and identification of pathways associated with treatment response and resistance. We propose a two-step approach combining ML using least absolute shrinkage and selection operator (LASSO) regression with pharmacometric modeling. We demonstrate our workflow using a previously published dataset consisting of 4706 tumor growth profiles of patient-derived xenograft (PDX) models treated with a variety of mono- and combination regimens. Pharmacometric TGI models were fit to the tumor growth profiles. The obtained empirical Bayes estimates-derived TGI parameter values were regressed using the LASSO on high-dimensional genomic copy number variation data, which contained over 20,000 variables. The predictive model was able to decrease median prediction error by 4% as compared with a model without any genomic information. A total of 74 pathways were identified as related to treatment response or resistance development by LASSO, of which part was verified by literature. In conclusion, we demonstrate how the combined use of ML and pharmacometric modeling can be used to gain pharmacological understanding in genomic factors driving variation in treatment response.

Video Abstract

Identification of high-dimensional omics-derived predictors for tumor growth dynamics using machine learning and pharmacometric modeling

by Zwep et al.

Study Highlights

  • WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC?

Pharmacometric tumor growth inhibition models represent a well-established approach—quantify tumor growth dynamics and the effects of therapeutics agents. High-dimensional molecular profiling technologies (“omics”) are relevant predictors for prediction of interindividual variation in tumor growth dynamics.

  • WHAT QUESTION DID THIS STUDY ADDRESS?

How can high-dimensional omics-based predictors be efficiently identified for tumor growth inhibition model parameters?

  • WHAT DOES THIS STUDY ADD TO OUR KNOWLEDGE?

Two-step approach combining tumor growth inhibition modeling and least absolute shrinkage and selection operator regression allows identification of pharmacologically predictive and mechanistically relevant omics-derived molecular predictors for variation in tumor growth dynamics.

  • HOW MIGHT THIS CHANGE DRUG DISCOVERY, DEVELOPMENT, AND/OR THERAPEUTICS?

This developed approach can be used to enable personalized omics-derived predictions of optimal treatments and dosing schedules, through its integration with tumor growth modeling. The integration with tumor growth modeling also allows identification of molecular predictors specifically associated with efficacy or treatment resistance.

INTRODUCTION

Pharmacometric modeling of tumor growth inhibition (TGI) dynamics is extensively used to model the longitudinal response of tumor size in response to drug treatment in preclinical animal models or patients. Pharmacometric TGI models have increasingly been used to characterize drug-exposure response relationships using semimechanistic parameters related to, for instance, direct treatment effects or resistance to personalize drug treatment.1, 2 Using TGI models, interindividual variation in tumor growth rate, treatment efficacy, and treatment resistance can be quantified and related to patient-specific characteristics.1, 3 In recent years, TGI models have been integrated with time-to-event models to predict clinical outcomes, such as overall survival, which allow prediction of clinical outcomes based on the patient-specific tumor growth dynamics parameters.4-6

The use of high-dimensional molecular profiling technologies, including next-generation sequencing, to develop personalized treatment schedules is rapidly developing. In particular in oncology, the use of “omics” technologies to characterize tumor-specific molecular differences to predict variation in treatment response is of great interest.7 Although both omics and TGI modeling are of relevance toward personalized treatment strategies, pharmacometric TGI models are not frequently directly applied to high-dimensional covariates. In pharmacometric modeling, stepwise covariate inclusion approaches are still the most commonly used approach to include covariates, which is unsuitable for testing of covariates in a high-dimensional setting.

Current analyses of high-dimensional “omics” datasets predicting treatment response are mostly performed using machine learning (ML) methodologies, such as sparse regression models, random forests, and deep learning, to obtain predictive signatures of treatment response.8-10 The majority of studies with ML approaches are based on either dichotomous survival response or clinical response metrics, such as based on the Response Evaluation Criteria in Solid Tumors (RECIST) system,11, 12 wherein the observed dynamic tumor disease progression profile is reduced into a limited number of categories. These simplified categorical treatment response metrics lack biological or pharmacological relevance, because factors, such as resistance and direct treatment effects, are merged.13

A commonly used ML method is the least absolute shrinkage and selection operator (LASSO), which is a linear regression method with urn:x-wiley:21638306:media:psp412603:psp412603-math-0001 regularization that can be used for high-dimensional analysis, and results in variable selection.14 Although ML approaches, such as sparse regression models using the LASSO,15-18 have been implemented in pharmacometric modeling, they are computationally expensive due to the combination of nonlinearity and estimation of random effects, which often lead to convergence problems. The implementations of the LASSO involve alternating algorithms, which alternate between estimating the random effects and the LASSO optimization, so although LASSO is rather efficient, iterating through multiple random effect estimation steps can severely reduce computational efficiency. This can lead to long computation times and poor convergence rates, especially in high-dimensional settings.

In this study, we propose a two-step approach combining ML, using LASSO regression, with pharmacometric modeling. We demonstrate our approach using a large dataset consisting of longitudinal tumor growth profiles of patient-derived xenograft (PDX) models treated with a variety of mono- and combination regimens.19 We develop pharmacometric tumor growth models quantifying intertumor variation in growth rates, drug effect, and resistance, after which we implement ML-based LASSO models to address the following aims: (1) to predict longitudinal tumor growth profiles based on omics-derived predictors using a multivariate LASSO model; and (2) to identify biological pathways associated with interindividual variation in treatment response or resistance development using a group LASSO regression model (Figure 1).

Details are in the caption following the image
A schematic visualization of the proposed two-step approach. First, the tumor growth curves were modeled to obtain tumor growth parameter estimates, second, the individual estimates tumor growth parameter estimates were regressed on copy number variations (genomics) by different least absolute shrinkage and selection operator (LASSO) techniques. The group LASSO was applied to obtain biological pathways. The multivariate LASSO was applied to predict the tumor growth parameter values, which were then inserted into the tumor growth inhibition model equations to obtain predictions of the tumor growth curves

METHODS

Data

Data from a large scale preclinical study in PDX mice models were used.19 This dataset consisted of over 4000 PDX experiments, which were derived from a total of 277 patients, where multiple PDX experiments were derived from the same tumor. The PDX experiments from one tumor were all treated with different anticancer agents as mono treatment or combination treatment, or left untreated (e.g., natural growth experiments). There was a total of 62 unique treatments and for every tumor treated one PDX was left untreated, leading to an incomplete design with multiple PDX experiments per treatment. Tumor volume was measured daily. For each unique tumor, at the start of the treatment, genomic data based on gene copy number variations (CNVs) were obtained, yielding a total of 23,852 CNVs. We included data for 174 unique tumors and 55 unique treatments, corresponding with 3244 tumor-treatment combinations. This selection was based on availability of CNV data and adequate fit (see section below). The analysis was conducted separately for every treatment, so the number of observations differed per analysis, ranging from 17 to 171 observations (Table S1).

Tumor growth inhibition model

A TGI model was fitted to the longitudinal tumor volume measurements using the nonlinear regression modeling software NONMEM,20 with first order conditional estimation with interaction.4 The TGI model captured the longitudinal tumor volume measurements, per PDX, through estimation of three parameters: growth rate (urn:x-wiley:21638306:media:psp412603:psp412603-math-0002), treatment efficacy (urn:x-wiley:21638306:media:psp412603:psp412603-math-0003), and time-dependent resistance development (urn:x-wiley:21638306:media:psp412603:psp412603-math-0004) in an ordinary differential equation (Equation 1).
urn:x-wiley:21638306:media:psp412603:psp412603-math-0005(1)
with tumor volume urn:x-wiley:21638306:media:psp412603:psp412603-math-0006 at time urn:x-wiley:21638306:media:psp412603:psp412603-math-0007 and tumor growth model parameters urn:x-wiley:21638306:media:psp412603:psp412603-math-0008, urn:x-wiley:21638306:media:psp412603:psp412603-math-0009, and urn:x-wiley:21638306:media:psp412603:psp412603-math-0010. Random effects with a log-normal distribution, were added to all fixed effect TGI parameters as follows: urn:x-wiley:21638306:media:psp412603:psp412603-math-0011.
To fit the TGI model, we first estimated individual value for urn:x-wiley:21638306:media:psp412603:psp412603-math-0012 separately for every tumor using the untreated PDX data (Equation 2).
urn:x-wiley:21638306:media:psp412603:psp412603-math-0013(2)

The empirical Bayes estimates (EBEs) of urn:x-wiley:21638306:media:psp412603:psp412603-math-0014 were extracted and included as data in the TGI model. EBEs in NONMEM is the estimation of the posterior individual random effects (urn:x-wiley:21638306:media:psp412603:psp412603-math-0015), based on the empirically obtained prior distribution of urn:x-wiley:21638306:media:psp412603:psp412603-math-0016 and the individual data, as previously described.21 The residual error was modeled with both an additive and proportional error.

We observed that not all tumor growth curves showed time-dependent resistance development (e.g., regrowth), so both a full TGI model and a reduced model, without a term for resistance, were fitted, effectively allowing urn:x-wiley:21638306:media:psp412603:psp412603-math-0017 to become zero. For every PDX, a likelihood ratio test was conducted to evaluate whether inclusion of urn:x-wiley:21638306:media:psp412603:psp412603-math-0018 added significantly to the model fit (at significance level 0.05). A second criterium was added to only select the full model if the urn:x-wiley:21638306:media:psp412603:psp412603-math-0019 was estimated to be smaller than 1.0, because the term urn:x-wiley:21638306:media:psp412603:psp412603-math-0020 goes to zero very fast with urn:x-wiley:21638306:media:psp412603:psp412603-math-0021 for larger urn:x-wiley:21638306:media:psp412603:psp412603-math-0022, effectively making urn:x-wiley:21638306:media:psp412603:psp412603-math-0023 unidentifiable.

To evaluate the model fit separately for each treatment, we plotted the conditional weighted residuals per treatment, which represent the goodness of fit for the TGI models.22 Due to the large number of tumor growth curves, treatments with curves with a bad model fit were removed from further analysis. The EBEs of urn:x-wiley:21638306:media:psp412603:psp412603-math-0024 and urn:x-wiley:21638306:media:psp412603:psp412603-math-0025 were extracted for the treatments with good model fit.

Tumor growth profile prediction

The multivariate outcome the log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0026, log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0027, and urn:x-wiley:21638306:media:psp412603:psp412603-math-0028 was regressed on the genomic CNV data within every treatment using a multivariate LASSO (Equation 3).23 The multivariate lasso, similarly to the standard LASSO, minimized the loss function to estimate the linear parameters urn:x-wiley:21638306:media:psp412603:psp412603-math-0029 is mainly due to outcome urn:x-wiley:21638306:media:psp412603:psp412603-math-0030 and parameter urn:x-wiley:21638306:media:psp412603:psp412603-math-0031, which are, in this case, both matrices containing a column for every outcome. The penalty term is the root of the summed square error over the vector urn:x-wiley:21638306:media:psp412603:psp412603-math-0032.
urn:x-wiley:21638306:media:psp412603:psp412603-math-0033(3)

The LASSO hyperparameter urn:x-wiley:21638306:media:psp412603:psp412603-math-0034, which determines the size and number of non-zero parameters, was chosen through 10-fold cross-validation, to identify the urn:x-wiley:21638306:media:psp412603:psp412603-math-0035, which minimized the prediction error in terms of mean squared error. This minimizing urn:x-wiley:21638306:media:psp412603:psp412603-math-0036 differed per treatment. The treatments where the minimizing urn:x-wiley:21638306:media:psp412603:psp412603-math-0037 did not outperform the null model, which estimated no non-zero coefficients for the CNVs, were removed from further analysis, both in prediction of the tumor growth curves and the pathway selection. In a second analysis, only the log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0038, log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0039 were regressed on the CNV data. Prediction errors were evaluated both on the scale of the predicted parameter values and on the scale of the predicted tumor growth curves.

The individual TGI parameter values predicted from the LASSO were extracted. The ordinary differential equation (Equation 1) was solved for these predicted parameter values to bring the predictions back on the longitudinal tumor volume scale. For robustness, the cross-validation step was repeated 20 times over different cross-validation splits and the predicted curves were averaged over the 20 repetitions.

A measure of prediction error was defined on tumor curve scale through comparing the curves from the estimated parameters from the TGI model to the curves with the predicted parameters from LASSO. The prediction error was defined as the absolute fraction of the area between the predicted and the estimated curves (ABC) over the area under the estimated curve (AUC), called the scaled ABC (sABC; Equation 4).
urn:x-wiley:21638306:media:psp412603:psp412603-math-0040(4)
for individual urn:x-wiley:21638306:media:psp412603:psp412603-math-0041 with volume urn:x-wiley:21638306:media:psp412603:psp412603-math-0042 estimated from the TGI model fit (IPRED) and volume urn:x-wiley:21638306:media:psp412603:psp412603-math-0043 predicted from the multivariate LASSO. The area is considered until some cutoff urn:x-wiley:21638306:media:psp412603:psp412603-math-0044, which in our study was set to 56 days (2 months). The sABC was used because it is a one-dimensional and interpretable error measure. The sABC metric allowed for the comparison of the two functions produced by the TGI model fit and the LASSO parameter value prediction. The sABC of the LASSO with CNVs was compared with the sABC of the null model, to see whether the CNVs added predictive power.

Pathway selection

To gain biological insight gained beyond selection of individual genes contributing to the predictive performance of treatment efficacy and time-dependent resistance development, the log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0045 and urn:x-wiley:21638306:media:psp412603:psp412603-math-0046 were separately regressed on the CNVs through pathway analysis using overlapping group LASSO.24, 25 The overlapping grouped LASSO uses a combination of the LASSO and the urn:x-wiley:21638306:media:psp412603:psp412603-math-0047 norm, a square root of the sum of squares of the coefficients, which is also used for RIDGE regression,26 to select variables on a group level (Equation 5). Each of the urn:x-wiley:21638306:media:psp412603:psp412603-math-0048 groups contain a set of indices ℐg, including all parameter indices of the urn:x-wiley:21638306:media:psp412603:psp412603-math-0049s in group urn:x-wiley:21638306:media:psp412603:psp412603-math-0050. The size of the group is denoted as |ℐg|, which is used to scale the penalty to account for the different group sizes.
urn:x-wiley:21638306:media:psp412603:psp412603-math-0051(5)

The groups were defined as the pathways from the WikiPathways ontology, which contains a comprehensive overview of biological pathways and processes.27-29 A total of 5998 CNVs was grouped to one or more pathways.

Again, 10-fold cross-validation was used to identify the urn:x-wiley:21638306:media:psp412603:psp412603-math-0052, which determined how many pathways were selected. While utilizing a combination of urn:x-wiley:21638306:media:psp412603:psp412603-math-0053 and urn:x-wiley:21638306:media:psp412603:psp412603-math-0054 penalties, there is only one hyperparameter in the group lasso.24 Subsequently, part of the discovered correlations between pathways and treatment response was researched in literature for validation. This analysis was conducted in R30 (version 3.6.3) using the library grpregOverlap (https://github.com/YaohuiZeng/grpregOverlap).

Code availability

All scripts and models used for the analysis are available on github (https://github.com/vanhasseltlab/PDX).

RESULTS

Tumor growth inhibition model development

The TGI model was fitted to the PDX tumor growth curves, separately for every treatment. For three treatments, no model was converged, these were left out of the analysis. The model fit was evaluated through the conditional weighted residuals (Figure S1) and the visual inspection of the PDX fits (Figure S2). The visual inspection showed the tumor dynamics for treatment TAS266 were not captured. Combination therapy LFW527 and binimetinib showed skewed residual distributions. The two treatments were discarded for further analysis. The model fit for the other treatments was sufficient.

All individual parameter estimates (EBEs) were extracted from the TGI model (Figure 2a). Figure 2b shows how the values of the parameter estimates affect the curve. The percentage of PDX experiments with non-zero time-dependent resistance development was 12.6%. The TGI model for the chosen treatments showed sufficient fits for the next step parameter values prediction step.

Details are in the caption following the image
Results of the tumor growth inhibition (TGI) model estimation. (a) The distributions of the individual, estimated TGI parameters. (b) Selected tumor growth profiles showing how urn:x-wiley:21638306:media:psp412603:psp412603-math-0055 and urn:x-wiley:21638306:media:psp412603:psp412603-math-0056 vary for different treatments, with from left to right a very ineffective treatment, a slightly effective treatment, a very effective treatment and a very effective treatment with time-dependent resistance development

The effect of shrinkage of the individual prediction values (urn:x-wiley:21638306:media:psp412603:psp412603-math-0057), often referred to as eta-shrinkage, was evaluated in Figure S2. The fit of the individual growth curves was not influenced by shrinkage. Because the tumor volumes were densely sampled over time, with an average of 0.3 samples per day, 50 days follow-up time, and 14 measurements per experiment, we did not expect problems with shrinkage.

Prediction of tumor growth profiles using genomic predictors

The estimated individual TGI parameters urn:x-wiley:21638306:media:psp412603:psp412603-math-0058, log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0059, and log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0060 were simultaneously predicted by the multivariate LASSO. The prediction errors for the urn:x-wiley:21638306:media:psp412603:psp412603-math-0061, urn:x-wiley:21638306:media:psp412603:psp412603-math-0062, and urn:x-wiley:21638306:media:psp412603:psp412603-math-0063 were calculated as root mean square error (RMSE). Although the variation between the treatments was high (Figure S3a), overall, the RMSE was high. For urn:x-wiley:21638306:media:psp412603:psp412603-math-0064, the RMSE was 0.035, whereas a mean estimated urn:x-wiley:21638306:media:psp412603:psp412603-math-0065 of 0.0564. For both the urn:x-wiley:21638306:media:psp412603:psp412603-math-0066 and the urn:x-wiley:21638306:media:psp412603:psp412603-math-0067, the RMSEs 0.044 and 0.049, respectively, were actually higher than the mean estimated urn:x-wiley:21638306:media:psp412603:psp412603-math-0068 (0.033) and the urn:x-wiley:21638306:media:psp412603:psp412603-math-0069 (0.0564), indicating a bad prediction of the tumor growth dynamics from CNVs. A multivariate LASSO with only the log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0070 and log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0071 was also fitted. These two LASSO models were compared based on the prediction error of the log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0072 and log-transformed urn:x-wiley:21638306:media:psp412603:psp412603-math-0073 and the sABC error measure (Figure S3), where the model without predicting the urn:x-wiley:21638306:media:psp412603:psp412603-math-0074 seemed to fit better, especially in the case of combination therapies BYL719 and cetuximab, and BKM120 and LJC049, which was used for consecutive analysis.

Of 52 treatments, 33 treatments were detected with a better prediction than the null model, based on the average MSE over the cross-validation replications. For the other treatments, the predictive ability was not improved by adding CNVs as predictors to the LASSO regression. The log urn:x-wiley:21638306:media:psp412603:psp412603-math-0075 and log urn:x-wiley:21638306:media:psp412603:psp412603-math-0076 were transformed back to their original scale and the parameters were used to solve the ordinary differential equation (Equation 1) from the model.

The predictive performance of the LASSO for predicting the TGI parameter values was evaluated by comparing the curves from the predicted estimates to the curves from the TGI model fit, because the estimated curves were already shown to fit the data well. The predictions and estimations are functions instead of measures, so the scaled area between the curves was calculated as error. The overall median sABC is 0.456, which can be interpreted as the area between the predicted and estimated curve, is less than half the area below the estimated curve (Figure 3a). The sABC distributions for the different treatments are shown (Figure 3b). A lower sABC shows a lower prediction error. There were 23.6% of the curves that had an sABC below 0.2, so the difference between the curves is less than 20% of the AUC of the estimated curve. The treatment LFA102 has a median sABC of only 0.153, indicating a good prediction. The worst predictions are in the treatment LGH447 with a median sABC of 0.867. Compared with the null model, the LASSO reduced the sABC by a median decrease of 3.8%. This shows low predictive ability of the CNVs to predict tumor growth curves.

Details are in the caption following the image
Predicted curves from the multivariate least absolute shrinkage and selection operator. (a) Tumor growth curves visualized with the area under the estimated curve (orange) and between the estimated and predicted curves (grey) and the error (in scaled area between the predicted and the estimated curves [sABC]). From left to right show a very good prediction to a very poor prediction. (b) The distributions of the individual patient-derived xenograft sABCs for the different treatments, given by the interquartile range. Outliers are not included in the plot

Identification of pathways associated with treatment efficacy and resistance

The TGI parameter values of urn:x-wiley:21638306:media:psp412603:psp412603-math-0077 and urn:x-wiley:21638306:media:psp412603:psp412603-math-0078 were regressed on CNVs grouped in pathways using the overlapping group LASSO. The group LASSO selected the pathways with predictive power for the 33 treatments where predictiveness was shown in the curve prediction step. Out of the 472 pathways from WikiPathways,29 71 different pathways were selected for one or more of 19 different treatments, with a total of 118 detected pathway-treatment response correlations (Figures 4, 5). The pathways were specifically correlated to either treatment efficacy or resistance development. More pathways were identified for urn:x-wiley:21638306:media:psp412603:psp412603-math-0079 than urn:x-wiley:21638306:media:psp412603:psp412603-math-0080, due to smaller variation in urn:x-wiley:21638306:media:psp412603:psp412603-math-0081.

Details are in the caption following the image
Selected pathways obtained by the group least absolute shrinkage and selection operator for the treatment efficacy (urn:x-wiley:21638306:media:psp412603:psp412603-math-0082 = black), time-dependent resistance development (urn:x-wiley:21638306:media:psp412603:psp412603-math-0083 = orange) or both (blue) over the different treatments. The distribution of pathways found for different treatments (top)
Details are in the caption following the image
Overview of overlapping pathways between the different treatments. The nodes are the treatments (white background) and pathways (orange background) and the edges indicate which tumor growth inhibition parameter links the pathway to the treatment

For paclitaxel, trastuzumab, encorafenib, and figitumumab, the US Food and Drug Administration (FDA) approved drugs administered as monotherapy, we compared identified pathways with literature reports to evaluate their biological validity. We identified 14 pathways for these 4 drugs, of which 9 could be confirmed in literature (Table 1), where we confirm previously described mechanisms were detected through our method.

TABLE 1. Pathway-treatment correlations found in literature
Treatment Pathway Pathway description Response type Literature Relation
Paclitaxel WP2290 RalA downstream regulated genes urn:x-wiley:21638306:media:psp412603:psp412603-math-0084 Ganapathy et al. (2016)31 Paclitaxel is a mitotic inhibitor by stabilizing the microtubule and RalA has been previously shown to disrupt microtubule formation and inducing mitotic catastrophe.
Trastuzumab WP311 Synthesis and degradation of ketone bodies urn:x-wiley:21638306:media:psp412603:psp412603-math-0085 Jobard et al. (2017)32 Ketone production was shown to be increased with effective trastuzumab treatment.
WP4146 Macrophage markers urn:x-wiley:21638306:media:psp412603:psp412603-math-0086 Shi et al. (2015)33 Trastuzumab interacts with Fcγ receptors on macrophages for the killing of HER2 cancer cells
WP4225 Pyrimidine metabolism and related diseases urn:x-wiley:21638306:media:psp412603:psp412603-math-0087 Ghosh et al. (2009),34 Liu et al. (2019)35 The pyrimidine metabolism pathway has been found in previous studies to correlate with drug response to Trastuzumab, based on pathway enrichment analysis in transcriptomics and metabolomics studies
WP4191 Caloric restriction and aging urn:x-wiley:21638306:media:psp412603:psp412603-math-0088 Chappell et al. (2011)36 There is a connection between Raf/MEK inhibitors and aging
WP4186 Somatroph axis (GH) and its relationship to dietary restriction and aging urn:x-wiley:21638306:media:psp412603:psp412603-math-0089 Chappell et al. (2011)36 There is a connection between Raf/MEK inhibitors and aging
Encorafenib WP3595 mir−124 predicted interactions with cell cycle and differentiation urn:x-wiley:21638306:media:psp412603:psp412603-math-0090 Ross et al. (2018)37 Resistance to Encorafenib has been shown to be correlated to cell cycle and differentiation
WP4269 Ethanol metabolism resulting in production of ROS by CYP2E1 urn:x-wiley:21638306:media:psp412603:psp412603-math-0091 Friedlander et al. (2019)38 Ethanol metabolism resulting in production of ROS by CYP2E1 was found to have a connection to the development of melanoma, thus might be related to drug efficacy of encorafenib in melanoma treatment.
WP4495 IL−10 Anti-inflammatory Signaling Pathway urn:x-wiley:21638306:media:psp412603:psp412603-math-0092 Szczepaniak Sloane et al. (2017)39 , Sumimoto et al. (2006)40 IL-10 has been researched in the context of overexpression of Raf in patients with cancer, showing that IL-10 is an immunosuppressive factor that is decreased by MEK inhibitors
  • Scientific literature indicating previous findings on the pathways correlated to treatment efficacy and resistance, for the treatments with paclitaxel, trastuzumab, and encorafenib.
  • Abbreviation: ROS, reactive oxygen species.

DISCUSSION

In order to utilize high-dimensional omics data to further advance treatment response prediction and understanding, we developed a two-step approach combining an ML method with pharmacometric modeling.

We showed how CNVs can contribute to prediction of variability in tumor growth dynamics. This approach establishes a practical framework to enable personalized treatment selection or even dose optimization. Even though we have applied our approach to preclinical PDX data, TGI models have been widely used for modeling of clinical tumor size measurements to which our approach can be applied. Pharmacometric models, including TGI models, are typically based on ordinary differential equation (ODE) models, which is why we have chosen to formulate our model as ODEs and not using an analytical expression. Importantly, the use of a TGI model enables further integration with either clinical outcome prediction models41 or it can be integrated with pharmacokinetic-pharmacodynamic (PK-PD) models for TGI to refine dosing regimens to optimally suppress tumor growth. We expect this approach can also be implemented for the analysis of clinical tumor growth data.

In this study, we have set a cutoff of 56 days to evaluate the ability to back-predict tumor growth profiles; however, the predictions can also be extrapolated over a longer time-span, depending on the nature of available omics-data or specific disease or treatment characteristics. In terms of this sABC, the CNVs did, however, not show great improvement of predictive ability as compared with a null model. This was already visible in the large prediction errors on the tumor growth parameter values. The predictive ability was evaluated instead of the model fit in order to study more generalizable results. A large proportion of variance can often be explained by omics data, but the high-dimensional nature of the data makes it hard or impossible to distinguish between noise and structural differences.

To identify biological factors predictive for either treatment efficacy or resistance development we used a group LASSO, grouping individual gene-associated CNVs to known biological pathways. We have used the WikiPathways ontology for grouping pathways, although other pathway databases can be used in a similar fashion. The pathway group LASSO yields a set of pathways predictive of the outcome treatment efficacy and time-dependent resistance development. Of 14 identified pathways predictive of treatment efficacy and resistance development, 9 pathways were confirmed by literature search. This is an indication of how omics pathway analysis for dynamic tumor growth responses could be a useful tool for validating pathway associations with factors responsible for treatment response, as well as discovering new correlations with pathways. Such a pathway-oriented approach has been previously proposed, but not in context with TGI or pharmacometric modeling.42

In this study, we have used two versions of LASSO regression for two distinct aims: variable selection and prediction. We selected the use of the LASSO over other ML approaches due to its intrinsic property of variable selection.14 The selection for variables in high-dimensional data is not well accommodated in many algorithms, whereas the LASSO inherently shrinks noise variables to zero. The LASSO can achieve high sensitivity, but it can suffer from low specificity, this, however, is not considered as much of a problem in exploratory analyses.

The use of the group LASSO allows for direct pathway selection based on omics data, which is computationally efficient and interpretable.42 The variable selection performance of the LASSO has been investigated previously, and has been shown to perform competitively or comparatively better than other methods.43-45

The multivariate LASSO was used to simultaneously predict the three model parameter values. A limitation of the multivariate LASSO used in this study is that it does not take into account the dependence between the outcome variables, whereas the tumor growth model parameters are expected to be correlated. A second limitation was shown by the comparison between the predictions with and without adding urn:x-wiley:21638306:media:psp412603:psp412603-math-0093 to the multivariate outcome. Prediction of one parameter can restrict the prediction of another parameter. We expect this problem can be overcome by better modeling of the joint and marginal distributions of the multivariate outcome.

The LASSO has been previously implemented in pharmacometric nonlinear mixed effect models.16-18 These direct implementations have the advantage of informing the LASSO directly within the longitudinal modelling. Models with a very high number of variables, however, become computationally hard. To our knowledge, these LASSO implementations have not been successfully applied to very high-dimensional data, where the number of variables (p) was an order of magnitude larger than the number of observations (n), either due to convergence problems, or exploding computation times. The two-step method is more dependent on the fit of the first model and the accuracy of the EBEs. Our method is more feasible in high-dimension, because the steps of the complex longitudinal model estimation and the high-dimensional predictors are separated.

The two-step approach can directly use other ML algorithms besides the LASSO. Algorithms, such as Random Forests and Gradient Boosting, are able to capture nonlinearity more easily, and can be used to improve model prediction accuracy. There is still a challenge in modeling multiple outcomes at the same time, such as the urn:x-wiley:21638306:media:psp412603:psp412603-math-0094, urn:x-wiley:21638306:media:psp412603:psp412603-math-0095, and urn:x-wiley:21638306:media:psp412603:psp412603-math-0096 in our study, but multivariate outcome modeling extensions have been made in in other high-dimensional methods, such as random forests,46 which can be also used to predict tumor growth parameter values, as in the second step of our approach.

In summary, we demonstrated how combining machine learning and pharmacometric modeling can be used to gain pharmacological understanding of factors driving variation in treatment response, and to enable omics-based personalized treatment regimens.

Conflict of Interest

The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS

L.B.Z., K.L.W.D., T.G, J.J.M., P.J.U., and J.G.C.H. wrote the manuscript. J.G.C.H. and L.B.Z. designed the research. L.B.Z., K.L.W.D., M.J., and J.G.C.H. performed the research. L.B.Z., K.L.W.D., and M.J. analyzed the data.