Volume 113, Issue 6 p. 1217-1222
Review
Open Access

The Aetion Coalition to Advance Real-World Evidence through Randomized Controlled Trial Emulation Initiative: Oncology

David Merola

Corresponding Author

David Merola

Aetion, Inc., New York, New York, USA

Correspondence: David Merola ([email protected])Search for more papers by this author
Ulka Campbell

Ulka Campbell

Aetion, Inc., New York, New York, USA

Search for more papers by this author
Nileesa Gautam

Nileesa Gautam

Aetion, Inc., New York, New York, USA

Search for more papers by this author
Alexa Rubens

Alexa Rubens

Aetion, Inc., New York, New York, USA

Search for more papers by this author
Sebastian Schneeweiss

Sebastian Schneeweiss

Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA

Search for more papers by this author
Shirley V. Wang

Shirley V. Wang

Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA

Search for more papers by this author
Gillis Carrigan

Gillis Carrigan

Center for Observational Research, Amgen, San Francisco, California, USA

Search for more papers by this author
Victoria Chia

Victoria Chia

Center for Observational Research, Amgen, San Francisco, California, USA

Search for more papers by this author
Osayi E. Ovbiosa

Osayi E. Ovbiosa

AbbVie, North Chicago, Illinois, USA

Search for more papers by this author
Simone Pinheiro

Simone Pinheiro

AbbVie, North Chicago, Illinois, USA

Search for more papers by this author
Amanda Bruno

Amanda Bruno

Bayer Pharmaceuticals, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Xiaolong Jiao

Xiaolong Jiao

Bayer Pharmaceuticals, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Mark Stewart

Mark Stewart

Friends of Cancer Research, Washington, District of Columbia, USA

Search for more papers by this author
Rachele Hendricks-Sturrup

Rachele Hendricks-Sturrup

Duke-Margolis Center for Health Policy, Washington, District of Columbia, USA

Search for more papers by this author
Carla Rodriguez-Watson

Carla Rodriguez-Watson

Reagan-Udall Foundation for the Food and Drug Administration, Washington, District of Columbia, USA

Search for more papers by this author
Sajan Khosla

Sajan Khosla

AstraZeneca, Gaithersburg, Maryland, USA

Search for more papers by this author
Yiduo Zhang

Yiduo Zhang

AstraZeneca, Gaithersburg, Maryland, USA

Search for more papers by this author
Mothaffar Rimawi

Mothaffar Rimawi

Baylor College of Medicine, Houston, Texas, USA

Search for more papers by this author
Jenny Huang

Jenny Huang

Gilead Sciences, Foster City, California, USA

Search for more papers by this author
Aliki Taylor

Aliki Taylor

Gilead Sciences, Foster City, California, USA

Search for more papers by this author
Lauren Becnel

Lauren Becnel

Pfizer, New York City, New York, USA

Search for more papers by this author
Lynn McRoy

Lynn McRoy

Pfizer, New York City, New York, USA

Search for more papers by this author
Joy Eckert

Joy Eckert

Reagan-Udall Foundation for the Food and Drug Administration, Washington, District of Columbia, USA

Search for more papers by this author
Benjamin Taylor

Benjamin Taylor

Aetion, Inc., New York, New York, USA

Search for more papers by this author
First published: 21 November 2022

Abstract

Legislative and technological advancements over the past decade have given rise to the proliferation of healthcare data generated from routine clinical practice, often referred to as real-world data (RWD). These data have piqued the interest of healthcare stakeholders due to their potential utility in generating evidence to support clinical and regulatory decision making. In the oncology setting, studies leveraging RWD offer distinct advantages that are complementary to randomized controlled trials (RCTs). They also permit the conduct of investigations that may not be possible through prospective designs due to ethics or feasibility. Despite its promise, the use of RWD for the generation of clinical evidence remains controversial due to concerns of unmeasured confounding and other sources of bias that must be carefully addressed in the study design and analysis. To facilitate a better understanding of when RWD can provide reliable conclusions on drug effectiveness, we seek to conduct 10 RWD-based studies that emulate RCTs in oncology using a systematic, protocol-driven approach described herein. Results of this investigation will help inform clinical, scientific, and regulatory stakeholders on the applications of RWD in the context of product labeling expansion, drug safety, and comparative effectiveness in oncology.

Legislative and technological advancements over the past decade have given rise to the proliferation of healthcare data generated from routine clinical practice, often referred to as real-world data (RWD).1-3 These data, which include registries, administrative claims, and electronic health record (EHR) databases, have piqued the interest of healthcare stakeholders due to their potential utility in generating evidence to support clinical and regulatory decision making (i.e., real-world evidence (RWE)).4-7

In the context of oncology, RWE studies offer distinct advantages that are complementary to randomized controlled trials (RCTs). RWD often cover large populations with rare malignancies or disease subtypes, and enable investigation of patients who may be ineligible for RCTs (e.g., those with poor performance status or multiple cancer diagnoses). They also permit the conduct of investigations that may not be possible through prospective designs due to ethics or feasibility. Evidence generation from RWD is generally more rapid and less resource-intensive than RCTs, and, as the quality and availability of routinely collected healthcare data have greatly improved over the past decade, so too has the application of these data in answering an array of clinical, regulatory, and economic questions.8

Despite its potential, the use of RWE, particularly when assessing treatment effectiveness, remains controversial.5 In addition to confounding from lack of randomized treatment assignment, RWE studies are susceptible to other sources of bias that must be carefully addressed in the study design and analysis, such as immortal time, differential surveillance, or misclassification.9-11 These studies may also face unique challenges pertaining to the origins and quality of the data.8, 10 For example, EHR data in oncology frequently have variables that are not systematically captured or have poor continuity resulting in missingness and limited visibility of care across the healthcare continuum.12-14 Although no data sources are perfect, addressing limitations with modern epidemiological and statistical methods15-17 is essential to high-quality studies in oncology.

To provide a clear standard for designing an RWD study that provides valid and generalizable conclusions about safety and effectiveness, researchers are encouraged to follow a target trial17 approach in which the design of the imagined RCT that would be done if feasible is emulated in the RWD study design. Extending this guidance, investigators have evaluated the potential of RWD studies to provide the same evidence as RCTs by emulating completed or in-progress trials.6, 7, 18 To date, only some RWE studies in oncology have demonstrated findings that were consistent with their analogous randomized counterparts.19-26 Although a promising start, more of these studies are needed to better understand the treatment settings and end points in which RWE is most likely to render valid conclusions.

Herein, we describe a framework for systematically emulating trials in oncology and comparing RWE findings to their RCT counterparts. This framework has been adapted from prior emulation studies,7, 18 which includes protocol pre-registration, feasibility checkpoints, power analyses, and propensity score-based methods for confounding control. The purpose of this work is to demonstrate whether RWE studies can provide reliable conclusions on treatment effectiveness to inform further applications of RWD in pharmaceutical product label expansion, postmarketing safety, and other purposes that are complementary to RCTs. We intend to conduct 10 studies in oncology using a systematic, protocol-driven approach, to facilitate a better understanding of the utility of RWE in providing reliable conclusions on drug effectiveness. Of the 10 studies, some may be multiple emulations of a single RCT using different databases.

METHODS

Steering committee

A multistakeholder committee has been established for the purpose of informing and overseeing the design and execution of this project. The committee is composed of voting members, including representatives of policy organizations and academic institutions, as well as non-voting members from pharmaceutical manufacturers, who will collectively contribute scientific, regulatory, and clinical expertise throughout the project. Although the expertise of steering committee members may be sought throughout the conduct of this study, all final decisions on the study design, execution, and interpretation of findings will be the responsibility of the direct study team.

Randomized controlled trial selection

Our aim is to complete a total of 10 RWE studies, noting that data feasibility may limit the trials that can be emulated. RCTs targeted for emulation by RWE studies will be identified from RCTs that led to a recent US Food and Drug Administration (FDA) drug approval during 2015–2020. Such trials are likely to be of high quality and clinical relevance, and thus provide the highest standard for evaluating the robustness of RWE studies for regulatory decision making. Furthermore, recent data are more likely to be captured within contemporary candidate data sources. Trials with an active comparator arm are preferred because emulation of “placebo” treatments in pharmacoepidemiologic studies can often result in intractable confounding due to disease severity.27 Placebo-controlled trials may be included if the placebo was used with an active medication (e.g., placebo plus fulvestrant). Physicians' choice as a comparator will also be acceptable if the choice of medications is specified in the trial protocol.

To support sufficient sample size in our emulation studies, trials involving common tumor types will be preferred—particularly metastatic breast cancer, colorectal cancer, and non-small cell lung cancer. The Center Watch28 and FDA New Drug Approvals—centralized repositories containing clinical trial information—will be queried to identify trials. For each candidate trial, the definitions of the eligibility criteria, exposure/comparator, and outcome will be extracted and reviewed by the study team to preliminarily assess the feasibility of replication in RWD sources. Following the data feasibility steps described below, a final list of RCTs to be emulated will be established and reported along with a Consolidated Standards of Reporting Trials (CONSORT)-style diagram. Some RCTs may be emulated more than once using different data sources to understand the extent to which different data sources are fit-for-purpose. If it is determined by the investigators that there are no databases available that are fit-for-purpose or analyses would not be feasible (e.g., due to insufficient power), the emulation will not be carried out and reasons documented in the protocol. However, once feasibility and at least one viable data source has been established, the emulation will proceed, and results will be reported.

Data feasibility

Once a candidate RCT has been selected, its emulation will proceed through several phases. First, the study eligibility criteria, exposure, outcome, and potential confounders will be operationally defined. The data requirements (Table S1) for emulating those operationalized definitions in RWD sources will then be identified and ranked in terms of uniqueness. This will enable us to efficiently filter candidate data sources and identify those to be selected for a more detailed feasibility assessment. Administrative claims cannot be used, as these data do not contain clinical information vital to emulation of oncology trials (tumor stage, performance status, etc.). Therefore, our feasibility assessments will be focused on EHR-based data sources or registries, which are more likely to meet our minimum data requirements. Because practice patterns and standards of care may differ by country, this study will focus on data derived from patients in the United States only. Data reliability, including quality and provenance, will also be considered in this step, as this can vary greatly among EHR data sources. Based on the findings of the detailed feasibility assessment, one or more data sources deemed fit-for-purpose will be accessed for exploration, such as confirmation of variable completeness and cohort size.

Data explorations will be carried out iteratively to inform the development of the study design and analysis plan. At this stage, no analysis connecting the treatment and outcome(s) of interest will be performed. A HARmonized protocol template29 will be completed for every emulation study describing these explorations. First, an analysis unstratified by the exposure group will be conducted to examine the sample size, distribution of outcome events, follow-up time, and censoring reasons in the entire study population to explore how the potential study population aligns with that of the RCT. The distribution of patient characteristics will be compared between treatment groups using a c-statistic and absolute value of the standardized difference30 before and after matching on basic covariates (e.g., age and sex) and all prognosticators of the outcome (i.e., potential confounding variables).30 Variables selected for inclusion in the propensity score model will be based on substantive knowledge of the study team and members of the steering committee. During feasibility analyses, investigators’ knowledge of the outcome distribution by treatment group may bias decisions pertaining to study design. Therefore, where necessary (e.g., examination of censoring reasons), a dummy outcome will be used to eliminate this risk. Last, a CONSORT-style diagram31 will be created for each individual RWE study, displaying the impact of each eligibility criterion on cohort attrition.

Statistical power considerations

The statistical power computation of the emulating cohort study in the RWD will be carried out in two stages. In the initial stage, a power assessment will be conducted in a 1:1 propensity score matched population to determine whether the RWE study is powered to detect a point estimate similar to that observed in the RCT being emulated for the outcome of interest. Patients will be selected based on the diagnosis of interest and adapted RCT eligibility criteria prior to matching. In a 1:1 matched population, there may be many unmatched patients that are discarded; therefore, we expect the power analysis to provide conservative estimates. Only basic covariates (e.g., age and sex) will be used for matching in this initial phase. The final stage will consist of a similar process with a power analysis in a 1:1 matched population, however, with greater emphasis on the ability to achieve covariate balance. This iterative, two-stage approach will allow us to isolate the root cause of a problem with feasibility, whether it be from a lack of outcome events or inability to achieve a good match. Ultimately, this will help inform the study design (e.g., by directing us to adjust the propensity score model or perhaps relax some eligibility criteria). Relaxing eligibility criteria to improve precision of the RWE study estimates may result in a cohort that is less representative of the RCT population. Therefore, such actions will be noted in the protocol and in the interpretation of results if the study is carried out. If there is more than one database chosen, all feasibility/power and statistical analyses will be carried out separately within each database and, if appropriate, pooled together with meta-analysis methods.

Missing data

During the feasibility analysis, the number of missing values (if present) will be described for each variable in the treatment-stratified patient characteristics table. The potential reasons underlying missing data will be discussed among study team members and subsequently classified into one of three theoretical mechanisms of missingness—missing completely at random, missing at random, or missing not at random.32 These mechanisms will be documented in the study protocol and guide our approach to addressing the missing values.

Determination of feasibility

Based on all aforementioned feasibility analyses, as well as the steering committee and study team's collective subject matter knowledge and expertise, the investigators and steering committee will make a final decision of whether or not to proceed with the emulation. If a decision is made to proceed with the emulation, any concerns regarding the RWE study's validity will be noted in the protocol prior to its execution.

Protocol development, registration, and execution

When a candidate trial and fit-for-purpose data source have been chosen, a protocol will be drafted describing the following elements: a brief description of the randomized trial's study design elements, all feasibility analyses (including post-matching balance assessment), initial and final power analyses, eligibility criteria, and exposure and outcome definitions. Once a study design and analysis plan—informed by the feasibility phase—have been established, the protocol will be uploaded to clinicaltrials.gov prior to execution. All analyses relating to the study, including feasibility checks, will be conducted on the Aetion Evidence Platform,33 which keeps a dated log of every contact made with the data by each investigator. This is intended to facilitate transparency and reproducibility.

Data analysis considerations

Differences in an analytic approach could result in discrepancies between the results of RWE studies and RCTs. For instance, conventional outcome regression may yield a different conclusion than a marginal structural model in the presence of time-varying confounding.34, 35 Therefore, we may take multiple approaches to data analysis, designating the statistical estimand that is expected to yield a result most similar to the RCT's as the primary analysis. All details will be specified in the registered protocol on clinicaltrials.gov. Our goal is to achieve statistical comparability and regulatory agreement between the RCT and RWE studies.

Along the same lines, intention-to-treat analyses carried out in many clinical trials may yield very different results if applied in RWE studies. This is because patients in routine practice are apt to change therapy or have differential adherence due to the natural evolution of disease status and treatment response. Therefore, alternative approaches that account for treatment crossover occurring in real-world practices may be used. In addition to these principles, expertise from our steering committee will be sought to inform decision making in this domain.

Exploratory and sensitivity analyses

Data generated from this study may be valuable in answering an array of questions that are beyond the scope of the primary aim. Particularly, explorations of correlations between proxy end points and established effectiveness endpoints (e.g., time-to-next-treatment vs. progression-free survival) and analytic strategies accounting for time-varying confounding, informative censoring, or competing risks are of interest. Any such analyses that deviate from the primary aim will be reported as exploratory. Furthermore, additional analyses that are intended to examine the robustness of our primary results to assumptions will be described as planned sensitivity analyses in the protocol.

RCT vs. RWE agreement and interpretation

In this study, we will compare the results of each RWE study against the results of the RCT it was designed to emulate under the assumption that the RCT achieved valid, unbiased estimates. However, there are many reasons unrelated to validity that could explain differences between their results.36 Often termed the efficacy-effectiveness gap,37 these differences may be due to effect modification by prescribing behaviors, medication adherence, access to healthcare resources, patient characteristics, and variable measurement, which can all vary between routine care settings and the tightly controlled environments of RCTs. We will attempt to account for as many of these factors as possible through various study design strategies, including emulation of the RCT's eligibility criteria, use of analyses that account for differential adherence, and choosing a similar statistical estimand as in the trial. If there are large differences in the distributions of known effect modifiers between the study populations, standardization may also be used as a sensitivity analysis.

Despite our best efforts, it may not be possible to adjust for all differences between the RCT and RWE studies and, in some cases, this may be instructive. Particularly, some oncology end points (e.g., progression-free survival and treatment response) are inherently challenging to emulate in EHR data due to their reliance on unstructured data elements, such as radiologic imaging findings and frequency, and/or clinically actionable genomic data. Consequently, some database administrators may curate purportedly analogous variables based on abstracted information from patient charts; calibrating the performance of these so-called “real-world” end points against RCT evidence may illuminate the utility of these pre-study curated variables in assessing drug effectiveness. Furthermore, some investigators have demonstrated that heterogeneity exists in the recording of certain outcomes between different databases.38 Our investigation will permit an opportunity to explore this heterogeneity across multiple outcomes and different data sources. Aside from outcome measurement, any other known or potential reasons that could explain the observed results will be documented.

Using a similar approach to that previously described,7 we will use three metrics to assess agreement between the RCT and RWE studies: regulatory agreement, estimate agreement, and standardized differences (Figure 1). Briefly, a regulatory agreement will be assessed as having a similar direction and statistical significance of the RCT estimate; an estimate agreement will be defined by the RWE point estimate lying within the bounds of the 95% confidence interval estimate of the RCT; and standardized difference.39

Details are in the caption following the image
A depiction of the behavior of agreement metrics across various combinations of potential results. (a) Both RCT and RWE estimates have the same direction and statistical significance, but the RWE estimate does not lie within the 95% CI of the RCT estimate. (b) The RWE estimate lies within the 95% CI of the RCT estimate, but does not have the same statistical significance. (c) Both estimates have the same direction and statistical significance. (d) The RWE and RCT estimates fail to have either the same statistical significance or direction. Note: Figures are for illustration purposes only and are not drawn to scale.EA, estimate agreement; RA, regulatory agreement; RCT, randomized controlled trial; RR, relative risk; RWE, real-world evidence; SD, standardized difference.

The advantage of the regulatory agreement measure is that it provides a binary indicator of whether a similar regulatory decision might be made on the basis of the RWE study as the RCT, all else being equal. However, because RWE studies tend to have larger sample sizes and greater statistical power than RCTs, it is possible that an RWE study may not meet regulatory agreement, despite having a similar point estimate to the RCT. This is an advantage of the estimate agreement metric, which is robust to such cases (Figure 1b). Last, because the regulatory and estimate agreement metrics are dichotomous, they are uninformative of the direction and magnitude of the difference between RWE and RCT estimates. Standardized difference provides this important information, which is useful for further evaluating the reasons underlying discrepancies in the results of two studies’ findings. Further details on the advantages and disadvantages of each agreement metric have been described elsewhere.7

DISCUSSION

In some clinical oncology settings, small patient populations and other challenges can result in a limited evidence base to guide clinical and regulatory decisions. The rapidly growing availability of data collected from routine clinical practice has opened up potential opportunities to fortify this evidence base in a manner that is complementary to RCT. Additionally, the FDA recently issued draft guidance on the execution and use of RWE to support regulatory submissions.40 To better understand the potential of these types of data and analyses in oncology, this investigation seeks to conduct large-scale, systematic, and transparent comparisons of RWE and RCT studies. Our approach extends a previously developed framework6, 7, 18 for emulating RCTs by inclusion of oncology clinical trials and EHR-based data resources that have been emerging on the marketplace.

A major strength of our study is its broad scope, targeting a variety of common tumor types and data resources. Where multiple databases are used to emulate a single RCT, our approach will allow for an exploration of the heterogeneity of so-called “real-world” end points and study outcomes between available data resources. This can provide a sense of the overall availability and reliability of RWD resources, as well as illuminate which data attributes are most important to account for in studies of a particular tumor type. An additional strength of our study is pre-registration of the protocol and execution on a platform that records all interactions with the data. Carrying out the study in this way prevents knowledge of the study outcome from influencing study design. It also facilitates transparency and reproducibility.

There are several limitations to our approach. First, it is likely that not every element of each randomized trial can be precisely emulated. Under such circumstances, investigators will need to use personal judgment when defining certain features of the RWE study. These decisions may alter the causal question and, as such, it will be important to examine our results considering such differences between the RCT and RWE—particularly characteristics that are known effect modifiers—in the interpretation of each study. Even with perfect emulation of study eligibility criteria, it is possible that there will be differences in patient populations between the RCT and RWE emulation studies. In addition to population differences and emulation failures, it is possible for random variability to explain any observed results, as previously noted.7

Our investigation will be the first to systematically explore the reproducibility of oncology RCT findings using RWD on a large scale, with carefully designed studies that use fit-for-purpose RWD. Results of this investigation will help inform clinical, scientific, and regulatory stakeholders on the utility and applications of RWE in the context of product labeling expansion, drug safety, and comparative effectiveness in oncology.

FUNDING

This work was funded by contracts with Bayer Pharmaceuticals, AbbVie, AstraZeneca, Pfizer, and Gilead Sciences.

CONFLICTS OF INTEREST

D.M., U.C., A.R., N.G., and B.T. report employment compensation and ownership of equity in Aetion, Inc., a software development company. S.S. is a consultant to and owns equity in Aetion, Inc. and reports receiving grants or contracts from Boehringer Ingelheim, UCB, and Vertex for unrelated projects. S.V.W. reports participation in the scientific advisory board of this project with no compensation, and service on the Board of the International Society for Pharmacoepidemiology and the Editorial Board of Pharmacoepidemiology and Drug Safety. G.C. and V.C. report employment at and ownership of stock grants from Amgen. O.E.O. reports employment at AbbVie and funding for this project by AbbVie. A.B. reports support of this project by Bayer as a member of the scientific advisory board of this project. X.J. reports participation as a member of the scientific advisory board of this project and employment and stock in Bayer. R.H.S. reports employment compensation for unrelated work with the National Alliance Against Disparities in Patient Health. C.R.W. reports grants or contracts for research support from the FDA, Pfizer, AbbVie, Merck, and Novartis; direct payment from Global Genes to administer a lecture at the Rare Disease Drug Symposium; and stock or stock options from Gilead. S.K. reports providing funding for this project from AstraZeneca and employment and stock compensation at AstraZeneca. Y.Z. reports employment and stock compensation from AstraZeneca and service on the scientific advisory board of this project. M.R. reports grants or contracts from Pfizer; consulting fees from Novartis, Seagen, AstraZeneca, and Macrogenics; and a pending patent application # PCT/US21/70543 (Methods for breast cancer treatment and prediction of therapeutic response) filed and owned by Baylor College of Medicine. A.T. reports employment and stock compensation from Gilead. L.B. and L.M. report employment and stock compensation from Pfizer. J.E. reports employment with the Reagan-Udall Foundation for the FDA and a salary that comes from grants and contracts with the FDA; and unpaid membership of the Board of Directors for the National Women's Health Network from 2018–2022. All other authors declared no competing interests for this work.