Quoting some papers then giving a composite AI assessment below. This is a side bar to an original blog here.
###
Estimating Treatment Effects From Observational Data—Calling It a Target Trial Does Not Make It One Krista F. Huybrechts, MS, PhD1,2; Sonia Hernández-Díaz, MD, DrPH2
Find it here. It might come with an access version here.
- See also NEJM, Hubbard 2024, on "target trial emulation." - here.
- And, Hernan, the target trial framework, causal inference from observational data, Ann Intern Med 2025 here. (My blog re Hernan here.)
- There's also Wilson, Real World Data - Trials to Practice, Lancet 2024 here.
- And Wang, Emulating RCTs with Databases, JAMA 2023, here. (Op Ed, Sheldrick, here.)
- Wang has already been cited 241 times.
A recent constellation of five key papers—by Sheldrick, Wang & Schneeweiss, Hubbard et al., Hernán et al., and Wilson & Booth—form a vibrant and evolving conversation around target trial emulation, real-world data (RWD), and the limits of randomized controlled trials (RCTs).
Below is a critical synthesis of these contributions, written [by AI] for readers steeped in public health methodology, trial design, and medical epistemology.
Converging Themes: Agreement in Principles
Across these papers, several common principles emerge:
-
Target Trial Emulation as a Design Framework:
All authors accept the growing importance of emulating RCTs using observational data when RCTs are unavailable, unethical, or infeasible. The conceptual framework—sometimes called “target trial emulation” or “RCT emulation”—aims to reduce biases in observational studies by adhering to RCT-like protocols: clear eligibility criteria, aligned time zero, defined interventions, and prespecified analyses. -
Complementary Roles of RCTs and RWD:
There's near-universal agreement that RCTs and RWD answer different but complementary questions. RCTs provide high internal validity—“can it work?”—while RWD offers external validity—“does it work in practice?” Sheldrick, Wang, Hernán, and Wilson all articulate this dichotomy, though with different degrees of optimism about reconciling the two. -
Bias Awareness and Design Sensitivity:
Authors stress that careful emulation does not eliminate bias—especially confounding by unmeasured variables—but can reduce it if thoughtful design and transparent reporting are applied. Hubbard, Hernán, and Huybrechts specifically caution against overconfidence in observational causal claims that mimic RCT language. -
Data Constraints and Iterative Design:
Hernán and Huybrechts emphasize that the “target trial” must be constrained by the observational data at hand. This is not an idealized RCT, but a pragmatic trial that can be realistically emulated. Emulation is iterative: the ideal target trial is refined until it is mappable to available data.
Distinctive Contributions and Tensions
1. Wang & Schneeweiss (JAMA 2023)
A landmark empirical study from the RCT-DUPLICATE initiative. They emulated 32 RCTs using U.S. claims databases and reported moderate to strong agreement (e.g., Pearson r = 0.82 overall, r = 0.93 for closely emulated trials) between database studies and their RCT analogues.
-
Strengths: Methodical rigor, large-scale analysis, and clear metrics of agreement.
-
Limitation: Trials were hand-picked for emulatability, limiting generalizability.
-
Implication: Under ideal conditions, real-world studies can mimic RCT results with surprising fidelity.
2. Sheldrick (JAMA Editorial 2023)
Sheldrick's editorial contextualizes the Wang findings within epistemological frameworks, such as Mill’s definition of causation and Pearl’s DAGs.
-
Key Argument: RCTs prove contributory causes, while RWD explores sufficient causes and generalizability. He frames both as necessary to move closer to “complete causal mechanisms.”
-
Audience Impact: Invokes philosophy of science, appealing to readers invested in the deep structure of evidence-based medicine.
3. Hubbard et al. (NEJM Perspective 2024)
This NEJM piece focuses on the pitfalls of target trial emulation.
-
Key Concerns: Immortal time bias, selection bias, misaligned time zero, poor data quality, and lack of preregistration.
-
Message: Emulation is a design discipline, not a magic bullet. RCT-like structure does not automatically confer validity.
-
Example: Their analysis of the VA COVID vaccine trial underscores how emulation can succeed only if time zero and assignment are properly aligned.
4. Hernán et al. (Ann Intern Med 2025)
This is the most comprehensive formalization of the target trial framework.
-
Contribution: Proposes a structured table linking:
-
Causal estimand (e.g., intention-to-treat),
-
Identifying assumptions (e.g., no unmeasured confounding),
-
Estimators (e.g., inverse probability weighting),
-
Data constraints.
-
-
Key Insight: Design flaws, not just unmeasured confounding, explain prior observational missteps (e.g., statins for dementia).
-
Epistemological Clarity: The authors distinguish between estimand (what you want to know), estimator (what you measure), and identification assumptions (what you must believe).
5. Huybrechts & Hernández-Díaz (JAMA Psychiatry 2025)
A critique of superficial uses of the “target trial” label.
-
Warning: Simply naming a study a “target trial emulation” does not make it valid. Misalignment of eligibility, treatment assignment, and follow-up timing can reintroduce bias.
-
Advocacy: The target trial framework is best used to reveal when a study cannot be done, not merely to lend prestige to flawed designs.
-
Focus on Psychiatry: Adds insight on vulnerable areas like psychiatric pharmacoepidemiology, where biases such as immortal time bias have caused misinterpretation.
6. Wilson & Booth (Lancet eClinicalMed 2024)
An oncology-specific and regulatory-facing piece.
-
Viewpoint: Focuses on RWD’s role in regulatory submissions (e.g., FDA’s use of real-world comparators), but acknowledges deep discordance between trial and observational results.
-
Unique Focus: Proposes that even when surrogacy fails, RWD may offer independent epistemic value—answering different questions, especially about effectiveness in unrepresented populations.
-
Key Ethical Question: “Should a drug still be prescribed if RCTs show efficacy but RWD shows no effectiveness?”
Points of Disagreement or Debate
Topic | Dissenting Viewpoints |
---|---|
Is agreement between RCTs and RWD likely? | Wang et al. (optimistic with caveats) vs. Wilson & Booth (skeptical due to methodological variance) |
Is target trial emulation sufficient to ensure validity? | Hernán (cautious optimism) vs. Hubbard & Huybrechts (emphatic no—emulation must be accompanied by data quality, alignment, and bias checks) |
What is the value of RWD when it contradicts RCTs? | Wilson & Booth ask if effectiveness alone should influence treatment recommendations, an ethical and regulatory question rarely addressed in the other papers. |
Conclusion: The Target Trial Framework as Scientific Infrastructure
Collectively, these articles map out the emergence of the target trial not just as a methodology but as a philosophical and regulatory infrastructure for causal inference. The conversation has evolved from debating RWD versus RCTs to integrating both in epistemologically coherent ways.
-
Methodologists are urged to treat target trial emulation not as a branding term, but as a commitment to design rigor and analytical transparency.
-
Policymakers and regulators are encouraged to recognize the plural nature of evidence: real-world effectiveness data is not merely second-tier, but a distinct dimension of truth.
-
Clinicians and ethicists are left with a vital question: When RCTs and RWD disagree, which world should define our reality?
Let me know if you’d like this converted to an academic commentary or LaTeX typeset draft for submission.
#####
MORE
####
Focus your discussion on HUBBARD and HERNAN, since these are written on a quite nerdy topic (targeted trials in clinical data) but written for general audiences (being NEJM and Ann intern med, respectively).
####
Certainly. Below is a focused comparative critique of the NEJM Perspective by Hubbard et al. (2024) and the Annals of Internal Medicine essay by Hernán et al. (2025)—both of which address target trial emulation in observational data. These two works are exceptional in that they aim to communicate a conceptually dense topic to a broad clinical audience through prestigious, general medical journals. They share a pedagogical ambition but diverge in tone, framing, and epistemological emphasis.
Overview and Shared Mission
Both articles tackle the central premise of target trial emulation: the idea that observational studies can be designed to mimic hypothetical randomized trials to improve causal inference. They aim to disseminate a methodology that originated in epidemiology, biostatistics, and econometrics, but which now plays a central role in how clinicians, regulators, and policy makers think about non-randomized evidence.
-
Hubbard et al. (NEJM, 2024) focus on potential and pitfalls—emphasizing limitations, implementation challenges, and necessary rigor.
-
Hernán et al. (Annals, 2025) offer a more formalized, constructive framework—mapping out a systematic process for target trial specification and emulation.
While both are written in broadly accessible terms, each paper targets a different flavor of medical readership:
-
Hubbard adopts a skeptical, cautionary tone to temper overenthusiastic adoption.
-
Hernán is an advocate and architect, building a structured bridge between theory and practice.
Tone and Framing: Caution vs. Codification
Feature | Hubbard et al. (NEJM) | Hernán et al. (Annals) |
---|---|---|
Tone | Moderately skeptical | Programmatic and pedagogical |
Audience posture | “Slow down, proceed carefully” | “Here’s how to do it, step-by-step” |
Central metaphor | Bias trap detection | Blueprint for trial reconstruction |
Main warning | Even well-designed emulations remain vulnerable to data limitations and residual confounding | Lack of clear causal estimand leads to observational failure—even with good data |
Key example | VA COVID-19 vaccine study: How even strong design can be undone by misalignment or missing data | Hormone therapy for coronary disease: Failures traced to lack of well-specified question, not lack of data |
Hubbard et al. use a pitfall-driven narrative, focusing on how biases like immortal time bias, selection bias, and poor alignment of time zero can stealthily undermine studies even when the term “target trial” is invoked. They emphasize data fragility and residual confounding, and call for sensitivity analyses, transparent assumptions, and design discipline.
By contrast, Hernán et al. treat target trial emulation as a methodological advance, worth championing if done right. They elevate the importance of asking a well-defined causal question—what they call the “causal estimand”—and map out the seven canonical features of a trial: eligibility, intervention, comparator, follow-up, outcomes, causal contrast, and analysis.
Epistemological Crux: What Fails—The Data or the Question?
A subtle yet deep difference lies in their diagnosis of failure:
-
Hubbard emphasizes data problems—missingness, measurement error, and inconsistency in real-world EHR or claims data. Their view is that even good design cannot fix broken data, and that much of the optimism around emulation is misplaced unless these issues are addressed head-on.
-
Hernán, however, places greater blame on design failure—arguing that misinterpretation of observational data often stems from not asking the right causal question in the first place. They cite reanalyses of hormone therapy and statins where retrospective application of the target trial framework produced findings much closer to those of RCTs.
Conclusion:
-
For Hubbard, target trial emulation is a guardrail, not a safety net.
-
For Hernán, it is a scaffold to reconstruct causal inference from otherwise noisy observational data.
Data Realism and Pragmatism
Both papers stress that the target trial must be realistic, given the data at hand. Yet their emphases differ:
-
Hubbard warns that data often fail to support key design elements (e.g., timing of treatment initiation, outcome ascertainment), and this disconnect limits the value of even the most faithfully emulated design.
-
Hernán argues that the target trial should be specified in terms of what can realistically be emulated, i.e., not the ideal RCT one wishes to run, but the best RCT one could emulate using available data. This moves the field toward a pragmatic epistemology where the ideal gives way to the feasible.
Didactic Contributions
Both articles serve pedagogical functions in the generalist journals where they appear:
-
NEJM (Hubbard) acts as a brake pedal for an enthusiastic readership—emphasizing humility, caveats, and the irreducible threat of confounding.
-
Annals (Hernán) functions more as a textbook chapter—laying out a protocol structure and inviting readers to apply it.
Each achieves an unusual level of clarity for general medical readers without sacrificing the integrity of the underlying methods.
Final Comparison: Who Should Read Which?
Reader Profile | Recommended Paper |
---|---|
Practicing clinician or policy maker, cautious about over-reliance on EHR data | Hubbard et al. |
Researcher or analyst designing RWE studies for regulatory or health system impact | Hernán et al. |
Educator or graduate student seeking a didactic intro to trial emulation | Hernán et al. |
Reviewer or editor evaluating causal claims from observational studies | Hubbard et al. |
Closing Thought
Together, Hubbard and Hernán offer the yin and yang of target trial emulation discourse in clinical journals. One warns against methodological complacency; the other equips readers to do better. Both recognize that RCTs cannot answer every question, and that with sufficient humility and design clarity, observational research can rise to meet real-world demands. But neither paper suggests this is easy—or automatic.
Let me know if you'd like this formatted as a journal commentary, seminar handout, or submission-ready op-ed.
###
###
How different than Hubbard and Hernan, than Wilson and Wang papers, since the latter two don't have Targeted Trials in the title, although i've considered them collectively with hubbard and hernan.
###
Excellent question. While Wilson & Booth (Lancet eClinicalMedicine) and Wang & Schneeweiss (JAMA) don’t explicitly use “target trial” in their titles, you’re absolutely right to group them with Hubbard and Hernán—all four form a coherent intellectual discourse about emulating causal inference from observational data, albeit with different degrees of technical focus and rhetorical positioning.
Let’s explore how Wilson & Wang differ from Hubbard & Hernán, focusing on framing, emphasis, audience, and epistemological posture.
🔹 Framing and Titling: What’s Named and What’s Implied
-
Hubbard and Hernán: Put “Target Trial Emulation” front and center, emphasizing it as a methodological framework.
-
Wilson and Wang: Discuss the same ideas (emulation of RCTs, causal inference from observational data), but couch them under broader, system-level themes:
-
Wilson & Booth: "Bridging the gap between clinical trials and practice".
-
Wang & Schneeweiss: "Emulation of Randomized Clinical Trials With Database Analyses".
-
Implication:
-
Hubbard/Hernán are meta-methodologists, aiming to formalize and regulate the new paradigm.
-
Wilson/Wang are applied empiricists or policy-minded thinkers, using or assessing methods to answer practical questions—especially in oncology and regulatory science.
🔹 Audience and Intent
Writer(s) | Primary Audience | Primary Intent |
---|---|---|
Hubbard et al. | General clinicians + methodologists | Alert readers to bias traps in emulated studies |
Hernán et al. | Methodologists + pragmatic trialists | Formalize a rigorous structure for emulation |
Wang et al. | Regulators, payors, pharmacoepidemiologists | Empirically test whether RWE can match RCTs |
Wilson et al. | Clinicians, funders, regulators (esp. oncology) | Argue that RWD has policy and practice value |
🔹 Epistemological Posture
Wang & Schneeweiss
They ask: Can we empirically demonstrate that observational database studies can reach the same causal conclusions as RCTs?
-
32 RCTs are emulated using claims data across 3 large datasets.
-
Agreement metrics (e.g., Pearson r = 0.82) quantify when observational studies “match” RCTs.
Key epistemic stance: “If you match the PICOT elements carefully enough, the results might be surprisingly concordant.”
-
They are cautiously optimistic: under the right conditions, observational emulations can serve as proxies for RCTs, particularly where trials are impractical.
Wilson & Booth
They focus not on design mimicry but on the value of real-world data even when it diverges from RCTs.
Key epistemic stance: “Effectiveness and efficacy are separate axes. Real-world data can add value—even when it fails to replicate trial results.”
They call for:
-
Acceptance of non-surrogacy between RCT and RWD endpoints (e.g., OS vs PFS).
-
Greater emphasis on equity, access, and representativeness, which RCTs often lack.
They ask:
-
What do we do when RCTs say yes and RWD says no?
-
Should RWD influence reimbursement, approvals, or clinical uptake—even when contradictory?
🔹 Relationship to “Target Trial” Thinking
Writer(s) | Target Trial Explicit? | Emulation Framed as Design Framework? | Emulation Framed as Regulatory/Practical Tool? |
---|---|---|---|
Hubbard | Yes | Yes | Somewhat |
Hernán | Yes | Yes (structured formalization) | Yes, cautiously |
Wang | No (implicit) | Yes (clear PICOT, protocol registration) | Yes, very much so |
Wilson | No | Not method-centric | Yes, primary focus is policy and practice |
-
Wang fully implements the target trial idea (complete protocols, registered at ClinicalTrials.gov), but frames it in empirical, not conceptual, terms. They are testing concordance, not theorizing structure.
-
Wilson is conceptually adjacent but less method-bound. They argue that even when emulation fails, RWD has independent epistemic and ethical value.
🔹 Summary: Distinct Roles in the Discourse
Dimension | Hubbard | Hernán | Wang | Wilson |
---|---|---|---|---|
Framework Builder | No | Yes | Partial | No |
Bias Diagnostician | Yes | Yes | Somewhat | No |
Empirical Demonstrator | No | No | Yes | No |
Policy Strategist | No | Implicit | Somewhat | Yes |
Epistemic Modesty | High | Moderate | Moderate | High |
🧠 Final Thought
You’re right to group all four papers as part of a shared intellectual movement—one that tries to reconcile observational data with the rigor of causal inference. But:
-
Hubbard and Hernán are method architects and critics.
-
Wang and Wilson are system-level thinkers, showing how those methods can be used—or challenged—in the wild.
Together, they map the tension between methodological purity, empirical pragmatism, and regulatory action in an era of real-world data abundance.
Let me know if you'd like a visual schema or slide deck summarizing these relationships.
###
###
###
You mentioned PEARL's DAG paradigm, what is that.
You mentioned "Statin trials in dementia" as a confounded result, what is that.
####
Excellent questions—both go to the heart of causal inference in observational data. Let's tackle them one by one.
#1. What is “Pearl’s DAG”?
“Pearl’s DAG” refers to a concept from Judea Pearl, a computer scientist and philosopher of causality, who popularized the use of Directed Acyclic Graphs (DAGs) to represent and reason about causal relationships in complex systems.
➤ What is a DAG?
-
A DAG is a graphical model—a diagram made of nodes (variables) and arrows (causal paths).
-
“Directed” means arrows go one way (from cause to effect).
-
“Acyclic” means there are no loops—you can’t return to the same variable by following the arrows.
➤ What is the purpose of a DAG in medicine or epidemiology?
-
DAGs help us visualize which variables are causes, effects, or confounders.
-
They are tools for deciding what to control for (adjust) in an analysis to get an unbiased estimate of causal effect.
-
Pearl’s key contribution was to give us mathematical rules (like the backdoor criterion) to decide when an association in the data can be interpreted causally.
➤ Example DAG: Statins and Dementia
Let’s say we’re studying:
-
Statin use → Risk of dementia
But we know:
-
Healthy behavior leads people both to take statins and avoid dementia.
So we draw:
HEALTHY LIFESTYLE → STATIN USE
HEALTHY LIFESTYLE → LOWER DEMENTIA RISK
STATIN USE → ??? → DEMENTIA
In this DAG, healthy lifestyle is a confounder. If we don’t adjust for it, we may wrongly attribute the reduced dementia risk to statins, when it’s actually due to healthy behavior.
#2. What was wrong with the statins and dementia study?
This refers to a classic case of bias in observational epidemiology from the 1990s and 2000s.
➤ Claim:
Some large observational studies (e.g., cohort studies using insurance or registry data) reported that statins dramatically reduced the risk of dementia or Alzheimer’s disease—sometimes by 40–70%.
➤ Problem: These effects were too good to be true.
When randomized trials were later conducted, they did not confirm these benefits. The RCTs found little or no effect of statins on dementia prevention.
➤ What went wrong?
The likely causes, as described by Hernán and others, were:
-
Healthy user bias:
People who take statins regularly are often more health-conscious, better educated, more adherent to medical advice, and less likely to develop dementia for reasons other than the drug itself. -
Confounding by indication:
Doctors may prescribe statins more aggressively to people who already show signs of better vascular health or cognitive status, leading to selection bias. -
Immortal time bias (in some studies):
If patients are categorized as “statin users” based on having survived long enough to fill a statin prescription, this creates a time window in which they could not have had the outcome (dementia), thus artificially lowering risk estimates.
➤ Solution Hernán suggested:
Re-analyze the data using target trial emulation—specify the eligibility, treatment assignment, and follow-up as if it were an RCT. When this was done (see Dickerman et al., 2019, JAMA), the exaggerated protective effect vanished.
🧠 Summary
-
Pearl’s DAGs are a tool for thinking clearly about confounding and bias, and deciding what needs to be adjusted to estimate causal effects.
-
The statin-dementia case is a cautionary tale of how observational data can mislead when biases like healthy user and immortal time are not addressed.
-
Target trial emulation and graphical causal modeling are two modern responses to these classical problems.
Would you like a one-slide DAG visual on the statin case or a list of further readings on Pearl and causal inference?