In January 2026, FDA released a 29 page draft guidance on Bayesian design in clinical drug trials.
I gave that, plus about 6-8 total documents (JAMA, Linked In, etc) and asked for an evaluation. Here I publish Chat GPT first and CLAUD OPUS below.
###
FDA BAYES GUIDANCE 01-2026
by CHAT GPT and then CLAUDE OPUS
# # #
Write a detailed and objective article about the FDA
draft guidance, what it accomplishes, its implications, and drawbacks or
concerns.
In a closing section, feel free to go beyond the PDFs
provide and contribute other germane
thought capital to the topic of FDA usage of Bayesian designs and
analyses.
You are writing for PHD and MD PHD professionals.
# # #
See a separate discussion "What about diagnostics?" here.
Contents
Chat
gpt 2
claude. 11
CHAT GPT
FDA’s Draft Bayesian Guidance: A Quietly Important Shift
in Regulatory Trial Design
FDA’s January 2026 draft guidance, “Use of Bayesian
Methodology in Clinical Trials of Drug and Biological Products,” is not a
wholesale replacement of the randomized controlled trial paradigm. Nor is it a
license for sponsors to substitute optimism, external data, or “modern
statistics” for persuasive evidence. Its real importance is more specific and
more technical: it gives CDER and CBER sponsors a more explicit regulatory
language for when Bayesian methods may support primary inference in
trials intended to establish safety and effectiveness. That is a meaningful
step, because Bayesian methods have long been used in dose finding, pediatric
extrapolation, adaptive designs, device studies, and selected rare-disease
settings, but their use in pivotal drug and biologic trials has remained
cautious, uneven, and often negotiated case by case. FDA’s draft guidance
attempts to move that negotiation into a more transparent framework.
The guidance is draft and nonbinding. FDA emphasizes that it
represents current thinking, does not create legally enforceable duties, and
allows alternative approaches that satisfy applicable statutes and regulations.
That matters because Bayesian trial design is not a single method. It is a
family of inferential, computational, and evidentiary approaches, ranging from
almost-frequentist analyses with minimally informative priors to highly
consequential borrowing from external or historical information. FDA’s document
therefore functions less like a cookbook and more like a regulatory grammar:
define the prior, define the estimand, define the success criterion, evaluate
operating characteristics, test sensitivity, document the computational
workflow, and make clear what part of the conclusion is coming from current
randomized data versus earlier information.
What Bayesian analysis changes
In a conventional frequentist pivotal trial, the familiar
regulatory question is framed around long-run error control: assuming a null
hypothesis, how often would a design falsely conclude success? Bayesian
analysis begins elsewhere. A prior distribution summarizes pre-study
information or uncertainty about a parameter of interest; the likelihood
summarizes the observed trial data; the posterior distribution then expresses
updated information after the trial. In the FDA’s phrasing, the posterior can
be used to draw conclusions about efficacy or safety, and it may be summarized
through posterior means, credible intervals, and posterior probabilities such
as the probability that the treatment effect exceeds zero.
That shift is attractive because it often maps more directly
to how clinicians, regulators, and patients speak. “What is the probability the
drug has a clinically meaningful effect?” is a more natural question than “How
surprising would these data be if the null hypothesis were true?” However, the
regulatory problem is not merely philosophical. FDA must decide whether a trial
result is reliable enough to support approval, labeling, or a supplement. For
that purpose, Bayesian posterior probabilities do not eliminate the need to
evaluate trial performance. They change what must be evaluated.
FDA’s guidance is particularly attentive to this
distinction. It separates trials that use Bayesian analysis inside a broadly
frequentist framework—where Bayesian posterior probabilities may be calibrated
to Type I error control—from trials that use a more fully Bayesian framework,
particularly when external information is incorporated through an informative
prior. In the first case, the Bayesian machinery may be used to create an
adaptive rule or posterior-probability decision threshold, but the design is still
judged by familiar frequentist operating characteristics: Type I error, power,
bias, mean squared error, and interval coverage. In the second case, especially
when borrowing is central, FDA recognizes that the operating-characteristic
conversation is different: the reliability of the decision depends strongly on
the prior, the fit of the data model, the design, and the measurement process.
What the guidance accomplishes
The first accomplishment is normalization. The
guidance makes clear that Bayesian methods are not exotic exceptions. FDA lists
several settings in which Bayesian methods have already appeared in development
programs: borrowing from prior clinical trials, augmenting concurrent
randomized controls with external or nonconcurrent control data, pediatric
extrapolation, borrowing across related diseases or disease subtypes, subgroup
estimation through hierarchical modeling, and oncology dose-finding. The
examples include REBYOTA, where a Phase 3 analysis incorporated data from a
prior placebo-controlled Phase 2 study; oncology platform trials such as GBM
AGILE and Precision Promise, where nonconcurrent controls and temporal effects
are methodological concerns; pediatric extrapolation examples involving
empagliflozin and linagliptin; and basket or related-disease contexts where
hierarchical models can borrow across groups.
The second accomplishment is regulatory taxonomy. FDA
forces sponsors to specify whether they are using Bayesian methods for interim
adaptation, dose selection, primary inference, borrowing from external
information, subgroup estimation, or some combination. This is useful because
“Bayesian trial” is otherwise too vague. A dose-escalation model in early
oncology development, a Bayesian hierarchical subgroup model, an adaptive
platform trial using predictive probabilities, and a pivotal trial borrowing
adult data for pediatric inference pose very different evidentiary risks.
The third accomplishment is discipline around success
criteria. FDA recognizes that frequentist success criteria may not be
directly applicable or appropriate when the primary analysis is Bayesian. A
Bayesian success criterion might be framed as a posterior probability that the
treatment effect exceeds a threshold, or it might be built into a more explicit
benefit-risk or decision-theoretic framework. FDA also notes that more complex
approaches may consider disease seriousness, availability of approved
therapies, the harm of approving an ineffective drug, and the harm of failing
to approve an effective drug. In such settings, a “loss function” could
formally combine safety and effectiveness considerations into a benefit-risk
decision rule.
The fourth accomplishment is clarity about priors.
FDA states plainly that the prior distribution is the feature that
distinguishes Bayesian from frequentist approaches. Noninformative or minimally
informative priors may be used when no relevant prior information exists and
will often be overwhelmed by adequate trial data. Informative priors require
more justification because they can materially affect conclusions. Sponsors
must pre-specify and justify the prior, explain its influence, and show that
the design’s operating characteristics remain acceptable under the proposed
prior.
The fifth accomplishment is a practical review checklist.
The guidance requires attention to prior-data conflict, external-data
relevance, quality and reliability, pre-specification of source selection,
sensitivity analyses, software reliability, MCMC convergence, reproducibility,
and reporting. FDA recommends that sponsors provide results using alternative
reasonable priors and, when borrowing external information, show posterior
results under different degrees of borrowing. It also asks for model checking,
prior-data conflict assessment, convergence diagnostics, software version
details, documented code, and even random seeds for MCMC analyses where needed
for reproducibility.
The most important technical distinction: posterior
credibility vs regulatory reliability
One of the most subtle and important parts of the guidance
is FDA’s discussion of false positive conclusions. In frequentist analysis,
Type I error is the probability of rejecting the null hypothesis when the null
is true. In Bayesian analysis, FDA emphasizes that a false positive conclusion
is conditional on a positive conclusion, not on a true null hypothesis. Put
differently, the Bayesian regulatory question can resemble: among trials that
declare success, how often is the conclusion wrong? That is closer in spirit to
a positive predictive value framework than to the classical Type I error
definition.
Haitao Pan’s commentary usefully sharpens this distinction.
A posterior probability threshold such as Pr(θ > 0 | data) ≥ 0.975 is a
single-trial statement: conditional on the observed data and the model, the
posterior probability that the treatment effect is positive is at least 97.5%.
But that is not the same as saying that, across repeated uses of the decision
rule, only 2.5% of declared successes will be false. That cross-trial
reliability question must be evaluated through design-level simulations, particularly
when priors are informative, sample sizes are small, or early stopping rules
effectively reduce information.
This is more than a statistical nicety. It is probably the
point on which many sponsor misunderstandings will turn. A posterior
probability threshold may look familiar because 0.975 resembles the complement
of a one-sided 0.025 alpha. But the number alone does not guarantee frequentist
Type I error control. The trial’s behavior depends on the prior, the
likelihood, the data-generating assumptions, the stopping rules, the endpoint
distribution, and the amount of borrowing. In Bayesian regulatory work, the threshold
is only one piece of the decision architecture.
Borrowing: the promise and the trap
The most consequential use of Bayesian methods in this
guidance is borrowing—that is, using prior or external information to
strengthen inference in the current trial. Borrowing is appealing in precisely
the settings where conventional trials are difficult: rare diseases, pediatric
indications, small molecularly defined oncology populations, platform trials,
and settings where ethical or practical constraints make large randomized
controls difficult.
But borrowing is also where Bayesian methods can do the most
damage. If historical or external information is genuinely exchangeable with
the current trial, borrowing can increase precision, reduce sample size, expose
fewer patients to inferior treatment, and accelerate learning. If the external
data are biased, noncomparable, selectively chosen, or affected by temporal
drift, borrowing can import bias into the primary analysis with a statistical
elegance that makes the problem harder to see.
FDA addresses this by requiring sponsors to evaluate data
quality and reliability, pre-specification, relevance, study
design, and availability of patient-level data. It also emphasizes prior-data
conflict: the possibility that the observed current data are notably
inconsistent with the prior. FDA recommends exploring this conflict through
simulations across plausible scenarios, including no effect in the target
population, and planning sensitivity analyses accordingly.
Berry Consultants’ Kert Viele broadly praises the guidance
as rigorous and constructive, but his commentary also shows how technically
complex this territory is. He notes that borrowing works best when historical
and current data are similar, and can mislead when they diverge. He also
stresses that dynamic borrowing, hierarchical models, and mixture priors can
help reduce the influence of prior information when current data disagree with
it, whereas static borrowing may be more brittle.
A useful regulatory heuristic is this: Bayesian borrowing
is most defensible when the biology, endpoint, population, care pathway,
ascertainment, and trial conduct are all sufficiently aligned that the borrowed
information would be persuasive even before formal modeling. The model can
quantify borrowing; it cannot make irrelevant data relevant.
Implications for sponsors
For sponsors, the guidance creates an opportunity but also
raises the bar for statistical and clinical planning. A Bayesian proposal
cannot be treated as a late-stage rescue strategy after a trial is
underpowered. It must be designed prospectively. Sponsors will need to bring
FDA a coherent package: the clinical rationale for borrowing, the external
evidence-generation process, the prior-construction method, simulations,
operating characteristics, sensitivity analyses, computational diagnostics, and
comparison with simpler alternatives.
That comparison with simpler alternatives is important. FDA
specifically recommends that complex designs using informative priors be
compared against alternative and simpler designs so reviewers can understand
the advantages and limitations of the proposed design features. This is a quiet
but important discipline. A Bayesian design should not merely be clever; it
should solve a concrete problem better than a simpler randomized or frequentist
design.
The guidance also implies that companies will need stronger
internal cross-functional teams. Prior construction is not a job for
statisticians alone. FDA describes it as a multidisciplinary process requiring
domain knowledge to judge relevance and statistical expertise to quantify
uncertainty and select methods. In practice, the team may need clinical
development, biostatistics, regulatory affairs, translational science,
real-world evidence, data management, and sometimes payer-oriented evidence
experts in the same conversation.
There is also an operational implication: early FDA
interaction becomes even more important. If a sponsor plans to rely on an
informative prior, waiting until NDA/BLA review is far too late. The central
negotiation—what prior is acceptable, what external sources count, how much
borrowing is allowed, and what sensitivity analyses will be persuasive—must
occur before trial initiation.
Implications for FDA
For FDA, the guidance is both an invitation and a burden. It
invites more sophisticated trial designs, especially in settings where
conventional trial designs are slow, ethically awkward, or infeasible. But it
also requires review divisions to develop consistent expectations. One risk is
that Bayesian acceptability becomes division-specific or reviewer-specific:
acceptable in one disease area, resisted in another; welcomed by one review
team, treated with suspicion by another.
This is not a trivial issue. Bayesian methods make explicit
what is often implicit in regulatory judgment: prior knowledge matters. FDA
already uses prior knowledge informally in many ways—through disease natural
history, class effects, surrogate endpoint experience, external controls,
mechanistic plausibility, and prior trial results. The Bayesian framework
offers a way to quantify and pre-specify some of that knowledge. But once
quantified, it becomes contestable. Sponsors, FDA reviewers, academic methodologists,
patients, and payers may not share the same prior.
That creates a governance problem. If Bayesian designs are
to become more common in pivotal settings, FDA will need not only technical
guidance but institutional habits: model libraries, review templates,
simulation expectations, cross-division case studies, public statistical
reviews, and perhaps more transparent precedent. Otherwise, the guidance may
paradoxically increase uncertainty: everyone knows Bayesian designs are
possible, but no one knows when a given review division will accept one.
Concerns from the evidentiary-standard perspective
The JAMA commentary by Evans, Fleming, Janes, and Dodd
provides the most pointed counterweight. The authors acknowledge that Bayesian
methods have value in diagnostic medicine, prediction, early-phase drug and
device trials, and rare diseases, but warn that late-phase confirmatory use can
compromise evidentiary standards if it dilutes randomized evidence,
incorporates sponsor- or investigator-specific priors, or depends on strong
assumptions that are difficult to verify.
Their central concern is the primacy of randomized
evidence. Randomization protects against known and unknown confounding,
supports unbiased treatment-effect estimation, and provides the foundation for
valid inference. Bayesian methods do not necessarily compromise randomization;
a trial can be randomized and Bayesian. The concern arises when external or
prior information is combined with randomized data in a way that weakens the
protection randomization was meant to provide.
They also emphasize objectivity of priors. Priors
must be constructed from relevant, high-quality, objectively selected external
data. Post hoc selection of favorable evidence is especially problematic. Even
well-intended priors may differ among sponsors, regulators, payers, IRBs,
patients, and data monitoring committees. This is an important reminder: the
“prior” is not merely a technical input; it is a compressed evidentiary
argument.
A further concern is robustness. Bayesian models can be
elegant but assumption-dependent. Hierarchical models, commensurate priors,
mixture priors, dynamic borrowing, and model-based adaptive designs can all
behave well under some assumptions and poorly under others. Simulation helps,
but simulations only test scenarios the designers thought to simulate. That is
why sensitivity analysis and prior-data conflict diagnostics are not
decorations; they are central to the credibility of the approach.
Drawbacks and unresolved issues
The guidance has several limitations.
First, it is still somewhat ambiguous about when Type I
error calibration is expected and when a fully Bayesian framework will be
acceptable. FDA distinguishes the two categories, but sponsors will want more
disease-area-specific examples. Without such examples, the default posture may
remain conservative: Bayesian methods will be welcomed in principle, but
pivotal reliance on informative priors may still require substantial
negotiation.
Second, the guidance may underemphasize predictive
probabilities, a point made by Berry Consultants. Predictive probabilities
are central to many adaptive trials and development decisions: should the trial
stop for futility, expand a cohort, graduate an arm, or proceed to Phase 3? The
guidance focuses heavily on borrowing and primary inference, but a fuller
Bayesian regulatory framework would give predictive probability a more explicit
role.
Third, Bayesian methods can create a communications
problem. Posterior probability statements are intuitive but easily misread.
A 97.5% posterior probability of benefit does not automatically mean the
regulatory decision rule has a 2.5% false-positive rate. The distinction
between single-trial posterior credibility and cross-trial decision reliability
will need to be explained repeatedly to executives, clinical teams, investors,
advisory committees, and possibly courts.
Fourth, the guidance may not fully address patient-level
utility. Viele notes that operating characteristics are often framed
statistically—power, bias, coverage, probability of correct decision—whereas
Bayesian decision theory can also optimize patient-centered outcomes, such as
expected lives saved or expected morbidity avoided. This is not a minor
philosophical point. In some adaptive designs, the rule that maximizes
information is not the same as the rule that maximizes patient benefit within
or beyond the trial.
Fifth, there is a payer and HTA implication. FDA
approval based partly on Bayesian borrowing may be scientifically defensible
but still less persuasive to payers if the current randomized evidence base is
small. A Bayesian approval package may therefore need a postmarket evidence
strategy from the beginning. FDA may accept a posterior inference that a
therapy is effective; payers may still ask whether the effect is robust,
generalizable, clinically meaningful, and economically relevant in their
covered population.
Closing thought capital: Bayesian FDA review as
disciplined prior knowledge, not statistical indulgence
The most productive way to view the FDA guidance is not as
“Bayes versus frequentism.” That debate is too scholastic for the regulatory
problem. The real issue is how FDA should use prior knowledge without
letting prior belief erode evidentiary standards.
FDA already uses prior knowledge. It does so when it accepts
pediatric extrapolation, interprets single-arm oncology response rates,
considers natural history controls, evaluates surrogate endpoints, uses
class-wide safety expectations, or asks whether a subgroup result is
biologically plausible. Bayesian methods can make these judgments more
explicit. They can also make them more reproducible, auditable, and
stress-tested.
But quantification is not purification. A biased prior
remains biased after being written in mathematical notation. A poorly
comparable external control remains poorly comparable after being embedded in a
hierarchical model. A posterior probability can be precise and still be wrong
if the model is wrong. The promise of Bayesian regulatory science is not that
it bypasses hard evidentiary questions. The promise is that it forces those
questions into the open.
For FDA, the mature position should be Bayesian pluralism
with evidentiary hierarchy. Minimally informative Bayesian analyses should
be broadly acceptable when they are well specified and computationally
reliable. Bayesian adaptive designs should be encouraged when their operating
characteristics are transparent and their decision rules protect trial
integrity. Informative priors and external borrowing should be welcomed only
when the external evidence is highly relevant, objectively selected, and
sensitivity-tested. In confirmatory settings, randomized concurrent evidence
should remain the anchor unless there is a compelling disease-specific reason
to depart from it.
The deeper opportunity is to build a regulatory culture of model
accountability. Every Bayesian pivotal design should answer five questions
plainly: What is being borrowed? Why is it relevant? How much influence can it
have? What happens if it is wrong? Would the conclusion survive a skeptical but
plausible alternative prior? If the answer to the last question is no, the
sponsor may still have an argument—but it should be understood as a Bayesian
argument dependent on prior acceptance, not as an ordinary replication of the
conventional pivotal-trial standard.
In that sense, FDA’s draft guidance is not permissive in the
loose sense. It is permissive in the disciplined sense. It opens the door, but
asks sponsors to show their work. For sophisticated drug and biologic
development, especially in rare diseases, pediatrics, oncology subtypes, and
platform trials, that is a valuable advance. The drawback is that the hardest
questions remain judgment calls: relevance, exchangeability, acceptable
uncertainty, and the regulatory consequences of being wrong. Those questions
cannot be solved by Bayes’ theorem. But Bayesian methods, used carefully, can
make the assumptions visible enough for regulators, sponsors, clinicians, and
patients to argue about them honestly.
# # #
# # #
CLAUD OPUS 4.7
# # #
The FDA's January 2026 Draft Guidance on Bayesian
Methodology in Clinical Trials: Substance, Implications, and Critical
Reflections
Background and Regulatory Context
On January 9, 2026, the FDA released a draft guidance titled
Use of Bayesian Methodology in Clinical Trials of Drug and Biological
Products, with formal Federal Register notice appearing on January 12, 2026
(Docket No. FDA-2025-D-3217). The document, jointly authored by the Office of
Biostatistics in CDER and the Division of Biostatistics in CBER, satisfies a
PDUFA VII performance commitment to issue draft guidance on Bayesian
methodology, with the comment period closing March 13, 2026. The release fills
a long-anticipated gap: while CDRH issued Bayesian guidance for medical devices
in 2010, drug and biologic sponsors have operated without comparable formal
guidance for primary inference, navigating Bayesian designs through
case-by-case interactions with the agency.
The draft articulates FDA's current thinking on the use of
Bayesian methods to support primary inference in pivotal trials of drugs and
biologics—including INDs, NDAs, BLAs, and their supplements. It complements
existing instruments such as the ICH E9 (1998) Statistical Principles, the ICH
E9(R1) Addendum on estimands (2021), ICH E11(R1) on pediatric extrapolation,
the FDA's 2019 Adaptive Designs guidance, and the more recent ICH E20 draft on
adaptive designs. As Berry Consultants' Kert Viele observed, the document
"formalizes progress in the past two decades," providing what had
previously been tacit knowledge accumulated through specific reviews and
approvals.
What the Guidance Accomplishes
1. Doctrinal Framework: Two Inferential Regimes
The most consequential structural feature of the guidance is
its bifurcation of Bayesian trial designs into two categories with distinct
evidentiary frameworks:
Type I error-calibrated trials retain familiar
frequentist operating characteristics—FWER controlled at 0.025 one-sided, power
calculated under fixed parameter assumptions, bias and MSE of point estimates,
and coverage of intervals. In this regime, Bayesian machinery functions, in Haitao
Pan's apt formulation, as "an engineering mechanism for constructing
decision rules with acceptable long-run error rates." The posterior
probability serves as a computational device analogous to a test statistic.
Success criteria take the form Pr(d > a) > c, with c
calibrated—algebraically for simple conjugate models, via clinical trial
simulation for complex adaptive designs—to maintain the desired Type I error.
Trials not calibrated to Type I error rate represent
the more philosophically novel territory. Here, FDA explicitly acknowledges
that when informative priors meaningfully shape inference, traditional Type I
error control becomes "philosophically inconsistent given a prior which
assumes a non-zero effect" (lines 756–758). Operating characteristics are
instead calculated with reference to the prior distribution itself: Bayesian
power (averaged over a design prior), probability of correct decision (a
positive predictive value analog), expected bias, expected MSE, and expected
interval coverage. The guidance introduces the formal distinction between an analysis
prior (the single, pre-specified prior used for inference) and design
priors or sampling priors (multiple distributions used to
characterize sensitivity).
This bifurcation is, as Pan emphasizes, almost certainly
deliberate rather than oversight. The guidance does not prescribe which regime
a sponsor must adopt; instead, it shifts the justificatory burden. Sponsors
abandoning Type I error calibration must demonstrate through carefully
specified priors and comprehensive simulation that the decision rule produces
sufficiently reliable conclusions.
2. Cataloguing Established Practice
Section III provides an evidentiary backbone showing that
Bayesian methods are already embedded in FDA decision-making. Concrete examples
include:
- REBYOTA
(fecal microbiota, 2022): Bayesian incorporation of phase 2 data into
phase 3 inference for recurrent C. difficile infection.
- GBM
AGILE and Precision Promise: Bayesian platform trials in glioblastoma
and pancreatic cancer using nonconcurrent controls with adjustment for
temporal drift (the "Bayesian time machine" of Saville et al.,
2022).
- Empagliflozin
and linagliptin pediatric T2D supplements: Bayesian extrapolation from
adult data using pharmacometric bridging models.
- LEADER
trial of liraglutide: BHM for regional subgroup analysis.
- Dose-finding
methods including CRM, BLRM, BOIN (with explicit fit-for-purpose
designation), mTPI, and mTPI2.
This catalogue accomplishes something rhetorical as well as
substantive: it neutralizes the implicit objection that Bayesian methods are
novel or untested in regulatory submissions.
3. Prior Specification: A Structured Taxonomy
The guidance devotes the bulk of its substantive content
(Section V) to prior distributions, organizing them into
noninformative/minimally informative, skeptical, enthusiastic, and informative
(borrowing) priors. Within the borrowing category, it discusses:
- Static
discounting (e.g., power priors with fixed discount factor)
- Dynamic
discounting, including commensurate priors, supervised power priors,
mixture priors, Bayesian hierarchical models, and elastic priors (Jiang et
al., 2023)
Critically, the guidance frames prior construction as a
multidisciplinary process requiring evidence synthesis principles akin to those
used in systematic reviews and meta-analyses, with attention to data quality,
pre-specification, relevance, study design (preferring randomized comparisons),
and patient-level data availability. It explicitly addresses prior-data
conflict (Evans and Moshonov, 2006), recommending that simulations cover
plausible degrees of conflict.
4. Quantification and Sensitivity
The guidance recommends multiple metrics for quantifying
prior influence: prior mean, effective sample size (ESS) with appropriate
methods (Malec, 2001; Morita et al., 2008; Neuenschwander et al., 2020), and
ratios of prior ESS to prospective trial size. Notably, it explicitly discourages
using Type I error inflation as a measure of borrowing strength when
informative priors are used—a position aligned with Berry Consultants'
long-standing methodological commentary. Sensitivity analyses across
alternative priors are required, with the document endorsing what Viele
describes as "tipping point" analyses.
5. Computational and Reporting Rigor
The guidance addresses MCMC convergence diagnostics, the
responsibility for software reliability, and detailed documentation
requirements for both protocol and clinical study report stages. It requires
that sponsors submit simulation code, seed numbers (for reproducibility), and
comprehensive simulation reports.
Strategic and Operational Implications
Alignment with International Frameworks
The Alston & Bird advisory correctly highlights that
this guidance aligns with the EMA's 2025–2026 workplan to accelerate clinical
trials, with ICH E11A on pediatric extrapolation (finalized 2024), and with the
draft ICH E20 on adaptive designs. For sponsors operating across jurisdictions,
this convergence reduces regulatory arbitrage risk and increases the
feasibility of single trials supporting multi-jurisdictional submissions.
Particular Relevance for Specific Indications
The guidance is most immediately consequential for:
- Pediatric
programs, where adult-to-pediatric extrapolation is increasingly
expected and formally supported by ICH E11A
- Rare
diseases, where conventional sample sizes are infeasible
- Oncology,
particularly basket trials, platform trials, and dose-optimization studies
under Project Optimus
- Programs
with substantial external data, including biosimilar bridging studies,
supplements to approved products, and indications with rich historical
control databases
The C3TI Infrastructure
The FDA's Center for Clinical Trial Innovation (C3TI),
launched in 2024, provides operational pathways: the Demonstration Program for
non-adaptive Bayesian trials and the Complex Innovative Trial Design (CID)
Meeting Program for adaptive designs. The guidance crystallizes principles that
C3TI engagements have been developing case-by-case.
Sponsor Implications
For sponsors, several practical implications follow:
- Early
FDA engagement is essential. The guidance repeatedly emphasizes
pre-trial alignment on priors, success criteria, and operating
characteristics. The cost of misalignment late in development is
substantial.
- Statistical
capability must be deepened. Sponsors need expertise not only in
Bayesian inference but in simulation-based design evaluation, MCMC
diagnostics, and evidence synthesis.
- Documentation
requirements are intensive. The reporting expectations—simulation
code, convergence diagnostics, sensitivity analyses across multiple
priors—exceed those typical for frequentist analyses.
- The
benefit-risk framework opens new design space. Section IV.A.3, with
its allusion to loss functions and decision-theoretic approaches, formally
accommodates trial designs that incorporate utility functions (as in
AWARD-5 for dulaglutide).
Drawbacks, Concerns, and Open Questions
The Evidentiary Standards Debate
The most pointed critique appears in JAMA (May 19,
2026) from Scott R. Evans (George Washington University), Thomas R. Fleming
(University of Washington), Holly Janes (Fred Hutchinson), and Lori E. Dodd
(NIAID). Their concerns warrant serious attention:
First, they argue that bayesian methods used in
confirmatory trials can compromise the benefits of randomization by combining
external information with high-quality randomized data. Randomization's
epistemic warrant—unbiased treatment effect estimates, valid CI coverage,
error-rate control—is conditional on the integrity of the randomized comparison
being preserved as the primary basis of inference.
Second, they emphasize the loss of objectivity
introduced by sponsor- or investigator-specific priors, and the cognitive
dissonance risk: when sponsors aim to establish rather than determine
effectiveness, selection bias in prior construction is foreseeable. They invoke
the STARTS-2 sildenafil trial, in which high-dose sildenafil increased
pediatric mortality 3.95-fold despite adult data not predicting this signal—a
cautionary example of the perils of cross-population extrapolation.
Third, they question the FDA's framing of the
guidance as "modernizing statistical methods" in its press release.
The implication that current methods are "outdated" misrepresents
methods that "are principled, based on scientific fundamentals for
protecting integrity and ensuring robustness."
The contrasting JAMA perspective by Gelman, van Zwet,
and Więcek (also May 19, 2026) is broadly supportive but raises complementary
concerns: priors should be clearly stated, their influence assessable, and the
data model itself should be subject to posterior predictive checking. They note
that the survey by Janiaud et al. (2021) of 912 FDA applications found
inconsistent citation of prior reasoning and approvals reflecting new
interpretations of existing evidence rather than new evidence—a problem they
argue Bayesian formalism could ameliorate.
Process and Consistency Concerns
Even Berry Consultants, broadly enthusiastic about the
guidance, identifies process gaps. Viele observes that the document is
"vague on the process for obtaining this agreement" on priors and
benefit-risk quantifications. The risks include:
- Reviewer-to-reviewer
inconsistency: Different statistical reviewers may have meaningfully
different prior preferences, creating perceived inequity between sponsors.
- Temporal
drift in agreements: A prior agreed to early in development may face
pressure for revision as related trials report.
- Asymmetric
leverage: Sponsors with substantial regulatory experience and
resources will navigate the discretionary terrain more effectively than
smaller developers.
Specific Technical Concerns
Berry's commentary identifies several specific technical
limitations worth highlighting:
- Predictive
probabilities are largely absent from the document despite their
centrality to many real-world Bayesian adaptive designs and phase
transition decisions.
- The
discussion of skeptical and enthusiastic priors sits uneasily with the
analysis prior / design prior framework introduced earlier; these concepts
may be more coherently treated as design priors used to assess what would
convince a range of stakeholders.
- The
framing that informative priors "have been used mostly in pediatrics
and rare diseases" risks being read as a recommendation that they
are best suited to those areas, when in fact common diseases with
rich historical data may offer equally compelling use cases.
- The
treatment of exchangeability is technically correct but practically
over-emphasized; what borrowing actually requires is closeness between
historical and current data, not formal exchangeability.
- Sections
IV.A.2 and IV.A.3 (direct interpretation of posterior probability
versus benefit-risk based criteria) are difficult to distinguish
operationally and might be more usefully merged.
The Operating Characteristics Question
Pan's LinkedIn analysis identifies a subtle but important
point in lines 295–297 of the guidance: "the concept of a false positive
conclusion in a Bayesian framework is conditional on a positive conclusion and
not, as in the frequentist framework, on a true null hypothesis." This
shift—from Pr(reject H₀ | H₀ true) to Pr(H₀ true | reject H₀), conceptually
akin to 1 − PPV—is fundamental but may not be widely appreciated by clinicians
and reviewers more accustomed to frequentist error rates. The risk is that posterior
probability thresholds (e.g., Pr(θ > 0 | data) ≥ 0.975) may be
misinterpreted as long-run error bounds when they are in fact data-conditional
statements whose long-run reliability depends on the realistic distribution of
true effects, prior calibration, and design features such as interim stopping.
Patient-Level Data Pragmatics
The guidance prefers patient-level data for borrowing.
Viele's commentary notes that this creates a practical tension: many
historically informative studies are available only as summary statistics, and
excluding them in favor of patient-level data from less relevant studies may be
epistemically suboptimal. Real-world borrowing decisions involve trade-offs
that the guidance acknowledges in principle but does not fully operationalize.
Mixture Prior Behavior
Viele also flags an underappreciated issue with certain
test-then-pool and weight-as-function-of-discrepancy approaches: pathological
cases can arise in which a weaker observed treatment effect would lead
to a successful trial conclusion while the actual (stronger) observed effect
would not, because the assumed discounting function compensates differently for
the two scenarios. Such non-monotonic behavior is difficult to explain to
clinical audiences and can undermine confidence in Bayesian conclusions.
Beyond the PDFs: Additional Considerations
The Borrowing-Replication Tension
The pharmaceutical evidence ecosystem has spent two decades
grappling with replication concerns—the so-called replication crisis affecting
biomedical research broadly. There is genuine tension between two reform
impulses: (1) increased rigor through pre-registration, larger samples, and
stricter error control, versus (2) increased efficiency through information
borrowing and adaptive designs. Bayesian borrowing, by design, makes the
primary trial's data less determinative of the conclusion. In a world where
many published frequentist findings have failed to replicate, doubling down on
borrowing from prior studies—whose own replicability may be uncertain—deserves
epistemological scrutiny that the guidance does not fully provide. The
historical priors used in extrapolation embed whatever biases (selection,
measurement, publication) afflicted the original studies.
The Question of Calibration Across the Industry
If Bayesian designs become more common, the meta-question of
how the industry's evidence base behaves in aggregate becomes pressing. Each
individual trial's local operating characteristics may be acceptable, but the
cumulative effect across many trials—each borrowing from a partially
overlapping evidence base—is not addressed in the guidance. There is potential
for correlated errors and feedback loops in which optimistic priors derived
from early studies propagate forward into confirmatory trials whose results
then reinforce the priors used by subsequent programs.
Real-World Data and the Prior Frontier
The guidance acknowledges real-world data (RWD) as a
potential prior source but defers substantively to existing FDA RWD guidances.
Yet the central scientific question—when can RWD legitimately substitute for
randomized concurrent controls?—remains contested. The combination of Bayesian
borrowing methodology with RWD risks compounding two streams of uncertainty
(the relevance of historical/external data and the comparability of
non-randomized populations) in ways that warrant more explicit treatment.
The Lecanemab Precedent and Confirmatory Trials
The accelerated approval of lecanemab and the broader
debates over the strength of evidence in Alzheimer's drug development
illustrate how disputes over confirmatory standards play out publicly. Bayesian
methods could either improve such situations (by formalizing how prior evidence
informs interpretation of borderline results) or worsen them (by providing
additional degrees of freedom for advocates and skeptics to construct
conflicting analyses). The guidance's emphasis on pre-specification is the principal
defense, but it relies on a regulatory infrastructure with sufficient capacity
to enforce it.
International Harmonization and the Limits of Convergence
While alignment with EMA workplans is a real benefit,
harmonization is not uniformity. EMA and FDA may agree on the legitimacy of
Bayesian approaches but disagree on specific prior choices, discounting
strengths, or success criteria. PMDA (Japan), NMPA (China), and Health Canada
operate with their own statistical traditions. Sponsors planning
multi-jurisdictional programs should not assume that a Bayesian design
acceptable to FDA will be uncritically accepted elsewhere—particularly when
borrowed historical data comes from non-representative populations.
Workforce and Capability Gaps
The guidance's expectations—simulation-intensive design
justification, MCMC diagnostics, hierarchical modeling, prior
elicitation—require statistical expertise that is unevenly distributed across
the industry and regulatory agency. FDA has built capability through C3TI and
through hires of methodologists, but reviewer bandwidth remains finite. The
guidance encourages early engagement, which has resource implications for both
sponsors and the agency. Realizing the guidance's potential will require sustained
investment in biostatistical training, including in academic programs that have
historically emphasized frequentist methods.
Bayesian Methods for the Regulator's Own Decisions
Gelman, van Zwet, and Więcek raise a point worth amplifying:
the guidance addresses how sponsors should conduct Bayesian analyses, but the
FDA itself makes regulatory decisions that are implicitly Bayesian (combining
trial evidence with prior beliefs about mechanism, related programs, population
needs, and historical precedent). Making the agency's own decision framework
more explicitly probabilistic and transparent—as is happening, in different
ways, in some health technology assessment bodies internationally—could
complement the guidance's industry-facing focus. The episode noted in JAMA
regarding the Moderna mRNA influenza vaccine refusal-to-file letter and its
reversal illustrates how informal regulatory reasoning can produce decision
instability that a more explicitly Bayesian framework might reduce.
The Loss Function Question
The guidance opens the door to decision-theoretic approaches
incorporating loss functions but does not develop this in depth. Yet loss
functions encode value judgments—whose preferences enter, and with what
weight—that have historically been left implicit in regulatory decision-making.
Patient preferences, payer perspectives, and population health considerations
might all be formalized within such frameworks. This is an opportunity for more
transparent value-based decision-making but also a potential vector for capture
by whoever specifies the loss function.
Conclusion
The FDA's January 2026 draft guidance is a substantial and
largely welcome document that formalizes two decades of accumulated experience
with Bayesian methods in drug and biologic regulation. It accomplishes three
principal things: it provides sponsors with a clearer roadmap for proposing
Bayesian designs; it articulates a defensible bifurcated framework
distinguishing Type I error-calibrated designs from those evaluated within a
fully Bayesian inferential paradigm; and it raises the documentation and pre-specification
expectations sufficiently to mitigate—though not eliminate—the risks of bias
and inconsistency that Bayesian methods can introduce.
The substantive critiques deserve engagement rather than
dismissal. The Evans-Fleming-Janes-Dodd concerns about randomization integrity,
objectivity, and robustness are not relics of statistical conservatism; they
reflect lessons learned from decades of trials whose initially promising
results failed to replicate. The Berry Consultants commentary, while
supportive, identifies process gaps that the guidance will need to address as
it moves toward finalization. Pan's analysis surfaces conceptual subtleties that
practicing clinicians may underappreciate.
For PhD and MD-PhD professionals working in or around
clinical development, the guidance creates both opportunity and obligation. The
opportunity is real: well-designed Bayesian trials can be more efficient, more
informative, and more aligned with how scientists actually update beliefs in
light of evidence. The obligation is to use these methods with the same
epistemic rigor that has made well-conducted randomized trials the gold
standard of clinical evidence—pre-specifying priors with the same discipline applied
to hypotheses, evaluating sensitivity comprehensively, and resisting the
temptation to let methodological sophistication substitute for genuine
evidence.
The guidance is, in the language of FDA practice,
nonbinding. Its real influence will emerge through how it shapes the
case-by-case interactions between sponsors and the agency over the coming
years, and through whether the comment process—closing March 13, 2026—surfaces
sufficient critical engagement to refine the document before finalization.