In a prior blog, I asked Chat GPT and Claude to review a January, 2026, guidance on Bayesian design in clinical drug trials. Here. I asked, what about diagnostics?
Claude first, then Chat GPT.
###
###
Discuss the most important ways the analysis would be different if the topic were used of Bayesian design in developing of genomic tests [diagnostics] as opposed to biotechnology and drugs [therapeutics.]
###
CLAUDE
Bayesian Design in Genomic Diagnostics versus Therapeutics: Key Analytical Divergences
The January 2026 FDA draft guidance addresses drugs and biologics, operating under CDER and CBER. Genomic diagnostics fall under CDRH (for in vitro diagnostic devices, IVDs) and increasingly under CDER for companion diagnostics co-developed with drugs, while laboratory-developed tests (LDTs) sit in a contested regulatory space following the FDA's 2024 final rule and subsequent litigation. The Berry Consultants commentary noted in passing that CDRH "has a separate guidance on Bayesian methods" — specifically, the 2010 Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. The analytical landscape differs in ways that go well beyond which Center has jurisdiction.
1. The Estimand Itself Is Categorically Different
The therapeutic guidance centers on estimating a treatment effect — a causal parameter requiring randomization to estimate without bias. The relevant Bayesian machinery concerns priors over treatment effects, hierarchical borrowing of effect estimates, and inference about whether a parameter exceeds a threshold of clinical benefit.
Genomic diagnostics are evaluated against very different estimands:
- Analytical performance: limit of detection, precision, reproducibility across sites and reagent lots, specificity for the intended target
- Clinical performance: sensitivity, specificity, positive and negative predictive values, likelihood ratios
- Clinical utility: whether using the test changes outcomes, typically requiring a clinical-utility study or modeling exercise
These are largely prediction problems and classification problems, not causal inference problems. Bayesian methods are arguably more natural here than in therapeutics, because the relevant quantities — posterior probability of disease given a test result, predictive value adjusted for prevalence — are inherently Bayesian. PPV and NPV are literally Bayes' theorem applied to diagnostic data. The Evans, Fleming, Janes, and Dodd critique in JAMA of Bayesian methods in confirmatory therapeutic trials — that they erode the benefits of randomization — does not apply with the same force, because randomization is not the foundational epistemic warrant for diagnostic accuracy studies.
2. The Reference Standard Problem Has No Therapeutic Analog
Diagnostic accuracy studies require a reference standard against which the new test is compared. For many genomic tests, the reference standard is itself imperfect:
- Sequencing-based variant calling is benchmarked against orthogonal sequencing platforms, with discordance resolution often involving Sanger confirmation or replicate testing
- Cancer mutation panels may have no true gold standard for certain variants, particularly at low variant allele frequencies
- Polygenic risk scores have no contemporaneous reference at all; their "truth" is a future clinical event
This creates a class of statistical problems — latent class models, imperfect reference standard methods, composite reference standards — that Bayesian methods handle particularly well but that have no parallel in the therapeutic guidance. The Bayesian framework here is not optional methodological flavor; it is often the only tractable framework for inference when the reference standard itself is uncertain.
3. The Borrowing Question Takes a Different Shape
The therapeutic guidance devotes substantial attention to borrowing from historical trials, external controls, and adult-to-pediatric extrapolation. The borrowing question in genomic diagnostics is structurally different:
Borrowing across analytical conditions: When a sequencing assay is validated, performance characteristics are estimated across many variant types, depths of coverage, allele frequencies, and sample matrices. Hierarchical Bayesian models naturally handle this combinatorial space; treating each variant-condition combination as an independent estimation problem is statistically wasteful and often infeasible given the rarity of specific variants.
Borrowing across variants in panels: A multi-gene panel may contain hundreds or thousands of variants, most of which will never be observed at sufficient frequency in any validation study to establish standalone performance. Bayesian hierarchical modeling — borrowing strength across variants sharing structural features (variant type, genomic context, technical difficulty class) — is essentially mandatory for these products.
Borrowing across populations: This is where the analogy is closest. The therapeutic guidance's framework for adult-to-pediatric extrapolation has a diagnostic parallel in the recognized problem that genomic test performance, particularly for polygenic risk scores, often degrades dramatically when applied to populations under-represented in discovery cohorts. The Bayesian framework can formalize this discounting, but the ethical and equity stakes are arguably sharper than in the therapeutic setting because performance degradation in under-represented groups can entrench health disparities through the test's deployment.
4. Locked versus Adaptive Algorithms
The therapeutic guidance assumes a fixed protocol throughout the trial, with adaptations limited to pre-specified options. Genomic diagnostics, particularly those incorporating machine learning components (as many modern variant interpretation pipelines do), present a different problem: the Predetermined Change Control Plan framework that FDA developed for AI/ML-enabled device software anticipates ongoing algorithm updates.
This intersects with Bayesian methodology in ways the therapeutic guidance does not address. A diagnostic algorithm that updates its variant classifications as new evidence accumulates is essentially performing continuous Bayesian updating in production. The regulatory questions — when does an update require new submission, how is performance monitored post-market, what constitutes prior-data conflict in a deployed algorithm — are distinct from anything in the drug context. The therapeutic guidance's framework of pre-specified analysis priors with limited sensitivity-analysis design priors does not map cleanly onto a system designed for ongoing learning.
5. Type I Error Has Different Meaning and Stakes
The therapeutic guidance's bifurcation between Type I error-calibrated designs and fully Bayesian designs presupposes that the central inferential question is whether a treatment effect exceeds zero (or a non-inferiority margin). Diagnostic studies typically test multiple performance characteristics simultaneously against pre-specified acceptance criteria:
- Sensitivity ≥ some threshold (with appropriate confidence)
- Specificity ≥ some threshold
- Reproducibility within acceptable bounds
- Limit of detection at or below claimed value
The multiplicity structure is fundamentally different. CDRH has historically used Bayesian methods with explicit acknowledgment of Type I error inflation, often accepting somewhat higher Type I error rates when borrowing is well-justified — a posture Berry Consultants identified as a real difference between CDRH and CDER/CBER practice. The new CDER/CBER guidance does not change this; it leaves the device-side approach intact, but PhD-level practitioners working across both domains need to recognize that the regulatory calculus differs.
6. The Loss Function Is More Tractable
The therapeutic guidance gestures toward decision-theoretic approaches with loss functions but does not develop them extensively. In diagnostics, the loss function structure is often more concrete and quantifiable:
- False positives lead to specific downstream costs: confirmatory testing, unnecessary treatment, anxiety, potential harm from invasive workup
- False negatives lead to specific clinical consequences: missed diagnosis, delayed treatment, potential progression
- These can often be expressed in QALY terms or direct economic terms more readily than the loss function for approving an ineffective drug
This makes net benefit analysis — Vickers and colleagues' decision-curve framework being one prominent example — a natural setting for Bayesian methods in diagnostics. The prior over disease prevalence in the intended-use population directly enters the calculation, and sensitivity to this prior is a routine sensitivity analysis. The therapeutic guidance's relatively underdeveloped treatment of loss functions reflects, in part, that the loss function question is genuinely harder in the drug context.
7. Companion Diagnostics: The Hybrid Case
Companion diagnostics, co-developed with targeted therapies, sit at the intersection. Here the therapeutic guidance applies to the drug arm and the device guidance applies to the test, but the integrated development program raises Bayesian questions that neither guidance fully addresses:
- How should prior information from the analytical validation of the diagnostic inform the design of the therapeutic trial that depends on it?
- When the cutpoint for biomarker positivity is itself uncertain, how should that uncertainty propagate into the treatment effect estimate?
- How should the inevitable drift between the trial-use assay and the commercial assay be handled in regulatory inference?
These are inherently Bayesian questions, but the regulatory framework remains compartmentalized in ways that PhD-level methodologists working in oncology biomarker programs encounter constantly.
8. Post-Market Surveillance and Performance Drift
The therapeutic guidance is largely silent on post-market Bayesian updating, treating the trial as the inferential endpoint. For genomic diagnostics, performance can drift for reasons that have no therapeutic analog:
- Population allele frequencies shift as the test is deployed in new geographies
- The spectrum of variants encountered changes as the test moves from research to clinical use
- Reagent reformulations and platform updates introduce subtle performance changes
- The distribution of indications for testing changes as clinical guidelines evolve
Bayesian sequential monitoring frameworks for diagnostic performance — updating posterior estimates of sensitivity and specificity as confirmed cases accumulate post-market — are an active methodological area with no clean parallel in the drug guidance, which assumes that post-approval evidence generation occurs through formally structured studies rather than continuous Bayesian updating from routine use.
9. The Replication and Generalizability Critique Cuts Differently
The Evans-Fleming-Janes-Dodd critique emphasizes that bayesian borrowing in confirmatory therapeutic trials risks compromising the randomization-based warrant for unbiased inference. This concern does not transfer directly to diagnostics, where the inferential foundation is not randomization but rather representative sampling from the intended-use population.
But a parallel concern does apply: diagnostic accuracy estimates derived from spectrum-enriched validation cohorts (e.g., samples enriched for known positives and clear negatives) systematically overestimate real-world performance. Bayesian methods can either ameliorate this (by formalizing the discounting needed to translate validation performance to deployment performance) or worsen it (by importing optimistic priors from non-representative cohorts into new validation studies). The structural risk is analogous; the mechanism differs.
10. Regulatory Sociology and Sponsor Capability
The therapeutic guidance addresses sponsors with substantial biostatistical infrastructure — large pharmaceutical companies with dedicated Bayesian methods groups. The diagnostic industry is more heterogeneous: large IVD manufacturers, mid-size sequencing companies, single-laboratory LDT providers, and academic centers all operate in this space. The implicit assumption of sophisticated sponsor capability that runs through the therapeutic guidance applies less uniformly in the diagnostic context, where regulatory burden falls on a much wider distribution of organizational types. A Bayesian methodology guidance for diagnostics — were one to be developed in parallel to the therapeutic version — would need to address this capability heterogeneity more explicitly than the current CDER/CBER document does.
Synthesis
A Bayesian methodology guidance for genomic diagnostics would not be a simple translation of the January 2026 therapeutic document. The estimands differ, the role of randomization differs, the multiplicity structure differs, the borrowing problems differ, the post-market lifecycle differs, and the relationship between Bayesian inference and the underlying scientific question differs. In some respects Bayesian methods are more natural in diagnostics — PPV is intrinsically Bayesian, hierarchical borrowing across variants is often necessary rather than optional, and loss functions are more tractable. In other respects the methodological challenges are sharper — reference standard imperfection, performance drift, equity implications of population-specific performance, and the integration with continuously-learning algorithms.
The therapeutic guidance is best understood as one document in what should ultimately be a coordinated family of statistical guidances. CDRH's 2010 document on medical device Bayesian statistics anticipated some of these issues but predates both the AI/ML era and the modern genomic diagnostics landscape. The most consequential gap in current FDA Bayesian methodology guidance is arguably not in the therapeutic space — which the January 2026 draft now substantially addresses — but in the diagnostic and combination-product spaces, where the inferential questions are in some ways better suited to Bayesian formalism but where formal guidance has not kept pace with either the science or the deployment patterns of modern genomic tests.
##
##
##
CHAT GPT
##
##
Yes — the analysis changes substantially if the target is Bayesian design for genomic diagnostics rather than Bayesian design for therapeutic drug trials. The same statistical vocabulary applies — priors, posteriors, operating characteristics, borrowing, sensitivity analyses — but the evidentiary object is different.
For drugs, FDA’s January 2026 draft guidance is centered on Bayesian methods to support primary inference in clinical trials intended to establish effectiveness and safety of drugs and biologics. FDA explicitly frames use cases such as interim adaptation, dose selection, and primary inference in trials supporting drug safety and effectiveness. (U.S. Food and Drug Administration) For genomic tests, the central question is not usually, “Does the intervention improve outcomes?” It is more often, “Does this test accurately, reproducibly, and clinically usefully classify patients, variants, tumors, residual disease, or treatment-relevant molecular states?”
1. The target of inference changes: treatment effect vs diagnostic performance
In a therapeutic trial, the principal Bayesian target is often a treatment effect: hazard ratio, response-rate difference, mean change, risk difference, noninferiority margin, dose-response function, or benefit-risk quantity. The regulatory question is whether the drug produces a sufficiently reliable clinical effect in the intended population.
For a genomic diagnostic, the key parameters are different. They may include sensitivity, specificity, positive percent agreement, negative percent agreement, limit of detection, variant-calling accuracy, reproducibility, tumor-fraction thresholds, concordance with tissue testing, positive predictive value, negative predictive value, or clinical reclassification performance. FDA’s diagnostic-test statistical guidance emphasizes that sensitivity and specificity are estimates from a subset of the intended-use population, and that PPV and NPV help characterize how test results should be interpreted. (U.S. Food and Drug Administration)
That changes the Bayesian analysis. In a drug trial, a posterior distribution may answer, “What is the probability the treatment effect exceeds zero or exceeds a clinically meaningful threshold?” In a genomic test submission, the posterior may answer, “What is the probability that sensitivity exceeds 95% at a specified allele fraction?” or “What is the posterior distribution for false-negative risk in Stage II colon cancer MRD detection?” The inferential quantity is not therapeutic benefit; it is classification performance under a specified intended use.
2. The regulatory center of gravity shifts from CDER/CBER to CDRH, and from drug approval logic to device/diagnostic logic
The 2026 Bayesian draft guidance is a CDER/CBER drug-and-biologics document. Diagnostics generally sit under CDRH device/IVD regulation, although companion diagnostics and certain biologic-related assays create cross-center interactions. FDA already has a separate Bayesian guidance for medical device clinical trials, which states that it addresses the statistical design and analysis of medical device trials using Bayesian methods. (U.S. Food and Drug Administration)
That matters because devices and diagnostics often have a different development logic. FDA’s medical-device Bayesian guidance notes that prior information may be more available for devices because devices often evolve incrementally, have physical mechanisms of action, and may have evidence from prior generations or overseas use. (U.S. Food and Drug Administration) For genomic diagnostics, the analogy is not perfect, but it is relevant: prior information may come from earlier versions of the assay, comparator platforms, analytical validation datasets, orthogonal sequencing, curated variant databases, public genomic repositories, natural-history cohorts, or prior clinical studies.
Thus, the diagnostic Bayesian question often becomes: How much can we borrow from technically or biologically adjacent evidence without overstating performance in the exact intended-use population?
3. “Prior information” is more likely to be technical, platform-based, or variant-class-based
For drugs, a prior may come from a prior Phase 2 study, adult data used for pediatric extrapolation, historical controls, a platform trial, or a related disease subtype. FDA’s drug guidance includes examples of borrowing from previous clinical trials, augmenting controls with external or nonconcurrent controls, pediatric extrapolation, borrowing across disease subtypes, subgroup borrowing, and oncology dose finding. (U.S. Food and Drug Administration)
For genomic diagnostics, prior information may be more granular and technical. Examples include:
Prior assay versions. A lab may have an earlier NGS panel and now adds genes, changes chemistry, switches instruments, modifies bioinformatics, or lowers the limit of detection. Bayesian borrowing might support bridging between old and new versions.
Variant-class borrowing. SNVs, indels, CNVs, fusions, MSI, TMB, methylation signals, and fragmentomic features may have different error structures. A Bayesian model might borrow across variant classes only if the wet-lab and informatic mechanisms are sufficiently similar.
Platform or site borrowing. If the same assay is run across laboratories, Bayesian hierarchical models might estimate site effects, lot effects, operator effects, batch effects, or instrument effects.
Synthetic and contrived samples. Genomic diagnostics often rely on blends, cell lines, engineered materials, reference standards, and dilution series. Bayesian designs may formally combine contrived-sample evidence with clinical-sample evidence, but the key regulatory question becomes whether contrived samples really represent clinical specimen behavior.
That is very different from a therapeutic prior. The prior is not just “belief about drug efficacy.” It may be knowledge about measurement performance.
4. The central bias problem changes: confounding vs reference-standard and spectrum bias
In therapeutic trials, the classic threat is confounding, especially when randomized evidence is diluted by external controls or historical borrowing. That is why critics of Bayesian methods in confirmatory drug trials worry about compromising the evidentiary value of randomization.
For diagnostics, the bias problem is different. FDA’s diagnostic-test guidance emphasizes that sensitivity and specificity can be biased; simply increasing the sample size does not reduce bias; and key sources include error in the reference standard and incorporation of the candidate test into the definition of the target condition. (U.S. Food and Drug Administration)
For genomic tests, major bias risks include:
Reference-standard bias. What is the truth comparator? Sanger? ddPCR? another NGS panel? tissue biopsy? clinical adjudication? longitudinal recurrence? orthogonal methylation assay?
Partial verification bias. If only discordant or selected samples undergo deeper adjudication, performance estimates can be biased.
Spectrum bias. A test may perform well in obvious high-tumor-fraction samples but less well in low-shedding tumors, early-stage disease, low-input FFPE, degraded cfDNA, or minority ancestry groups underrepresented in variant databases.
Prevalence distortion. Enriched validation sets can estimate sensitivity/specificity but may not support real-world PPV/NPV.
Incorporation bias. If the new test influences the adjudicated truth standard, the apparent performance can be inflated.
FDA’s diagnostic guidance is explicit that when no true reference standard is available, sensitivity and specificity may not be appropriate terms; agreement measures may be required instead, and agreement is not correctness because two tests can agree and both be wrong. (U.S. Food and Drug Administration) That point becomes central in Bayesian genomic-test design.
5. The intended-use statement becomes the anchor
For drugs, the analog is the indication. For diagnostics, the intended use is even more determinative. Bayesian design must be tied to who is tested, with what specimen, at what disease stage, for what clinical decision, and against what truth standard.
A Bayesian analysis of a genomic test is incomplete unless it specifies whether the test is intended for:
therapy selection, such as detecting actionable mutations;
companion diagnostic use, where false negatives may deny an effective therapy and false positives may expose patients to an ineffective or harmful therapy;
screening, where prevalence is low and false positives can dominate;
minimal residual disease, where the key question may be recurrence risk or lead time;
monitoring, where serial dynamics matter more than a single binary result;
diagnosis of rare inherited disease, where prior probability may be shaped by phenotype, family history, ancestry, and variant interpretation;
tumor profiling, where the endpoint may be concordance with tissue, detection of actionable variants, or successful classification.
The Bayesian model must serve the intended-use claim. A beautiful posterior distribution for analytical sensitivity is not enough if the label claim is about clinical recurrence prediction.
6. Clinical utility becomes harder to separate from clinical validity
For therapeutics, the clinical utility is intrinsic: if the drug improves survival, symptoms, or another accepted endpoint, the intervention has clinical value subject to benefit-risk.
For diagnostics, accuracy is not automatically utility. A genomic test can be analytically excellent and clinically useless, or clinically interesting but not yet action-guiding. Bayesian methods may help estimate diagnostic performance, but the regulatory and payer questions may require additional layers:
Does the result change management?
Is the management change evidence-based?
Does earlier molecular detection improve outcome or merely move the clock?
Does the test identify patients who benefit from a drug, surveillance program, or de-escalation strategy?
Does a negative result safely avoid treatment, biopsy, imaging, or chemotherapy?
For FDA, some diagnostic claims may be cleared or approved based on analytical and clinical validity. For payers, especially Medicare, the clinical utility argument may dominate. A Bayesian design that efficiently estimates sensitivity and specificity may therefore be only the beginning of the commercial evidence story.
7. Bayesian borrowing may be more attractive — but also more treacherous — in genomics
Genomics is full of repeated structure: genes, variants, samples, sites, batches, instruments, bioinformatic pipelines, cancer types, and molecular subgroups. That makes Bayesian hierarchical modeling tempting. It can borrow strength across rare variants, rare cancers, low-prevalence biomarkers, and small subgroups.
This is especially useful where conventional validation becomes numerically absurd. For example, validating every rare fusion, every exon-level deletion, every low-frequency variant, and every specimen subtype with large independent sample sizes may be impossible. Bayesian modeling can help by allowing partial pooling across biologically or technically related categories.
But the danger is false exchangeability. An NTRK fusion, an EGFR exon 19 deletion, a MET exon 14 skipping event, a copy-number amplification, and a methylation classifier are not interchangeable just because they appear on the same report. Similarly, a variant called at 35% allele fraction in a high-quality tissue sample is not exchangeable with a 0.08% ctDNA signal in plasma. Bayesian borrowing is powerful only when the borrowed units are genuinely comparable.
The practical rule would be: borrow across genomic units only when the wet-lab process, informatic process, specimen type, error mode, and clinical interpretation are sufficiently aligned.
8. Adaptive designs are less about patient assignment and more about sample allocation, thresholds, and enrichment
In therapeutic trials, Bayesian adaptive design often means modifying randomization, stopping early for futility or success, dropping arms, selecting doses, or enriching a subgroup.
In genomic diagnostics, adaptive Bayesian design may look different. It may involve:
adaptive allocation of validation samples to poorly characterized variant classes;
sequential expansion of rare-variant cohorts;
adaptive refinement of cutoffs, such as tumor fraction or methylation classifier thresholds;
planned enrichment for discordant or borderline cases;
adaptive stopping when precision around sensitivity or specificity is adequate;
Bayesian monitoring of lot-to-lot, site-to-site, or software-version drift;
posterior predictive checks for performance in underrepresented subgroups.
This is a major conceptual difference. In diagnostics, the “trial” may be less like a drug RCT and more like a measurement-system validation program.
9. Operating characteristics should be diagnostic-specific
For drugs, the standard operating characteristics include Type I error, power, bias, MSE, coverage, and probability of correct decision. The 2026 drug guidance explicitly discusses Bayesian calculations for primary inference and adaptive rules in drug trials. (U.S. Food and Drug Administration)
For genomic diagnostics, operating characteristics should include quantities such as:
posterior probability that sensitivity exceeds a clinically acceptable floor;
posterior probability that specificity exceeds a minimum threshold;
expected false negatives and false positives per 1,000 tested patients at plausible prevalence levels;
posterior distribution of PPV and NPV across intended-use prevalence scenarios;
probability of incorrect classification near the cutoff;
probability of no-call, indeterminate, or quantity-not-sufficient results;
robustness to specimen quality, tumor fraction, ancestry, batch, instrument, and site effects;
performance drift after software updates.
This is particularly important because a genomic test may have multiple performance regimes. A ctDNA MRD assay, for example, may behave very differently by tumor type, stage, shedding biology, time from surgery, adjuvant therapy, and blood-draw schedule. A single posterior mean sensitivity may be less informative than a hierarchical posterior profile across clinically meaningful strata.
10. The false-positive / false-negative asymmetry may dominate the Bayesian decision rule
In therapeutic trials, false approval and false rejection are usually framed as approval of ineffective therapy versus failure to approve effective therapy, with safety layered into benefit-risk.
In diagnostics, the harms are more context-specific:
A false positive companion diagnostic may send a patient to an ineffective targeted therapy.
A false negative companion diagnostic may deny a highly effective drug.
A false positive MRD test may trigger unnecessary chemotherapy, imaging, anxiety, or trial enrollment.
A false negative MRD test may falsely reassure and delay treatment.
A false positive screening test may initiate cascades of imaging, biopsy, and overdiagnosis.
A false negative screening test may delay cancer diagnosis.
Thus, Bayesian diagnostic design naturally invites decision analysis. The relevant threshold may not be “posterior probability of performance > X,” but expected clinical harm under different test-result pathways. FDA’s device Bayesian guidance explicitly includes decision analysis, predictive probabilities, interim analyses, model checking, and sensitivity analysis among topics for Bayesian medical device trials. (U.S. Food and Drug Administration)
For genomic diagnostics, this is where Bayesian methods could be especially valuable: they can formalize asymmetric consequences of false positives and false negatives in a way that conventional sensitivity/specificity tables often do not.
11. The payer implications are different and probably more severe
For drugs, FDA approval often creates substantial market access momentum, although payers may restrict coverage.
For genomic diagnostics, FDA authorization does not automatically solve reimbursement. Medicare and commercial payers may still ask whether the test is reasonable and necessary, whether it changes management, whether the evidence supports improved outcomes, whether the tested population matches coverage criteria, and whether serial testing is justified.
Bayesian evidence may be persuasive to FDA for a diagnostic performance claim but less persuasive to payers if the clinical utility chain is incomplete. This is especially true for MRD, MCED, methylation classifiers, AI-derived genomic signatures, and recurrence-risk tests. A Bayesian validation design should therefore be paired with a payer-facing evidence plan: decision impact, outcomes, real-world management changes, and health-economic consequences.
12. The article’s tone would shift from “modernizing pivotal trials” to “formalizing evidence integration for complex measurement systems”
If rewriting the earlier article for genomic diagnostics, I would change the organizing thesis.
For therapeutics, the article is about whether Bayesian methods can support drug approval-level causal inference while preserving evidentiary standards.
For genomic diagnostics, the article would be about whether Bayesian methods can support credible performance estimation and clinical interpretation in complex, high-dimensional, low-prevalence, rapidly evolving measurement systems.
That is a different article.
The major sections would become:
1. Bayesian methods as a tool for diagnostic evidence integration.
The emphasis would be on combining analytical validation, clinical validation, orthogonal comparator data, reference materials, prior assay versions, public genomic knowledge, and real-world datasets.
2. Intended use as the Bayesian anchor.
Every prior, endpoint, and operating characteristic would be tied to a specific specimen, population, disease state, and clinical decision.
3. Reference standards and the problem of truth.
This would become a central section, not a side issue. FDA’s diagnostic guidance makes clear that when the comparator is not a reference standard, sensitivity and specificity may be inappropriate and agreement is not correctness. (U.S. Food and Drug Administration)
4. Borrowing across variants, samples, sites, and platforms.
This would be the diagnostics analog of borrowing across trials, diseases, or subgroups in drug development.
5. Bayesian decision thresholds and clinical consequences.
The article would discuss false-positive and false-negative harms for specific diagnostic contexts.
6. FDA authorization versus payer adoption.
For diagnostics, this deserves a major section because clinical validity and clinical utility are often separated in ways that are less true for therapeutics.
Bottom line
For drugs, Bayesian design asks: Can we use prior information and accumulating trial data to make a reliable causal claim about safety and effectiveness?
For genomic diagnostics, Bayesian design asks: Can we integrate heterogeneous molecular, analytical, clinical, and prior-platform information to make a reliable claim about test performance and clinical interpretability in a defined intended-use population?
The therapeutic version is mainly about causal inference under regulatory evidentiary standards. The diagnostic version is mainly about classification accuracy, reference truth, measurement validity, and clinical decision consequences. Bayesian methods may be even more naturally suited to genomic diagnostics because genomics is modular, hierarchical, sparse, and information-rich. But that same structure makes over-borrowing, hidden bias, and false exchangeability especially dangerous.