In the past week or two, I saw the ANSARIAN paper on the large impacts that AI and LLM can already have in the oncology clinic, focus on AML.
This week, I saw an even newer paper by DELLAMONICA about the near future value of more assertive use of multimodality AI in oncology clinic and oncology clinical research.
I asked Chat gpt 5 to review the two papers and discuss.
###
AI CORNER
##
The Ansarian AML paper and the Dellamonica multimodal AI paper both argue that oncology is on the cusp of a major AI-driven shift, but they do so from opposite directions—Ansarian from a vertical, disease-specific perspective and Dellamonica from a horizontal, ecosystem-wide perspective.
Ansarian treats acute myeloid leukemia as a microcosm of what AI can do when it fully saturates a clinical service line, walking through a progression from early supervised models to deep learning on bone marrow morphology, genomic risk stratification, multimodal prognostic models, and the use of synthetic AML patient cohorts to emulate clinical trials. The paper’s strongest contribution is its attention to realistic workflow, offering a detailed clinical decision-support dashboard that displays ELN risk, AI overlays, mutation profiles, morphology assessments, and explicit physician-in-the-loop functions like “accept and document.” It therefore presents AI not merely as a set of algorithms but as a fully integrated part of AML diagnosis, prognosis, treatment planning, and documentation.
Dellamonica, by contrast, surveys multimodal AI (MMAI) across all of oncology, arguing that fusing pathology, radiology, multi-omics, electronic health records, epidemiology, and real-world evidence into a shared multimodal representation fundamentally reshapes risk models, trial design, drug development, population stratification, and clinical workflows.
The paper highlights numerous examples—pathology-genomics fusion models like Pathomic Fusion, radiology-histology-genomics frameworks like TRIDENT in lung cancer, real-world MMAI models like ABACO in metastatic breast cancer, and large foundation models trained across vision and language domains.
Dellamonica also emphasizes data governance, privacy-preserving learning, federated and swarm architectures, and a new conceptual hierarchy of AI evidence that includes pre-specified MMAI endpoints, synthetic controls, and continuous-learning clinical systems. Taken together, the papers show that AML is an ideal disease to illustrate how these multimodal approaches can be deployed end-to-end, while the multimodality paradigm itself is clearly applicable across nearly all tumor types.
A central sidebar question is whether Dellamonica’s multimodal framework is equivalent to what the current AI discourse calls a “world model.” In contemporary AI, multimodal foundation models—LLMs and VLMs trained on text, images, omics, and structured clinical data—are extremely powerful at prediction, generation, and pattern fusion, but they do not model environment dynamics or the consequences of actions over time, which are the defining attributes of a world model.
World models, in the Ha & Schmidhuber and Yann LeCun lineage, create internal simulations of how states evolve when actions are taken, enabling planning, counterfactual reasoning, and long-horizon optimization. Dellamonica does not cross this line: the paper discusses multimodal representation, predictive analytics, and foundation models, but it does not describe reinforcement-learning-like structures, action-conditioned transitions, or explicit simulators of clinical care pathways. In other words, Dellamonica represents the maturation of multimodality—not yet the emergence of world-model AI.
If world-model approaches were added on top of the Ansarian and Dellamonica foundations, oncology AI would shift from prediction to simulation and planning. In AML, this would mean dynamic models that learn how blast counts, MRD levels, toxicity, and relapse trajectories evolve under different induction, consolidation, and transplant strategies—allowing clinicians to simulate multiple personalized treatment sequences before committing.
In solid tumors, a world-model layer would enable digital-twin trial simulators, adaptive trial steering informed by simulated arms, and health-system-level “what-if” analyses to evaluate the impact of coverage policies or biomarker adoption on survival and cost curves. This is the natural extension of Dellamonica’s multimodal fabric: a shift from fusing data to modeling the oncology ecosystem as a dynamic environment. The combination of Ansarian’s concrete vertical instantiation, Dellamonica’s panoramic multimodal synthesis, and a future world-model layer points toward an oncology AI landscape capable of planning, optimizing, and simulating entire care pathways—not just predicting outcomes, but shaping them.
###
Additional recap - bullet style
###
Both papers are trying to answer roughly the same meta-question—how far can AI actually reorganize oncology care?—but they attack it from very different angles, and neither quite crosses the line into true “world model” territory yet.
1. What each paper is really about
Ansarian (JAMA Oncol, AML) is a vertical deep dive into one disease (AML) as a test bed for the whole AI stack. It walks through:
-
Classical supervised models for risk stratification from genomics and clinical features.
-
Deep learning on morphology (bone marrow slides, blood smears) to detect AML, subtype, and even infer mutations.
-
Generative and synthetic-data approaches—e.g., synthetic AML patients to mimic clinical trials and augment training.
-
Federated and foundation-model directions: large models for healthcare, and federated learning benchmarks like FHBench and work on federated foundation models.
All of this culminates in a concrete AML clinical decision-support (CDS) dashboard concept: integrated ELN risk + AI risk, morphology, variants, treatment suggestions, survival curves, and an explicit “accept & document” physician-in-the-loop workflow.
So Ansarian is essentially saying: “Here’s what a fully AI-saturated AML service line could look like, from lab to bedside, and what infrastructure and governance you’d need.”
Dellamonica (npj AI, MMAI) is a horizontal perspective: it treats multimodal AI as a new analytic fabric for all of oncology.
They define multimodal artificial intelligence (MMAI) as models that fuse histology, imaging, multi-omics, clinical records, and other data into a shared representation to exploit inter-scale biology. Examples span the entire cancer ecosystem:
-
Early detection & risk: multimodal epidemiologic models for prevention and stratified screening.
-
Diagnosis & prognosis: pan-cancer histology models that infer actionable mutations; fusion models like Pathomic Fusion combining pathology + genomics that outperform WHO 2021 risk classification; vision-language foundation models for precision oncology.
-
Treatment selection: platforms such as ABACO (RWE MMAI for HR+ mBC) and TRIDENT (radiomics + pathology + genomics in lung cancer) to carve out treatment-benefit signatures beyond conventional subgroups.
-
Health-system & R&D: DREAM multimodal drug-sensitivity challenges, Phase 3 trial re-analyses, MMAI-enabled subgroup finding, and a proposed new hierarchy of MMAI evidence incorporating RWE, synthetic controls, and continuous-learning systems.
They also go fairly far into federated learning, swarm learning, and privacy-preserving training, including the IP and data-governance constraints in pharma and cross-institutional collaborations.
Dellamonica is basically: “Take everything — trials, registries, images, omics, RWE — and treat MMAI as a cross-cutting analytic layer that re-prices risk, response, and value across the entire oncology ecosystem.”
2. How they complement each other
If you put them side by side:
-
Scope:
-
Ansarian = AML as a microcosm of AI-enabled hematology.
-
Dellamonica = pan-oncology, focusing on infrastructure, economics, and ecosystem-level effects.
-
-
Modality blend:
-
Both care deeply about multimodality. Ansarian notes multi-omics, imaging, and high-dimensional blood transcriptomics for AML prediction and monitoring.
-
Dellamonica elevates this to a formal MMAI paradigm: multi-omics, histology, radiology, clinical data, RWE, plus explicit reference to vision-language foundation models for precision oncology.
-
-
Clinical workflow vs analytics fabric:
-
Ansarian is strongest where it gets concrete: the AML CDS interface, with ELN risk, AI overlays, and explicit clinician acceptance flows.
-
Dellamonica mostly treats MMAI as an analytics layer that different stakeholders (clinicians, trialists, payers, pharma) will plug into; it does not specify individual UI workflows but focuses on how MMAI restructures trials, RWE, and reimbursement logic.
-
-
Evidence & regulation:
-
Ansarian leans heavily on diagnostic test methodologic tools (ROC basics, QUADAS-AI, external validation challenges) and HIPAA de-identification, and cites foundational reviews on healthcare foundation models and federated learning.
-
Dellamonica is more about new evidence hierarchies, continuous-learning systems, and trial design where MMAI is pre-specified rather than bolted on post-hoc.
-
You can almost think of Ansarian as an instantiated use-case chapter inside the Dellamonica book: if you inserted the AML dashboard into Dellamonica’s MMAI ecosystem figure, it would sit nicely at the “diagnosis + risk + treatment selection” nodes.
3. Sidebar: multimodality vs “world models” – how far does Dellamonica go?
In the emerging AI jargon, multimodal LLMs/foundation models and world models are overlapping but not identical species.
-
A multimodal LLM/foundation model fuses text, images, tabular data, maybe even omics, into a shared latent space and can predict or generate across them, but it is usually:
-
Trained on static or weakly sequential data.
-
Supervised or self-supervised for representation and prediction (e.g., “predict relapse,” “predict mutation status”).
-
Not explicitly built to model environment dynamics and agent actions.
-
-
A world model, in the Ha & Schmidhuber / LeCun sense, is a generative model of an environment that learns its dynamics. It compresses high-dimensional sensory input into a latent state and predicts how that state will evolve under different actions, supporting planning via internal simulation.(arXiv) Recent commentary frames world models as the next stage beyond LLMs: systems that can reason about physics, causality, and long-horizon consequences rather than just pattern-match over text.(The Wall Street Journal)
On a close read, Dellamonica does not claim MMAI is a “world model” architecture. The paper discusses:
-
Multimodal fusion for better prediction (MUSK, Pathomic Fusion, TRIDENT, pan-tumor RWE MMAI).
-
Vision-language foundation models for oncology.
-
Federated/swarm learning and privacy-preserving training.
But it does not describe:
-
Explicit modeling of clinical actions and their consequences over time.
-
Reinforcement-learning-like policy optimization.
-
An internal “simulator” of the oncology environment used to plan interventions.
So, to your sidebar question: Dellamonica is about broad multimodal and foundation-model-based analytics; it does not go all the way into world-model AI as LeCun, Ha/Schmidhuber, or the recent “digital twin as world model” papers define it.
4. What a “world model” layer would add beyond Ansarian + Dellamonica
If we take Ansarian’s vertical AML view and Dellamonica’s horizontal MMAI view as the current state of the art, a next step would be to add a genuinely world-model layer on top. Conceptually, that looks more like:
“AI not only predicts risk or response from multimodal data, but internally simulates the patient–disease–system over time under different treatment and system-level policies.”
That’s very close to the digital twin concept, and we’re starting to see LLM- or transformer-based systems described explicitly as world models for clinical trials and patient twins.(arXiv)
Concretely, you could imagine three layers:
-
Multimodal representation (Dellamonica layer)
-
Fuse histology, imaging, omics, labs, notes, PROs, etc. into a state vector S(t) for each patient and timepoint.
-
-
World-model dynamics (new layer)
-
Learn ( P(S_{t+1} \mid S_t, A_t, C) ): how the state changes given treatment choices (A_t) (chemo, transplant, TKI, dosing changes), context (C) (center type, guideline era, payer rules), and latent factors.
-
This is the Ha/Schmidhuber + JEPA spirit: a predictive model of the world (here, the oncology care environment), born from longitudinal EHR, registry, and trial data.(arXiv)
-
-
Policy and planning (RL/decision layer)
-
Use offline RL or planning atop that world model to propose treatment policies, trial designs, or care-pathway redesigns that maximize a reward (survival, QALYs, cost, toxicity, equity).
-
Now plug that back into the Ansarian + Dellamonica landscape:
a) AML as an early proving ground
Ansarian already cites synthetic AML cohorts used to mimic clinical trials. A world-model extension would:
-
Use real AML longitudinal cohorts + synthetic augmentation to learn the dynamics of blast burden, MRD, toxicity, and relapse under different regimens.
-
Let you run simulated treatment policies: “What if we intensify consolidation here?” “What if we delay transplant until MRD crosses threshold X?”
-
Feed those simulations back into the AML CDS dashboard as policy suggestions with uncertainty bands, rather than mere risk estimates.
That shifts AML AI from predictive scoring (today) to decision-optimal treatment sequencing (world model).
b) Solid tumors and system-level design
Dellamonica’s examples—ABACO, TRIDENT, pan-tumor RWE MMAI—are already nibbling at the edge of dynamic modeling, but stop at “who benefits most from regimen A vs B”. A world-model layer could:
-
Build tumor-trajectory simulators that integrate ctDNA/ctDNA MRD, imaging, and toxicity data over time under different systemic sequences.
-
Support adaptive trial design where trial steering committees simulate the impact of adding or retiring arms, changing eligibility, or switching endpoints in silico before implementing changes.
-
Enable payer-level “what-if”s: simulate how coverage of MRD testing or MMAI-guided therapy would change long-term outcomes and cost curves in a given health system.
This is very close to what some industry groups are hinting at when they talk about world-model-driven digital twins for clinical development and health-system planning.(kiwi.oden.utexas.edu)
c) Clinical “world models” ≠ robotics world models, but the same idea
The key conceptual shift is the same as in robotics:
-
LLM / MMAI today: “Given all this data, what is the risk, the label, or the best next single action?”
-
World model in oncology: “Given everything we know and have observed, how will the entire trajectory unfold under different action sequences—and which sequence is best under the constraints we care about?”
You can absolutely reuse Dellamonica’s multimodal representations and Ansarian’s domain-specific AML feature engineering as input layers to this kind of model. The novelty is in treating the care pathway as an environment and learning its dynamics.
5. Short answer to your sidebar question
-
Does Dellamonica’s multimodality already equal “world model” AI?
No. It is a broad and sophisticated account of multimodal, foundation-model-inflected prediction and stratification, but it does not explicitly model actions and environment dynamics in the world-model sense. -
How would a true world-model approach extend Ansarian + Dellamonica?
By adding a generative, dynamical layer that can simulate patient and system trajectories under alternative policies (treatment choices, trial designs, coverage rules) and then optimize those policies—essentially turning multimodal oncology AI into a planning engine rather than just a prediction engine.
If you wanted to write the follow-on blog to your Ansarian piece, I think the frame is:
“Ansarian shows how AI saturates one disease (AML). Dellamonica shows how multimodal AI saturates the oncology ecosystem. The next turn of the wheel is world-model AI that can simulate entire care pathways, letting us test treatments, trials, and coverage policies in silico before patients ever feel the consequences.”