Tuesday, January 27, 2026

Nature med 2026 Li et al STANFORD / Elaborate H&E to Proteomics to Prediction

D Michael Montalto of Amgen Precision Medicine writes,

Predicting molecular phenotypes and spatial biology from H&E is not necessarily new... but this is certainly an impressive level of resolution that we have not seen before for protein predictions.

Only trained on 10 NSCLC patients. Some might say that is a limitation, but I would consider that just a start. Imagine if trained on larger data sets with continuing model improvements.

Hope we are all thinking about how to routinely get more from H&E in drug development and clinical care.

###


###

Chat GPT 5.2 summarizes:

Love this paper choice — it’s dense because it’s trying to quietly introduce a new way of thinking about what an H&E slide is. Here’s the same story, but as a flowing, clinical-scientific essay rather than a technical breakdown.


This study describes a striking conceptual shift in digital pathology: the authors argue that a routine hematoxylin and eosin (H&E) slide is not merely a morphological image, but a latent molecular map from which spatial protein biology can be computationally reconstructed. Their system, called HEX (H&E to protein expression), uses deep learning to generate virtual spatial proteomics profiles directly from standard histopathology. Rather than measuring protein expression through multiplexed immunofluorescence platforms such as CODEX, which are expensive, technically complex, and difficult to scale, HEX infers the spatial distribution of 40 immune, epithelial, stromal, and functional protein markers using only morphology. The result is effectively a computational spatial proteomics assay layered onto every H&E slide.

What makes this technically distinctive is that the model was not trained on weak bulk labels but on tightly paired data: histology tiles were co-registered with true spatial proteomics measurements. This allowed the network to learn direct associations between tissue architecture and protein-level spatial organization. Prediction accuracy for proteins reached Pearson correlations in the 0.73–0.79 range, which is unusually high in this field and far exceeds typical performance reported for spatial transcriptomic inference from H&E, where correlations often hover around 0.2. In other words, the model is not making vague gene-expression guesses; it is recovering spatially structured immune and stromal phenotypes that appear strongly encoded in morphology.

However, the core innovation is not protein prediction alone but how these inferred proteomic maps are used. The authors developed a multimodal fusion framework (MICA) that integrates two complementary views of the tumor: classical histomorphology and AI-derived virtual protein spatial maps. This fusion substantially improved clinical outcome modeling. In early-stage non-small-cell lung cancer (NSCLC), prognostic performance increased by roughly 22% (C-index) compared with models based only on H&E or clinicopathologic factors. For immunotherapy response in advanced NSCLC, prediction accuracy improved by 24–39% compared with conventional biomarkers such as PD-L1 expression and tumor mutation burden. In a field where incremental gains of a few percentage points are often considered meaningful, these are large effects.

Importantly, the system does not behave like a pure black box. Because it produces spatial protein maps, the model’s predictions can be biologically interpreted. Favorable prognosis and immunotherapy response were associated with spatial niches enriched in coordinated T-cell populations—co-localization of CD8 cytotoxic cells, TCF-1–positive progenitor/exhausted T cells, and helper T cells. Poor outcomes, in contrast, were linked to regions dominated by CD163-positive macrophages, CD66b-positive neutrophils, FAP-positive fibroblasts, and extracellular-matrix remodeling—patterns consistent with immunosuppressive, pro-tumor microenvironments. Notably, it was the spatial co-localization of cell states, rather than single markers, that carried prognostic and predictive value. This aligns closely with modern understanding of the tumor microenvironment and lends biological credibility to the model.

From a clinical standpoint, the most immediate implication lies in early-stage lung cancer. Decisions about adjuvant therapy currently rely on coarse clinicopathologic variables such as tumor size and nodal involvement, which imperfectly capture recurrence risk. HEX-derived risk stratification could potentially identify biologically aggressive tumors that appear low risk by TNM staging, while sparing others unnecessary chemotherapy. The second major application is immunotherapy selection. PD-L1 expression and TMB are imperfect predictors, and the study shows that spatial immune architecture inferred from morphology may outperform them. The model identifies patients within PD-L1–high, intermediate, and even PD-L1–negative groups who are more or less likely to benefit, suggesting a path toward more nuanced immunotherapy decision-making.

Yet the work is not without caveats. The analyses are retrospective, and the immunotherapy cohort, though substantial, is not definitive. The method assumes that morphology consistently encodes molecular states across institutions and staining variability. Moreover, despite interpretability tools, this remains a complex AI-derived assay that would face significant regulatory and validation hurdles. Conceptually, the system infers proteins indirectly through morphology-correlated patterns; its robustness outside the training distributions remains to be proven.

Even so, the broader implication is difficult to overstate. If validated prospectively, this approach turns every archived H&E slide into a scalable, low-cost spatial biology resource. It collapses the traditional barrier between morphology and molecular profiling, enabling population-scale spatial biomarker discovery without additional tissue or laboratory assays. Rather than replacing existing biomarkers immediately, HEX would likely augment them at first, adding a new dimension: morphology-encoded molecular spatial context.

In essence, the study proposes that the histology slide already contains a hidden molecular atlas. The authors show that AI can decode part of that atlas and that doing so materially improves prediction of recurrence and immunotherapy response in lung cancer. Whether this becomes a routine clinical tool will depend on prospective trials and regulatory navigation, but as a scientific advance, it represents a genuine shift toward computationally derived spatial phenotyping from routine pathology.

#

______________________________________________

#

Compare to OWKIN papers by Garberis and by Gaury.

#

Nature Comm GARBERIS (OWKIN)


MedRxiv GAURY (OWKIN)



#

Li (Nature Medicine 2026) and the Owkin “RlapsRisk” papers are cousins in the same family—WSI-first models meant to make H&E do more than diagnosis—but they’re trying to solve different clinical problems with very different “biological grounding.”

What Li is, in one sentence

Li’s novelty is that it uses H&E to generate “virtual spatial proteomics” (virtual CODEX maps) with an AI model called HEX, and then fuses that with H&E morphology in a framework (MICA) to predict prognosis across cohorts and, importantly, immunotherapy response in advanced NSCLC.

The “new” part vs prior H&E models

Most H&E prediction papers jump directly from pixels → endpoint, and interpretability is often post hoc. Li adds an intermediate, biologically interpretable layer: pixels → spatial protein maps → clinically relevant predictions.

They report HEX’s accuracy for protein prediction across 40 markers with strong summary metrics (average Pearson r ~0.79, SSIM ~0.95, low MSE) and claim it beats prior GAN-style translation methods. That’s the technical “bridge” that lets them argue they’re not just correlating morphology with outcome—they’re recovering spatial immune/stromal structure that plausibly drives ICI response.

The clinical “hook”: ICI response vs PD-L1/TMB

In a cohort of 148 advanced NSCLC patients treated with PD-1/PD-L1 ICIs, MICA achieved AUC 0.82 for objective response, outperforming PD-L1 (0.66) and TMB (0.59). They also report that MICA beats either H&E-only or virtual-CODEX-only models (AUC 0.72 and 0.75, respectively).

That comparison (vs PD-L1/TMB) is exactly the kind of claim that sounds like hype—yet it’s also the kind of head-to-head result that, if replicated prospectively, becomes clinically actionable.

Biological interpretability (a real differentiator)

They don’t stop at “attention heatmaps.” They define spatial co-localization signatures and show that low-risk tiles are enriched for T-cell phenotypes (e.g., granzyme B+/CD8+, PD-1+/CD8+, TCF-1+/CD4+) while high-risk tiles are enriched for immunosuppressive/fibrotic myeloid/stromal niches (CD66b+/MMP9+, CD163+/MMP9+, FAP+/collagen IV+).

That’s qualitatively different from most end-to-end H&E predictors: it’s a mechanistic narrative you could build assays around (real CODEX/IHC panels; targeted spatial assays), not just “the model saw something.”

What Owkin RlapsRisk is, in one sentence

RlapsRisk BC (Garberis et al., Nat Comms 2025; and the Gaury medRxiv 2025 validation) is a direct prognostic model trained on breast cancer WSIs to predict survival endpoints (metastasis-free / distant recurrence), and then optionally combined with clinical variables to improve risk stratification—especially in intermediate-risk groups.

Strength: classic clinical validation posture

Owkin emphasizes the orthodox structure of clinical validity: independent external validation, prespecified fitting before validation, and measurable incremental value over clinical models.

  • In the Nature Communications paper, the combined model (“RR Combined”) achieved Harrell’s C-index 0.81 vs 0.76 for the clinical score alone in external validation, with greater gains in intermediate clinical risk.

  • The medRxiv international validation reports external cohort performance (e.g., pooled c-index 0.78 with strong HR separation between high vs low risk).

What it is not trying to do

RlapsRisk is not presenting a “virtual biology reconstruction” layer like HEX. It’s much closer to: H&E → risk score, with clinical integration, and validation across sites/scanners. That is arguably more straightforward to operationalize as a lab-developed algorithmic test because it doesn’t require a second derived modality (virtual spatial proteomics) to justify itself.

Head-to-head comparison: what’s the real delta?

1) Endpoint and near-term “actionability”

  • Owkin (RlapsRisk): prognosis in early breast cancer—actionability hinges on whether it changes adjuvant therapy decisions versus existing molecular assays/clinical tools. It positions itself as adding incremental stratification beyond clinical variables (and as a possible complement/competitor to molecular scores).

  • Li (HEX/MICA): the bold claim is predicting ICI response (and outperforming PD-L1/TMB in their cohort). If that holds up, actionability is immediate: treat vs don’t treat / choose regimen / enroll in trial.

Bottom line: Li is aiming at a decision point with very high clinical and commercial value (ICI selection), but that also raises the bar for proof.

2) “Is it biology or just correlations?”

  • Owkin: credibility comes from scale, external validation, and survival endpoint supervision. Mechanistic interpretability is less central.

  • Li: credibility comes from an additional step—reconstructing spatial protein patterns from H&E with strong quantitative performance, then showing that the risk/response signal maps onto plausible immune-stromal niches.

My read: Li’s approach is more convincing on “not just a black box,” because it creates a testable biology story. But it also introduces more moving parts (HEX quality, generalizability of CODEX ground truth, domain shift).

3) Generalizability claims

Li explicitly frames broad evaluation: five NSCLC cohorts (2,150 pts), 12 other cancer types (5,019 pts), plus the 148-patient ICI cohort.
Owkin’s validation posture is also multi-cohort and international; the medRxiv paper describes large development + external validation sets (thousands for dev; ~900 external).

Difference: Owkin’s generalization is primarily about risk prediction portability across scanners/sites. Li’s is about portability of a learned cross-modality mapping (H&E → proteins) plus downstream prediction. That’s a tougher problem—but if solved, it’s more “platform-like.”

If you were thinking about “clinical test” reality

Owkin-style test (RlapsRisk)

Simpler product concept: upload H&E WSI → produce a validated risk score ± integrated clinical score. You need: WSI QC, model lock, calibration, cutoffs, and clear clinical-use labeling (e.g., ER+/HER2–, endocrine-only subgroup, etc.). Their papers already speak the language of C-index, HRs, external validation.

Li-style test (HEX/MICA)

More ambitious product concept: WSI → response probability + a biologically legible report (virtual spatial proteomics patterns consistent with inflamed vs excluded vs myeloid/fibrotic TME). They even show specific spatial dual-marker patterns linked to survival/biology.

But: for clinical adoption, the key question is whether “virtual CODEX” is treated as (a) an internal latent representation (fine), or (b) a claim that you are effectively measuring protein expression (which invites a higher evidentiary bar and likely demands orthogonal confirmation in a clinical workflow).

My “actionable vs hype” verdict

  • Owkin RlapsRisk: more immediately actionable as a deployable digital pathology prognostic test, because it is architecturally simpler and the validation narrative is conventional. The risk is mostly clinical utility (does it change management versus Oncotype/EPclin + clin-path?).

  • Li HEX/MICA: higher upside, higher proof burden. The immunotherapy result versus PD-L1/TMB is the kind of thing that could be transformative—but until you see prospective validation and careful handling of pre-analytic/site effects, it remains “promising, not practice-changing.” The biological interpretability is a genuine step forward, not just garnish.

If you want, I can turn this into (1) a one-page “clinical product spec” comparison (intended use, population, comparator, validation package) or (2) a “what would MolDx ask?” evidence checklist, mapped to each approach.


###

ANDANI NATURE MACH INTEL 2025


When placed alongside the Stanford HEX work from Li and the Owkin RlapsRisk studies, the ANDANI paper (HistoPlexer) occupies a distinct and foundational position in the emerging field of “H&E as molecular data.” All three efforts start from the same premise—that routine histopathology contains far more biological information than pathologists can visually extract—but they diverge in what they treat as the primary objective.

Owkin’s RlapsRisk models are fundamentally endpoint-driven. They are trained directly to predict distant recurrence or survival from whole-slide H&E images, sometimes combined with basic clinical variables. Their contribution lies in demonstrating that morphology alone can support robust, externally validated prognostic stratification at scale. Biology is implicit: the model learns patterns associated with outcome, but it does not attempt to reconstruct molecular states or spatial cell phenotypes.

Li’s HEX/MICA framework takes a step deeper. It inserts a biological intermediate layer by predicting virtual spatial proteomics maps from H&E and then using these inferred protein patterns, together with morphology, to predict prognosis and immunotherapy response in lung cancer. Here, virtual molecular information becomes part of the predictive engine, and clinical utility—especially treatment response prediction—is central.

ANDANI’s HistoPlexer, in contrast, is not primarily a prognostic or treatment-response model at all. Its core aim is to show that multiplex spatial protein imaging can be computationally generated from H&E with high fidelity, preserving not only pixel-level appearance but also biologically meaningful spatial co-localization patterns among tumor and immune markers. The emphasis is on image realism, cross-marker consistency, and maintenance of tumor–microenvironment architecture, validated against real multiplex imaging and expert review. Clinical prediction appears as a downstream use case, but not the central claim.

What is most important in ANDANI, therefore, is not a particular survival AUC or hazard ratio. It is the demonstration that H&E can function as a surrogate molecular imaging modality, capable of producing slide-wide virtual multiplex protein maps that retain the spatial structure of immune infiltration and tumor organization. This establishes a platform capability: turning existing histology archives into scalable spatial biology resources. In the conceptual hierarchy of the field, Owkin shows that H&E can predict outcomes, Li shows that virtual molecular layers can improve those predictions, and ANDANI provides the deeper foundation—that faithful, multiplex molecular reconstruction from morphology is scientifically plausible at all.