Li et al. take lung cancer slides witih H&E WSI. They generate a metalayer of data which they refer to as proteomic layers. However, this is not the clinical output. They combine and weight the "metalayers" into new algorithms that are highly predictive of important factors like drug response or relapse risk. My question is tis. In principal, couldn't this be done by powerful ML-AI without the metalayer? All the info must be present in the "bottom layer" (slide) to lead to the "top layer' (e.g. relapse risk).
###
Yes — in principle, a sufficiently powerful ML/AI system could learn to predict outcomes like relapse risk or drug response directly from the raw H&E whole-slide image without constructing an explicit intermediate “proteomic” metalayer. From an information-theoretic standpoint, the metalayer does not add new information: it is a deterministic transformation of the slide, and any signal present in the metalayer must already be encoded in the morphology of the tissue itself. If an outcome can be predicted from the metalayer, then it must also be predictable, at least in theory, from the original image.
However, the decision to introduce metalayers is a pragmatic architectural choice rather than a conceptual necessity. Mapping directly from pixels to complex clinical outcomes is extremely data-hungry and unstable in real biomedical settings, where sample sizes are limited and outcomes are noisy. By forcing the model to first infer biologically meaningful intermediate representations—patterns meant to approximate underlying proteomic or pathway activity—the learning problem is constrained and made more sample-efficient. The metalayer acts as an inductive bias, narrowing the hypothesis space and helping the model focus on signals that are more likely to be biologically relevant rather than cohort-specific or artifactual correlations.
Metalayers also help disentangle weak and interacting signals that contribute to outcomes such as relapse or treatment response, including tumor–stroma relationships, immune contexture, and spatial organization of cell states. An end-to-end model can, in principle, discover these abstractions on its own, but in practice it often latches onto brittle shortcuts that fail to generalize across institutions, scanners, or patient populations. By structuring the model around intermediate representations aligned with known biology, Li et al. improve robustness and transferability, even if those intermediates are never themselves the clinical output.
In short, the metalayer is best understood not as an informational requirement but as a coordinate system chosen to make learning feasible, stable, and biologically grounded with finite data. The final predictions could theoretically be learned directly from the slide, but the intermediate layers help ensure that what the model learns is meaningful, generalizable, and usable in a clinical and regulatory context.
###
Yes — in principle, you’re exactly right.
If the only goal were to predict a top-level clinical outcome (relapse risk, drug response), a sufficiently powerful end-to-end ML model could learn that mapping directly from the raw H&E whole-slide image without ever constructing an explicit “proteomic metalayer.”
But the fact that Li does construct metalayers is not accidental. It’s a design choice, not a necessity. Here’s how to think about it.
###
Below are the bullet point notes that Chat first gave me. Below that, Chat compares LI and ANDANI.
1. Information-theoretic view: the metalayer is not adding new information
From a strict information perspective:
The H&E slide is the bottom layer
The “proteomic” metalayers are deterministic functions of the slide
The final clinical prediction is another function on top
So yes:
If outcome Y is predictable from metalayer M, and M is derivable from slide S, then Y is predictable from S.
Nothing magical is being injected. The metalayer is a reparameterization, not a new data source.
This is the same logic that says:
You don’t need edge detectors in CNNs — the network can learn them
You don’t need phonemes — speech models can go waveform → text
2. So why bother with metalayers at all?
Because learning directly from pixels to outcomes is possible but often suboptimal in real biomedical settings.
Think of the metalayer as an inductive bias.
A. Sample efficiency
End-to-end learning from raw WSI to survival or drug response is brutally data-hungry.
Metalayers:
Compress the slide into biologically meaningful intermediate variables
Reduce the hypothesis space the model has to search
Let you train with hundreds or low thousands of patients instead of tens of thousands
This matters a lot in oncology, where outcomes are sparse and noisy.
B. Signal disentanglement
Clinical outcomes are affected by many weak, entangled signals:
tumor–stroma interactions
immune infiltration patterns
spatial colocalization effects
cellular state mixtures
A metalayer forces the model to:
Separate morphology → inferred molecular programs
Then reason about how those programs interact
Without this, an end-to-end model may:
Learn brittle shortcuts
Overfit cohort-specific artifacts
Lock onto confounders that correlate with outcome but aren’t causal
C. Biological alignment and transferability
This is the quiet but important part.
Metalayers:
Are closer to known biology
Can be reused across tasks (drug response, relapse, subtype)
Can generalize better across institutions and scanners
A raw-pixel → outcome model may work beautifully on Dataset A
…and then collapse on Dataset B.
The metalayer acts like a biological lingua franca.
3. Interpretability is not the main reason — but it helps
People often say metalayers are about interpretability. That’s partly true, but it’s not the deepest reason.
The deeper reason is:
They constrain the model to learn the right kind of abstraction.
Interpretability is a side benefit:
You can ask which inferred pathways mattered
You can compare them to real proteomics
You can sanity-check failure modes
For regulators, payors, and clinicians, this matters — even if, mathematically, it’s unnecessary.
4. A useful analogy
Think of it like this:
End-to-end model:
“Given a novel, predict the reader’s emotional response directly from the ink patterns on the page.”Metalayer approach:
Ink → letters → words → syntax → themes → emotional response
Yes, the first is theoretically possible.
The second is how you make it work reliably with finite data and humans in the loop.
5. Bottom line
You’re correct in principle:
All the information needed to predict relapse or drug response must already be present in the H&E slide.
But Li et al.’s metalayers are:
Not redundant
Not cosmetic
Not just for explanation
They are a pragmatic architectural scaffold that:
Improves learning efficiency
Stabilizes generalization
Anchors the model in biology rather than texture statistics
If you wanted to be very sharp about it, you could say:
The metalayer is not about adding information — it’s about choosing the right coordinate system in which learning becomes tractable.
Which, frankly, is a very pathologist-friendly idea.
###
Both Andani et al. and Li et al. start from the same premise—H&E whole-slide images contain latent molecular information—but they differ in how explicitly they structure that information and what role intermediate representations play.
Li et al. use a deliberate, explicit meta-layer strategy. They first infer multiple spatially resolved “proteomic” or pathway-like layers from H&E (their so-called metalayers), and only then combine and weight those layers to predict clinically relevant endpoints such as relapse risk or drug response. The metalayers are not the clinical output; they are intermediate, biologically framed representations that constrain learning, improve sample efficiency, and provide a scaffold for downstream prediction. Conceptually, Li imposes a two-stage hierarchy: morphology → inferred molecular programs → clinical outcome.
Andani et al., by contrast, use a latent but implicit middle layer. Their model learns internal feature embeddings that correlate with protein expression or molecular states, but these representations are not formalized as named, interpretable proteomic layers that are recombined downstream. The network is trained more end-to-end, with protein or molecular prediction emerging from learned feature space rather than from an explicitly constructed meta-layer that is later reused for separate clinical tasks.
In short, both approaches rely on intermediate representations, but only Li et al. elevate those intermediates into an explicit, modular meta-layer that becomes a reusable object for clinical modeling. Andani’s “middle layer” exists architecturally inside the network, while Li’s metalayer exists conceptually and analytically as a distinct step between image and outcome.