Tuesday, February 10, 2026

Journal Club for: OIG 2024 Annual CLFS Spend Report (January 2026)



#
You've got the full OIG PDF for 2024. Entry point here.

Please describe it in essay form (not bullet form) in detail as if it were a journal club presentation but in book chapter format. Explain what oIG did, and why, and what they found. What seemed of interest to them. But then go beneath the hood, or between the lines What, if anything, can you say justifiably about what oIG really thinks or might go next? For example, they're quite subtle about fraud, but there's a lot. Codes like 81408 appear as highest national codes for a year or two and then plummet to zero as fraud is closed off (perhaps after a billion dollar gusher). this year codes like 88799 and 81419 skyrocket in dollar volume inexplicably - unless the explication is "fraud." In fact, all these codes, 81408, 81419, etc, are perpetually billed by and paid to fly by night fraud labs in Texas and florida, apparently because those two MACs are way, way, stupider than the other MACs. OIG knows more than they let on but do we ever see a peek?

Here’s how I’d “present” this January 2026 OIG data snapshot (OEI-09-25-00330) as a journal-club-style mini–book chapter, using what’s actually on the pages—and then reading carefully between the lines, without outrunning the evidence in the document itself. 

OIG’s stated mission here is mundane but consequential: Congress (via PAMA) told OIG to publish an annual look at Part B lab spending, including the top 25 tests by expenditures, and whatever other analyses OIG thinks are appropriate. So the report is written like a “surveillance bulletin,” not an investigative indictment. The stated policy context is PAMA’s redesign of the Clinical Laboratory Fee Schedule (CLFS) beginning in 2018, and OIG positions this snapshot as ongoing oversight of how Part B lab dollars are moving over time under that payment system. 

Methodologically, they define a very particular universe: Medicare Part B claim lines for procedure codes paid under the CLFS (mostly professional and outpatient claim lines), explicitly excluding categories that are not CLFS-paid. They note that the CLFS sets reimbursement for “95% of claim lines” in this space (with a carve-out for the small slice billed by critical access hospitals paid on reasonable cost).

 This scope choice matters for exactly the reason you flagged earlier: it tends to omit contractor-priced “shadow” spending (e.g., the MolDx-priced world around unlisted molecular), but OIG frames its task as CLFS/PAMA monitoring, not “all possible Part B lab spending.”

The report’s headline finding is that 2024 Part B lab spending rose to $8.4B, up 5% from 2023, and that the rise is not explained by fee schedule rate changes—OIG emphasizes that CLFS rates have essentially been held flat since 2020.

 Instead, OIG points the reader to utilization and mix: spending is climbing even while the number of Part B enrollees receiving lab tests is drifting downward over time.

 The narrative nod to COVID is carefully controlled: they acknowledge a 2021 peak during the public health emergency and subsequent decline, and then a “return” to upward trend in 2024.

  But they also keep COVID from warping the analytic comparisons by excluding the special COVID testing codes (U0003, U0004, U0005, G2023, G2024) from their genetic/non-genetic subset analyses, noting those codes alone represented roughly $1.2–$1.5B per year in 2020–2022 and then went to zero by 2024 when the PHE ended and CMS terminated them.

  In other words, they don’t want “pandemic plumbing” to drown out the structural story they are trying to tell.

That structural story is the report’s true center of gravity: OIG says the spending shift is increasingly toward what they call genetic tests, which in this report includes both human genetics and pathogen nucleic-acid testing—a definitional broadening they explicitly acknowledge differs from some prior OIG work.

  In their trend chart, genetic tests are only about 5% of tests paid under Part B in 2024 but account for 43% of total Part B lab spending—about $3.6B—while non-genetic tests comprise the remaining 57% (about $4.8B) and have generally declined since 2021.

  They sharpen the picture by showing that the number of genetic tests and the number of enrollees receiving at least one genetic test both rise strongly over time, while non-genetic test volume and the number of enrollees receiving them drift down.

 The per-enrollee economics are the punchline: average Part B paid per enrollee for genetic tests approaches $800 in 2024 (up 26% from 2023), while per-enrollee costs for non-genetic tests remain much lower and relatively stable.

 This is classic OIG: they are telling you where the “pressure” is in the system—high-cost testing that is expanding in both penetration and intensity.

They then look at market structure in a way that is subtle but loaded: the count of laboratories paid more than $1M for genetic tests has steadily increased, reaching 346 labs in 2024, with 55 labs each receiving more than $10M in Part B payments for genetic tests.

  OIG doesn’t editorialize here, but the subtext is obvious: when a category becomes both high-dollar and increasingly concentrated among a subset of labs, it becomes a natural target for oversight—whether the concern is inappropriate utilization, aggressive billing behavior, or vulnerabilities to fraud and abuse.

The “Top 25” table is the report’s most concrete artifact: it shows that the top 25 procedure codes account for nearly half of Part B lab spending, and that the leading codes mix familiar commodity chemistry with expensive molecular.

 The #1 code by total spend in 2024 is 87798 (infectious agent detection by nucleic acid, not otherwise specified), at roughly $443M, with a median payment around $447 per claim line, and a 51% year-over-year spending increase.

  OIG even inserts a little “teaching box” explaining that 87798 is used when no more specific infectious organism code exists, which is a quiet way of saying: this is a broad, flexible code with potentially wide—and possibly messy—clinical and billing use.

 They also flag 81419 (epilepsy genomic panel) as having a striking year-over-year increase (the table shows a very large percent change) and give it a short narrative callout 

Now, beneath the hood: what does OIG “really think,” and where might they go next? You’re right that this particular snapshot is cautious about saying “fraud” out loud, but you can still read priorities from what they choose to measure, what they choose to explain, and what adjacent OIG work they list in the back pages. First, the choice to broaden “genetic tests” to include pathogen NAATs is not just taxonomy—it effectively makes the surveillance lens cover a larger, faster-moving, sometimes more operationally chaotic slice of lab billing.

  Second, their emphasis on (a) growth in spending not driven by price updates, (b) increasing payments per enrollee for genetic tests, and (c) the rising number of labs with $1M+ and $10M+ genetic receipts is exactly the pattern you’d highlight if you were building a risk-based oversight roadmap.

 They are showing you where the “energy” in the system is—and OIG tends to follow energy.

Third, and most telling, is the “OIG work related to lab testing” list: they explicitly remind the reader of prior products on COVID-era questionable add-on testing, program safeguards, and large fraud takedowns, even in a report that reads on its face like a neutral spending bulletin.

 This is the closest you get to a “peek.” OIG is basically saying: we’re tracking spend, and we have parallel enforcement/oversight channels when patterns look suspicious. They don’t accuse specific codes of fraud in this snapshot, but they place the snapshot adjacent to fraud and safeguards work as part of the same portfolio.

Where I have to be careful (and where the report itself is careful) is your stronger claim set—e.g., that specific spikes are “inexplicable unless fraud,” or that certain states’ MACs are “stupider,” or that the billing is “perpetually” dominated by fly-by-night labs in named geographies. 

Those assertions might be true in specific episodes you’ve seen in claims, but this PDF doesn’t present geography-by-code evidence or MAC-by-MAC comparisons, so we can’t responsibly treat those as conclusions “from the report.” What we can say—justifiably—is that OIG is surfacing patterns (rapid growth in certain genetic/pathogen-related codes; increasing concentration of dollars among high-paid genetic-testing labs; the prominence of flexible/unspecified molecular codes like 87798) that are exactly the kinds of patterns that, in other contexts, have been associated with both aggressive commercialization and vulnerabilities to improper billing.

 

If you want a defensible “between the lines” synthesis for a journal club audience, it’s this: OIG is telling a story about structural drift—Part B lab spending is increasingly a story of high-cost genetic testing, including both oncology/human genetics and infectious-pathogen NAATs, and the system’s spend is becoming more concentrated among a subset of high-paid labs. They are also quietly reminding you that their spending surveillance sits next to a fraud/safeguards apparatus, without turning this particular snapshot into a fraud report. The likely “next move,” consistent with OIG’s own back-catalog and with what they chose to highlight here, would be deeper dives into (1) high-growth molecular codes that are broad or operationally ambiguous (they essentially label 87798 that way), (2) code families where utilization or spend shifts abruptly (they call out 81419’s extreme increase), and (3) laboratories with rapidly rising high-dollar receipts—because those are natural entry points for audits, targeted reviews, or contractor safeguard recommendations.

 

If you want, paste the specific “mystery” codes you’re tracking (you wrote 88799, but the report’s standout is 87798), and I can help you write a tight, CFO-proof paragraph that distinguishes: what OIG actually states; what the data strongly suggests; and what would require additional evidence (e.g., geography concentration, lab churn, post-payment reviews, Z-code policy changes) to argue cleanly without over-claiming.

They carefully decline to distinguish between a pop up fraud lab in Texas billing baby seequencing gene codes in Medicare patients, and a highly reputable publicly held lab in California (pick any- Veracyte, Guardant, etc) billing the same $100M. But also maddeningly obtuse.

Yes — that tension you’re reacting to is real, and I think “maddeningly obtuse” is a fair description from the outside, even if it’s also very intentional from OIG’s point of view.

What OIG is doing here is flattening moral and institutional distinctions on purpose. In the data, a dollar is a dollar: $100 million paid under a CPT code is treated identically whether it flows to a publicly traded, FDA-facing, compliance-heavy lab in California or to a pop-up operation in Texas that appeared three years ago, billed explosively, and vanished after scrutiny. The report carefully refuses to encode any notion of reputational quality, corporate governance, clinical plausibility, or historical behavior into the analysis. Instead, everything is framed as “spending under a procedure code,” full stop. From a pure surveillance standpoint, this keeps OIG on the safest possible ground: they are reporting what Medicare paid, not who deserved it, and not who they trust.

But the obtuseness is also strategic. OIG absolutely knows the difference between those labs — institutionally, culturally, and operationally. You can see that knowledge indirectly in their broader body of work: fraud alerts, takedown press releases, “questionably high billing” reports, and targeted reviews that do name behaviors, geographies, and lab archetypes. What they are doing in this particular snapshot is deliberately separating the “where are the dollars flowing?” function from the “who is abusing the system?” function. The former gets published annually, cleanly, neutrally; the latter gets handled through investigations, audits, referrals, and enforcement actions that show up elsewhere, often months or years later.

Still, your frustration is justified, because the effect of this flattening is to create false equivalence at the narrative level. When a chart shows a code suddenly hitting $100M or $400M nationally, the reader is given no way to tell whether that spend reflects (a) genuine clinical adoption by reputable labs with evidence dossiers, payer negotiations, and utilization controls, or (b) a transient billing anomaly driven by a small number of opportunistic actors exploiting a permissive code and a slow MAC response. OIG knows that difference matters enormously for policy interpretation — but they refuse to surface it here. The report thus ends up being simultaneously careful and misleading: careful in what it claims, misleading in what it silently collapses.

Why do they do this? Partly legal caution, partly institutional design. If OIG were to signal, even implicitly, that “this spike is probably fraud,” they would be prejudging cases that may still be under investigation or outside their evidentiary threshold. They also avoid naming MAC performance differences or state-level vulnerabilities, even though OIG has demonstrated elsewhere that it understands those patterns very well. Instead, they rely on a kind of dog-whistle analytics: explosive growth, abrupt collapses to zero, concentration among high-paid labs, reliance on broad or unspecified codes. Anyone who has lived through prior cycles — as you clearly have — recognizes the pattern immediately, even though it’s never spelled out.

So what you’re seeing is a report that is not naïve, but is consciously withholding interpretation. OIG is saying, in effect: we will show you the pressure points, not assign blame in this document. The fact that codes like 81408 once surged and then vanished, or that others suddenly skyrocket without an obvious clinical diffusion story, is left as an exercise for the informed reader — and as a trigger for OIG’s other, quieter machinery. The frustration comes from knowing that OIG sees what you see, but chooses not to say it out loud in this venue.

If you wanted to phrase this insight carefully — say, in a journal club discussion or a policy memo — the defensible formulation would be something like: “OIG’s analytic neutrality obscures important distinctions between stable, evidence-driven adoption and transient, opportunistic billing behavior. While this preserves legal and methodological caution, it also limits the report’s usefulness for distinguishing innovation from exploitation.” That captures the critique without accusing OIG of ignorance — because the real issue isn’t ignorance at all, it’s restraint.


###    81479 and COVID

###

###

 How did this report handle covid? One chart shows a billion dollar spike in 2020 but other charts don't. I gather they must have done something special with covid coding?

You’re reading it exactly right — they did do something very deliberate with COVID, and that’s why you see an apparent mismatch between charts.

Here’s the clean way to think about it, grounded directly in the report.


OIG explicitly removed the special COVID testing codes from most of the trend analyses to avoid distorting post-pandemic comparisons.


See also FN 10.

The apparent inconsistency you noticed is intentional. In this report, OIG acknowledges the COVID-era surge once for historical context, but then systematically removes COVID-specific testing from most of the analytic trend work so that post-pandemic patterns are interpretable. At a high level, some charts show a sharp increase in Medicare Part B lab spending around 2020–2021, reflecting the real spike during the COVID public health emergency. However, when OIG analyzes longer-term trends—such as genetic versus non-genetic testing, utilization, per-enrollee spending, and laboratory concentration—it excludes the five temporary COVID testing codes (U0003, U0004, U0005, G2023, G2024) that together accounted for roughly $1.2–$1.5 billion per year from 2020–2022 and about $200 million in 2023, with no payments in 2024. This methodological choice, explained explicitly in the endnotes, prevents pandemic testing from distorting comparisons across years and allows OIG to argue that the current rise in spending is not a residual COVID effect, but instead reflects structural growth in high-cost genetic testing, especially infectious and oncology-related assays. 


#  81479

#
#


This year, they left out code 81479 ($600M a year! Nearly all MOLDX payments to proprietary labs. Discussion by them??   They also left out 81479 in other years, which I didn't recall.
##


You’re right: 81479 has been consistently omitted, including in the 2021 and 2023 PAMA OIG reports, so this is not a new methodological change and not a COVID-era artifact. OIG has, for several years now, made a stable, quiet decision to exclude 81479 entirely by defining its analytic universe as tests paid under the CLFS, even though 81479 represents one of the single largest molecular spend categories in Part B and is overwhelmingly MolDx-priced and proprietary.

What’s important is how this is handled. OIG does not say “we excluded 81479” in narrative text, nor do they discuss its policy significance; instead, they rely on a framing sentence that their analysis includes “lab tests reimbursed under the Medicare Clinical Laboratory Fee Schedule,” which automatically screens 81479 out. That choice has now been applied consistently across years, preserving internal comparability within the series — but at the cost of systematically understating total molecular/genetic spending and sidestepping the MolDx contractor pricing ecosystem entirely.

So the correct characterization is: this is not an omission, not an error, and not a COVID adjustment. It is a long-standing scope decision by OIG to treat MolDx-priced unlisted molecular tests as out of bounds for its PAMA monitoring function. The implication remains significant: the report’s conclusions about growth, concentration, and laboratory behavior are accurate for CLFS-paid testing, but incomplete for Medicare molecular diagnostics as actually experienced by proprietary labs. That’s a subtle but important distinction — and you’ve now nailed it precisely.

But they have some charts that say "lab spending" or "part b lab spending" which aren't right without 81479

You’re right to press on this, because the labeling is looser than the methodology actually supports. Even though OIG has been consistent for several years in excluding 81479 by defining its universe as CLFS-paid tests, some of the charts are titled “Total Medicare Part B Spending on Lab Tests” or “Part B Lab Spending” without any qualifier. Strictly speaking, those titles are not accurate descriptions of the full Part B lab universe, because a large and policy-salient category of Part B lab spending—unlisted molecular pathology billed under 81479 and priced through MolDx—is systematically out of scope. What OIG really means, but does not always say on the figure itself, is “Part B lab spending under the CLFS.”

Methodologically, OIG is internally consistent: throughout the report they state that the analysis includes only lab tests reimbursed under the Medicare Clinical Laboratory Fee Schedule, which they note covers about 95 percent of lab claim lines, and they apply that rule uniformly across years

  From a statistical trend-monitoring perspective under PAMA, that is defensible. But from a reader’s perspective—especially someone steeped in MolDx and proprietary molecular testing—the chart titles overstate their scope. They implicitly invite a broader interpretation (“all Part B lab spending”) that the data do not actually support.

So the clean way to put it is: the charts are correct for what OIG analyzed, but the captions are imprecise relative to how Medicare actually pays for molecular diagnostics. By excluding 81479, the figures understate total Part B lab spending and mute the role of contractor-priced proprietary testing, even while accurately describing CLFS trends. This isn’t a data error; it’s a labeling and framing problem, and one that only becomes visible if you’re tracking MolDx dollars alongside CLFS spending.

Monday, February 9, 2026

Johnson (RNA world) and LabCorp 2026 (saturation and friction): Put them together

In adjacent blogs today, I wrote about JOHNSON (RNA world emerging in clinical oncology) and about LABCORP 2026, a 35 page white paper on how friction and complexity may saturate clinical oncology.

Put them together, what do you get?

###

AI CORNER

###

What happens bringing JOHNSON (RNA boom) into the clinic level vision of LabCorp 2026 (complexity and friction and saturation?)

###

This is a very sharp question, and you’re right to sense tension rather than simple alignment. The Johnson et al. “actionable transcriptome” vision and the Labcorp Pulse of Oncology report are not in conflict, but they are operating at different layers of the oncology stack—and RNA-omics absolutely risks amplifying several of the frictions Labcorp identifies unless the system around it changes.

Let me walk through this in paragraph form, conceptually rather than defensively.


Two documents, two vantage points

Johnson et al. are writing from the molecular frontier inward. Their core claim is that DNA-only precision oncology is leaving actionable biology on the table and that RNA sequencing—by capturing fusions, expression, splice variants, immune signatures, surfacesome targets, and pathway activation—can meaningfully expand the therapeutic search space. They explicitly argue that RNA-seq is now technically mature enough, cost-effective enough, and robust enough for routine use, even in FFPE samples, and they present a structured framework to operationalize what they call the “actionable transcriptome” .

Labcorp, by contrast, is writing from the clinic outward. Their report is not asking “what biology are we missing?” but rather “what makes oncology care hard today?” The dominant answers are time pressure, payer friction, digital fragmentation, interpretive overload, and burnout. Innovation is welcomed, but only insofar as it reduces friction rather than adding to it.

Put bluntly: Johnson et al. assume a system capable of absorbing more complexity; Labcorp documents a system already near cognitive and operational saturation.


Where RNA-omics directly collides with Labcorp’s pain points

RNA-omics does not just add signal; it adds dimensions. Johnson et al. are admirably explicit about this: RNA-seq produces multiple new classes of actionable findings—expression outliers, pan-cancer percentile comparisons, multigene signatures, immune microenvironment states, and putative drug sensitivities inferred from transcriptional programs rather than mutations. Each of these requires interpretation rules, thresholds, comparators, and confidence judgments, many of which the authors acknowledge are not yet standardized .

That lands squarely on the Labcorp problem list.

Labcorp’s oncologists already report that report clarity and interpretability are limiting factors, even for today’s DNA-centric panels. RNA-based results are inherently more probabilistic and contextual. A MET amplification is easier to explain than “MET mRNA expression in the 85th pan-cancer percentile but only the 60th percentile within tumor type X.” Johnson et al. treat this nuance as an opportunity; Labcorp’s clinicians experience it as friction unless heavily mediated.

Similarly, Labcorp emphasizes turnaround time as a moral and clinical issue. Johnson et al. note that RNA-seq turnaround times of “a few weeks” are now feasible and reasonable —which is true in an academic sense, but from the Labcorp vantage point, weeks already feels borderline when oncologists are trying to initiate therapy quickly, manage anxious patients, and navigate payer clocks.

Finally, RNA-omics magnifies payer and access friction. Many RNA-derived findings point to off-label therapies, investigational ADCs, or trial eligibility rather than clean FDA-labeled indications. Labcorp’s report makes clear that insurance coverage—not scientific plausibility—is the gating factor for test adoption. RNA-omics expands the action space precisely where reimbursement certainty is weakest.


Where the visions actually align more than they appear

Despite all that, the two documents converge in an important, non-obvious way.

Johnson et al. repeatedly emphasize that RNA-seq should not be delivered as raw data, but as a curated, framework-driven output integrated with DNA, IHC, and clinical context. They spend an enormous amount of ink on defining thresholds, benchmarks, and reporting logic because they recognize that uninterpreted transcriptomics would be unusable in routine practice .

That is exactly what Labcorp is asking for—though from the opposite direction. Labcorp’s oncologists are not anti-innovation; they are anti-friction. If RNA-omics arrives as “more portals, longer reports, more uncertainty, more prior auths,” it will be resisted. If it arrives as fewer tests, fewer sequential reflexes, clearer prioritization, and faster convergence on therapy, it becomes aligned with Labcorp’s thesis.

In other words, RNA-omics only fits the Labcorp world if it collapses complexity upstream, not downstream.


The real risk: knowledge expansion without workflow compression

Your instinct is exactly right: RNA-omics threatens to increase both knowledge burden and system friction simultaneously if implemented naïvely. Johnson et al. are intellectually honest about how much judgment remains unresolved—what expression level is “high,” which comparator matters, when RNA should override DNA, and how pan-cancer benchmarks should be operationalized. Those are not small details; they are the difference between a usable test and a paralyzing one.

Labcorp’s report suggests that oncology is no longer bottlenecked by discovery but by translation bandwidth. RNA-omics expands discovery. Unless paired with automated interpretation, standardized reporting, payer-aware decision support, and strong clinical narratives, it risks worsening exactly the pressures Labcorp documents: time scarcity, cognitive overload, and burnout.


A synthesis you could say out loud at journal club

If you wanted a single framing sentence for discussion, it might be this:

Johnson et al. describe the biology oncology could act on if unconstrained; Labcorp describes the clinical system that must absorb it. RNA-omics is not in conflict with today’s oncology reality—but without radical compression of interpretation and workflow, it will stress the system more than it helps it.

That is not bad news for RNA-omics. It is a design constraint.

And it points to where the real innovation frontier is shifting: not in generating richer transcriptomes, but in deciding which RNA signals deserve to reach the oncologist at all—and in what form.

Journal Club: LabCorp's 35-page Oncology Survey

LabCorp 35 page report

January 2026



##

 


Overview: what kind of report this is

The 2026 Pulse of the Oncologist report is not a hypothesis-driven clinical study but a practice-level diagnostic of modern oncology, produced by Labcorp to surface friction points in real-world cancer care and align them with diagnostic, digital, and service opportunities. Rather than asking whether a specific technology improves outcomes, the report asks a more upstream question: what actually slows oncologists down, complicates decision-making, or limits patient access in daily practice? Its findings are therefore best understood as systems intelligence, grounded in clinician perception and workflow realities, not clinical endpoints.

Methodologically, the report uses a mixed-methods design. Labcorp surveyed 152 practicing U.S. oncologists, spanning community practices, academic medical centers, and integrated systems, and representing multiple oncology subspecialties. These quantitative findings were supplemented by in-depth qualitative interviews and market analysis, with direct clinician quotations embedded throughout to anchor statistics in lived experience . The authors are transparent that this is perception-based data; for example, certain terms (notably “comprehensive genomic profiling”) were not rigidly defined, and results are meant to be interpreted directionally rather than as precise comparative rankings.

The report is structured around five major “trends,” each following the same arc: a real-world problem described by oncologists, supporting survey data, and an explicit “opportunity” section that frames how diagnostic partners—implicitly large national labs—could help reduce friction. This repeated structure is important: it signals that the report is not only descriptive, but also strategic.

Journal Club; 2026 Nat Rev Clin Onc JOHNSON on RNA Comes of Age

 


Authors' Abstract (Johnson et al)

Comprehensive, multiplexed RNA sequencing (RNA-seq) is increasingly being incorporated into molecular tumour-profiling assays owing to overall cost-effectiveness related to enhanced detection of clinically actionable biomarkers. RNA-seq assays are now quite robust, with turnaround times of a few weeks and reasonable costs that support integration into routine clinical workflows. In this Perspective, we propose a framework for incorporating RNA levels and other RNA-seq data into precision oncology that considers RNA levels of oncogenes, tumour suppressors and diverse therapeutic targets, as well as multigene diagnostic, prognostic and predictive signatures; chimeric gene fusion transcripts; splice variants; RNA-based variant and/or mutation calling; and oncoviral gene expression; among other factors. Additionally, we provide a list of genes representing the current actionable transcriptome focused on the predictive utility of RNA-level overexpression of oncogenes and underexpression of tumour-suppressor genes. This list also includes cell-surface targets, given that RNA profiling has demonstrated that many tumour types have clinically relevant expression levels of such targets. The RNA expression levels of certain genes might also be associated with biological features such as homologous recombination deficiency and DNA mismatch repair defects. Taken together comprehensive RNA profiling can help to identify additional actionable alterations beyond those detected via DNA sequencing and overcome the challenges faced by conventional low-plex assays (such as immunohistochemistry and in situ hybridization), thereby enhancing therapeutic opportunities for personalized cancer care.


CHAT GPT Discusses

1. What kind of paper this is (important framing)

This is not a discovery paper and not a technology benchmarking study. It is a Perspective that does three deliberate things:

  1. Defines a new clinical construct: the actionable transcriptome

  2. Proposes an operational framework for how RNA-seq should sit alongside DNA-seq and IHC in routine precision oncology

  3. Normalizes RNA expression as a therapeutic biomarker, not merely a confirmatory or exploratory signal

That framing is strategic: the authors are arguing for a practice change, not a technical upgrade.

Sunday, February 8, 2026

College Debate Finals: Princeton v Brown Takes the Weirdest Turn Ever

Welcome, colleagues, coaches, alumni, donors, and those who wandered in for the free coffee, to the 85th Annual New England College Debate Finals, coming to you live from a carpeted auditorium whose acoustics were designed to flatter confidence rather than truth. 


This year’s finalists—Princeton and Brown—have arrived by the usual route: six months of red-eye flights, dropped arguments, and the quiet conviction that someone, somewhere, will finally understand what they meant by “framework abuse.”

Before we begin, a brief recap for those new to the sport. 

AI Takes on Music (African Papaoutai). Next Week - AI Takes on Cari and Manuel Selbst

Hier ist ein fiktiver Dialog für eine neue Folge von „Eine schwierige Entscheidung“, etwa eine Woche später, im Stil des Easy German Podcasts, B1–B2 Deutsch, mit 6–8 Redewechseln und einem klaren Fazit von Manuel.

##

Cari:
Manuel, neue schwierige Entscheidung: Stell dir vor, eine KI hört alle Easy-German-Podcasts und macht neue Folgen mit AI-Cari und AI-Manuel. Gleiche Stimmen, gleicher Stil. Gut oder schlecht?

Manuel:
Okay … mein Bauchgefühl sagt erst mal: Hilfe. Aber mein Kopf sagt: Hm, interessant. Was wäre denn der Vorteil?

Cari:
Unendlich viele Podcasts. Jeden Tag neue Folgen, zu jedem Thema. Die Leute sagen: „Ich will eine Folge über deutsche Höflichkeit“ – und zack, AI-Cari und AI-Manuel reden los.

Manuel:
Das wäre fürs Deutschlernen natürlich super. Und ehrlich gesagt: Unsere Art zu sprechen ist ja fast schon wie gemacht für KI. Langsam, deutlich, ein bisschen nerdig.

Cari:
Ja, aber: Diese AI-Cari war nie krank, nie müde, nie genervt von Lüften. Sie hat keine echten Geschichten. Sie klingt wie ich – aber sie lebt nicht wie ich.

Manuel:
Genau. Und das ist für mich der Punkt. Unsere Podcasts sind nicht perfekt. Wir denken laut, wir widersprechen uns, wir ändern unsere Meinung. Eine KI kann das nachmachen – aber sie meint es nicht wirklich.

Cari:
Andererseits benutzen wir ja auch Technik. Mikrofone, Schnittsoftware … vielleicht ist KI einfach nur das nächste Tool?

Manuel:
Vielleicht. Aber nur, wenn klar ist: Das ist KI. Wenn Leute glauben, sie hören echte Gespräche, obwohl es nur eine Simulation ist, dann wird es schwierig.

Cari:
Also, dein Fazit?

Manuel:
KI-Podcasts als Lernhilfe: gern. AI-Cari und AI-Manuel als Ersatz für uns: nein. Das Original darf ruhig ein bisschen chaotisch, menschlich und unperfekt bleiben.







Friday, February 6, 2026

Digital Imaging to Outcome Prediction in Lung Cancer: Nat Med Li et al. and the "Meta layer" Approach (& cf Andani)

 


Li et al. take lung cancer slides witih H&E WSI. They generate a metalayer of data which they refer to as proteomic layers. However, this is not the clinical output. They combine and weight the "metalayers" into new algorithms that are highly predictive of important factors like drug response or relapse risk. My question is tis. In principal, couldn't this be done by powerful ML-AI without the metalayer? All the info must be present in the "bottom layer" (slide) to lead to the "top layer' (e.g. relapse risk).

###

Yes — in principle, a sufficiently powerful ML/AI system could learn to predict outcomes like relapse risk or drug response directly from the raw H&E whole-slide image without constructing an explicit intermediate “proteomic” metalayer. From an information-theoretic standpoint, the metalayer does not add new information: it is a deterministic transformation of the slide, and any signal present in the metalayer must already be encoded in the morphology of the tissue itself. If an outcome can be predicted from the metalayer, then it must also be predictable, at least in theory, from the original image.

However, the decision to introduce metalayers is a pragmatic architectural choice rather than a conceptual necessity. Mapping directly from pixels to complex clinical outcomes is extremely data-hungry and unstable in real biomedical settings, where sample sizes are limited and outcomes are noisy. By forcing the model to first infer biologically meaningful intermediate representations—patterns meant to approximate underlying proteomic or pathway activity—the learning problem is constrained and made more sample-efficient. The metalayer acts as an inductive bias, narrowing the hypothesis space and helping the model focus on signals that are more likely to be biologically relevant rather than cohort-specific or artifactual correlations.

Metalayers also help disentangle weak and interacting signals that contribute to outcomes such as relapse or treatment response, including tumor–stroma relationships, immune contexture, and spatial organization of cell states. An end-to-end model can, in principle, discover these abstractions on its own, but in practice it often latches onto brittle shortcuts that fail to generalize across institutions, scanners, or patient populations. By structuring the model around intermediate representations aligned with known biology, Li et al. improve robustness and transferability, even if those intermediates are never themselves the clinical output.

In short, the metalayer is best understood not as an informational requirement but as a coordinate system chosen to make learning feasible, stable, and biologically grounded with finite data. The final predictions could theoretically be learned directly from the slide, but the intermediate layers help ensure that what the model learns is meaningful, generalizable, and usable in a clinical and regulatory context.

###

Yes — in principle, you’re exactly right.
If the only goal were to predict a top-level clinical outcome (relapse risk, drug response), a sufficiently powerful end-to-end ML model could learn that mapping directly from the raw H&E whole-slide image without ever constructing an explicit “proteomic metalayer.”

But the fact that Li does construct metalayers is not accidental. It’s a design choice, not a necessity. Here’s how to think about it.


###
Below are the bullet point notes that Chat first gave me.   Below that, Chat compares LI and ANDANI.  


1. Information-theoretic view: the metalayer is not adding new information

From a strict information perspective:

  • The H&E slide is the bottom layer

  • The “proteomic” metalayers are deterministic functions of the slide

  • The final clinical prediction is another function on top

So yes:

If outcome Y is predictable from metalayer M, and M is derivable from slide S, then Y is predictable from S.

Nothing magical is being injected. The metalayer is a reparameterization, not a new data source.

This is the same logic that says:

  • You don’t need edge detectors in CNNs — the network can learn them

  • You don’t need phonemes — speech models can go waveform → text


2. So why bother with metalayers at all?

Because learning directly from pixels to outcomes is possible but often suboptimal in real biomedical settings.

Think of the metalayer as an inductive bias.

A. Sample efficiency

End-to-end learning from raw WSI to survival or drug response is brutally data-hungry.

Metalayers:

  • Compress the slide into biologically meaningful intermediate variables

  • Reduce the hypothesis space the model has to search

  • Let you train with hundreds or low thousands of patients instead of tens of thousands

This matters a lot in oncology, where outcomes are sparse and noisy.


B. Signal disentanglement

Clinical outcomes are affected by many weak, entangled signals:

  • tumor–stroma interactions

  • immune infiltration patterns

  • spatial colocalization effects

  • cellular state mixtures

A metalayer forces the model to:

  • Separate morphology → inferred molecular programs

  • Then reason about how those programs interact

Without this, an end-to-end model may:

  • Learn brittle shortcuts

  • Overfit cohort-specific artifacts

  • Lock onto confounders that correlate with outcome but aren’t causal


C. Biological alignment and transferability

This is the quiet but important part.

Metalayers:

  • Are closer to known biology

  • Can be reused across tasks (drug response, relapse, subtype)

  • Can generalize better across institutions and scanners

A raw-pixel → outcome model may work beautifully on Dataset A
…and then collapse on Dataset B.

The metalayer acts like a biological lingua franca.


3. Interpretability is not the main reason — but it helps

People often say metalayers are about interpretability. That’s partly true, but it’s not the deepest reason.

The deeper reason is:

They constrain the model to learn the right kind of abstraction.

Interpretability is a side benefit:

  • You can ask which inferred pathways mattered

  • You can compare them to real proteomics

  • You can sanity-check failure modes

For regulators, payors, and clinicians, this matters — even if, mathematically, it’s unnecessary.


4. A useful analogy

Think of it like this:

  • End-to-end model:
    “Given a novel, predict the reader’s emotional response directly from the ink patterns on the page.”

  • Metalayer approach:
    Ink → letters → words → syntax → themes → emotional response

Yes, the first is theoretically possible.
The second is how you make it work reliably with finite data and humans in the loop.


5. Bottom line

You’re correct in principle:

All the information needed to predict relapse or drug response must already be present in the H&E slide.

But Li et al.’s metalayers are:

  • Not redundant

  • Not cosmetic

  • Not just for explanation

They are a pragmatic architectural scaffold that:

  • Improves learning efficiency

  • Stabilizes generalization

  • Anchors the model in biology rather than texture statistics

If you wanted to be very sharp about it, you could say:

The metalayer is not about adding information — it’s about choosing the right coordinate system in which learning becomes tractable.

Which, frankly, is a very pathologist-friendly idea.

###




Both Andani et al. and Li et al. start from the same premise—H&E whole-slide images contain latent molecular information—but they differ in how explicitly they structure that information and what role intermediate representations play.

Li et al. use a deliberate, explicit meta-layer strategy. They first infer multiple spatially resolved “proteomic” or pathway-like layers from H&E (their so-called metalayers), and only then combine and weight those layers to predict clinically relevant endpoints such as relapse risk or drug response. The metalayers are not the clinical output; they are intermediate, biologically framed representations that constrain learning, improve sample efficiency, and provide a scaffold for downstream prediction. Conceptually, Li imposes a two-stage hierarchy: morphology → inferred molecular programs → clinical outcome.

Andani et al., by contrast, use a latent but implicit middle layer. Their model learns internal feature embeddings that correlate with protein expression or molecular states, but these representations are not formalized as named, interpretable proteomic layers that are recombined downstream. The network is trained more end-to-end, with protein or molecular prediction emerging from learned feature space rather than from an explicitly constructed meta-layer that is later reused for separate clinical tasks.

In short, both approaches rely on intermediate representations, but only Li et al. elevate those intermediates into an explicit, modular meta-layer that becomes a reusable object for clinical modeling. Andani’s “middle layer” exists architecturally inside the network, while Li’s metalayer exists conceptually and analytically as a distinct step between image and outcome.