Summary for Medical Readers: LLMs and Sepsis
Pak et al. (JAMA Network Open, Oct 2025)
In a cohort of 104,000 adult patients admitted with suspected infection across five Massachusetts hospitals, Pak et al. tested whether a large language model (LLaMA 3-8B) could extract presenting signs and symptoms of sepsis from admission notes.
The model labeled 99% of patient histories with a balanced accuracy of ~85%, closely matching physician review and outperforming ICD-10 coding. From >400 distinct signs and symptoms, the model generated seven syndromic clusters (skin/soft tissue, cardiopulmonary, gastrointestinal, urinary tract, dizziness, back pain, constitutional). These clusters correlated with infection sources, multidrug-resistant pathogens, and mortality risk—for example, skin symptoms predicted MRSA (AOR 1.73), while cardiopulmonary symptoms predicted death (AOR 1.30).
The study demonstrates that public, untuned LLMs can transform unstructured clinical notes into analyzable data at population scale, potentially enabling new phenotyping and antibiotic-stewardship research.
Baghdadi & Vazquez-Guillamet Commentary (JAMA Network Open, Oct 2025)
The editorial highlights the promise and limits of this approach. They note that current LLM-based extraction is research-ready but not yet clinical-grade, as antibiotic selection requires contextual data (comorbidities, exposures, prior treatment) that LLMs did not assess. Still, automating symptom abstraction could vastly expand pragmatic sepsis trials that now depend on manual chart review. The authors caution that overreliance on AI might “flatten” the patient narrative—reducing nuance to a checklist of features—and that LLMs still struggle with diagnostic uncertainty, performing well in symptom recognition but not in complex decision-making.
Conclusion
Together, the two papers mark a pivotal proof-of-concept: off-the-shelf LLMs can accurately read clinical text and classify sepsis presentations at scale, opening a path toward automated cohort assembly, syndromic subtyping, and refined empiric antibiotic models. For clinicians and researchers, this signals a shift from structured-data analytics to hybrid AI systems that mine unstructured narratives, while underscoring the need to preserve clinical nuance and validate across diverse populations