See my blog about THE THINKING GAME, and comparing it to ENRON: SMARTEST GUYS IN THE ROOM.
##
You might want to know more, than THINKING GAME tells you, about the ins-and-outs of AI proteomics. Here are some Q&A with Chat GPT
###
Please tell me two things. Please summarize what the Nobel prize was actually awarded for in the words of the Nobel prize award. And then tell me more about Baker’s work on producing proteins.
###
1. What the Nobel Prize was actually awarded for (in the words used by the Nobel Committee)
The 2024 Nobel Prize in Chemistry was divided between two major achievements in protein science:
-
One half of the prize was awarded to David Baker “for computational protein design.”
-
The other half was jointly awarded to Demis Hassabis and John M. Jumper “for protein structure prediction.”
In the Nobel Foundation’s own summary, the Committee described this work as revolutionizing our understanding of how amino acid sequences relate to three-dimensional protein structure and enabling scientists to design entirely new proteins — an enormous advance in molecular science with implications across chemistry, biology, and medicine.
So, in the official formulation:
Half the prize recognized the ability to predict how a given protein sequence will fold into a 3D structure (Hassabis & Jumper), and
Half recognized the ability to design new protein structures from scratch (Baker).
✅ 2. More about David Baker’s work on producing proteins
David Baker’s contribution centers on what’s known as computational protein design — the inverse of the prediction problem. Rather than asking “What shape does this sequence make?”, his work asks:
“What sequence will produce a desired structure or function?”
Here’s the scientific arc of his work:
🔹 Early breakthroughs
-
In 2003, Baker’s lab succeeded in designing a completely novel protein fold that didn’t exist in nature, using computational algorithms to specify the amino acid sequence that would produce that fold. This was a major landmark demonstrating that protein structures could be rationally designed rather than just evolved.
-
The first fully designed protein — often called Top7 — showed that computational methods could create real, stable proteins — not just variants of natural ones.
🔹 Tools and software
-
Baker’s group developed the Rosetta software suite, one of the most widely used frameworks for protein modeling and design. Rosetta and its derivatives power many efforts to both predict and design protein structures, including community science projects like Rosetta@home and even the puzzle game Foldit.
🔹 Modern achievements
-
Over the past decade Baker’s group has applied computational design to create proteins with new functions — not just new shapes — including:
-
Enzymes catalyzing reactions not seen in nature
-
Novel binding proteins and scaffolds
-
Vaccine components and therapeutic leads
-
Sensors and nanoscale materials
-
-
His lab has increasingly integrated AI and machine learning to make these design steps faster and more reliable, enabling creation of proteins that can be tested experimentally with a high success rate.
🔹 Why this is transformative
Baker’s work essentially established a platform for engineering biology from first principles — specifying what a protein should do and then computing a sequence that will perform it. Prior to this, most engineered proteins were tweaks of natural ones; now scientists can start from scratch.
This opens the door to:
-
New therapeutics (e.g., designed antibodies and vaccines)
-
Enzymes to address environmental and industrial challenges
-
Custom biosensors and self-assembling materials
-
Precision tools for basic biology and cell engineering
This body of work — showing that we can both predict and design protein structure with computational precision — is what earned Baker one half of the 2024 Nobel Prize in Chemistry.
###
Let’s go back to Alpha fold and the Nobel prize. What are the three biggest problems that were very hard to make any traction on, and how did they overcome them?
###
Certainly. I’ll keep the same intellectual structure, but recast it as a continuous essay, using bullets only where they genuinely clarify rather than fragment the argument.
AlphaFold earned the Nobel Prize not because it added incremental accuracy to an already-solved problem, but because it overcame three specific obstacles that had resisted meaningful progress for decades. These were not matters of compute or data alone; they were conceptual blockages that made protein folding stubbornly intractable despite fifty years of effort. AlphaFold’s success came from breaking all three at once.
The first was the problem of long-range interactions. Protein folding is governed less by local chemistry than by non-local constraints: amino acids that are far apart in sequence frequently end up adjacent in three-dimensional space. Classical approaches struggled here. Physics-based molecular dynamics methods were overwhelmed by the combinatorial explosion of conformations and became trapped in local energy minima. Fragment-assembly approaches, such as those used in Rosetta, could build plausible local structure but often failed to recover correct global topology. Early contact-prediction methods, while promising, produced sparse and noisy constraints that were insufficient to determine an overall fold. The fundamental difficulty was that folding depends on weak, distributed, cooperative interactions across the entire chain.
AlphaFold overcame this by reframing folding as a global constraint-satisfaction problem rather than a local energy-minimization exercise. Instead of assembling structure piece by piece, the model predicts probabilistic distance relationships between all residue pairs simultaneously and searches for a geometry that satisfies those constraints as a whole. This shift—from local refinement to global consistency—allowed the system to capture β-sheet pairing, long-range packing, and overall fold class with a reliability that had previously been unattainable.
The second major obstacle was the extraction of evolutionary signal from sequence data. For decades, biologists had known that co-evolving residues often indicate physical proximity in a folded protein. Multiple sequence alignments contain immense information, but turning that information into accurate structural constraints proved extremely difficult. Earlier statistical approaches relied on pairwise correlations or simplified graphical models, which were confounded by indirect effects and struggled with higher-order dependencies. Shallow alignments provided little signal, while deep alignments generated overwhelming noise. As a result, evolutionary information helped, but never enough.
AlphaFold’s breakthrough was to treat the entire multiple sequence alignment as a rich object to be learned from directly. Using attention-based neural networks, the model learns which sequences and positions are informative and how patterns of variation imply geometric constraint. Rather than explicitly coding rules for co-evolution, AlphaFold learns an implicit evolutionary grammar. This allows it to infer structural relationships even when close homologs are absent, something that earlier methods largely failed to do. Evolutionary information became not an auxiliary input, but a central driver of structure prediction.
The third and perhaps most subtle obstacle was the representation problem. Proteins inhabit a space that is continuous, three-dimensional, and invariant under rotation and translation. Earlier computational approaches relied on awkward representations—torsion angles, fragments, voxel grids, or distance matrices introduced only late in the pipeline. These representations were either too rigid to capture reality or too flexible to enforce physical plausibility. Without a representation that was simultaneously learnable, expressive, and physically meaningful, even the best algorithms were fundamentally constrained.
AlphaFold resolved this by introducing representations that embed geometric invariance directly into the learning process. The model does not merely output a structure at the end; it reasons in three dimensions throughout its internal computation. Coordinates are updated iteratively in a way that respects physical symmetry, allowing the network to integrate sequence, evolutionary data, and geometry in a single, coherent framework. This eliminated the need for explicit force fields while still producing structures that obey chemical and geometric constraints.
What made AlphaFold transformative was not any one of these advances in isolation, but their convergence. Long-range dependencies, evolutionary signal, and geometric representation had each limited previous methods. When all three were solved together, prediction accuracy crossed a threshold—from often wrong to routinely near experimental accuracy. That qualitative change, rather than incremental improvement, is what triggered the Nobel committee’s judgment.
Importantly, this does not mean protein folding in the biological sense is solved. Dynamics, disorder, misfolding, post-translational modification, and cellular context remain largely untouched. What AlphaFold solved was the long-standing inverse problem of mapping sequence to dominant folded structure for a large class of proteins. In Nobel terms, it closed a problem that had been cleanly defined, deeply resistant, and foundational.
Or, put more bluntly: AlphaFold did not explain life. It removed one of its longest-standing excuses.
###
They won the 2024 Nobel prize. When did they have the big breakthrough, I think that was winning CASP 14.
###