Google DeepMind: The Podcast · 2025-11-28

AlphaFold: Grand Challenge to Nobel Prize with John Jumper

Hosts: Hannah Fry

Guests: John Jumper

AlphaFoldAlphaFold 3AlphaProteoprotein structure predictionprotein designdrug discoverydiffusion modelsevolutionary datainterpretabilityAGINobel Prize 2024Isomorphic Labs

Read summary Jump to transcript Go to episode

Podcast feed URL

Open feed

Why it matters

John Jumper won the 2024 Nobel Prize in Chemistry with Demis Hassabis.

Key claims

John Jumper won the 2024 Nobel Prize in Chemistry with Demis Hassabis; AlphaFold database now used by 3M+ researchers across 190 countries, cited in ~35,000 papers
AlphaFold 3's key architectural changes: replaced heavy per-block evolutionary conditioning with a diffusion-based architecture and geometric reasoning, enabling extension to DNA, RNA, small molecules, and ions
Hallucination rates are higher in AlphaFold 3 than AlphaFold 2; Jumper stresses confidence metrics and experimental follow-up rather than treating predictions as oracle outputs
AlphaFold 2's weights are still the version in practical use across most application areas; AlphaProteo is DeepMind's internal effort on protein design (binding, enzymes)

Episode summary

Summary

Google DeepMind's John Jumper, co-recipient of the 2024 Nobel Prize in Chemistry alongside Demis Hassabis, reflects on the five-year arc from AlphaFold 2's CASP breakthrough to today's widespread scientific adoption. He describes his surprise that the trained model itself—not just the underlying ideas—became a lasting tool used by over 3 million researchers across 190 countries and cited in roughly 35,000 papers. He argues AlphaFold has accelerated structural biology by perhaps 10%, rather than solving it outright.

The conversation covers the architectural shift from AlphaFold 2 to AlphaFold 3: AlphaFold 2 relied heavily on evolutionary history at nearly every network block, which worked for proteins but was inflexible. AlphaFold 3 removed that reliance from most of the network, added a diffusion architecture to handle uncertainty, and extended prediction to DNA, RNA, small molecules, and ions—achieving what Jumper calls the "wholePDB" dream. He acknowledges higher hallucination rates in AlphaFold 3 and emphasizes that confidence metrics and experimental validation remain essential. He also discusses AlphaProteo (protein design) and applications including Isomorphic Labs' drug discovery work, bumblebee conservation, sperm-egg fertilization research, and cholesterol transport studies.

On interpretability and AGI, Jumper takes a strongly utilitarian stance: useful systems matter more than philosophical debates about whether machines "think." He draws an analogy to Roman bridge-building—working partially on intuition—and says downstream users are more like engineers using a computation package than like physicists deriving equations. He envisions future biology systems fusing narrow AI tools (like AlphaFold) with large language models to draw on the full biology literature, but is skeptical of a classical "simulated cell" given we don't even have a complete parts list.

John Jumper won the 2024 Nobel Prize in Chemistry with Demis Hassabis; AlphaFold database now used by 3M+ researchers across 190 countries, cited in ~35,000 papers
AlphaFold 3's key architectural changes: replaced heavy per-block evolutionary conditioning with a diffusion-based architecture and geometric reasoning, enabling extension to DNA, RNA, small molecules, and ions
Hallucination rates are higher in AlphaFold 3 than AlphaFold 2; Jumper stresses confidence metrics and experimental follow-up rather than treating predictions as oracle outputs
AlphaFold 2's weights are still the version in practical use across most application areas; AlphaProteo is DeepMind's internal effort on protein design (binding, enzymes)
Jumper frames drug discovery as far more than protein-ligand binding—notes ~90% clinical trial failure rate reflects deeper ignorance of biology, not just structure prediction gaps
On interpretability: Jumper pushes back on demands for full understanding, arguing scientists already work with partial interpretability and trust characterizable confidence metrics
On AGI: takes a utilitarian view—rejects equating 'playing chess' with intelligence, sees narrow useful AI systems for science as the path forward, distinct from AGI research
Future direction: DeepMind exploring fusion of AlphaFold-style narrow AI with LLMs to integrate protein structure with full biology literature; rejects the 'simulated cell' framing

Source material

Transcript

[MUSIC PLAYING] I think we'll get this ability to poke the cell in exciting ways, to interrogate it.

And every time we develop that, we'll develop a more interventional understanding of the cell that we will bring forward to medicine and synthetic biology.

Welcome to Google DeepMind the podcast.

I'm Professor Hannah Fry.

Now today we are talking about AlphaFold, one of the most extraordinary technological breakthroughs in modern science, a tool that has been described as the most useful thing that AI has ever done.

And in truth, that might be an understatement.

This is a Google DeepMind AI system that solved one of biology's grandest challenges, predicting the 3D structures of proteins, the fundamental building blocks of life.

Its latest version, AlphaFold 3, can now model the structure and interactions of all of life's molecules with unprecedented accuracy.

And the impact has been seismic.

AlphaFold has mapped hundreds of millions of protein structures.

And more than 3 million researchers across 190 countries now use its database.

It is transforming drug discovery.

And in 2024, the Nobel Prize in chemistry was awarded to Google DeepMind's Demisa Salves and John Jumper, who is our guest on today's podcast.

Now this is a story that we have been following on this podcast since season one, which was nearly eight years ago, long before it hit the headlines.

So if you are coming to AlphaFold for the first time and wondering what all the fuss is about, you can find our previous explainer episodes linked in the description.

Welcome to the podcast, John.

Oh, that's exciting.

I don't think I've interviewed you since you won your Nobel Prize.

Tell-- where were you when you found out about it?

I stayed home because I was nervous enough I thought there was a chance.

Like one in 10 a chance.

Yeah.

And so I figured I would be disappointed at home.

And I was kind of just sitting in the bed.

My original plan was I'll sleep through it and a phone call wakes me up, then I've got the Nobel, but I couldn't sleep.

Because you knew the day that the thing happened, right?

You know the day.

I knew, in fact, the kind of time that it was scheduled to be announced at 11.

I knew that winners were called about an hour beforehand.

So by about 1030, I said, oh, well, I guess not this year.

And I told my wife.

And she goes, no, no, wait.

And as she's telling me to wait, my phone lights up with a phone call from Sweden.

And thankfully, it was not the world's meanest prank call.

And yeah, it was just kind of this extraordinary thing.

And you answer, and they say, is Dr. John Jumper available?

Yes.

I have some wonderful news.

Great.

Can you please hold?

Right, so that they get a-- But they make you hold.

Well, I think they were trying to read-- part of the problem was they didn't have either Demis or my phone number initially.

So anyway, they ended up calling us very late.

But then they were finally arranging, and they pulled the person on.

And he says, I have some life-changing news.

And they don't say the word Nobel for like 60, 90 seconds, which was the longest minute of my life as there is no other explanation for this call.

And I remember the very first thing I did is run to get a shower because I knew I was going to get no time for the rest of the day.

But after that, it was announced.

You come in.

You see the team.

You have this amazing kind of celebration.

We bought the local Waitrose out of sparkling wine.

Only the best will do.

I'm not a connoisseur.

And we were celebrating with friends.

And there was just this incredible kind of party just across the floors of our building.

It was amazing.

The thing is, it's an extraordinary story of you as an individual, right?

Because your first PhD, your physics PhD, you dropped out, right?

Yeah, yeah.

And so going from that, which I think must have been quite a hard experience to live through to being a Nobel Prize winner and having your tool being used in tens of thousands of academic papers.

I mean, I will say dropping out was a very lucky thing for me.

I was doing the wrong thing.

I didn't really want to.

And so I just left.

And because I left, I actually fell into this computational biology group that was doing amazing work on custom computer chips to simulate proteins.

And then I go back and I do my PhD now in chemistry by another set of accidents.

And I didn't have those great computers.

So why not get into AI?

Why not try and use sophisticated algorithms to make up for a lack of compute?

I have to be the first person to get into AI because of a lack of computational capability rather than an abundance.

And then I got lucky enough to kind of find a job that had something to do with everything I had ever tried to do in my past.

And it worked out.

And I get a Nobel.

Do people react differently to you now, then?

Oh, I mean, well, I think there's all sorts of people.

They're the people that I did my chemistry PhD with, who knew me as a pretty good physicist and a lousy chemist.

There was the people that I work with every day.

And I'm still, I think, just John, but now John with a Nobel, so he's busy.

But then there are all the people I meet.

I mean, I would get on phone calls.

And a surprising number of my phone calls start with, it's such an honor to speak with you.

And I sometimes think, and also with you, there's a certain type of difference or at least excitement.

And it's a symbol of this giant AI world and what it can mean in terms of applying AI to solve real world problems.

And then you're a Nobel Prize winner.

So you're allowed to have an opinion on anything.

And it's supposed to be a bit valid, even if it's not.

So people want you to show up to things just so that you can symbolize that a Nobel Prize was won.

And then you're done, which is not a very satisfying thing to do as a scientist, but you have this platform that maybe you can use to affect how the public thinks about science, how it funds science.

So all of these things kind of roll together in this wild combination.

I would say roughly at the midpoint of my career in terms of kind of time since undergrad and time since rough retirement.

And so I've got to figure out what to do in the second half.

And that's a fair amount to live up to.

- Yeah, absolutely.

A lot of pressure going on in that.

The thing is that we're still only five years on from that CASP breakthrough, really, when Alpha Fold 2 smashed the prediction challenge.

Did you realize at the time the potential significance of the work that you were doing?

- We were sure of two things and totally unsure of some others.

I think the two things we were pretty sure of.

We were very sure it worked even before we entered CASP.

We had measured very well.

We knew about how we would do in CASP.

That we understood, and we were careful.

We knew that we had solved this grand challenge.

But the normal thought in a grand challenge in science is that you'll solve it and it'll be a great celebration.

And then you will go build effective, useful systems that use the ideas that enabled you to solve the grand challenge.

And that this was kind of the beginning of an era.

I think the real shock to me is those weights that we train, that system, that piece of computer software has been so incredibly practically important to scientists working into this field to this day that the actual bit of software is used that makes this difference in all these different application areas, all this different type of science published on top of this as a black box computer program.

And the extent to which that is entered into scientific practice has been really, I think beyond my imagination.

- Yeah, I mean, it's really difficult to overstate the genuine significance that this has had.

I saw one thing where AlfaVold was described as the most useful thing that AI has ever done.

Right, that hasn't sort of landed with the public yet, has it?

- I think it's hard for people to appreciate, and you work in science communication, how very hard science is, how very hard curing disease is.

We have to work extremely hard to get smaller bits of knowledge about how say the cell works, how the body works for protein structure prediction or protein structure what AlfaVold does.

I think it's hard for people to appreciate this process takes a year in the lab.

I've seen PhD theses that are progress toward determining the structure of X.

And that doesn't mean they finished it, just they feel like they're a little bit closer and they need to graduate.

I think-- - Of one protein.

- Of one protein.

Then the notion that we'll turn that work into a machine that gives you a really good answer in five minutes.

And then that enables so much more work downstream of it.

I think there's something like I haven't looked at the recent number, but 30, 35,000 different scientific papers that cite AlfaVold.

35,000 different contributions to our understanding of biology that build on top of this advance.

And I think the right kind of way to think of AlfaVold is not certainly that we've solved all problems in biology, we very much haven't.

I think for this slice of biology that cares about what structures in the cell look like structural biology.

Maybe we've made it 10% faster overall across the whole thing.

We've amplified this enormous effort in societal expense.

And then ultimately we will have transformative science.

And there's certain narrow areas, say protein design, that are just being transformed by this understanding.

- I think one of the ways for me that really demonstrates just how important this was as a breakthrough was the way that biologists reacted when you published the 200 million protein structures.

Just tell me a little bit about that.

'Cause you just put it out there, right?

- Oh yeah, yeah.

So the original release was a bit smaller, but was still, I think it was like 400,000.

What I remember was there was maybe a week in between when we had put out our code and the real experts were playing with it.

And they were like, this really works on hard problems.

But all the other biologists were, no, no, no.

These can't be real hard problems like I work on.

And then of course we put out this huge database, AlfaVold database.

And people are like, well, let me just see how dumb the AI engine was.

And click on their protein of interest, I think expecting to make fun of it.

And then they sat there and they were amazed.

I saw this comment from someone on Twitter saying, how did they get a copy of my structure?

How did DeepMind get this thing that I had done and not yet published?

Like they couldn't believe that this was like literally a machine doing years of painstaking work.

Like this all at once kind of a flash.

And what was I think also amazing about it is how rapidly this turned into a community understanding of what AlfaVold does, what it doesn't do, for example, not sensitive to single amino acid changes.

And how to build this into their workflow and do work on top of it.

I thought it would take kind of years for people to really figure out what's the right way to build it in.

How do I make sure that I look at the confidence measures where AlfaVold is saying, this looks like a reliable answer, this doesn't.

This happened within a matter of months that science is a community that people develop this incredibly rapid, not totally perfect, but actually really good understanding very, very fast.

And so people were doing excellent science on top of AlfaVold, you know, we released this in I think July or something like that.

And people were almost immediately doing really excellent science on top of it.

That really cool work was coming out by the end of that year.

And I think that's just a testament to how much scientists are looking for really effective tools that help them push knowledge forward.

And then when they find it, they use it well.

- How do you stay on top of the work that people are doing with AlfaVold now?

I mean, because it's so embedded in the way that people do biology now that, I mean, presumably they're not sending you an email every time.

- Thank goodness.

I actually ended up, I still pretty often just, yeah, put the word AlfaVold into, you know, searching on whatever X and just see the random work.

What I love is the random things that pop up that says, oh, we use this to do this weird thing.

I think the other way, one of the nice things about being at a company is if really cool things happen, someone will notice and they'll post it to this chat room that we have to collect various, you know, things that people find cool.

And so it's so valuable to collect that experience and it's so much fun, right?

You feel this kind of vicarious ownership of a little bit of that work.

- Well, okay, tell me then, what is the most random unusual use of AlfaVold that you've come across?

- One that I really love is this protein in bumblebees and they're trying to understand bumblebee populations, you know, reproduction, but their biology to try and, you know, enable pollination and understand things like colony collapse.

And so there were some important proteins involved in the honeybee life cycle that they were studying with AlfaVold and you can see ultimately how this kind of leads on to bee conservation.

And I think it's so interesting to see the structural biology of this echoing into all these things that we care about, from food to industrial production, to everything else, they're all connected because it's all the same biology, right, you know, plants, animals, they have basically the same proteins.

We were definitely thinking most about human health.

We weren't thinking, how am I going to help honeybee populations?

But here we are.

There was another really nice story that people were trying to understand human fertilization when an egg and a sperm meet and come together and eventually fuse.

- Right, so they want to find the exact proteins that were involved in sperm sticking to eggs.

- In sperm sticking to egg.

And they, I think, had the full picture of all the egg proteins, but not of all the sperm proteins.

And in fact, there were two independent groups that did this and they said, well, there are only 2000 proteins that we know that are on the outside of sperm.

Why don't we just try all of them and see which ones stick to the proteins that we know are on egg?

And if you think about doing this experimentally, that's like, well, I'll spend the next two millennia.

You know, a year at a time, $100,000 each in the next millennium and I'll get a nice paper in nature.

He's not, that's not a feasible approach.

But AlphaFold is pretty fast and they had some computers available.

So they tried all of them.

And then they both came out with this one protein, T-MIM something, I can't remember the number.

But this was the one that they didn't know what it did before and now they find out.

It's AlphaFold says it sticks to the egg and that this is how, kind of the first step of fertilization.

And so of course they don't just trust AlphaFold, right?

It's a computational system.

So they went and they said, well, what happens if I remove that protein or if I change that protein, I find if they change that protein or remove that protein, then sperm and egg will get close, but they won't fertilize.

So you go from this broad hypothesis, there's some protein on the surface of sperm that does this.

AlphaFold says, I think it's this one.

And then you go do your detailed experiments to confirm and now you can think about questions like infertility.

Now, if you see mutations in that protein, maybe that's a cause of infertility.

Maybe we can think about treating that.

And we go from kind of rough hypothesis, AlphaFold in the middle, confirm with experiment.

And now maybe we can think ultimately about something like drug design on top of this, but we have to get this biological understanding first that we bring meaning to all those pieces in the cell.

And that's what AlphaFold, I think, really helps with in the early stages is bringing meaning to the parts of the cell.

And then later companies like Isomorphic use it in order to build small molecules that have targeted effects.

- We should talk about the difference between AlphaFold2 and AlphaFold3 though, right?

Because I mean, AlphaFold2 was like predicting the structure of proteins from these strings of amino acids.

But of course biology isn't only about proteins, right?

You've got all of these other biomolecules.

You've got DNA, you've got RNA, you've got like small drug molecules, for example.

You've got ions, charged particles, et cetera.

Like all of these are interacting with each other.

So how early in the process did you know that you needed to change the fundamental model of AlphaFold2 in order to incorporate those additional molecules?

- So even before the world knew about AlphaFold2, we were sitting there and dreaming.

And we were dreaming for two reasons.

One is that we had a lot of, for example, proteins that exist naturally in what are called complexes, multiple proteins stuck together.

And sometimes there's no real way to predict their structure without predicting them all together.

So we're already thinking about this multiprotein problem.

But we also almost immediately said, well, a lot of proteins, for example, as you say, bind drugs, small molecules, maybe 20 atoms, you can think of aspirin, right?

It sticks to a protein.

And we knew that this was really important.

We said, but later.

And we started to talk about this kind of dream about this goal we would call wholePDB, right?

So the protein databank, the PDB, is the data source we use.

But we take it in and we throw away a lot of the things.

Oh, this has RNA or DNA attached to the protein.

Well, let's throw away the RNA or DNA and just have AlphaFold predict the protein.

- Because you can't handle those extra molecules.

- We couldn't handle that complexity.

That we were very driven by, we have 20 amino acids that produces 20 types of structures.

And then we will predict and all our code was kind of based around that.

And we're like, eh, it's a challenge for later.

But eventually we'll start doing it.

And one of the things we almost immediately realized is a lot of the decisions we made in AlphaFold were very good and very helpful and very annoying to extend to more complicated things.

The other bit of work is that we were trying to figure out how to simplify AlphaFold.

And we thought, okay, well, AlphaFold is complicated, but maybe there are some things we can remove.

- How has the architecture shifted from AlphaFold two to AlphaFold three then?

- So there's a lot of changes, but I would say there's two big themes of changes.

When we were trying to handle much more of the kind of the DNA, small molecules, et cetera, we adopted this thing called a diffusion architecture, a different way in which we handle our uncertainty.

And I can tell you more about that.

But then I think the other one was really thinking a lot about the role of evolution and evolutionary data.

- Well, let me ask you about that then, because this is one of the things that I remember being quite key to AlphaFold two, this idea that actually proteins have evolved in lots of different creatures numerous times.

And actually there is like some clues about the evolutionary history that will indicate where amino acids are likely to end up in the final folded shape.

So that even if you're starting with a string of amino acids, you're not going in totally blind because, I mean, this sort of stuff's happened before, right?

And like that ended up being quite a key part of the model, but it was also, I think potentially one of the parts that made the model quite, I don't know, like inflexible to other molecules, is that fair?

So AlphaFold two used evolutionary information in this exuberant way at kind of every part of almost every block it was saying, and here's the evolutionary information in case you need it.

But a lot of what we studied in AlphaFold three that we knew we were moving toward didn't have evolutionary information.

So we were shouting at it with nothing and we were kind of worried that this, you know, was both slowing down the network, but also possibly leading to some bad dynamics and how it works.

And so we decided to just take that out of most of the network and otherwise emphasize the geometric information, the thing that really is always there.

And that turned out to work exceedingly well, actually better than we expected.

- I wonder if there's an analogy that we can use here, okay, for the difference in this architecture, right?

Let's imagine that you're planning a wedding and you got to do the seating plan for the entire wedding, right?

And you have all of these guests, those are your amino acids, and you have to sort of work out where each one's got to sit.

And there's a couple of different ways you can think about this.

So you could think about pairwise interaction.

So like this person sitting next to this person, is that a good interaction?

But then what does that mean for this person over here or that table over there?

But you could also potentially think about the history of what you know about those people.

How am I doing so far as an analogy?

- I can go for this.

I think, you know, you know, some people went to school together, some people used to date and had a terrible breakup, right?

Those might be- - Don't sit them together.

- You probably don't want to sit them right next to each other unless you're really looking for sparks.

But yeah, I think before we just talked about where the wedding guests sit, now we think about where the flower arrangements are.

- That's nice.

- Right?

You know, that we think about all these other things that come together to become the reception dinner.

- In this analogy then, AlphaFold2 was very focused on the history of the guests, right?

Was sort of like continually checking, looking at where they might best fit based on their past.

And that's great for proteins, but it's difficult once you try and include other elements of the reception, like the flowers, like the food.

And once you start bringing in other biomolecules, you don't want to focus so much on history.

- I think that's all true.

I think one thing though I would say is that we kind of always made this history available, this evolutionary history and the analogy, kind of what we know about their past.

And what we would find is that AlphaFold, we think was not relying on it much other than at the very beginning, that it was like saying, oh, well, these people should probably be together.

These people should probably be apart.

I know a couple of things, but then it kind of trained itself to ignore it.

And so by inspecting and seeing that we're probably not using that information, maybe we should stop attaching it constantly into the processing.

- But then as a result, you managed to massively simplify the model.

- Well, I wouldn't say we massively simplified, but we made it massively more accurate.

And suddenly we were doing new problems.

And in fact, we made light adjustments and then we made a much better model.

And then it turned out that even that protein-protein problem, something that has nothing to do with ligands or nucleic acids or anything else, even that protein-protein problem got massively better from this kind of science and improvement.

- Also in AlphaFold 3, you have this diffusion element.

And I mean, I know that that comes up a lot in video generation, for instance.

How does that fit in here?

- So diffusion is this different idea in how you train a neural network.

The AlphaFold 2 system was really heavily based around the kind of shape of a protein backbone.

In AlphaFold 3, we went to diffusion where you basically say, here's a blurry image of the protein.

I kind of took all of the protein and added some noise, some error, like you looked at it in the wrong prescription glasses.

And then guess the right answer.

And you have it constantly refined.

And so what this gave us was a really great understanding of local geometry, of how to make things extremely precise, 'cause that's what it does at small scales, and this way of tackling big systems.

And that gave us a kind of new approach that we didn't have to get so involved in the details of exactly how proteins look, 'cause they're different than DNA.

They're different than RNA in small molecules.

And the upside is that it made it really, really easy to kind of handle this wide universe of things that we study.

The downside is that it led to a higher rate of hallucination, of weird stuff appearing.

And so then we needed to handle that in different ways.

- Well, this is one of the big differences between alpha-fold two and alpha-fold three, right?

That you have this introduction of the stocasicity, they have the potential for hallucinations.

How much should people be concerned about that when they're using it?

Is there a danger that they think because alpha-fold two was so on the money that they think of alpha-fold three as though it's some kind of oracle?

- I think one of the wonderful things about biologists is that, as scientists, they're deeply skeptical of their tools.

Alpha-fold two did have an advantage that wrong answers often look stupid.

No one looked at that and said, "That's definitely a protein."

Whereas wrong answers in alpha-fold three are sometimes more plausible, but I think that people have gotten really good, not uniformly perfect, but really good at saying, "Well, alpha-fold two is also telling me how accurate it thinks it is in the confidence measure.

I should also use that."

And so it's this kind of social knowledge.

There's no experiment or tool that scientists use that is without limitation, right?

Even experimental structure determination has all these known faults.

And so scientists, I think, use it relatively well.

I haven't seen any uptick in really bad conclusions from people using alpha-fold three.

I think because it's now such a part of the education and community of scientists that you, when you use computational methods, here are the confidence measures you look at.

We color our proteins by confidence.

And ultimately, we also think of it as a tool where we'll induce hypotheses and we'll test those hypotheses experimentally.

And alpha-fold two is not perfect by any means.

It's just very, very useful.

- How important is interpretability in all of this?

I mean, this idea that humans want to understand why alpha-fold is folding a protein in a particular way.

- There's a lot of interest in it.

And you will hear people sometimes make very confident pronouncements that, you know, we can only use AI systems if we perfectly understand what they do.

And they almost mean if I can write down an algorithm that I could use instead of that AI system.

And I think it's this desire of, that's an annoying black box.

What if it just wasn't a black box?

And I kind of feel like that narrow demand for we must understand it perfectly is honestly kind of a weird demand.

And I think about cases in which we've been perfectly happy not having that in science.

One, for example, is just experimental science in general.

If you look at, say, how someone crystallizes a protein, early in crystallography, it wasn't clear if those structures were gonna look just like a free protein floating in liquid and more experiments kind of said most of the time it's about right.

So we always in science totally worked in this kind of partial interpretability way.

I think there's really good applications of interpretability when we think about, okay, I wanna understand the network so that I can change it and make a better version of say, alpha fold.

I described some stories earlier about how we do that kind of work.

Some people will say, well, I want interpretability so I can trust it.

And I think more important than that is if you really wanna know whether you trust an answer, well, we have pretty good characterization that our confidence metrics are a reliable guide and people use them in practice to decide when an answer is probably true.

I'd love to see a lot more interpretability work go on for alpha fold.

What exactly leads it to generalize so widely?

I think there could be more done, but that won't necessarily give people what they think it will give them.

- I think sometimes if like, okay, the Romans, for example, were building bridges in aqueduct without having a full understanding of gravity, right?

They didn't have like Newton's equations.

Is there a way in which alpha fold here is like Roman engineering, but for biology?

Like we are able to build stuff now with the tools that you and your team are creating, even though we don't necessarily have full insight into why they're working.

- I mean, Romans are a way back in order to do this.

- I mean, this feels very-- - Think about a modern jet airplane.

- Yeah.

- Or modern car.

- Yeah.

- We understand, you know, the Navier-Stokes equations of fluid dynamics, et cetera, a bit about turbulence.

Despite that low level understanding, we both build wind tunnels to measure flow and we build simulations that for this precise wing geometry show you how the air goes over it.

I think the Roman bridge building is maybe a better analogy of how we do AI development, where we have some intuitions, just like the Romans had intuitions and they built some beautiful bridges, but they didn't have all the equations and full understanding, and yet they built things they needed, and yet they were able to drive carts across the bridge they built.

So we are in that sense operating partially intuitionally in AI, but downstream the people using tools like AlphaFold, I think it's more like having a great computation package that maybe you don't exactly understand.

Your expertise is not exactly in how this air flow results in turbulence, but then you figure out how to change it and adapt and you work with this tool to do your larger scale science.

And I think that's really the slightly less Roman version of what AI users are doing.

- Hey look, the Romans, there's no shade on the Romans.

- What have the Romans given us, yes.

- Exactly, exactly right.

Okay, well let's talk about some of the downstream applications, because earlier this year I got to speak to Max Adeburg and Rebecca Paul from Isomorphic Labs about how they're using your AlphaFold tool in drug design.

What has that been like to see this thing that you guys built actually being implemented in drug design in that way?

- I think it's just really extraordinary to see it carried so far and to be a part of, one of the things about drug design is it's not just protein structure prediction.

I like to remind people that a protein structure costs about $100,000 and a drug costs about a billion.

So they can tell you that it can't all be protein structure determination.

I think it's really exceptional to see people trying to build on and take these ideas further and really find also a way in order to integrate it into application, and we see this across the pharm industry, like how are we gonna build processes around this that enable us to ultimately end up with molecules that are dosed in patients that pass all these different things, some we'll help with, how does the molecule stick, or what is the biology is this protein a target at all, and some that we have very little to do with.

Will this drug be metabolized in the liver?

Maybe there's a protein small molecule interaction that you can use to help that, but for the most part, alpha fold is probably not the tool, and I think it's really important that we have both the work in how do we understand biology and then the work in specifically how do we make molecules to drug targets, it's an exciting combination.

- I think one of the things that I hadn't quite appreciated until having that conversation with them is like finding a molecule to bind to a particular protein target is such a small subset of curing disease, right?

I mean, Alzheimer's is an example where we know that proteins are involved, but there isn't even necessarily a place to target yet.

- Well, we don't even know, we still don't know if amyloid beta accumulation is in the causal chain.

Is it a symptom?

Right, that's a protein, but breaking that up, I mean, there's been some starting to see a bit of effect.

I think it's an example of one of the most important things to say, if you think about finding a drug that sticks to a protein, finding a drug that's non-toxic, at least for the most part in animal models, doing all these hard stages of early drug design that we rightly say are very, very difficult, take people off in years, but the bigger problem to me is like even when you do all that, 90% of drugs fail in clinical trials, so even though you do all those things right, they still don't work or they're still not safe, we determine this experimentally, and a lot of this is our grand ignorance of biology, right, that we don't know the causes of Alzheimer's, of autism.

Even when we have ideas of cause, you know, for example, Huntington's with very clear genetic correlates, still making a molecule that actually makes those patients' lives better is so very difficult.

There are so many giant problems left.

We're only starting this world of computational biology or, well, continuing, let's say, but still, there's so much left.

You know, one case study I like from AlphaFold, one that's somewhat recent, people were trying to understand how cholesterol is moved around the body.

This protein that is involved in the transport of fatty molecules from one location to another, I believe it is also found in some of the plaques that build up that are correlated to heart disease.

And even as we start to understand that biology, and we have this nice piece that AlphaFold contributed, the detailed structure of this molecule, that they could only take an extraordinarily fuzzy picture with a method called cryo-electron microscopy, which is not an uncommon outcome for that technique, but then that fuzzy piece actually really well matched with the AlphaFold structure.

And now you can say, okay, well, this is that thing that is moving cholesterol.

Maybe I can interfere with or change how it moves cholesterol.

Maybe I can add a small molecule.

But of course, your first thing you might say, well, that's the protein.

Why don't you just add a drug that blocks it?

And I think you would immediately find out that would be really bad.

Your body didn't have this protein by accident.

The purpose of this protein is not to cause heart disease.

The purpose of this is to move fatty molecules where they need to be in the cell.

- You sort of need that, right?

- You sort of need that.

You're gonna sort of need that.

So what you're actually gonna need to figure out how to do is how might, this gives you some new ideas, how to change how this behaves in the cell without killing the patient and making their life better.

And I think AlphaFold is a part of that story.

It's not the end.

- It feels like there's a natural next step to all of this.

So if you are like predicting the shape of proteins and then using those models to interpret the function of proteins in the human body, does it then go on to designing new proteins?

- Oh yeah, people have wanted, they've looked at these beautiful proteins and they said, I wish humans could do that, right?

And so there's been all this exceptional work and in fact, a lot of it done at David Baker's lab who with Demis and I won the Nobel.

And AlphaFold has actually been shockingly transformative at this at saying, well, how do we go from, now we've built these computational systems that understand it, how are we gonna design our own proteins?

And in fact, a large portion of new approved drugs are proteins, normally antibodies discovered initially in very interesting ways, injecting mice or llamas with something that you want to build a protein against and using their natural immune system to find it.

But we are starting to talk very seriously about how are we going to design proteins to have the effect we want?

And it turns out that the most important part of that is that you can design many things you think might work.

It's extraordinarily, it takes a long time.

It's difficult, it's expensive to test in the lab.

So what's been so important there is using AlphaFold as a proxy for nature is trying to say, how do we integrate AlphaFold's understanding of how proteins stick together when they do?

How do we use that to maximally make a signal for protein design?

And people have gotten extraordinarily successful.

They've gotten really, really good at getting proteins to stick just where they want.

- Okay, but hang on, because that wasn't the original intention of AlphaFold to like see how proteins stick together.

- Wasn't the intention of them to see how they stick together?

In fact, that was an early surprise from Twitter where two different people said, if you wanna know if two proteins stick together, yeah, we were busy making a multi-protein, like properly done system.

They said, well, just take those two proteins and put some random amino acids in the middle and see if they stick together that way.

And that was the best system in the world for seeing if proteins stick together.

- Wow.

- We didn't think we were making a system that could help people design proteins in a really deep way.

We thought we would go use this fundamental breakthrough and go on and do it.

And then people said, actually, it already works.

And I think it was this grand story that does show up again and again in AI.

So maybe we should have expected it that if you train a model to be really, really good at a task, it has to learn a lot of deep facts.

If you wanna be really good at structure prediction, learn some deep facts about how proteins interact.

And if you do just the right experiments, you can kind of access that knowledge.

But there was this whole field of what I think people started to call AlphaFoldology, where people would find out which things work.

They would just treat it as this really cool black box that they could start experimenting with and try their own ideas on.

I think there was a lot of really great science that has been and continues to be done in that vein.

We're still figuring out, and there's starting to be work on how do we make enzymes, proteins that do chemistry, how do we do really complicated, sophisticated stuff.

Nature would still laugh at us on our ability to design proteins.

But we are starting to develop these really interesting tools that are maybe therapeutics that are also maybe ways to interrogate the cell, that you can bring two proteins together and see how the cell changes because you do that.

I think we'll get this interplay in not only the tools we use for therapeutics, but now our ability to poke the cell in exciting ways to interrogate it.

And every time we develop that, we'll develop a more interventional understanding of the cell that we will bring forward to medicine and synthetic biology.

- Is this AlphaProteo that you're describing here?

- So AlphaProteo is Google DeepMind's internal effort to do protein design and thinking about problems in binding and enzymes and really trying to figure out, especially for these super hard problems, how do we get reliable systems?

And I think what we're seeing is in the design space still is a lot of success, but when you're actually designing proteins, you have to go to the lab and test them.

There's no other way to find out.

And then finding out the right ways to predict if they're going to work.

And the AlphaProteo work has shown that we can get further and further in doing this.

- Give me an example then of some of the type of proteins that would be nice to be able to design.

- You know what?

I think if you ask any protein designer, they will have a favorite.

And their favorite is really, can we make proteins do things like carbon capture?

Can we actually build enzymes that meaningfully contribute to address climate change?

I think other ones that you really see, for example, degrading microplastics or environmental plastics.

I think one of the things also though, I'll say as a caution is that for all of these, when you talk about doing a real application, just like people's conception of drug design is get molecule to stick, drug design done.

And that's not the case, right?

There's so many more properties you need.

You need it to be tolerable in all these ways.

You need it to be pill formulatable.

You need all these other things.

Similarly in enzymes, you might think, oh, well, you just need to make this reaction happen.

An enzyme is a protein that catalyzes the chemical reaction.

But no, actually, you need to be able to do this many times enough that you're not constantly having to make new proteins for each reaction.

You need it to be fast enough.

You need it to not do certain other reactions.

You have all these other properties.

And I think there's a lot more to be done as we think about going from, oh, maybe this is kind of interesting to this really, really works.

Although in fairness, actually, interestingly on synthetically evolved enzymes, people are already using them.

There's a lot of washing powder that has designed proteins, which I find fascinating.

I think one of the few applications of design proteins and something people would recognize.

- Yeah, absolutely.

How much harder is it, though, to sort of engineer biology than it is to kind of design than it is to just predict?

- I'm very empirical.

You should ask me in three years and we'll know.

It's easier and it's harder.

One analogy I like to say is, if you're trying to figure out what an object is, and you might say, is this a bicycle?

And I would see two wheels, a chain, some handlebars, and I would say, yeah, that's a bicycle.

But having two wheels, a handlebar, and a chain doesn't make something a working bicycle.

So when you're designing something, you have to get all the details right enough that it actually works.

And I think we're still figuring this out in proteins.

And right now, protein structure prediction is, let's say, solve star, right?

It's a very, very useful system.

It's not perfect.

Design is not yet solved, but I think that it is advancing rapidly.

And I don't think we'll be still talking about protein design is incredibly difficult in 15 years.

- Well, okay, let's just zoom out a little bit on AI and biology more generally, because this whole conversation, it's reminded me of something that you said when we last interviewed you a few years ago.

And I've got a little clip that I can play you of what you said.

- I think it's really important to remember that these are really powerful techniques that we've developed that are still far short of a real artificial intelligence that you can talk about thinking and making decisions and everything else.

- I think that's so interesting.

So that was 2022, right?

I wonder how you reflect now on that.

Do you think that machines are beginning to sort of understand biology in an intelligent way now?

Have you changed your mind?

- I think that whether or not they can think, they're extraordinarily useful for solving problems.

How far they are from AI or AGI, I think that's almost beside the point.

I think the really, really interesting point is where we can characterize these systems as reliable enough, do we find useful things for them to do?

I think we need to be much more utilitarian about it.

And certainly machines like AlphaFold, I wouldn't necessarily apply the word think.

And I don't know if we're in the situation, right?

That we used to say, okay, intelligence was playing chess and we should work on chess because once we have machines that play chess, we've basically got intelligence.

And of course we got machines that played chess really well and the superhuman level in 1994 was the Kasparov match.

And that wasn't the path that led us to machines that can read and write.

And so I think we always reach for these problems and say, well, this is the problem or like, people rather optimistically name something, humanity's last exam, problem so hard that if you solve it, there's no point in posing problems to machines anymore.

And I'm very interested in how do we find those problems that turn out to be so easy in a certain sense that we can do incredibly on them and build very useful systems before we build AGI.

Those are the kind of science problems.

And of course you wanna use related techniques to the people trying to build AGI, they're powerful techniques, but we don't have to get tied up in the philosophy.

We can just build useful systems.

In fact, I think the whole industry is thinking a lot about how do we build useful systems that matter for people doing software development, that matter for people doing writing, that expand the nature of the problems we solve.

And then we'll see if we end up with AGI, but we will certainly end up with useful systems.

- So how about the most useful system of all in biology?

I mean, you have deep mind, you could have all of these different systems for lots of aspects of human biology, like, you know, alpha-fold, alpha-genome, you know, alpha-proteo and so on.

Can you bring those together in a single system?

I mean, is there a goal here to sort of build like a simulated cell?

- You know, I used to work in simulation, and simulation is that I will write down the rules for how all the little pieces do their little thing locally, and then I'll put it all, mash it together, and turn a big crank, and then I will get it.

But we don't even have a parts list for the cell.

We have all these effects that I think are not gonna give us like a classical simulation simulated cell.

I think what we're gonna do is build really useful systems that draw information from alpha-fold, that draw information from the literature, that draw information from the genome, and use that to say really useful things about biology that matter.

And I think quite possibly actually one of the core technologies of that will be finding the right fusion of what we understand in narrow AI systems, and what we're understanding about broad machine learning in terms of large language models.

- Well, so is that, I mean, how do you bring those systems together?

Is there sort of ideas from large language models that can be applied?

- I think very easy to say is, oh, well, we'll just have your large language model call alpha-fold.exe as a tool.

But I think there are all these other problems, like, okay, well, if alpha-fold produces a structure, can these large language models actually understand structure really well?

To what extent can they understand these 3D coordinates as well as a human better than a human?

How do they bring in information from, say, DNA sequencing from all these others?

I think it's far from trivial.

How do we get these deep integrations so that a model can understand as much about proteins and protein structure as alpha-fold, but also understand the entirety of, say, the biology literature?

I'm kind of hopeful we'll get there, but we have to build it.

- Do you think that there are aspects of biology that are gonna resist computational prediction?

- I think there will certainly be aspects that if you ask deep questions about evolution, you know, or the origin of life, what data are you learning from?

What experiments?

You're gonna have to draw data very far away to answer that question.

You might know something about chemistry.

You might be able to do these experiments a bit faster, but you're certainly not directly learning from data.

Or we talk about evolution and we draw phylogenetic trees, but ultimately we just have the DNA of the species that exist right now and a little bit into the past.

These kinds of things, I think, will be very hard.

I think the other thing that will happen is that as we build these AI tools, the space of kind of reasonable hypotheses will narrow, that it will say probably not that for this reason, probably not that, and our experiments will be better, you know, in a certain kind of Bayesian sense.

Our prior over what our reasonable biological answers will narrow because of our computational tools and experiments will help resolve them.

And I think this interplay will get tighter, and as we do more experiments, or as we use AI to do things like protein design that gives us more tools to poke the cell, then we will learn more and we will do more.

But I think we'll just see some things will be harder and some things will be easier, and the easier things will happen first.

- The easier things will happen first.

John, thank you so much for joining us.

- It was fun.

(upbeat music) - I think it's really easy to have a very romantic idea of science, right?

That it's about uncovering the hidden truths of the universe, that your aim as a researcher is to build this picture piece by piece that can help to understand the mechanisms of life.

And that, I think, is what makes John's ideas about interpretability completely fascinating because that is turning things completely on their head.

You know, AlphaFold is unashamedly not about the why here.

Instead, this is a tool that can just reliably be used to accelerate the work that scientists can do.

And then I think when you remember that John Jumper is only halfway through his professional career as a scientist and he's already got one Nobel Prize, you realize he isn't necessarily defending an old paradigm here.

He is literally building the next one.

And if John's focus is completely on utility rather than understanding, when the person who built the most useful thing that AI has ever done tells you that that is what really matters, well, you have to wonder if he's just showing all of us where science is headed next.

You have been listening to Google DeepMind, the podcast with me, Professor Hannah Fry.

If you have enjoyed this episode, then please do leave us a comment or a review.

And I should tell you that coming up, we have interviews with two of Google DeepMind's co-founders, Demisis Abbess and Shane Legge.

Trust me, you will not want to miss them.

So why not take the opportunity to subscribe to our YouTube channel?

See you soon.

(upbeat music)