Training Data · 2025-05-16

Jeff Dean on the Coming Transformations in AI

Hosts: Bill Corn

Guests: Jeff Dean

GeminiTPUsPathwaysAI agentsFrontier model competitionDistillationAI for scienceHardware specializationModel architectureVibe codingJunior engineer AI agent

Read summary Jump to transcript Original podcast

Podcast feed URL

Open feed

Why it matters

Agents are currently useful but limited; a clear path exists to expand capability via more RL and agent experience

Key claims

Frontier model market will consolidate to a 'handful' of general-purpose labs, with many specialized distilled models built on top
Agents are currently useful but limited; a clear path exists to expand capability via more RL and agent experience
Google's TPU line is now on Ironwood, and Pathways (single Python process across tens of thousands of chips) is now being offered to Cloud customers
Dean estimated a 24/7 AI 'junior engineer' agent is 'probably possible in the next year-ish'

Episode summary

Summary

Google chief scientist Jeff Dean, speaking at Sequoia's annual AI conference, framed the current moment as the product of roughly 12 years of steady progress in scaling neural networks, with three reinforcing drivers: better algorithms (including post-training and RL), faster specialized hardware, and multimodality. He argued that agents are real but early—useful today but not yet capable of most tasks—and described a clear trajectory toward more capable virtual and physical (robotic) agents through more RL and real-world experience.

On the competitive landscape, Dean predicted only a "handful" of frontier general-purpose model labs will remain, with distillation enabling many lighter-weight specialized models layered on top. He defended Google's TPUs (now on the Ironwood generation) and the Pathways software stack—used to train Gemini with a single Python process driving tens of thousands of chips—and confirmed Pathways is now becoming available to Cloud customers. He also flagged analog and highly specialized digital inference hardware as a major efficiency frontier.

On AI's scientific impact, Dean cited the Nobel Prizes to Hinton and to DeepMind's Hassabis and Jumper as evidence that ML is reshaping science, especially by approximating expensive simulators with neural nets. He predicted a 24/7 "junior engineer" AI agent within roughly a year, said algorithmic improvements have been at least as important as hardware scaling across Gemini generations, and sketched a more organic future model architecture—irregular sparsity, mixture-of-experts style partitioning, on-the-fly distillation and parameter reuse—rather than today's rigid monolithic designs.

Frontier model market will consolidate to a 'handful' of general-purpose labs, with many specialized distilled models built on top
Agents are currently useful but limited; a clear path exists to expand capability via more RL and agent experience
Google's TPU line is now on Ironwood, and Pathways (single Python process across tens of thousands of chips) is now being offered to Cloud customers
Dean estimated a 24/7 AI 'junior engineer' agent is 'probably possible in the next year-ish'
Algorithmic improvements across Gemini generations have been as important or more important than raw hardware scaling
AI is already transforming scientific workflows by replacing expensive simulators with fast neural approximations
Future model architectures should move toward irregular sparsity, MoE-style partitioning, and dynamic distillation/compaction rather than today's rigid monolithic designs
Inference hardware (including analog approaches) could become 10–1,000x more efficient than today

Source material

Transcript

Hi, and welcome to Training Data.

We are mixing it up for this week's episodes and dropping a conversation that was filmed live at Sequoia's annual AI conference in San Francisco with Google's chief scientist and AI lead, Jeff Dean.

Jeff is interviewed by our partner and Google alum, Bill Corn.

We hope you enjoy this special conversation with Jeff about the future of model development and compute, whether or not he likes Vibes coding (hint, he does) and his expected timelines for a 24/7 software developer agent.

[Music] We have Jeff Dean, and if you read Jeff's bio, he's run everything at some point in Google, including overseeing the genesis of this industry and the BERT paper that kind of sparked things so many years ago.

And we're very fortunate at Sequoia to have our partner, Bill Corn, who's spent about a decade before Sequoia running most of engineering at Google with Jeff.

And so, please welcome Jeff and Bill.

[Applause] Thank you.

And Jeff, it's great to see you.

We got to work together for a few years, and Jeff still is occasionally willing to talk to me, which I'm very proud of.

We have an occasional dinner, which is great fun.

Yeah, no.

He's now the chief scientist, I think, at Alphabet.

So, I thought we'd start.

Obviously, a lot of the people in the room are excited about AI and what's happened.

Google clearly introduced a lot of the tech that the industry is based on, Transformers and other things.

Where do you see things going these days as you look out, both within Google, but also in the industry as a whole?

Yeah, I mean, I think this sort of period has been a fairly long time in developing, even though it's sort of come into popular visibility only in the last three or four years.

But really, starting maybe in 2012 and '13, people were starting to be able to use these really, at that time, what seemed like large neural networks to solve interesting problems.

And the same sort of algorithmic approach would work for vision and for speech and for language.

And that was pretty remarkable and kind of brought attention to machine learning as a way to solve those problems rather than sort of more traditional handcrafted approaches.

And one of the things we were interested in in 2012 even was how can you scale and train very, very large neural networks.

So, we trained a neural network that at the time was 60X larger than anything else.

And we used 16,000 CPU cores because that's what we had in our data centers and got really good results.

And that really cemented in our mind that scaling these approaches would really work well.

And there's been a whole bunch of evidence of that and hardware improvements to help increase our ability to scale to larger and larger models, larger data sets.

We had an expression, bigger model, more data, better results, which has been sort of relatively true for the last 12 or 15 years.

And where things are going, I think now the models that we have are capable of doing really interesting things.

They can't solve every problem.

They can solve a growing set of problems year over year because the models get better.

We have better algorithmic improvements that show us how to train larger models with the same compute cost, more capable models.

And then we have scaling of hardware.

We have increasing compute per unit of hardware.

And also we have reinforcement learning and post-training kinds of approaches that are making the models better and sort of guiding them into the ways that we want them to behave.

And that's really exciting.

I think multimodality is another big thing, like having the ability to put in audio or video or images or text or code and have it sort of output all those kinds of things as well is pretty useful.

The industry is, I think, mesmerized by agents right now.

How real do you think agents are?

I know Google introduced an agent framework.

Some of this stuff, not Google's necessarily, but some of the agent stuff seems to be a little bit vaporware to me.

Sorry, folks.

I'm a little direct as some folks will tell you.

It's all good.

I think there's a lot of promise there because I do see a path for agents with the right training process to eventually be able to do many, many things in the virtual sort of computer environment that humans can do today.

Right now, they can sort of do something things, but not most things.

But the path for increasing the capability there is reasonably clear.

You get more reinforcement learning going.

You have more agent experience that it can learn from.

You have early nascent products that can do some things but not most things but are still incredibly useful for people.

I think similar things will happen in sort of physical robotic agents as well.

Like right now, we're probably close to making that transition from robots in messy environments like this room kind of don't quite work today, but you can see a path where in the next few years or two, they'll start to be able to do 20 useful things in this room.

That will introduce pretty expensive robotic products that can do those 20 things.

Then learning from experience, they will then get cost engineered to now have something that's 10 times cheaper and can do a thousand things.

That's going to engender even more cost engineering and more improvement in capability.

It's exciting.

It is.

It does seem like it's coming even though it's very important today.

I guess one of the other things that comes up, I think, with a lot of young companies is what's happening with large models.

Clearly, Google has Gemini 2.5 Pro and Deep Research and so forth.

Then there's OpenAI and a number of other players.

I think there's an open debate about how many large language models, open source, closed source, where are things going.

How do you think about that?

Obviously, Google has a strong position and wants to, I'm sure, dominate in that area.

How do you see the landscape?

I think clearly it takes quite a lot of investment to build the absolute cutting edge models.

I think there won't be 50 of those.

There may be a handful.

There are an awful lot.

Once you have those capable models, it's possible to make much lighter weight models that can be used for many more things because you can use techniques like distillation that I was a co-author on and got rejected from NeurIPS 2014 is unlikely to have impact.

I've heard that technique may have helped deep-seek.

That's a really nice technique if you have a better model and then you can put it into a smaller scale thing that actually is pretty lightweight and fast and all the kinds of properties you might want.

I think there will be quite a number of different players in this space because different shape models or models that focus on different kinds of things, but I also think a handful of really capable general purpose ones will do pretty well.

Fair enough.

I guess hardware is the other thing that's interesting.

It looks to me like every large player is building their own hardware.

Obviously, Google has been very public about the TPU program, but Amazon has their own.

Rumors are Meta has one.

Rumors are OpenAI is building one.

There's lots of hardware and yet the industry seems to only hear about NVIDIA fairly.

I'm sure that's not true in your office, but how do you think about that?

How important is specialized hardware for this stuff?

Yeah.

Well, I mean, it's very clear that having hardware that is focused on machine learning style computations, I like to say accelerators for reduced precision linear algebra are what you want and you want them to be better and better generation over generation and you want them to be connected together at large scale with super high speed networking so that you can spread your model computation out over as many compute devices as possible.

I think it's super important.

I helped bootstrap the TPU program in 2013 because it seemed obvious we would want a lot of compute for inference at that time.

That was the first generation and then the next generation of TPUs, TPUv2 was focused on both inference and training because we saw a big need there and I think we're on now, we stopped numbering them for some annoying reason.

So now we're on Ironwood, which is coming out any day now and a trillion before that.

Be careful.

That sounds like an Intel chip naming strategy, which hasn't worked that well.

Small edit distance to Mykonium, which is a little scary.

I guess going a little bit off topic and then maybe we'll open up to questions from folks in the room.

I have a lot of friends who are physicists.

They were a little surprised when Jeff Hinton and his colleagues won the Nobel in physics.

I guess how do you see AI?

Some of the physicists I know are sort of offended that a non-physicist is starting to win Nobel Prizes.

How far do you think AI is going to go in various fields at this point?

Pretty far, I think.

Also this year, my colleague, Demis and John Jumper, won it for...

I almost forgot that.

Yes, yes.

Double Nobel Prize celebration Monday and Tuesday or whatever it was.

I think that's a sign that really AI is influencing lots of different kinds of science because at its core can you learn from interesting data?

A lot of parts of science are about making connections between things and understanding them.

If you can have AI assisted help in doing that, one of the things I've seen in many different fields of science is many disciplines often have incredibly expensive computational simulators of some process like weather forecasting is a good example or fluid dynamics or quantum chemistry simulations.

Often what you can do is use those simulators as training data for a neural net and then build something that approximates the simulator but now is 300,000 times faster.

That just changes how you do science because all of a sudden, well, I'm going to go to lunch and screen 10 million molecules, that's now possible instead of I would have to run that for a year on compute I don't have.

I think that just fundamentally changes your process of how you do things and will make faster discoveries.

I think it's probably the most interesting for the questions from the audience at this point.

I have other questions for Jeff.

Just to quickly follow up on that, Jeff Hinton famously left Google after studying the differences between digital and analog computing as a future platform for inference and learning.

I'm wondering is the future of inference hardware analog?

It's definitely a possibility.

I think analog has some nice properties in terms of it being very, very power efficient.

I think there's a lot of room for digital things to be much more specialized for inference as well.

It's a little bit easier to work with typically.

I think there is a general direction of how can we make inference hardware that is 10, 20, 50, 1,000 times more efficient than what we have today.

That seems imminently possible if we put our minds to it.

It's actually something I'm spending a bit of time on.

I was just going to ask about developer experience versus hardware.

I think the TPU hardware is extremely impressive, but there's a lot in the zeitgeist about how CUDA or different technologies are easier to use than the TPU layer.

I'd be curious for your perspective on that.

Is that something you've been thinking about or getting a lot of angry emails about?

I don't connect with cloud TPU customers all that much, but definitely the experience can be improved.

One of the things we started working on in 2018 is a system called Pathways, which is really designed to enable us to take lots of different computing devices and then give a really nice abstraction with those where you have a virtual physical device mapping that is managed by the underlying runtime system.

We have support for that for both PyTorch and JAX.

We primarily use JAX in-house, but what we have is a single JAX Python process just looks like it has 10,000 devices on it.

You just write your code as you would as an ML researcher, and off you go.

You can prototype it with four or eight or 16 or 64 devices, and then you change a constant and you run against a different Pathways back in with a thousand, 10,000 chips, and off you go.

Our largest Gemini models are trained with a single Python process driving the entire thing with tens of thousands of chips, and it works quite well.

Pretty good developer experience, I think.

One thing I would say is to date we had not offered that to Cloud customers, but we just announced at Cloud Next that we're now going to have Pathways available for Cloud customers, so then everyone else can have the delightful experience of a single Python process with thousands of devices attached.

I agree.

That's a much better experience than managing 64 processors for your 256 chips.

Why would you want to do that?

I love using the Gemini API.

It would be even easier if it got one API key rather than the Google Cloud credential setup.

Do you guys have a plan to unify the Google Cloud Gemini stack with the Gemini project setup right now that's more for testing stuff?

Yeah, I think there's a bunch of streamlining that is being looked at.

It's a known problem, not something I spend a lot of time on personally, but I know Logan and others on the developer side are aware of this friction.

We'd like to make it frictionless to use our bottles.

Is that working?

Okay.

It's an interesting time in computing.

You've got the confluence of Moore's Law and Denard scaling being completely dead with AI just scaling like crazy.

You have a pretty unique position in the world of driving these supercomputers and infrastructure that is being built.

You know how to map the workloads onto these things, which is a unique sort of scale.

What do you think the future of computing is going to look like?

What is the computing infrastructure heading towards, like from an asymptotic thought experiment level?

Yeah, I mean it's really clear that we will have dramatically changed the kinds of computations we want to run on computers in the last, say, five years, ten years.

That was initially a small ripple, but it's pretty clear now that you want to run incredibly large neural networks at incredibly high performance and incredibly low power.

You also want to train them.

Training and inference are pretty different kinds of workloads.

I think it's useful to think of those two as you probably want different solutions for the two or somewhat specialized solutions.

I think you're going to see all kinds of adaptation of compute platforms for this new reality that you really just want to run incredibly capable models.

So some of that will be in low power environments like your phone.

You'd like your phone to run incredibly good models with lots of parameters super fast so that when you talk to your phone, it just talks back to you and it can help you do all kinds of things.

You're going to want to run these on robots and autonomous vehicles.

We already do somewhat, but even better hardware for that will make those systems much easier to build, much more capable physical agents in the world.

Then you want to run them at incredibly large scale and data centers.

You also then want to use lots of inference time compute for some kinds of problems but not others.

It's pretty clear you want to use 10,000 times as much compute for some problems as for others.

That's a nice new scaling knob we have that can make your model much more capable or give you much better answers or make the model capable of doing things with that much compute that it can't do with one x as much compute.

But you shouldn't spend 10,000 times as much compute on everything.

So how do you make your systems work well for that?

I think that's a combination of hardware, system software, model and algorithmic tricks, distillation, all these things can help you make amazing models come to life in small compute footprints.

One thing I've noticed is the computer science, at least traditionally, when people are studying algorithms and computational complexity, it was all op count based.

I think as people are rediscovering hardware and details of hardware and system design, I think one of the things that's come back into focus is you need to think about network bandwidth and memory bandwidth and so forth.

And so I think a lot of the kind of traditional algorithmic analysis needs to be completely rethought just because of realities of what real computation looks like.

Yeah, one of my office mates in grad school did his thesis on like cache aware algorithms because the order of magnitude, big O kind of notation didn't account for the fact that some operations are 100x worse than others.

Yeah, no, that's right.

And I think in modern ML computing, you care about data movement at the incredibly small level like moving things from SRAM into accumulators costs you some tiny number, some tiny number of pico jewels, but it's way more than the actual operation costs you.

So it's important to have pico jewels at the tip of your tongue these days.

One other quick question.

Do you vibe code?

I've been trying it a little bit.

It actually works surprisingly well.

Yeah, I mean, we've had some nice we have a like a little demo chat room.

Actually, we have a lot of chat rooms.

We sort of run Gemini via chat room.

So I'm in like 200 chat rooms.

And when I wake up and brush my teeth, I get like nine notifications because my London colleagues are busily doing things.

Like we had one where people can send out cool demos of things I've seen.

And one that was particularly cool was you feed in a YouTube educational oriented video.

And the prompt is just something like, please make me an educational game that uses graphics and interactivity to help illustrate the concepts of this video.

And you know, it doesn't work every time.

But 30% of the time, you get something that's actually kind of cool and related to differential equations or traveling to Mars or, you know, doing some kind of cell aspect thing.

And you know, that's just an incredible sign for education.

Like the tools we now have and will have in the next few years, really have this amazing opportunity to change the world in so many positive ways.

So I think we should all remember that as kind of what we should be striving for.

>> Would you mind passing there and then maybe there?

>> Yeah, I would love to hear your thoughts about the future of search.

And especially given Chrome, such big distribution, right?

And especially Chrome already know the credentials, like payments and then web signing credentials.

Have you thought about getting Gemini directly into Chrome and making the Chrome app Gemini app instead of a separate app?

I say this because I'm a long term Googler.

So just think about -- >> Yeah, I mean, I think there are definitely lots of interesting downstream uses one could make of the core sort of Gemini models or other models.

You know, one is can it help you do stuff in your browser or on your full computer desktop by observing what you're doing and, you know, doing OCR on tabs or maybe it has access to the raw tab contents.

You know, that seems like it will be incredibly helpful.

And, you know, I think, you know, we have some early work in this area that we publish public demos of in video form that seem pretty useful, things like Mariner and things like that.

TBD.

>> You passed the -- >> Jeff, question for you.

So thank you for your comments.

Very insightful.

You know, earlier you mentioned, like, you know, the number of sort of foundational model players, you know, will likely only be a handful.

And, you know, this is largely because of, you know, kind of the infrastructure costs and sort of the scale of investment to sort of remain at that cutting edge.

And so sort of as this, like, battle for the frontier unfolds, like, how do you see the -- like, where do you see this end game going?

Like, you know, where is like the -- where does this lead us?

Like, is it just whoever writes the biggest check to build the biggest cluster wins or is it, you know, better -- you know, you just talked about, like, better utilization of, like, unified memory optimization and sort of, you know, different, you know, efficient uses of what you already have or, you know, is it the consumer experience or like how -- where does this all -- where does this arms race lead us?

>> Isn't it just whoever gets the Skynet first, the game's over?

>> Yeah.

I mean, I think it's going to require really good insightful algorithmic work as well as really good systems, hardware, and infrastructure work.

I don't think either one of those is more important than the other because what we've seen in, say, our Gemini progression to -- from generation to generation is the algorithmic improvements are as important or maybe even more so than the hardware improvements or the more, you know, larger amount of hardware we're putting to the problem.

But both are incredibly important.

And then I think from a product standpoint, you know, what -- it's -- there's sort of early stage products in this space, but I don't think we've collectively hit on what is the thing that -- or it's probably going to be many things that become the daily used products for billions of people, right?

I think there's probably some in the educational space or in, you know, general information retrieval that is search-like but sort of taking advantage of the strengths of, you know, large multimodal models.

I think probably helping people get stuff done in, you know, whatever work environment they find themselves in is going to be an incredibly useful thing and how will that get manifested in product settings.

You know, how do I manage my team of 50 virtual agents that are off doing things and they'll probably be mostly doing the right thing but occasionally they'll need to consult, you know, with me about some choice they need to make.

I need to give them a bit of steering.

How do I manage, you know, 50 virtual interns?

It's going to be complicated.

>> Hi, Jeff.

Thanks for being here.

Right here.

>> Oh, sorry.

>> I literally cannot think of anyone better in the world to ask this question.

How far do you believe we are from having an AI operating 24/7 at the level of a junior engineer?

>> Not that far.

Yeah.

>> Is that six weeks or six years?

>> Every year an AI seems like a dog seven or something.

I will claim that's probably possible in the next year-ish.

Yeah.

>> Hi, you talked about scaling pre-training and now scaling RL.

How do you think about, like, the future trajectory of these models?

Will it be, you know, one large model with all the compute or a constellation of smaller models that have been distilled from these larger models both working in parallel?

How do you see, you know, the future landscape?

>> Yeah.

I mean, I've always been a big fan of models that are kind of sparse and have different parts of expertise in different parts of the model because, you know, from our weak biological analogies, that's partly how our real brains get so power efficient is, you know, we're 20 watts or whatever and we can do a lot of things, but our Shakespeare poetry part is not active when we're, like, worried about the garbage truck backing up at us in the car.

And I feel like there's -- we do some of that with mixture of expert-style models.

You know, we did some of the early work in that space where we had, like, 2,048 experts and showed that it gave you dramatic improvements in efficiency, like, 10 to 100x more efficient sort of model quality per training flop.

And that's super important.

But it feels like we're not really fully exploring the space yet because right now the kinds of sparsity people tend to do is incredibly regular.

Like, it feels like you want paths through your model that are, like, 100 or 1,000 times more expensive than other paths and you want experts or pieces of your model that are tiny amounts of compute and some that are very large amounts of compute.

Maybe they should have different structures.

And I think you want to be able to extend your model with, like, new parameters or new bits of space.

And maybe you want to be able to compact parts of your model running a distillation process on this piece of it to make it one quarter the size.

And then you have some background garbage collection thing that is now, like, oh, great, I have more memory to use.

So I'm going to put those parameters or put those, you know, bytes of memory somewhere else and make more effective use of them somewhere else.

And so that, to me, seems like a much more organic, continuous learning system than what we have today.

So I, you know, the only problem with this is what we're doing today is incredibly effective.

So it becomes a bit hard to completely change what you're doing to be more like that.

But I really do think there are huge benefits to doing things in that style rather than the sort of more rigidly defined model that we have today.

I think one more question and then we'll probably wrap up.

I wanted to return to the junior engineer inside of here.

I'm curious, what advancements do you think we need to get there?

Like, obviously, just maybe code generation gets better.

But like outside of code generation, what do you think gets us there?

Tool use, agentic planning?

Yeah, I mean, I think they, you know, this hypothetical virtual engineer probably needs a better sense of many more things than just writing code in an IDE.

Like, it needs to know how to like run tests and like debug performance issues and all those kinds of things.

And we know how human engineers do those things.

They learn how to use various tools that we have and can make use of them to accomplish that.

And they, you know, get that wisdom from more experienced engineers, typically, or reading lots of documentation.

And I feel like, you know, junior virtual engineer is going to be pretty good at reading documentation and sort of trying things out in virtual environments.

And so that seems like a way to get better and better at some of these things.

And, you know, I don't know how far will take us, but it seems like it'll take us pretty far.

Jeff, thank you for coming and sharing your wisdom.

Thank you.

[applause] [end of transcript]