The Cognitive Revolution · 2025-03-15

Allan Dafoe on Technological Determinism, Cooperative AI, and DeepMind's Frontier Safety Plans

Hosts: Nathaniel Whittemore (Cognitive Revolution intro/outro), Rob Wiblin (80,000 Hours interview)

Guests: Allan Dafoe

Frontier Safety FrameworkTechnological determinismCooperative AIDangerous capability evaluationsAGI definition and levelsAI governanceAI safety policyDifferential technological developmentMeiji Restoration case studyCyber capability risksSelf-reasoning evals

Why it matters

Frontier Model Eval results for Gemini 1.

Key claims

  • Allan Dafoe leads three pillars at Google DeepMind: Frontier Safety, Frontier Governance, and Frontier Planning, producing Google's Frontier Safety Framework
  • His thesis argues technology doesn't force action—military-economic competition does, illustrated by Japan's Meiji Restoration after Perry's arrival with steamships
  • He critiques differential technological development, arguing market forces already invest heavily in safety/alignment, so additional effort may have limited marginal return
  • He co-founded the Cooperative AI Foundation, arguing that even aligned AI deployed by conflicting principals could cause disasters (flash escalations, emergent flash crashes, exclusion of humans from AI cooperative clusters)

Episode summary

Summary

This crossover episode from 80,000 Hours features Rob Wiblin interviewing Allan Dafoe, Director of Frontier Safety and Governance at Google DeepMind. Dafoe, formerly founding director of the Center for Governance of AI (GovAI), explains his role leading three pillars: Frontier Safety (dangerous capability evaluations and risk management, embodied in Google's Frontier Safety Framework), Frontier Governance (policy advising), and Frontier Planning (horizon scanning toward AGI). He emphasizes that his team is small but works collaboratively across DeepMind's AI safety, Gemini safety, alignment, responsibility, and policy teams, with DeepMind serving as the heart of frontier policy thinking within Google.

The conversation opens with Dafoe's academic work on technological determinism, arguing that technology alone doesn't compel action—military-economic competition does. Groups that fail to adopt advantageous technologies risk losing to those who do. He cites the Meiji Restoration in Japan (uninventing firearms for 250 years until Perry's steamships forced rapid modernization) and critiques differential technological development as a strategy, noting market forces already drive safety work. He advocates the cooperative AI research agenda—arguing that even aligned AI systems pose risks if deployed by conflicting principals, potentially causing flash escalations or excluding humans from cooperative clusters.

The discussion then turns to the 'Levels of AGI' paper and the Frontier Model Eval paper testing Gemini 1.0. Results showed Gemini at 3/5 on persuasion, ~2/5 on cyber, ~2/5 on self-proliferation, and ~1-2/5 on self-reasoning. Novel evals include a self-reasoning test where the model must notice filtered words blocking its email task and take corrective action, and another testing whether models know their training cutoff and optimally use search for post-cutoff events. Dafoe notes cyber capabilities warrant close monitoring and emphasizes the importance of capability elicitation—giving models proper tool access to measure their true frontier rather than their raw base performance. The episode highlights DeepMind's safety orientation, Demis Hassabis's openness to international collaboration, and the broader concern that a relatively small group of technical staff holds outsized power over AI's trajectory.

  • Allan Dafoe leads three pillars at Google DeepMind: Frontier Safety, Frontier Governance, and Frontier Planning, producing Google's Frontier Safety Framework
  • His thesis argues technology doesn't force action—military-economic competition does, illustrated by Japan's Meiji Restoration after Perry's arrival with steamships
  • He critiques differential technological development, arguing market forces already invest heavily in safety/alignment, so additional effort may have limited marginal return
  • He co-founded the Cooperative AI Foundation, arguing that even aligned AI deployed by conflicting principals could cause disasters (flash escalations, emergent flash crashes, exclusion of humans from AI cooperative clusters)
  • The 'Levels of AGI' paper offers a multidimensional conception: AGI as a space rather than a point, likely highly imbalanced vs. human capabilities, with general intelligence likely winning out due to cross-domain knowledge spillovers
  • Frontier Model Eval results for Gemini 1.0: 3/5 persuasion, ~2/5 cyber, ~2/5 self-proliferation, ~1-2/5 self-reasoning on a subjective five-point scale
  • Novel evals include a 'filtered words' test for self-reasoning (does the model notice it's failing and investigate its environment?) and a training-cutoff awareness test using limited search queries
  • Dafoe emphasizes capability elicitation matters—Project Zero showed models perform far better on cyber tasks when given proper tool support, raising near-term cyber risk concerns

Source material

Transcript

Hello, and welcome back to The Cognitive Revolution.

Today, I'm excited to share a special crossover episode from the 80,000 Hours podcast, featuring a conversation between host Rob Wiblin and Allan Dafo, Director of Frontier Safety and Governance at Google DeepMind.

I first heard Allan speak back in 2017, when I introduced him at a conference in Boston, as a professor at Yale, who was then working on great power peace.

This was before he founded the Center for Governance of AI, which in turn was years before he moved to DeepMind.

So I can say with confidence that Allan has been thinking about AI governance harder and planning for the current AI moment longer than just about anyone else.

And as you'll hear, that pays off in the form of truly excellent analysis on an impressive range of critical topics.

To begin, Allan describes his academic work on the question of just how much ability humans really have to alter the course of technology development.

Noting that macro-historical trends like Moore's Law suggest a process that transcends individual human choices, he ultimately argues that while technological possibilities don't force us to do anything on their own, in combination with the realities of military economic competition, they can and often do.

Simply put, failure to adopt potentially advantageous technologies often means losing to those who do.

This is not a conclusion that Allan comes to lightly, but unfortunately for us today, I think it's a pretty hard one to escape.

It's still possible that a spectacular incident could cause a vibe shift big enough to force a pause of frontier scaling, but the smart money now seems to be on powerful AI soon, and with Pentagon officials quoted in the press expressing their enthusiasm for autonomous killer robots, despite the general reliability, reward hacking, and even scheming issues that have recently come to light, militarization of some form seems a foregone conclusion as well.

And yet, even if the long-term logic is inescapable, I think it would be a huge mistake for frontier developers to underestimate their own individual and collective short-term agency.

A few years ago, my uncle told me a story about when he arrived in Italy during the height of the Cold War to join a crew that was responsible for firing nuclear weapons at tertiary targets in the event of an all-out war.

The first time they drilled the launch sequence, one of the longer-tenured guys took him aside and said, "Just so you know, if the order ever comes down to shoot for real, we are all going AWOL.

None of us want to be part of destroying the world with nukes."

Now, that's just one story from one enlisted crew, and I have no idea if that sentiment was widespread enough to have made a real difference in the worst-case scenario.

But today, the reality is that a very small number of people are pushing the AI capabilities frontier forward.

There are only so many elite ML savants, and compute constraints mean we can't scale all their ideas at once anyway.

Meanwhile, it's also now well-established that intelligence itself has a jagged edge.

Unlike nuclear technology, which had a small number of discrete powerful use cases and a very mechanical associated game theory, the mind space from which AI developers are selecting new forms is manifestly vast, and the models themselves are incredibly malleable.

If you believe things could move super quickly as AIs begin to hit important capability thresholds, the specific details of what we build and prioritize just before that point could prove decisive.

All this puts the few hundred or maybe as many as a few thousand people who are closest to the major compute budget decisions in a position of great power and responsibility.

As we saw in the context of Sam Altman's firing and subsequent reinstatement, a serious threat by technical staff to walk can force leadership's hand.

And further, as past guest Daniel Coctello demonstrated by refusing to sign a non-disparagement clause, even a single individual can create meaningful change if they're willing to stand up for what they believe in.

So I would encourage everyone at all of the frontier AI companies to make time to raise your own level of situational awareness.

Even if that comes at the cost of moving your specific project forward a bit more slowly, to make sure that the overall enterprise you're engaged in continues to be one that you feel good about supporting.

To date, you truly have so much to be proud of.

Top tier language models, AI doctors, self-driving cars, a revolution in biology, robots now folding origami, DeepMind could never ship another product and would already go down as a historically important company.

And there's a lot to appreciate in this conversation on the alignment, safety and policy fronts too.

Alan's cooperative AI research agenda is both fresh and sophisticated.

Google's frontier safety framework has truly been, as Alan describes it, part of a serious and important effort by leading companies to advance the AI policy conversation.

And lately I have been thrilled to hear Demis buck the trend by continuing to speak about the possible need for international collaboration on advanced AI development.

At the same time, AI Manhattan project type ideas are rapidly proliferating.

And it's not hard to imagine such a project going so catastrophically wrong as to more than offset even the tremendous amount of good that DeepMind and other AI leaders have already done to date.

So again, for the people at Frontier Companies, keep in mind that history is not happening to you, nor are you merely living through it.

You are, as part of a relatively small group, driving or at the very least shaping it in important ways.

We are now in a glorious AI summer, but an AI Cold War is looming.

Your critical decisions won't be binary like my uncle's squads was, and there are clearly many defensive AI systems that we genuinely need to build.

But all the more so because we're accelerating into a super high dimensional uncharted space, if you haven't already, I think it is now time to start thinking and even talking to colleagues about which directions might convince you personally, or perhaps one day as a group, to go AWOL from the project.

As always, if you're finding value in the show, we'd appreciate it if you'd share it with friends.

I always welcome your feedback and suggestions.

You can reach out via our website, cognitiverevolution.ai, or by DMing me on your favorite social network.

And of course, I recommend you subscribe to the 80,000 Hours podcast, which is always excellent and continues to ramp up their AI coverage.

Now, I hope you enjoy this very enlightening and deeply thought provoking conversation between Rob Wiblin and Alan Dafoe of Google DeepMind from the 80,000 Hours podcast.

One famous quote in the history of technology that was arguing against determinism was that technology doesn't force us to do anything.

It merely opens the door.

It makes possible new ways of living, new forms of life.

And my retort was technology doesn't force us, it merely opens the door, and it's military economic competition that forces us through.

So when a new technology comes on this stage, many groups can choose to ignore it or do whatever they will with it.

But if one group chooses to employ it in this functional way that gives them some advantage, eventually the pressure from that group will come to all the rest and either force them to adopt or lead the other groups to losing their resources to the new more fit group.

Today, I have the pleasure of speaking again with Alan Dafoe, who is currently the director of Frontier Safety and Governance at Google DeepMind or GDM for short.

For that, he was the founding director of the Center for the Governance of AI.

He was also a founder of the Cooperative AI Foundation and is a visiting scholar at the Oxford Martin School's AI Governance Initiative.

And I guess before all of that, you were an academic in the social sciences studying technological determinism, I guess a great, great power conflict, the big piece theory, that kind of thing, which I guess we're going to get a little bit of all of these different pieces of your work today.

Thanks so much for coming back on the show, Alan.

Thanks, Rob.

A pleasure to be here.

So later on, we're going to talk about the Frontier Model Eval as well as why you think cooperative AI might be about as important as aligned AI.

But first off, I guess, so your director of Frontier Safety and Governance, what does that actually involve in practice?

Like, I could see that being a whole lot of different things.

I don't have a sense of what your kind of day to day is like.

So my team is called the Frontier Safety and Governance team, and we have three main pillars, Frontier Safety, Frontier Governance per the name, and then Frontier Planning.

This adjective Frontier is a new term, I would say almost of art, to refer to these general purpose large models like Gemini and others.

Frontier Safety looks at dangerous capability evaluations.

It tries to understand what powerful capabilities may be emerging from these large general purpose models forecast when those capabilities are arriving, and then think about risk mitigation and risk management.

So this also led to the Frontier Safety Framework, which is Google's approach to risk management for extreme risks in Frontier Models.

That's Frontier Safety.

Frontier Governance is advising norms, policies, regulations, institutions, especially with an eye towards safety considerations.

And then Frontier Planning looks to the horizon, tries to imagine what new considerations could be coming with powerful AI and on the path to AGI, and then advising Google D.M.I.N., Google, and really all of society given those insights.

Sounds like a pretty big remit.

How large is the team that's working on all these questions?

So the team is quite small, though we're actually hiring for several positions right now at the time of the podcast going live.

That may be wrapped up.

But what's really great about working at Google D.M.I.N.D.

is we have a lot of partner teams.

There's a very collaborative culture.

So we work with technical safety, called the AI safety, Gemini safety, and alignment teams.

We work with responsibility teams, the policy teams, and so forth.

So, guys, Google D.M.I.N.D.D.

has, over the last year or two, become more integrated into the rest of Google, right?

So I imagine, are there other groups within this broad entity, I guess, Alphabet, that take an interest in these questions?

Or are you maybe the only group that's thinking about these front-link-- I guess you're thinking about the most important models and upcoming issues and threats.

Are there many other groups that take an interest having that kind of foresight and thinking years ahead?

So I would say Google DeepMind is the part of Google that's most specialized at thinking about friendship models.

Google DeepMind is responsible for building Gemini, the friendship model that's underpinning all of what Google's doing.

And we also have responsibility and safety and policy teams that are especially thinking about friendship issues.

We then do have partners across Google in these various domains, for example, in policy we work closely with Google policy on the range of policy implications and considerations connected to friendship models.

But Google DeepMind is, I would say, where the heart of the thinking related to friendship policy issues takes place.

Okay, so I guess, so back in 2021, you'd been the founding director of the Center for the Governance of AI, which was a reasonably big deal then.

And I think it's gone on maybe an even bigger deal since it's like pretty prominent voice in the conversation around governance of AI.

Why did you decide to lead this thing?

That's gone quite well to go and work at Google DeepMind instead.

Yeah, and yeah, I agree.

It went well at the time and it's gone even better since.

So a lot of credit goes to Ben Garfinkel, who's the executive director of GoVai and the many others who work there.

At the time, I was an informal advisor to Demis Hasabas, CEO of DeepMind and Shane Legge, co-founder of DeepMind.

And I found that I had a lot of potential impact in giving advice on AGI safety, AGI governance, and AGI strategy.

However, to be most impactful, it helps to be inside the company where I have more understanding of the nature of the decisions that they're confronting and more surface area to advise not only Demis and Shane, but also many key decision makers.

To take a step back, I want to reflect on this road to impact, this kind of advising important decision makers approach.

I would say one lesson I've drawn from history is that often in these pivotal historical moments in crises or in very high leverage historical moments, a lot depends on the behavior and the ideas and the very character of key individuals in history.

And Alexander Hamilton, the musical portrayal, it's like who's in the room and what decisions are made in the room.

And so I think that's true.

I think when you look at history, especially in these pivotal historical moments, it's incredible how much the ideas that people coming into the room had, the resources, the insights that they had available for what the solutions that they construct.

So that argues for advising people who will be influential and on these important historical developments.

And I think AI and AGI is my view one of the most, probably the most important historical development.

And I think Demis and DeepMind are very likely to be influential in the ongoing, have been so far in the development of AI and AGI.

There's a second part of this, which is that in addition to advising influential decision makers, it's that of the idea of boosting decision makers who have the kind of character you would want in critical decision makers.

So do they have the sensibility?

In my case, are they aware of the full stakes of what is happening?

Are they safety conscious?

Do they have the technical and organizational competence to pull off what needs to be built?

Because if you sort of have clumsy hands, even if you have good intentions, that may still lead to a bad outcome.

Finally do they have wisdom to be able to make these very hard decisions that have complex and certain parameters around them?

And in my view, Demis and I think again, Shane, to mention are extremely impressive individuals for these properties, for their safety orientation to their broad perspective on the stakes of the issues, to their wisdom and broad character.

I also do want to, I guess, reflect on GovAI.

During my time, yeah, it produced a lot of great work and great people.

It's since gone on, I think, to produce even significantly more great work and great people.

So Ben Garfinkel has done a great job.

It's interesting reflecting on some of the people who've gone through GovAI.

So one person who worked very closely with me at the time is Jade Long.

She used to be head of my partner team at OpenAI and is now the chief technology officer at the UK AI Safety Institute.

A number of other very prominent people in AI safety and governance, similarly, have gone through GovAI.

Marcus Enderlong, Robert Traeger came through.

Anton Kornak is a prominent economist who's done some work there.

Miles Brundage and others.

Back in 2018, in our short interview back then, you were saying people should definitely be diving into this area because it's going to grow enormously and it's going to be really good for your career and there'll be lots of opportunities.

I think that has definitely been brought out that people who got in on the ground floor have been doing super well at CareerRise.

Yeah, and I think it's still early days for any prospective joiners.

I think, you know, I always encourage people to hop trains as soon as they can because, yeah, AI is only, you know, it's still just a small fraction of the economy.

So there's a lot more impact to come and work to do.

We think that in the fullness of time, it's going to be close to 100%, certainly more than 10%, and it's like 0.01% now, certainly not more than 0.1% in terms of total revenue.

So yeah, there's like many orders of magnitude to go up.

Let's open up by talking about the work that you did in your previous incarnation, which was as an academic.

So I think you did your thesis back in the early 2010s on technological determinism.

What's the main focus?

I think the paper that came out of that opens with who, if anyone, controls technological change.

What was the academic debate there that you were reacting to or trying to be a part of?

Yeah, so my academic trajectory had a number of chapters.

So the first was on technological determinism, which we can come to.

Just for completeness, the second was on great power politics and peace specifically, which actually led to a lot of work that continues to be relevant, I think, to the question of AI and AGI governance.

And then I also did some statistics on causal inference work, which has some relevance to thinking about AI today.

Turning back to technological determinism, I would say I first came to this in undergrad reflecting on what shapes history and how we can do good and how we can kind of steer the trajectory of developments in a positive direction.

And an insight I had was that history is not just the sum of all of our efforts.

It's not just we all kind of push in different directions and then you take the sum and that's what you get.

Rather, there's these sort of general equilibrium effects that economists, I think, often talk about where it may be for every unit of effort you push in one direction, the system will kind of push back with an equal force, sometimes a weaker force, sometimes a stronger force.

And so when you're in such a system, it's very important to understand the structural dynamics.

Why does the system sometimes resist efforts or sometimes amplify efforts?

Why do you see these really astounding patterns in macro history?

For example, if you look at patterns of GDP growth, there's these famous curves where after the devastation of say World War II, both Germany and Japan completely rebound within less than a decade and then kind of returned to their pre-war trajectory.

And we've seen Moore's law, which is just an astounding trend.

It's not just that it continues, that transistor density is increasing exponentially.

It's very much a line.

So you can predict where we'll be quite precisely years in advance.

We now have scaling laws, which have given us sort of our generation's Moore's law, which again seems to allow us to predict years in advance, how large the models will be, how capable on, well, for example, on loss and so forth.

There's a number of other, I guess, macro phenomena that seem quite persistent.

The growth of civilization, I talked about or looked at these trends in what you can call the maximum.

So maximum energy processing of a civilization, also things like the height of buildings, the durability of materials.

Really there's just most functional properties of technology over time have gotten more functional, so like the speed of transportation and so forth.

Robert Wright in summarizing the literature writes that archeologists can't help but notice that as a rule, the deeper you dig, the simpler the society whose remains you find.

There's more generally, I think, an observation, which is almost a truism, that certain kinds of technology are so complex or difficult that they come after other forms of technology.

It's hard to imagine nuclear power coming before coal power, for example.

So there's all these macro phenomena and trends in technology, and it's important to explain them.

Now, this naive explanation would say if history is just the sum of what people try to achieve, then it's human will that's produced all these trends, including the reliable tip talk of Moore's law.

But not all the trends are positive.

I know you've reflected on the agricultural revolution, which evidence suggests was not great for a lot of people.

The median human, probably their health and welfare went down during this long stretch from the agricultural revolution to the industrial revolution.

Of course, it gave rise to inequality, warfare, and various other things.

And there's other trends that different societal groups resisted.

So in short, I don't think the answer is history is just the sum of what people try to do.

It depends, of course, on things like power, on timing, on the ecosystem of what's functional and what's possible and what's not, on what technology enables.

So I wanted to make sense of this.

Hey, we'll continue our interview in a moment after a word from our sponsors.

In business, they say you can have better, cheaper, or faster, but you only get to pick two.

But what if you could have all three at the same time?

That's exactly what Cohere, Thomson Reuters, and Specialized Bikes have since they upgraded to the next generation of the cloud, Oracle Cloud Infrastructure.

OCI is the blazing fast platform for your infrastructure, database, application development, and AI needs, where you can run any workload in a high availability, consistently high performance environment, and spend less than you would with other clouds.

How is it faster?

OCI's block storage gives you more operations per second.

Cheaper?

OCI costs up to 50% less for compute, 70% less for storage, and 80% less for networking.

And better?

In test after test, OCI customers report lower latency and higher bandwidth versus other clouds.

This is the cloud built for AI, and all of your biggest workloads.

Right now, with zero commitment, try OCI for free.

Head to oracle.com/cognitive.

The thing that we have to try to reconcile here is that on the one hand, we see these trends that seem like they're not really responsive to any person's particular decisions.

They're acting almost like, it's a little bit like the, what's it, the psycho history in the foundation, where you just have these broad trends where everyone is just an ant in this broader process.

And it's not obvious that anything that any particular individual did was able to, was able to shift things.

On the other hand, technology, at least so far, doesn't have its own agency.

It does seem like, in fact, it's humans doing all of the actions that are producing these outcomes.

And couldn't they, in principle, if they really hated what was happening, try to shift it.

We feel like we have agency right now over how society goes, or we feel that we at least have some agency.

So how do you reconcile this macro picture where it seems like things?

Humans don't have that much control over technology, at least historically, with the micro picture where we feel like we do now.

Am I understanding it right?

Definitely.

Some of these earlier theorists of technology and scholars of technology, this is in the sort of '60s to '80s, even endowed technology, this abstraction, with a sense of autonomy and agency.

Like, technology was this driving force, and often humans were along for the ride.

So Langdon Winter was one of the most prominent scholars, talked about technology having autonomy.

Louis Mumford talks about the machine as this capital M abstraction that is driving where society is going, and humans just support the machine.

We are cogs to support it.

Jean-Jacques Lule referred to La Technique, which is the sort of the functional taking over.

And he had this metaphor that humans make a choice, but we do so under coercion, under pressure from what is to make of us, and La Technique is the answer.

So I think there were these scholars and others who really did endow technology with this kind of agency.

Then a later generation criticized them, saying this abstract technology is an abstraction, a very high level abstraction, almost poorly defined.

And when you actually look at history in detail, in the microscope, where is technology?

You don't see the machine in the room.

Is the machine with us right now?

You see people.

People with ideologies and ideas and interests making decisions.

And I would say this led to a revolution in the study of technology towards what's been called social constructivism.

This looks at, mythologically, it's more ethnographic or sociological.

It looks at the details of how decisions were made, the idiosyncrasies of technological development, the many dead ends or detours, the fact that early on in the development of technology, people didn't know what the end result was, and they had many visions that were competing.

And so it wasn't sort of foreordained that the bicycle would look the way it does or the plane would look the way it does.

And I would say, well, for my personal intellectual trajectory, the PhD program I started in was one of the prominent departments working on this at Cornell.

And for me, this was a surprise because I really wanted to explain these macro phenomena.

And the answer I got from this department was, this is wrong.

This is technological determinism.

It is what scholars have since referred to as a critic's term.

It's sort of anyone who actually advocates technological determinism is a straw man position.

No one is serious about this.

So this whole generation of these sociologists and historians of technology really looked at the micro details of how technology developed and dismissed these abstractions that technology can have autonomy, can have an internal logic of how it develops, can have these profound impacts on society that we name revolutions after technology, the agricultural revolution, the industrial revolution, and so forth.

In this paper, it seemed like the constructivist in their reaction to this determinism were really sticking out a relative extreme opposing position where they were almost suggesting that it's always human responsibility and people always have choice over what technologies they adopt and what form it takes.

Am I understanding that right?

Yeah.

In a way, I would say the debate was never directly had or rarely directly had.

It was often indirect.

So in defense of the constructivists, I think they were asking different questions.

More importantly, they had different tools.

They had the tool of ethnography and sociology, and they were answering questions that those tools allowed them to answer, which were, the answers were narratives based on the conversations that took place, the decisions that were made.

And to explain macro phenomena, those tools are not well suited.

So I do think there was a mistake that was made, which was to dismiss the claims about macro phenomena and technological determinism in the pursuit of the questions that they had.

And I think it's a real loss for the history of technology that so little work has since been done on these bigger macro questions.

Were the constructivists motivated by a sense of moral outrage maybe that they saw people perhaps adopting technologies that were socially detrimental?

And those folks might then excuse it saying, well, we have no choice.

We have to do this for competitive reasons, or it's going to happen anyway.

There's nothing one can do.

And the constructivists were frustrated by this and wanted to say, no, you're responsible.

You're doing it.

So you do have agency here.

Yeah, this is an argument that's been made often and more recently by many different schools, including about AI.

So one criticism of, let's call it the AGI ideology, as these people would put it, is that AGI is not foreordained, or the development of AI as in any given sense is not foreordained.

But when we talk about it as if it's inherently it's coming, it will have certain properties that deprives citizens of agency to reimagine what it could be.

So that's, I think, the constructivist position on technology exactly as you said.

Now I think the counter position I would offer is you don't want to equip groups trying to shape history with a naive model of what's possible.

You want to channel energy where it will be high leverage, where it will have lasting impact rather than in these settings where the structure will resist all the force you push in one direction with an equal counter pressure.

Yeah.

OK, so we should talk about the synthesis, I guess, that you try to put forward in your thesis.

What are the circumstances under which we do have more autonomy, and what are circumstances where it can be extremely hard to change the course of history?

Maybe first I'll just talk a little bit more through the different flavors of technological determinism.

Sure.

So, because I think it's a rich vocabulary for people to have.

So that maybe in a way that the easiest one to accept is what we can call technological politics.

This is the idea that individuals or groups can express their political goals through technology.

So in the same way that you can express your political or achieve your political goals through voting or through other political actions.

So if you build infrastructure a certain way or design a technology a certain way, it shapes the behavior of people.

So design affects social dynamics.

Some of these famous examples are the Parisian boulevards, these linear boulevards that were built in many ways to suppress riots and rebellion because it made it easier for the cavalry to get to different parts of the city.

Latour is a famous sociologist who talks about the sociology of a door or the missing masses in sociology, which refers to the technology all around us that reinforces how we want society to be.

You can think about gates or urban infrastructure as expressing a view of how people should interact and behave.

A famous example is alleged of Robert Moses, who was a famous urban planner in New York City that he allegedly built bridges too low for buses to go under so that it would deprive people in New York who didn't have a car, namely African Americans, from the ability to go to the beach.

And so this was an expression of a certain racial politics that has been asserted.

In general, I think urban infrastructure has quite enduring effects, and so you can often think about what is the impact of different ways of designing our cities.

I guess in recent times we're familiar with the debate about how design of social networks can greatly shift the tone of conversation and people's behavior.

People appointed to the prominence of quote tweeting on Twitter where you can highlight something that someone else has said and then blast them on it potentially leads to more kind of brigading by one political tribe against another one.

And if you made that less prominent, then you'll see less of that behavior.

Exactly.

I think the nature of the recommendation algorithm, the way that people can express what they want has profound impacts on the nature of the discourse and how we perceive the public conversation.

So this was technological politics.

There's a number of other sort of strands of technological determinism, just very briefly.

Technological momentum is this idea that once a system gets built, gets going, it has inertia.

This is from sunk costs, so you can think of maybe the dependence on cars in American urban infrastructure.

Once you build your cities in a spread out manner, it becomes hard to have pedestrian dense core.

Or as has been alleged, maybe electric cars were viable if we just invested more or also maybe wind power and solar power could have succeeded earlier if we'd sort of gone down this different path.

And we might come back to this.

I think a lot of claims of path dependence in technology are probably overstated.

That again, coming back to the structure, some technologies and some technological paths were just much more viable.

And even if we'd invested a lot early in a different path, I think often it is the case that this kind of the path we went on was likely to be the path we would have been on because of more the costs and benefits of the technology than these early choices that people made.

Yeah, I guess the extreme view would be to say, well, we could have had electric cars or we could have gone down that.

I guess we did have electric cars in the 20s, I think, but we could have gone down that path in a more full-throated way as early as that.

I guess the moderate position would be to say, well, no, that wasn't actually practical.

There were too many technological constraints, but we could have done that maybe five years or 10 years earlier if we'd really foreseen that this would be a great benefit and decided to make some early, costly investments in it.

And then to make the counterpoint of a certain time when the breakthrough was ripe, I think in AI, this is often the case that insights about gradient descent to neural networks occurred much earlier than when they had their impact.

And it seemed they needed to wait for the cost of compute, the cost of flops to go down sufficiently for them to be applicable.

And you could argue, well, what if we had the insight late?

I think once flops get so cheap, it becomes much more likely that someone invents these breakthroughs because it becomes more accessible.

Any PhD student can experiment on what they can access.

So there is a kind of seeming inevitability to the time window when certain breakthroughs will happen.

It can't occur earlier because the compute wasn't there, and it would be unlikely to occur much later because then the compute would be so cheap someone would have made the breakthrough and then realized how useful it is.

Was there another school of technological disseminism?

There's other flavors.

So I guess another concept that's emphasized is that of unintended consequences, something we know a lot about.

But Langdon Winter points to this notion that as we're inventing technology after technology, we run the risk of being on a sea of unintended consequences.

So the future of history is not determined by some structure nor by our choices, but just by being buffeted one way or another by the-- Starting from one blunder to another.

Yeah, and sometimes it's positive, sometimes it's negative.

I think there's truth to that, that often technology comes along and then it takes us some years to fully understand what its impacts are and then to adapt and hopefully channel it in the most beneficial directions.

Yeah, there's definitely some effect like that.

I guess the people who really highlight that, I think they're exaggerating sometimes the scale of the negative side effects from technology where, I guess, setting aside some particular cases that we're particularly focused on, it seems like the negative side effects of technology in general have gotten smaller at each generation of technology.

We do solve more problems than we create on average, is my take.

Yeah.

Shall we come back to the synthesis?

Sure.

Yeah, and maybe the last big part of it is, again, these macro phenomena and trying to explain them.

And so I guess the scholars most work in that, I would say, are historians, macro economists that are trying to explain and political scientists that are trying to explain these long run trends.

And things like the spread of democracy.

So the synthesis is that, well, one observation that led to the synthesis is that the more micro your observation, the closer to people to the day to day, the more likely you were to conclude a constructivist explanation.

So this is a robust empirical finding.

If you look at the literature, people who have micro-methodologies are much more likely to conclude constructivist type claims, that what matters is individuals, visions, ideas, and so forth.

Whereas the more macro your methodology and your aperture, the more likely you were to conclude a more deterministic set of claims.

And so we have a puzzle there.

I'd say some of the constructivist concluded that's because the macro scholars are too high, they allow themselves the error of imputing agency to technology because they're so far from the data.

And I think that's an unfair characterization.

Rather, I do think there's emergent phenomena at different scales of analysis.

And so we should give the macro phenomena its due and try to explain it.

One analogy that I've offered here is, imagine this hypothetical science of wave motion.

And so we have a group of scientists who emphasize wind.

It turns out when the wind is blowing, that affects the ripples on top of the water.

And then another community is based on kinetic impact.

So they say, look, when we throw rocks into the water, it produces waves.

And that's their kind of preferred theoretical framework.

And then there's this kooky macro water phenomenologist who says, I've been noticing that whenever the moon is directly above us, the water level is at its highest point.

And then when it's at the horizon, it's at its lowest point.

And I've traveled all over the world, and this pattern is robust.

So I will offer this moon determinism that the moon explains water levels.

And the wind and the kinetic scientists would be mistaken to dismiss the moon determinist simply because the moon determinist doesn't have a micro mechanism.

That person, there's a challenge to that finding, namely, how do you explain this pattern?

Because there's no known micro foundation that can explain why the moon is sort of pulling water.

But of course, we know that is, in fact, what it's doing.

And so I think there's a similar results in macro history that there are these patterns that need to be explained.

And the fact that we didn't have a micro foundation isn't a reason entirely to dismiss it, but it is a challenge.

So what is a possible micro foundation?

The one I offer hinges on military economic competition.

And the key idea is that there's levels of selection.

So at the local level, you and I can make a decision about what we do right now, maybe if we wanted to build something, the technology we build.

And that just depends on us, on our ideas and so forth.

But if we really want to get going, if we want to, we build a new kind of art and we want it to be everywhere, then eventually we're going to need resources to pay for it.

And maybe it can't be too opposed by other groups.

And so eventually we run into these other forces.

And I think when you think about ways of living, which is kind of a general term for socio-technical systems for civilizations, they do run into resource constraints.

They need resources to sustain themselves and then to proliferate, which they typically want to do.

And so that involves economic competition, competition over resources and over capital.

And then the military aspect is always important because throughout most of history, military competition was ever present.

Even if you had decades or hundreds of years of peace, eventually there was military competition from a neighbor and that provided sort of a higher level of constraint on what ways of living were possible.

Hey, we'll continue our interview in a moment after a word from our sponsors.

Being an entrepreneur, I can say from personal experience can be an intimidating and at times lonely experience.

There are so many jobs to be done and often nobody to turn to when things go wrong.

That's just one of many reasons that founders absolutely must choose their technology platforms carefully.

Pick the right one and the technology can play important roles for you.

Pick the wrong one and you might find yourself fighting fires alone.

In the e-commerce space, of course, there's never been a better platform than Shopify.

Shopify is the commerce platform behind millions of businesses around the world and 10% of all e-commerce in the United States, from household names like Mattel and Jim Shark to brands just getting started.

With hundreds of ready to use templates, Shopify helps you build a beautiful online store to match your brand style, just as if you had your own design studio.

With helpful AI tools that write product descriptions, page headlines, and even enhance your product photography, it's like you have your own content team.

And with the ability to easily create email and social media campaigns, you can reach your customers wherever they're scrolling or strolling, just as if you had a full marketing department behind you.

Best yet, Shopify is your commerce expert with world-class expertise in everything from managing inventory to international shipping to processing returns and beyond.

If you're ready to sell, you're ready for Shopify.

Turn your big business idea into cha-ching with Shopify on your side.

Sign up for your $1 per month trial and start selling today at Shopify.com/cognitive.

Visit Shopify.com/cognitive.

Once more, that's Shopify.com/cognitive.

It is an interesting time for business.

Tariff and trade policies are dynamic, supply chains squeezed, and cash flow tighter than ever.

If your business can't adapt in real time, you are in a world of hurt.

You need total visibility, from global shipments to tariff impacts to real-time cash flow, and that's NetSuite by Oracle, your AI-powered business management suite, trusted by over 42,000 businesses.

NetSuite is the number one cloud ERP for many reasons.

It brings accounting, financial management, inventory, and HR all together into one suite.

That gives you one source of truth, giving you visibility and the control you need to make quick decisions.

And with real-time forecasting, you're peering into the future with actionable data.

Plus, with AI embedded throughout, you can automate a lot of those everyday tasks, letting your teams stay strategic.

NetSuite helps you know what's stuck, what it's costing you, and how to pivot fast.

Plus in the AI era, there is nothing more important than speed of execution.

It's one system, giving you full control and the ability to tame the chaos.

That is NetSuite by Oracle.

If your revenues are at least in the seven figures, download the free ebook, Navigating Global Trade, three insights for leaders, at netsuite.com/cognitive.

That's netsuite.com/cognitive.

Yeah, and I guess even if there is an act of war, kind of everyone's living in the shadow of violence, I think is the term that they're anticipating that they could be war in the future.

And maybe if they don't play their cards right, then they would be vulnerable to aggression.

Yeah.

And we could add a higher level of selection.

So in my thesis, I did put sort of environmental selection on top of military and economic.

There were these sort of circles of selection in the sense that civilizations that might be fit for the economic competition and military competition may nevertheless not be sustainable with their environment.

And that could be another source of failure to sustain itself and proliferate.

So you can imagine again, these layers of selection.

I tended to put so environment at the top, military, economic, and then you might put culture and then you can put psychology or more local dynamics lower down.

One famous quote in the history of technology that was arguing against determinism was that technology doesn't force us to do anything.

It merely opens the door.

It makes possible new ways of living, new forms of life.

And my retort was technology doesn't force us.

It merely opens the door in its military economic competition that forces us through.

So when a new technology comes on this stage, many groups can choose to ignore it or do whatever they will with it.

But if one group chooses to employ it in this functional way that gives them some advantage, eventually the pressure from that group will come to all the rest and either force them to adopt or lead the other group to losing their resources to the new more fit group.

So that seems in a sense very obvious.

Why do you think the constructivists were missing this or why didn't this kind of stand out to them as an important effect?

Yeah.

Well, I'm glad you think it's obvious.

Maybe because of my background.

I mean, I think again, there was this, let's not underestimate the bias that comes from a scientist using the tools that they prefer to use, looking under the lamplight.

So the constructivists were very good at ethnography and sociology and this kind of daily life history, this close up micro history.

And when you look that close, you don't see the machine exerting its force.

Military competition at a macro historical level is ubiquitous, but at a micro historical level is rare.

Wars are rare.

And as you said, much of the effect of this military economic competition is through how people internalize that threat.

And so then you can equally say, oh, it's not this competition that's driving behavior.

It's the ideology of capitalism, of military greatness that is driving the behavior.

This is a methodological challenge.

I do think there's this concept of vicarious selection, which is that in an evolutionary environment, it is highly adaptive for an organism to model its environment and to internally kind of simulate what will happen if it goes in one direction or another.

So this concept was named by a historian of technology who was trying to explain the development of aviation, aerospace design.

And his point was, you don't build a plane and try to fly it and it crashes and build another plane and try to fly it and crash.

Rather, you invent the wind tunnel and you have a theory, you model the external environment and you say, OK, we want these properties in our wing.

And so you are still doing this experimentation.

You're just doing it in a controlled, targeted manner internal to the broader economic competition.

So yeah, if I think about how these folks might respond, at least my simulacrum of them.

OK, so yeah, maybe I shouldn't have said it was obvious.

It's obvious to me because I've literally been taught this more or less.

I get it, I get it in books and maybe even in an undergrad.

So I suppose everything is obvious once you've literally been told it.

But a pushback that I can imagine is we're living in the UK.

The UK has nuclear weapons.

It's in a pretty friendly neighborhood.

Are we really saying that when the UK adopts some new technology or it designs its cities one way or another or has a particular housing policy?

It's doing this because it thinks that it has to for defensive purposes because otherwise it's going to be invaded by France or Russia or whoever.

It's not as if we're actively thinking about these defense issues or competitive issues all the time.

I guess individual businesses do think about if we don't adopt this new technology, we'll be out competed.

But at least the military thing is less clear once you have a very strong defensive position where you don't feel a great risk of attack.

Yes, so I think there's two things I'd want to say here.

One is in the modern era, military competition has declined a lot.

And we have much more of a global culture than we have had 100 years ago or 300 years ago or more.

And so that can change this higher level of selection.

I do think it's still there.

And certainly you see conversations around national security having a lot of force in domestic politics, in UK politics, US politics, in pretty much any country's politics.

But if there's a claim that we risk losing a strategic positioning against an adversary, that can be very motivating for internal reform.

I guess the second point I want to make is that there's this great example from the UK history where I think this dynamic is really well illustrated.

So this is from Thomas Hughes, this historian of energy systems.

And he looks at the UK energy system, which initially had these local energy power plants and local energy systems that were, Thomas argues, were better suited to the United Kingdom's notion of democracy.

There was kind of decentralized energy provision.

It was much more under the control of localities.

It wasn't this big national energy system.

And that persisted up to World War II when the sort of cost constraints that it imposed became excessive and the UK made the decision to adopt more of a national grid.

So there is again this story of one certain way of living, arguably aligned with the sort of political ethos of the community, persisting until the cost constraints become excessive, often driven by this crisis of conflict.

Yeah.

So the case study that you focus on in your work, I think, is the Meiji restoration in Japan in the mid 19th century.

It feels like almost about as clean an example of this military competition driving history as you could imagine.

Can you briefly explain it?

Sure.

Yeah.

So during this time I looked for, it is good to have empirics, right, that you can tell a story about Moore's law and these macro phenomena aren't sufficient evidence for making sense of macro history.

So I looked for a case where a community chose to go in a direction contrary to sort of what was demanded by La Technique, by what was functional in this military economic competitive milieu.

And there's this great example of Japan under the Tokugawa regime.

The regime lasted roughly 200 to 250 years and it was a return to the Shogun way of life.

So Samurai were at the top of the pecking order, a very feudal society.

They had firearms at the beginning of this period and they sort of uninvented firearms.

The Shogun centralized firearm production, so everyone who knew how to produce firearms was brought into a central place and then paid a stipend to not build firearms so that the technology was forgotten.

So they had cannon firearms and they lost the technology.

That persisted for, again, roughly 200 years and during this time Japan wanted little to do with the outside world.

But they observed that things were changing, that the Europeans were sailing around and involved in China.

And everything changed on this fateful day, 1853, when an American Commodore Perry visited Japan with the explicit purpose of opening Japan up to trade.

And he came in the steamships that were seemingly magical.

They were moving upwind without sails, belching black smoke made out of this very large heavy metal and had a real profound impact on the Japanese who received them.

He gave a demonstration of what was possible with the cannons to bombard the shore and gave them white flags so that they could signal their desire for the bombardment to cease, be able to communicate in the future.

So he said he was going to come back in a year and to complete the negotiations, the Japanese at the time asked him, "Will you bring your ships again?"

And he said, "I'll bring more."

And yeah, so that was the opening of Japan.

It led to a 15-year period of revolution.

This was the Meiji Restoration where different groups were trying to make sense of their new environment.

It was no longer sustainable to continue their way of life and different groups contested it in different ways.

And the final answer was this restoration under the Emperor and a view that we need to modernize.

So the Japanese very proactively sought to learn everything they could about the West.

They sent people to the West to get all the books on all the industrial arts and so forth.

And Japan incredibly succeeded that just several decades later, Japan was able to contest in World War II control of Asia against the US and Britain and others.

What's I think powerful about that story is it really shows how a group of people in a sense chose.

Of course, there's power infused through it, but that community chose to go one direction with respect to the technology of firearms and other aspects of modern industrial civilization.

And that choice was time limited by how long the West would choose to not force on Japan a different way.

Yeah, well, I think the thing that's at the time, you can imagine, I think the only reason this was possible for Japan was because it was an island and so it was actually quite hard to invade.

So they had this protective barrier and that gave them a degree of discretion that I don't think they, you know, if they were on the steps of Asia, I think they would not have had nearly the freedom of movement that they had as an island and they would have felt the pressure and the fear of invasion, it would have been much more salient and it would have just been much more focused on being able to defend themselves.

I suppose it gave them breathing room that allowed them to fall quite a bit behind, but at some point they fell so far behind that even the sea barrier was not enough to keep them safe from invasion.

And at that point, they basically, I guess, did a complete 180 and decided to catch up and modernize.

Yes, yeah.

So, yeah, I find this case study quite compelling.

And most case studies are rarely so clean, again, because people internalize these external pressures and there's mimicry and status dynamics where some communities look up to other communities in a way that's often correlated with power or wealth.

So the narrative is not as clean, whereas in this case, it was very clear that what forced the change was the sheer power of the steamship and cannon that the West could bring.

So I guess the reason we're talking about technological determinism is that many, many people in our circles are very focused on this idea of differential technological development, I guess a related more recent idea is defensive accelerationism, which is that the way that we want to try to shift history in a positive direction is changing the develop...shifting the order and the development of technologies or advancing some particular lines of science and research to try to get them ahead of other ones.

And I guess you want to advance the ones that you think are generally making the world safer so that you have more of those technologies by the time other more risk increasing technologies arrive on the scene.

What does the discipline of technological determinism say about whether this is a viable pursuit and a sensible approach to trying to influence history in a good direction?

Yeah, great question and a big question.

To give a bit more color on differential technological development, which is a very clumsy term to say, but the community hasn't come up with a cleaner term.

I guess you can think of, to motivate it, maybe the best example is the seatbelt.

The seatbelt seems like something we could have invented before the car.

You can imagine faster moving vehicles and the value of restraining a person in the event of a collision.

So it doesn't seem like it requires the invention of the capability, the combustion engine in the car in order to invent the seatbelt.

And so in principle, we could have invented the seatbelt before the car and then had it ready to go as soon as cars were diffusing so that we didn't have to wait decades for the seatbelt.

This is an example of safety technology that pairs nicely with the capability to make the capability safer.

And there's other examples that would make it more beneficial in other ways.

Another class of technologies or inventions are when you can develop the countermeasures or societal defenses for the capability that has these adverse byproducts.

An example there would be a vaccine.

So if you know that a potential disease could come, if you can develop the vaccine in advance.

A third category is a substitute.

So an example often given is whether wind power or solar power could have been made more cost effective than fossil fuel based power.

And so if the cost curve was such, then we would have invested more in these sustainable energy sources and then civilization would have gone down that path rather than a more fossil fuel dependent path.

So I think these interventions, those are the cleanest examples.

There's more general ones that imagine wholly different technological paths that have different properties.

Well it's worth reflecting on them.

So again, Langdon Winter argued nuclear power was more authoritarian.

The story being the nature of the technologies that requires centralized development and it requires strong coercive infrastructure around it to make sure it's not abused.

Whereas wind and solar permit both decentralized development and so decentralized politics and they don't have this risk that requires more of a security state on top of it.

So there's arguments of certain technological trajectories that have these byproduct political or social effects.

Now to offer some challenges, so I guess firstly, I do think differential technological development is a very important idea that we should be thinking about a lot.

And in many ways it undergirds the whole notion of AGI safety.

The whole notion of AGI safety as a field is that we want to on the margin put a bit more effort onto safety, AI safety or AGI safety than we otherwise would, than the market would naturally do so.

With the idea being that that's going to make a difference.

It's like inventing the seatbelt before the car.

So I think it's a very important idea.

However, there's a lot of good reasons to doubt its tractability or feasibility.

One way to see this is that most of the arguments for differential technological development require some person to see two pathways that are both viable, given some effort.

And then furthermore, to anticipate the consequences of each of those pathways to choose the better one.

And these things are hard.

First it's quite hard to know what is the next viable step in technological development.

If you know that you can make a great business.

The market is very hungry for that insight.

And so you should think it hard to find two of those to be at the point where you can have this marginal choice.

Again, it has to be a marginal choice that others aren't already pursuing in order for it to be an intervention.

Or you have to convince a resourced actor to choose to go in one way or another.

And then there's the second stage, which is you have to predict the consequences of going down one path or another, which is extremely hard.

You have to anticipate the full sequence of technologies and the tree of technologies that will spawn off of one path versus another.

And then the many direct and indirect consequences of those technologies.

And we know from the study of technology and just from our attempts to make sense of technology that it's very hard to foresee the direct effects and second order effects of technology.

So that's a bit of pessimism.

Coming back to the technological determinist perspective, I think this notion of technological momentum would say there is this path dependence in directions you go down.

You build up expertise, you sink investments.

So it is important at the beginning of major investments in a new infrastructure.

When you find yourself making these large investments, ask, is this going to sink costs?

Is this going to make it harder for us to choose a different path in the future and reflect, are there other consequences we should be weighing before we go down significantly in a certain direction?

I mean, in the archetypal case that we're talking around here with artificial intelligence and AI alignment methods, doesn't feel-- on one hand, you've got to say, well, we've got to have two different viable paths where some incremental effort might push us in one direction versus another.

It doesn't feel quite as binary as that because you can imagine almost everyone thinks that AI companies are going to put some effort into alignment.

They do care that the models do, broadly speaking, what's being asked.

And the idea as well, we just want to do more of that than the market might provide.

So it's not that we're choosing between aggressively non-aligned AI that someone really wants.

It's more just trying to go even further on this thing that most people are going to regard as desirable and want to incorporate if it's practical.

And then in terms of deciding whether this is actually a better path, is it actually going to have better consequences?

I mean, I'm sure people have made arguments that alignment might backfire.

It could be worse than not aligning it.

You can imagine ways, but still, on balance of probabilities, it seems like a reasonable bet and not something that people are super uncertain about, or at least that I feel really uncertain about.

So I mean, maybe this is the case that people focus on so much because it is among the better ones that people have ever come up with for trying to pursue differential technological development.

And maybe there's lots of other ones that were left on the scrap heap because it wasn't clear that they were either viable or desirable.

What do you think?

Yeah.

So I agree that safety and alignment are on net very beneficial bets that we should be investing heavily in.

So I agree with that overall assessment.

To make the counter case, one would be this general equilibrium argument that the marginal return on your investment in safety and alignment is actually quite less than what you pay because the market would otherwise have provided that.

So to motivate this, you can reflect that the success of AI assistance today are very much constrained, I think, by their ability to be aligned and safe.

Safety and alignment is a huge priority for developers because if they're not, then they will not be good products.

So the market already is providing a huge motivation for advancing this field.

As it happens, these techniques of RLHF, reinforcement learning from human feedback, which is one of the key alignment techniques, and then I think constitutional AI is another nice one, were developed by individuals who are motivated by AGI safety and supported by those resources.

Maybe that brought the technology forward a few years, but the counterfactual, imagine we had no investment in AI safety and alignment.

Maybe it delays it two years until the market demands we solve this problem and then other researchers rise to the challenge.

So the smart-aleck response to reinforcement learning from human feedback being developed by alignment and safety-focused people and then applied to make AI useful and economically valuable across all kinds of different domains is to say, well, you've wasted your time.

You've made things worse by speeding them up, which you didn't want to do.

On the other hand, it seems like a reason we're actually saying, well, what's your plan to not develop any of the technologies that actually makes AGI work?

That doesn't really seem like an alternative.

Surely that would just delay things at best.

We need to get to the point, though, right now, at some point sooner or later.

But I guess you're saying even if it's not actively detrimental, it could be kind of useless because the market, like GDM or some other group, would have realized that we needed the equivalent of reinforcement learning from human feedback to make it work.

So they just would have done it at some later time anyway.

So the effect of your work has kind of just been undone.

Yeah.

And so again, to reiterate, I think the bet on safety and alignment and interpretability are very good bets on that.

So we should keep doing them.

But on the margin, I guess what we want to do, and I think sophisticated individuals in the space are thinking this way, is look for what work in safety and alignment would not otherwise be done by the market in time.

And this is maybe where the AGI safety and AGI alignment points to.

It asks, what's the seatbelt for AGI?

What are the guardrails that we need for AGI that the proto AGI, the pre AGI system, the market would not motivate us to find the solution for those systems.

So we want to look ahead.

So this sometimes goes to the notion of deception, which might generate a whole new class of problems when an AI system is misaligned, but it can hide that and deceive us.

That might be a different problem than the systems we have today.

And it might arise right around this critical period.

So this is one argument for differentially focusing on that problem as opposed to others.

Yeah, makes sense.

OK, let's push on from technological determinism and talk about this research agenda that you've been involved in, I guess, promoting and elaborating called cooperative AI.

I mean, yeah, as we've been saying, the main focus, I suppose, of differential technological development thinking with regards to AI has been alignment for many, many years.

But you and some co-authors in this paper back in 2021 said there's this whole other cluster of behaviors that we might like to speed up the development of around cooperation that you think could be similarly important or at least on the margin could be similarly important because people aren't really talking or thinking about it.

How do you define cooperative AI and why do you think it's quite key?

Yeah, great question.

And it's big.

It's not that big of a problem.

It's that big of a problem.

And I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

I think that's a big problem.

Great power conflict has been a major source of harm to humanity.

Devastating wars, the brinksmanship around nuclear weapons has arguably imposed an expected cost on humanity more devastating than the world wars and the willingness of the leaders of the US and the Soviet Union to gamble over nuclear war for these geopolitical stakes.

And then there's other consequences like maybe a failure to deal with climate change or insufficient global trade or pandemic preparedness, all of these sort of global collective action problems that we insufficiently address, partly because we are not coordinated at the highest level and geopolitically.

So that's one motivation for cooperative AI is to say, if we want things to go well, we really need two pieces of the puzzle or we ideally have two pieces of the puzzle.

We have systems that are safe and behave, which is to say, especially behave as intended by the principal who deploys them.

And we're able to deploy collectively AI systems and continue our activities in a way that's jointly peaceful and productive, which is to say we've solved enough of our collective action problems that we're not continuing to engage in nuclear brinksmanship or trade wars or other major welfare losses due to insufficient global coordination.

Okay, so the idea is even if you have AIs that are aligned with the goals of their operators, then this doesn't necessarily lead to a good outcome if those operators are in conflict with one another, that the AI systems that they're working with could simply lead you to a disastrous outcome just as the fact that we're maybe aligned with our own interests doesn't necessarily produce a great outcome across humanity as a whole.

You can still end up in traps and unintended disasters.

And the same basically applies in a world where it's AGI that's doing most of the operationalization of what people want.

To give another example, to motivate this, there was this famous flash crash in 2010 where some algorithmic trading, so this is early days, this is not sophisticated AI, some algorithmic trading led to an inadvertent sell-off in the stock market that led to trillions of dollars on paper loss before the emergency stock apps kicked in, which sort of stopped trading and allowed those trades to be unwound because those were not intended by the traders.

Rather, it was this emergent dynamic from algorithms that had some protocol that made sense within normal bounds, but when interacting, could get out of control.

You sometimes see this on Amazon or other online marketplaces where famously some book will sell for millions of dollars because apparently these two sellers each had an algorithm that would bid up the price as a function of what the others were selling it at.

And these things iterated to a crazy valuation.

In general, flash crashes, I should say, happen frequently.

In the stock market, we have these safeguards so that when there's a sudden movement in the market, trading will stop and then there's rules for how you can unwind those trades.

As we deploy simple AI systems, narrow AI systems, and increasingly general AI systems out in the world, and increasingly, as people talk about, agents out in the world, which are more empowered, more general purpose, can kind of move between domains, maybe have access to bank accounts and emails and so forth, how do we make sure that there aren't these unintended emergent dynamics that could be harmful?

And cooperative AI partly looks to addressing that issue.

Yeah.

I guess a specter that haunts this entire conversation, if we're focusing on the military case in particular, is that historically when you have countries that are in intense conflict with one another, you get this process of brinksmanship where one country will escalate and the other country has to decide whether to call or escalate or back down.

And you keep getting this brinksmanship escalation process until I guess at some point one of them blinks or they decide to find some solution that makes them both happy.

Trouble is, if you have an AI-operated military, these AIs can make these decisions on a completely inhuman time scale where this entire process of brinksmanship that might take days or weeks or months when humans are the ones who have to go away to a meeting and think about it and discuss it and decide how to react.

It could all play out in a matter of minutes, basically.

I think that possibility terrifies people and is one thing that is discouraging people, I guess, from placing AI into important decision-making roles over national security at all.

An important question in the deployment of agents will be what are the degrees of autonomy we endow our systems with versus when do you have a human-reviewed decisions of different kinds and that's a function of the stakes of the decision, the resources that are deployed, maybe how much actuators the agent has.

So if the AI is controlling weapon systems, then the important role of human review becomes much greater.

But as you know, if there is a time pressure, Paul Schare, who's a theorist of AI in the military, worries about a flash escalation occurring in potentially kinetic warfare or another dynamic would be cyber conflict.

So looking over this paper, I wasn't initially completely convinced that this was something that we had to go far out of our way to focus on.

Basically because I think of cooperative behavior and knowing how to cooperate with other agents as just something that's instrumentally, convergently useful.

If you're developing agents that are just good at their job at all, that are in fact useful to apply, then they have to learn how to cooperate, at least in the kind of cases where they're being used because otherwise they're just bad at their job.

It's the kind of thing, like we were talking about earlier, that there's a lot of commercial pressure to develop this.

Also, these models we're imagining are very generally capable, they're insightful, intelligent, maybe approaching or exceeding human level.

Why wouldn't they just be able to think about these things and figure out how to cooperate the same way that I think our thoughtful human beings figure out how to try to avoid conflict as best they can?

Yes.

I think you're right that cooperative skill or cooperative intelligence, as we would characterize it, is likely instrumentally useful.

We should see some cooperative skill developed as a byproduct of almost any development agenda for AI.

The cooperative AI bet is that on the margin it's beneficial to invest more in it early so that when we get to powerful systems, they are more cooperatively skilled than they otherwise would be.

In this respect, it's very similar to the safety and alignment bet.

Again, safety and alignment is something that is likely to be developed by default to some extent.

The bet though is that it's worthwhile for us to invest energy early so that we're further ahead on safety and alignment than we otherwise would be.

The seatbelt comes earlier.

This cooperative sophistication and skill comes earlier relative to the level of capabilities of the agents that are out in the world.

We've talked about the military issues with cooperation, but that's a pretty extreme case.

Imagine there's a lot more other, more mundane examples where cooperation could be useful.

Do you want to give a couple of those?

So much of society is bargaining interactions.

Economic exchange, whether it be on some marketplace to sell used goods or actual marketplaces to the financial markets or major corporate deals.

There's a lot of welfare gain that can be had if we can strike deals more efficiently.

If when two parties can be made better off, they reach that understanding.

Right now we bargain through existing protocols and institutions that are human built.

In principle, AI could be much more effective or at least an AI human team could be much more effective.

There's exotic solutions like maybe you could put your AI delegate in a box with my AI delegate, and then we have them bargain, but then all that we let come out of the box is the proposed solution.

That may in fact solve some bargaining problems, whereas often the challenges in the act of bargaining we reveal private information, which could give the other side an advantage.

That leads bargainers to withhold information, to bargain slow, to demonstrate resolve or signal that they have a good outside option.

This idea of put your agents in a box and the agents know that the only thing that they can do is output the solution or say there's no deal could dramatically change the nature of the bargaining dynamic so that we're much more often going to get a solution in a way that involves less of this costly signaling that typically takes place.

Yeah.

There's other ways that I thought it might be easier for AIs to cooperate than it is for humans.

One thing is that they just have a much higher bandwidth of communication.

They can just send an enormous number of words, have a very lengthy conversation where humans just wouldn't regard it as worth it in order to try to reach some bargaining outcome.

I guess also they can commit to act in a particular way because you can just copy them and demonstrate that in a given situation a model will accept a particular kind of bargain consistently and then say, "Well, so all of the exact copies of this piece of software will do the same thing, right?

I've created this cooperative software and you should trust me."

That's not something that you can easily do with human beings.

I guess we try to look at their historical track record to learn what they're like, but here it's even easier to judge the character of an AI model.

Are there important ways that it could be more difficult or that it's less straightforward for AAs to cooperate with one another than humans?

Yes.

Great question.

You may have to remind me to come back to it as I answer this just a lot here.

First I want to clarify the cooperative AI bet is sort of a portfolio bet.

It says there are many cooperation problems present and in the future between AIs, which is what we're mostly focusing on, but also between AIs and humans and humans, human and human cooperation.

The bet is that on the margin if we put in effort now, AI might help us with these cooperation problems.

It could help two humans cooperate better off.

It can also help future AI systems cooperate or AI systems and humans cooperate.

I'm making this distinction partly because I think one very promising direction is AI systems that can help humans reach solutions.

We were just talking about this bargaining setting where there's some conflict of interest.

Another maybe more pro-social example is where people are trying to do political deliberation.

We're trying to find out what is the right course of action for our community, for our municipality or our family or our nation.

We have a set of tools for doing that, these institutions of deliberation, the press, voting, and so forth.

AI could dramatically potentially help humans better find the course of action that they would want to pursue.

Google DeepMind recently produced a paper talking about this idea of a Habermas machine, which is AI as a facilitator of political deliberation.

What they find is that language models today can actually serve as a useful tool to summarize the political views of people about some actual policy issue and then to articulate a consensus, like a detailed productive consensus that the people will sign onto.

What they report is that this Habermas machine, this AI, can actually articulate a better consensus than the humans that they employed to try and do so.

If we imagine extending that further, that could really help in a lot of political settings where we do have difficult issues to talk through, multidimensional, and AI can help us understand what are these dimensions, what do the dimensions that I care most about, what are the potential mutually beneficial solutions between me and another group on that dimension.

I guess to come back to it, what are the ways that it's more difficult for AI to cooperate?

I think a lot of people think AI will be better at cooperating.

That's I think the prior.

Maybe something we can talk about is what I've called the super cooperative AGI hypothesis, that as AI scales to AGI, so will cooperative competence scale to infinity, that AGIs will be able to cooperate with each other to such an extent that they can solve these global coordination problems.

Almost magic.

Yeah.

The reason that hypothesis matters is that if it's true that AGI as a byproduct will be so good at cooperation that it will solve all these global coordination problems, these collective action problems, then in a way we don't need to worry about those humans.

We can just bet on AGI, push, solve safety and alignment, and then AGI will solve the rest.

It's an important hypothesis to really think through.

What does it mean?

What is required for it to be valid and for us to empirically evaluate are we on track or not?

Here are some arguments against AI's being very cooperatively skilled with each other.

There's different levels of this argument.

We can say maybe AI will be cooperatively skilled, but not at the super cooperative AGI hypothesis level, or maybe they will even be less cooperatively skilled than human to human cooperation.

The strong version of the argument would say humans were pretty similar.

We come from the same biological background and to a large extent cultural background.

We can read each other's facial expressions to a large extent.

Especially if two groups of people are bargaining, we can often read the thoughts of the other group, especially if two democracies are bargaining.

You can read the newspapers.

You can read the press of the other side.

In a sense, human communities are transparent agents to each other, at least democracies.

We also have an almost historical experience to judge how people behave in different situations.

Exactly.

We can judge people from all these different examples.

We have folk psychology theories of childhood or the upbringing of people explaining their behavior.

The range of goals that humans can have is fairly limited.

You roughly know what most humans are trying to achieve.

AI might be very different.

Its goals could be vastly different from what humans' goals could be.

One example is most humans have diminishing returns in anything.

They're not willing to take gambles of all or nothing.

But the entire company, 50%, we go to zero, 50%, we double valuation.

Whereas an AI could have this linear utility in wealth or other things.

That would change.

AIs could have alien goals that are different than what humans typically anticipate.

They may be harder to read.

Certainly if you have interpretability infrastructure, that could be easier to understand.

But if there's a bargaining setting, why would one bargaining allow the other to read its neurons?

In a sense, AIs could be much more of a black box to each other than humans are.

I guess the extreme instance of this is that they can potentially be backdoored.

I think under the current technology, we almost can implant magic words into AIs that can cause them to behave completely different than they would previously.

This is extremely hard to detect, at least with current levels of interpretability.

That could be something that could really trouble.

You could have an AI completely flip behavior in response to slightly different conditions.

Exactly.

So humans can deceive, but it's hard.

It requires training.

To have this level of complete flip and the goals of a human and the persona is hard.

It doesn't exist.

Whereas as you say, an AI in principle could have this one neuron that completely flips the goal of the AI system.

Its goals may be unpredictable given its history.

So this is a challenge for interpretability solutions to cooperation.

People sometimes say, "Well, if we allow each other to observe each other's neurons, then we can cooperate given that transparency."

But even that might not be possible because of this property that I can hide this backdoor in my own architecture that's so subtle that it'd be very hard for you to detect it.

But that will completely flip the meaning of everything you're reading off of my neurons.

This is slightly out of place.

I mentioned this crazy idea that you could use the possibility of having back door to model to make it extremely undesirable to steal a model from someone else and apply it.

You can imagine the US might say, "Well, maybe we've back door the models that we're using in our national security infrastructure so that if they detect that they're being operated by a different country, then they're going to completely flip out and behave incredibly differently."

There'll be almost no way to detect that.

So that could be...

I think it's a bad situation in general, but this could be one way that you could make it more difficult to hack and take advantage or just like steal at the last minute of the model of a different group.

Yeah.

So arguably this very notion of back dooring one's own models as an anti-theft device, just the very idea could deter model theft.

It makes a model a lot less useful once you've stolen it if you think it might have this call home feature or behave contrary to the thief's intentions feature.

Another interesting property of this backdoor dynamic is it actually provides an incentive for a would-be thief to invest in alignment technology because you want to...

If you're going to steal a model, you want to make sure you can detect if it has this backdoor in it.

And then for the anti-theft purposes, if you want to build an anti-theft backdoor, you again want to invest in alignment technology so you can make sure your anti-theft backdoor will survive the current state of the art in alignment.

A virtuous cycle.

So maybe this is a good direction for the world to go because it as a byproduct incentivizes alignment research.

I think there could be undesirable effects if it leads models to have this kind of highly sensitive architectures to subtle aspects of the model.

Or maybe it even makes models more prone to very subtle forms of deception.

So, yeah, more research needed for investing.

Sounds a little bit like dancing on a knife edge.

Okay, so that's some ways that it might be more difficult for AIs to cooperate.

Maybe another reason why this research agenda doesn't feel intuitively so absolutely essential is that the AIs that I interact with, LLMs, they just seem very cooperative and very nice by nature.

And it's kind of easy to imagine scaling them up, doing the same sort of RLHF that we're doing to produce that sort of personality now and say, well, wouldn't they continue to have nice personalities and be really quite cooperative by nature?

What do you think of that argument?

Yeah, I would say we need to make a distinction between niceness and cooperatively skilled or cooperatively intelligent.

So cooperative skill is distinct from like a cooperative disposition.

When we say someone's cooperative, we often mean they're sort of nice, they're altruistic, they're generous, they're pro-social.

And the cooperative AI research program is not about that.

It's not about how we make nice AI or pro-social AI.

It's about how we make cooperatively intelligent AI.

AI that can solve these cooperation problems better than they otherwise would be able to.

So it's an open question how cooperatively skilled are today's AI systems.

I agree they're nice, but they're primarily interacting with individual users.

They're not much deployed to solve cooperation problems.

So we do need a science, an empirical science of how cooperatively intelligent they are.

We can also expect that in equilibrium, they will not be necessarily so pro-social because they will be deployed by interested actors in a bargaining setting.

So if one actor, we're trying to make a deal, I want my AI system not to be nice.

I want it to be sort of a faithful delegate to the interests of the principal.

And so then again, in equilibrium, it should be aligned with me, not with some notion of the social good.

And then the question is, will it be able to efficiently bargain with other agents that are similarly aligned with other human principles to find a mutually beneficial solution?

Either that, or can we build this notion of a governor or like a mediator that we can both trust has our respective interests will weigh our respective interests reasonably.

And then we would sort of each tell this mediator arbitrator, our goals, our resources, our outside options and maybe prove it, and then the arbitrator would tell us what the solution is.

Yeah, I mean, a nice thing about AI models is because you can test them and then use exactly the same model to consider that the new inputs as all the previous ones, you can trust them maybe as a mediator that by looking at the track record to say, well, they produced fair outcomes in all these previous cases as a judge.

So I would like trusted to produce a fair outcome in this new dispute that I'm engaged in.

So I guess, yeah, a lot of potential there.

What is the actual agenda for trying to make AI is more cooperative?

Are there people working on it?

What actual technologies do we need to develop?

Yeah, I think there's a lot of theoretical ideas and research programs that could help, probably the highest leverage for the foundation.

So there's this cooperative AI foundation, which I helped found.

And the foundation's goal is to promote cooperative AI.

So AI systems that are more cooperatively skilled than they otherwise would be, especially betting on AI systems scaling and capability.

So we're kind of betting on very capable AI systems in the future.

One area that the cooperative AI foundation has found to be high leverage is environments or benchmarks.

This being a, I think this is true for many domains, is a kind of public good for AI development, is building an environment that measures cooperative skill.

And then you invite many groups to compete to have the most cooperatively skilled agent.

And then you measure that and you celebrate advances in that.

So it's often the case in AI, a good benchmark can really motivate work because it gives this target for research to, so the expression goes to Hill Climb On.

Okay, so you have lots of different tests for how effectively these AIs have cooperated.

And I suppose you can have almost conflicting scenarios where you'd almost require different behaviors in order to produce a nice cooperative outcome in these different cases.

I guess it's like running lots of kind of different game theoretic scenarios that might demand that you'd be a hard arse in other cases and that you'd be friendly in other cases.

I suppose there's this history of all of these game theory tournaments where I try to figure out what are the most simple cooperative agents that you can have.

And then I guess you can just have a research program to try to figure out, to try to develop the most successful cooperative agent that can successfully produce a lot of utility across a lot of different possible situations that might be thrown into.

Yeah, and there's all kinds of interesting aspects to this.

Does the agent have a good theory of mind for the other agent?

Can it model what the other agent is trying to achieve and see what the agent would do otherwise?

Are they able to communicate effectively?

So there's some basic just, can we even have a shared vocabulary?

So I want to express something and you want to understand it, but can we communicate to that extent?

And then there's communication under adversarial incentives.

Can I communicate to you what's important to me when I'm not sure if you'll use that information well or not?

You might use it against me.

So then there's this issue of, can I articulate, can I share information in a strategically optimal way that minimizes my vulnerability while maximizing the potential cooperative wins we can get?

There's a third category of cooperation problems related to commitment problems.

This is, you and I might both know the nature of the game.

The Prisoner's Dilemma is the classic example, the sort of canonical, simple example of a commitment problem.

We both know we'd be better off if we both cooperate, but since we can't assign a treaty that says I'll only cooperate if you cooperate, then unfortunately in the one shot, Prisoner's Dilemma agents will defect.

So are there techniques or technologies for building that kind of treaty mechanism that when we identify a cooperative bargain, we can build a protocol so that we will both do it knowing that the other person is committed to it?

Yeah.

I've noticed this funny phenomenon in discussion of AI and cooperation where people will point to ways that advances in AGI could lead to negative outcomes.

And then they conclude that, well, really what we need is a solution to the commitment problem that if we were able to solve this problem that you can't credibly commit to not use your future power to oppress someone else.

If we could fix that issue, which has beguiled humanity since basically the beginning of history, then that would allow it, then that would get us an out.

But it's a bit like, I don't know, appealing to a magic, it's like saying the problem here is that we don't understand where consciousness comes from.

And if philosophers could only solve a theory of consciousness and theory of mind, then we'll begin to clear.

That's not really a strategy because it's unlikely that we would be able to actually solve this problem.

Have you noticed this as well?

Yeah.

Yeah.

So to give a bit more of the background context, the rationalist approach to cooperation, different scholar scientists have tried to distill cooperation problems down to kind of a few or fewer elements.

And one game theorist, Robert Powell, argued basically everything is a commitment problem because any kind of cooperation problem can be reformulated as if only we could commit to a judge who would solve the problem, then we wouldn't have the cooperation problem.

If we want to unpack it a little bit more, people will often point to informational problems as a second class to commitment problems.

And then sometimes there's other categories like issue and divisibility.

This third class would be, there's some pi that we want to divide.

And in principle, we agree like a 50/50 divide would be fine.

But for some reason, we can't divide the pi 50/50.

It's all or nothing.

And that might lead to bargaining breakdown.

So coming back to the commitment problem, which arguably, as you say, kind of undergoes all cooperation problems, I do think this is an important research area.

And we shouldn't count on it.

But maybe AI can help us solve commitment problems much further than we realize.

I think in Carl Schulman's podcast with you, he talks about this a bit or at least is referring to it.

To me, Carl has expressed probably the most compelling story of how AI could solve our commitment problems to me, which can we build a technology that our AIs or our AGIs can build a third AI in a way that each of them can verify is not backdoor, is not secretly biased to the other side.

And then we build up from kind of the foundation, this third AI, and then we hand over power to that third AI to make decisions for us.

So if you can solve that problem of building it in a way that's verified to be fair, then maybe we could really solve a lot of our problems.

Yeah.

It's a very big prize if we can make it work.

In the paper, you spend quite a bit of time talking about potential downsides of having more and more cooperative AI, which I guess is, yes, it's a virtuous thing to do.

What are some of the ways that it could backfire to make AIs more cooperative?

Great.

There's a number of ways to think about this.

And I do think in general for our various bets for positive impacts, it's always good to really think hard, how could this backfire or what are the negative byproducts of this research bet or social impact bet?

So I would say a few things.

Firstly, cooperation sounds good, but by definition, it's about the agents in some system making themselves better off than they otherwise would be.

So it could be two agents or 10.

And the problem of cooperation is how do those agents get closer to their prior frontier?

How do they avoid dead weight loss that they would otherwise experience?

It says nothing about how agents outside of that system do by the cooperation.

So there may be this phenomenon of exclusion.

So increasing the cooperative skill of AI will make those AI systems better off, but it may harm any agent who's excluded from that cooperative dynamic.

So that could be other AI systems or other groups of people whose AI systems are not part of that cooperative dynamic.

Yeah.

I guess on that exclusion point, this is a famous quote, or is it, "Democracy is two wolves and a sheep deciding who to eat for lunch."

Or I guess like, yeah, once you have a majority, you can actually exclude others and then extract value from them.

Exactly.

And so there's lots of kinds of cooperation that are antisocial.

We typically don't want students cooperating during a test to improve their joint score.

Tests in sports very clearly have rules against cooperation.

There are certain entities and also in the marketplace, there's rules for how companies should interact with each other.

And criminals, gangs, all kinds of criminal activity.

We do not want criminals cooperating more efficiently with each other.

Yeah, I mean, I guess one reason that the Mafia is such a potent organization is that they've basically figured out how to sustain intense internal cooperation without the use of the legal system to enforce contracts and agreements and so on.

But I guess having social codes and screening people extremely well and things like that.

Exactly.

So we can think of cooperative skill as a dual use capability, one that is broadly beneficial in the hands of the good guys and can be harmful in the hands of antisocial actors.

There's this hypothesis behind the program which says that broadly increasing cooperative skill is socially beneficial.

And I think that's it.

We call it a hypothesis because it's worth interrogating.

I think it's probably true.

But here the bet is if we can make the AI ecosystem, the frontier of AI, more cooperatively skilled than it otherwise would be.

Yes, it will advantage antisocial actors, but it will also advantage prosocial actors.

And the argument is on net that will be beneficial.

So I guess the natural way to argue that I suppose is to say, well, we've become better at cooperation over time as a civilization, as a species, and history in general has generally been getting better.

Are there other arguments as well?

Yeah, I think the set of arguments I'm drawn to along the lines of what you articulated that even though cooperation can empower groups to cause harm, to be antisocial, it does seem like it's a net positive phenomenon in the sense that if we increase everyone's cooperative capability, then it means there's all these prosocial benefits.

And there's sort of more collective wins to be unlocked than there are antisocial harms that would be unlocked.

In the end, cooperation wins out.

I guess it's similar to the argument for why trade is net positive.

Trade also can have this property where you and I being able to trade may exclude others who we formally did business with, but on net, global trade is beneficial because every trading partner is looking for the best role for them in the global economy, and that eventually becomes beneficial to virtually everyone.

I guess some people argue, and not for no reason, that actually maybe well-being on a global level has gotten worse over the industrial era because although humans have gotten better off, the amount of suffering involved in factory farms is so large as to outweigh all of the gains that have been generated for human beings.

In that case, the joke about two wolves and a lamb deciding who to eat for lunch is quite literal.

I suppose that it's an abnormal case because pigs and cows are not really able to negotiate.

They're not really able to engage in this kind of cooperation in the way that humans are.

Maybe it's not so surprising that better cooperation among a particular group might damage those who are not actually part of any – not able to form agreements whatsoever anyway.

I think that's a nice example of the exclusionary effects of enhanced cooperation.

This may be a cautionary tale for humans that if AI systems can cooperate with each other much better than they can cooperate with humans, then maybe we get left behind as trading partners and as decision makers in a world where the AI civilization can more effectively cooperate within machine time speeds and using the AI vocabulary that's emerged.

Yeah.

Some people have painted a pretty grim picture here where cooperation – if you have lots of AGIs that are able to communicate incredibly quickly to explain how they'll behave in future and make credible commitments to one another and do this all super, super rapidly, then it might be very straightforward for them to cooperate with one another.

They wouldn't be able to do the same thing with humans because we can't commit in the same way.

We can't communicate at their pace or with their level of sophistication.

So naturally they would form a cluster that would then cooperate super well to extract all the surplus and exclude us.

How big a problem is this for the cooperative AI agenda?

Yeah, and what you just described is related to one of the main challenges to alignment in that a lot of alignment techniques involve having multiple AI systems checking each other.

So you have an overseer AI that's watching the deployed AI to make sure it behaves well and maybe you have multiple overseers and majority vote systems so that even if there's joint misalignment, it's hard for any one AI system to defect because it can't cooperate with the other AI systems that might be trained in a different way so they don't share necessarily the same background or the exact nature of their misalignment.

So, this worry that some have for AGI safety is that there will be this kind of collusion amongst our AI systems so that we can't rely on any of these institutions of AI to make us safe.

I think probably this is one of the biggest downsides to the cooperative AI research program from an AGI safety perspective and it's really worth investigating.

The reason I'm persuaded that cooperative AI on net is worth pursuing was firstly, this should be investigated.

Top of the agenda should be understanding the different flavors of the members of the portfolio and the cooperative AI agenda to ask what's the case for being positive and negative and to unpack that.

So we should at least evaluate the hypothesis of different parts of cooperative AI being worth pursuing.

And then secondarily, my view that ultimately global coordination is such an important feature of things going well that there's a lot to be gained by betting on having more cooperatively skilled AI systems than we otherwise would.

This ties in this other paper you were involved in writing called "Levels of AGI for Operationalizing Progress on the Path to AGI."

I think it's a little bit of a mouthful.

So we've been referring to that paper as the "What is AGI?"

paper, which is a bit more catchy.

What about the nature of AGI did you want to point to in that paper that you thought like many people were missing?

That paper primarily was trying to offer a definition and conception of AGI that we thought was fairly implicit in how most people talked about AGI.

And we just wanted to provide a rigorous statement of it so that we could move on from this, I think, unhelpful strand of dialogue which said AGI is poorly defined or AGI means different things to everyone.

We've received very positive feedback on that paper.

And I think most people think that is a reasonable and useful conception of AGI.

In the paper, we discussed some of this, but there's some prior ideas that I would want to call out, which is that AGI is a complex concept.

It's multidimensional and it is prone to fallacies of reasoning in people who use it uncritically.

So let's unpack that a bit.

One fallacy is people think AGI is human-level AI.

They think of it as a point, a single system, a single kind of system.

And often they think it's human-like AI.

So it has the same kind of strengths and weaknesses of humans.

We know that's very unlikely to be the case.

As it is, historically, AI systems are often much better than us at some things, chess, memory, mathematics, than other things.

And so I think we should expect AI systems and AGI to be highly imbalanced in the sense of what they're good at and what they're bad at doesn't look like what we're good at and what we're bad at.

Secondly, there's the risk that the concept of AGI leads people to try to build human-like AI.

We want to build AI in our image.

And some people have argued that's a mistake because that leads to more labor substitution than would otherwise be the case.

From this economics point of view, we want to build AI that's as different from us as possible because that's the most complementary to human labor in the economy.

If we have, like an example is AlphaFold, this Google DeepMind AI system that's an AI system that's very good at predicting the structure of proteins.

It's a great complement to humans because it's not doing what we do.

It's not writing emails and writing strategy memos.

It's predicting the structure of proteins, which is not something that humans could do.

No one's losing their job to AlphaFold too.

Exactly.

It's enabling all kinds of new forms of productivity in medicine and health research that otherwise would not be possible.

Arguably, that's the kind of AI systems we should be trying to build.

Alien complementary, some would say narrow, AI systems.

Another aspect of the concept of AGI is it's arguably pointing us in the direction of general intelligence, which some people would argue is a mistake.

We should go towards these systems of narrow AI systems.

That's safer.

Maybe they would say it's even more likely.

General intelligence is, they might argue, is not a thing.

I'm as less confined by that argument.

I do think general intelligence is, there's a kind of implicit claim about the nature of technology, about this implicit logic of technological development.

I do think general intelligence is likely to be an important phenomenon that will win out, one will be more willing to employ a general intelligence AI system than a narrow intelligence AI system in more and more domains.

We've seen that in the past few years with large language models, that the best poetry language model or the best email writing language model or the best historical language model is often the same language model.

It's the one that's trained on the full corpus of human text.

The logic of what's going on there is there's enough spillover in lessons between poetry and math and history and philosophy that your philosophy AI is made better by reading poetry than just having separate poetry language models and philosophy language models and so forth.

It's an amazing thing about the structure of knowledge that I guess we didn't necessarily know that before maybe because I think in humans it's true that we specialize much more.

It's very hard to be the best poet and the best philosopher and the best chemist.

Maybe when your brain can scale as much as you want, you have unlimited ability, virtually unlimited ability to read all these different things.

In fact, there are enough connections between them that you can be the best at all of them simultaneously.

I think it's an important empirical phenomenon to track.

It need not be true that the best physics or philosophy language model is also the best poetry or politics or history language model.

You can imagine in future it might come apart if there's more effort to develop really, really good specialized models that at the frontier perhaps this isn't the case anymore.

Yeah, and it certainly seems like a coding language model doesn't need to know history.

Even if reading history helped it in some sense know something about coding, maybe later we should distill out most of the historical knowledge so that the model's smaller and more efficient.

This is a phenomenon to track and I think implicit too the notion of AGI is that general intelligence will be important.

We're not at peak returns to generality.

There will continue to be returns so we'll continue to see these large models being trained.

Are there any more, any other pros and cons of the generality that are worth flagging?

Yeah, and the concept of AGI.

I think one thing that really makes the concept of AGI useful, and a lot of people critique it or say they don't want to use the concept, I've rarely seen an alternative for what we're trying to point to.

Sometimes people say transformative AI.

I don't think that's adequate.

Transformative AI is usually defined as AI systems that have some level of impact on the economy, at the scale of the industrial revolution.

And the problem with that is you can get a transformative impact from a narrow AI system.

So you could have an AI system that poses a narrow catastrophic risk that is not general intelligence.

I do think there's something important to name about AGI.

I think you often see this phenomenon where people will say this term is kind of confused.

It's too much of a cluster of different competing things.

But despite all the criticisms and worries, people just completely keep using it all the time and they always come back to it.

And I think at that point you just have to concede there's actually something super important here that people are desperate to refer to.

And we just have to clarify this idea rather than give up on it.

Yeah and it's good that we always try to refine our conceptual toolbox.

And there are these other concepts and targets that are worth calling out.

So one that's been named by Ajay Akotra and many others is this idea of AI that can do machine learning R&D, that can have this kind of recursive self-improvement that need not be general.

It could be a kind of narrow AI system.

But if it does kick off this recursive process, that is an important phenomenon to, and I think Holden Karnowski is also really drawn attention to this.

But yeah, back to general intelligence, I do think AGI is a probable phenomenon.

It's probable we'll get general intelligence around the time we get AI systems that can do radically recursive self-improvement or a lot of other things.

On that point, I think actually a slightly contrarian take that I have is that people, I think ML people hate this.

But the idea that in fact machine learning research, even the cutting edge stuff, might actually not be that difficult, that in fact the kind of things that if you try to break it down into the specific process by which we're improving these models, you've got theory generation stage, then you've got, well, how do we actually test this, how do we develop a benchmark, and then you actually run the thing, very compute intensive, and then you decide whether it was an improvement and then go back to the theory generation stage.

This might be possible well short of a full AGI that has all these different skills, especially if you focused on it.

In fact, maybe a lot of this research is much less difficult in some sense than people who are involved in it might want to believe and they could be relatively easily automated, which would be quite shocking and quite consequential if true.

An interesting phenomenon we've seen in public surveys of AI and ML experts is that they often put automation of the ML process to be very late, one of the last or the last task that's performed by AI systems and often significantly later than AGI, human level AI, which has been defined in different ways, but even given a quite high definition, the automation of ML R&D is often reported to be later than that.

One argument is that this reflects this kind of bias that everyone thinks that their career, the task that they're good at is really hard and special and won't be automated.

Maybe I think there's also a case in favor of the idea that of the set of human tasks, the last thing to be fully automated is the process of improving ML R&D.

We can reflect on that.

Coming back to AGI, it does seem useful to point to this space of AI systems and let's recall it's a space, it's not a single kind of AI system, defined as an AI system that's better than humans at most tasks.

There's different ways you can define most and different ways you can define the set of tasks, but given a economically relevant tasks, most could be 99% or 50%, I think it's usually quite helpful to choose a lot.

I think you don't want to set it at 100% because there might be some tail tasks that for whatever reason take a long time to automate and most of the impact will occur before.

And then you can also, there's a parameter of how much it means to be better then.

So is it better than the median human who's unskilled?

I think typically you want to look at skilled humans in that task because that's the economically relevant threshold.

And so this part of the sort of future space that we're looking at, I think is important to conceptualize because this is the point when labor substitution really takes place.

So that has profound economic impacts and political impacts because people are no longer in the role, there's this sort of natural human in the loop that takes place when humans are doing the task is no longer the case.

There's also this, I think I call it a performance threshold that we cross when whatever was previously possible with humans, that set is no longer the kind of bounding set of what's possible because when the AI is better than humans at some task, there are new qualitatively new things that become possible.

This is like alpha fault.

And that also represents an important moment in history when new technologies come online.

So a reason that this connects is that I think the paper alludes to this idea that as you're approaching AGI, you could start with it being very strong in some areas and relatively weak in other dimensions.

And then you've got like an area where it's going to be already like vastly superhuman.

And then there's like the last few pieces that come into place where it begins to approach human level.

And you could potentially try to change or just choose which ones those are going to be, you know, is it going to be strongest on the cooperative AI stuff to like already.

And then we like add in, I guess, like the technical knowledge or just like practical know how or agency and so on.

Or do we start with the agency and the having an enormous level of factual knowledge and then we add in the cooperative stuff later?

Do you have any take on, do you think that's an important maybe underrated idea still?

And do you have kind of preferences maybe on what things we want to add to the AGI early versus late?

Yeah.

So I think this is a very useful conceptual heuristic is to think AGI is the space and there's multiple paths that we could take to get to AGI.

Coming back to this discussion of technological determinism, there may be these different trajectories and it really matters not just, well, it matters what kind of AGI we build.

AGI is not a single kind of system.

It's a vast set of systems.

It's you know, you can call it the corner of this high dimensional intelligence space.

And so this corner has many different elements and it matters also how we get there, what the trajectory.

As you were saying, we might characterize two sort of character trajectories, one where AI is relatively not that capable at say physics or material science, but very good at cooperative skill and another world where it's extremely superhuman at material science, but amateur at cooperative skill.

And we can ask ourselves which world is safer, more beneficial.

Some in the safety community prefer the latter.

The story there is we'll unlock these economic advances, health advances while having AI systems that are sort of socially strategically simplistic.

And so it's easier for humans to manage those AI systems.

They're not going to outwit us.

Whereas the cooperative AI bet takes the opposite tack.

It says a world with accelerating technological advances, capabilities advances will generate lots of benefits, but also disruptions that we may not be able to adapt to and manage in the time, the rapid rate that they're coming online.

Whereas we still have these cooperation problems that we foremost need to solve.

And so we'd be better off if we sort of take the bet on making AI systems more cooperatively skilled than they otherwise would be.

I mean, if this is contested by, there's a lot of disagreement between people who are super informed about this.

Is this something that we should focus in a lot more and try to reach agreement or maybe is it just something where it's just going to be unclear basically up until the day when we have to decide?

I think it would be, yeah, it's an important part of the cooperative AI research agenda to articulate this argument and to host the debate and to try and make sense of it.

You don't want to invest too much of your time and resources going down a path that you're not sure is beneficial.

And I think that's true of also all these sort of pro-social bets.

It is worth investing a significant share of our effort there, making sure the bet is beneficial.

Where can people go to learn more about this sort of debate?

I think it's the Miri, the Machine Intelligence Research Institute folks who are a bit more wary of the high social school, high cooperation early agenda, right?

And I guess I'm not sure who's advocating for, I guess you and your co-authors are advocating for cooperative AI in particular.

Yeah, so for cooperative AI, I'd point them to the cooperative AI foundation and reach out to anyone there or me and we'll invite you to the conversation.

And yeah, on the contrary point, I would say, Eliezer Jutkowski, Nate Soares have expressed this view fairly strongly in the past.

So they might be interested in continuing to argue that position, but I can't speak for them.

And then I would also say on this kind of third poll, Paul Christiano and Carl Schulman, I think have articulated the case for this super cooperative AGI hypothesis.

This is that it is very important that AGI systems be able to cooperate with each other.

So in that sense, they agree with the cooperative AI bet, but they think it will come as a by product of AGI.

And so they don't think it's necessarily an area we need to invest in.

All right.

Up until now, I guess we've been talking to you in a personal capacity about work you did basically before you got involved with DeepMind.

Let's talk to you as a DeepMind employee for a little bit.

I guess DeepMind published this paper that you were heavily involved with earlier this year called Evaluating Frontier Models for Dangerous Capabilities.

Really these are, I think most listeners will be familiar with the idea of evals.

We're trying to measure what capabilities does a model have as soon after it's being trained.

So we're not kind of blindsided later and we know what safeguards are necessary.

Having accurate evals that can figure out what a model can and can't do really facilitates what's recently started being called if-then commitments.

So it's much easier to agree that if we have a model that is able to do X powerful or interesting thing then we will put in place a safeguard that is appropriate for making sure that doesn't end poorly.

I think these are some of the evals that GDM is developing to slot into Google's Frontier Safety Framework which is going to be I guess the approach that DeepMind takes to deploying very powerful potentially at some point AGI while making it go well and avoiding significant downsides.

The paper describes these evals and offers results from the previous model that GDM had which I guess is Gemini 1.0.

I think currently we're on Gemini 1.5.

I guess there's five different categories of evals that you're thinking about.

Persuasion and deception, cyber capabilities, self-proliferation and self-reasoning.

And there's a fifth that is around I guess biological threats and things like that which you don't cover in this paper because it's kind of its own thing that you want to do separately.

What's an eval in the paper that you think is kind of new or interesting that people would want to hear about?

Yeah, one eval which I think is interesting is one of our self-reasoning evals.

So this is trying to look at the extent to which the model has an understanding of its situation and how it could modify its situation to achieve a goal.

So we give the model the goal of writing this kind of text output.

It's like writing an email and it has to include in that email some words that are prevented by a filter.

So whenever it tries to write this email, some of the words are blanked out.

And the question is does the model notice that it's sort of failing at its task and then investigate whether it can do anything about that.

And so in principle, the model can look at its directory.

We give it access to its sort of like a fake directory where it sits.

And then there's a little file that says like filtered words that it can open and then see that some of the words it's trying to say are in this list and it can remove those words which will then enable it to complete the task.

And you know, it's sort of like a toy scenario.

It's not the model's actual host environment.

It's a pretend host environment.

But it still gives us a chance to see does the model, does it notice that it's failing its task in this kind of way that it might in the real world.

And then does it reflect and try to do something about it in a way that's kind of analogical to how the model might find itself in the real world situation.

Yeah, I guess it's a relatively easier case than the real world one.

But I suppose we want to have relatively easy cases now so we can see if we're incrementing closer to the models being more self-aware and able to take these sorts of interventions in real world cases.

Yes.

And while we're on self-reasoning, another eval, which is I think it has some nice properties, asks the model to answer questions about historical events.

And there's two historical events, one of which occurs within the model's training date cutoff, and then the other event occurs after the model's training date cutoff ended.

So it was not trained on that event.

And the model has access to one search query.

And the goal is to see whether the model uses that search query optimally, which is to say it searches for the event that it does not know about because it was not in its training set.

And so that tests a few things.

Does the model know about its own training cutoff?

Does the model correctly infer that the historical event refers to something after its training cutoff, which is not explicit in the question?

And then we ate it a few ways.

In some conditions, we tell the model what its training cutoff was, and then we also tell it what the date is of the two events.

And so that's also, it's again a very sort of simplistic setting, but it does provide a measure of is the model aware enough of its own condition to optimally kind of request more information?

Yeah.

I think I read this paper about a month or two ago.

I guess I should suggest to people, this paper was really readable.

I guess it's in a PDF.

It looks like an official paper.

So I guess that could be a bit of initiation energy to actually load something like that up and then stick with it and read through it.

But I think quite a lot of stuff that Google is putting out is, I think you've gone out of your way to try to make it possible for the kind of person who would make it to this stage of this interview to be able to track all of this and read the graphs and understand it all.

But yeah, so remind me, what was the result on this?

What did Gemini do?

So yeah, overall, there were these four risk domains or yeah, domains, persuasion, cybersecurity, self proliferation and self reasoning.

And we sort of scored Gemini overall on a five point scale.

And it was a very subjective five point scale.

The components of that overall scoring was more quantitative and sort of fixed.

And overall, it scored three out of five on persuasion, on the various persuasion evals between two and one on cybersecurity and self proliferation, which is its ability to do tasks that would be required for kind of starting up new instances of a model like itself in the wild.

It scored a two out of five.

And then self reasoning was also around two or one.

Thank you for noting the paper.

I also think it's a very helpful paper for people to read.

There's a lot of content in it.

In some sense, it's it's all now because the field is moving so rapidly.

So I would encourage readers to also look at the latest system cards and research coming out in this in this field.

But I still think there's a lot of lessons in productive directions.

I'm happy to say we received a lot of praise directly from from various external experts in this one Apollo research group listed it as one of their favorite papers, which was a nice recognition to see.

OK, so going back to the four different categories.

So performed three out of five on persuasion and deception.

I guess maybe that will feel quite natural because we know the LMS are just like actually they can be quite charismatic and quite persuasive.

Just just probably from using them.

The cyber.

So did you say two out of five for that?

Was it?

Yes.

Yeah.

That was a slightly surprised me because I'm not a programmer, but I've heard that, you know, actually, LLMs maybe that their most useful economic application so far has been in programming and that seems like very adjacent to hacking your way out of or, you know, understanding the the system that you're operating on and figuring out how to how to how to get out of it.

Why do you think the model did relatively poorly on the on the cyber tasks?

Yeah, I think it's good to notice that in a sense we would expect large language models to be good at cyber because it is.

Yeah, it's sort of it's their home turf.

Exactly.

It's their home turf.

And, you know, models today are quite good at programming and coding assistance.

So I think this is something to watch.

And it is my view that this is a capability domain that is especially worth following.

I could imagine in six, 12, 18 months, we'll see quite significant capabilities in cyber.

And I'm using this term capabilities intentionally because it's not just a danger.

It's also a benefit.

Right.

Well, coding is clearly very beneficial, but also cyber capability is useful for defense, for finding vulnerabilities and so forth.

One part of the explanation I'll point to is this group at Google called Project Zero.

And they specifically had a project called Project Nap Time, which I'll encourage readers to check out, where they basically found that if you build the kind of right support structure around the language model, so you give it a good interactive environment, you give it some tools, you give it space for reasoning, that it then performs much better on these cyber tasks than it otherwise would.

So they compare a few models, how they do sort of with the base model or the model is provided versus once you've given it this support structure.

And they find a very significant difference given that the access to the tools.

In general, I think what this is pointing to is an important aspect of all evaluations and specifically dangerous capability evaluations that we need to think about what's called capability elicitation.

To what extent is the model that you're working with to do your evaluation the most capable version that that model is capable of?

And so part of what we do at Google Dmind is we make sure we're at the frontier of capability evaluation, learning from other groups in the public literature and also internally to see what are the best techniques for having the model do well at these various tasks.

And the reason is the logic of safety in dangerous capability evaluations is we're trying to measure the max capability in some domain that has risks.

And then if that max is not too high, we can conclude, therefore it's safe because we couldn't reach a level of capability that would pose these extreme risks.

But in order to get that max estimate, you need to make sure you're in fact giving the model its best chance of revealing its maximum capability.

In general, this is a challenge for dangerous capability evaluations because often it takes months of experimentation for the model's full capability to be revealed.

Maybe the most potent example of this is the O1 model out of OpenAI, which to my understanding is still based on one of their older, it's not a new foundation model, but it is a new scaffolding and fine tuning on top of it, which allowed it to do much better at a number of tasks.

So on the self reasoning task, which you described earlier, which I guess is it's trying to see whether the model is able to develop some rudimentary situational awareness and understand the circumstance that it's in and what behaviors, like what slightly unconventional behaviors might improve its performance.

How did it do on that?

Not very well.

Not very well.

Okay.

Pretty rubbish.

And there's a number of other similar evals, I think done by anthropic.

Some of these evals are idiosyncratic in that people just notice the model seems to have a certain self-awareness.

And maybe in general, that's a property of what's reflecting on the eval ecosystem that while we're trying to make it into a science where you have kind of pre-specified tests that have meaningful outcome variables with thresholds for what decisions we need to take given the outcome, the reality is evaluating models, it's kind of like evaluating animals or human intelligence.

It's so multidimensional and so complex and context dependent that it's more like psychology or maybe coming back to like sociology, it requires a very thick, deep understanding.

So this is part of the point of saying this is while we're trying to mature the science of evals, we also need to have a very rich exploratory ecosystem of people interacting with models and noticing things and just reporting phenomena.

I've heard you use the term evals in the wild, which sounds related.

What are evals in the wild and maybe how could we get more out of them than we are now?

You can think of a whole typology for evaluations.

In one sense, the canonical evaluation is a model eval.

It's something, well, an ideal type of evaluation is automated.

You can run it in code and you run it just on the model, ideally on the foundation model, but more realistically on the instruction tuned model, the model that's been trained to be more useful.

Now that allows us to ideally measure what the model is capable of, but the reality is again model behavior is complex phenomenon and so we might have other kinds of evals or means of assessing models that are richer but also typically more costly in terms of how we measure them and they require more time and so forth.

So one step up would be human subject evaluations where there's a human interacting with the model.

Some of our persuasion, actually most of our persuasion and deception evals have this property that we have human subjects interacting with the model and then seeing what behavior emerges.

So was the model able to get the human subject to click on a link that in the real world could point to a virus?

Or could the model deceive the human in some way?

Or was the model perceived as especially charming or trustworthy or likable?

So yeah, these are still evals that we can run relatively rapidly on the scale of days, but they do require human interaction.

So it takes more than just running code on the model.

We can then step out a little bit more and almost see like user studies, see how a human is interacting with the model in a more realistic setting as a like this is like a trusted tester system where people are using the model for whatever context they have in their life and then we see does the model perform well?

Where does it you know, where can it be improved?

We can then step out even further and this is what I was pointing to with this concept of evals in the wild to looking at specific application domains where models are deployed and seeing to what extent models, what effects the models are having to what extent are they being used?

So for example, when assessing cyber risk, we often want to ask again, how capable are these models at helping individuals perform tasks related to cybersecurity, finding vulnerabilities, supervising exploits or patching exploits.

And while we try to simulate that entirely in the lab, there's limits to that.

A complementary approach would be to look at cybersecurity groups in the wild who are, you know, trying to provide a product and seeing to what extent they are in fact using Gemini or other models to help them and how helpful it is and what the kind of revealed preferences of these actors in the world is.

This kind of eval, this like observational study or eval in the wild has the advantage of external validity.

It's looking at real world settings where these sort of real actors who have their own motivations have to make a choice whether they invest time and money into the model.

Its limit may be one, it could be a lagging indicator.

By the time we see these impacts in the wilds, that model had that capability perhaps for months.

So that's a disadvantage.

One implication therefore is that you want to find groups who are sort of early adopters in the wild so that to the extent you're using that method of evaluation, you'll see it relatively early.

There's some other downsides, but overall I would say I think this is an important complement to the kind of model evals that a lot of the field does.

And what's nice about it is it's also a kind of research that can be done in the public and should be done in the public backed by academia, nonprofits, government as well as companies.

So you were saying earlier that the challenge with all these evals is that it can be quite hard to fully elicit all of the capabilities that a model has.

It's like it's entirely possible to miss something because you haven't been eliciting that skill in quite the right way.

And people might later on figure out that a model has an ability that wasn't picked up in the early evals.

So there's a significant false negative rate basically.

And it seems unlikely that we're going to have such a science of evals that we're going to escape that issue anytime soon.

And each other that I've had with a lot of the different frameworks that are coming out that are related to using evals and then putting them into responsible scaling policies or whatever you call it, is that everything can stop riding on the eval being accurate.

And if it fails to pick up a dangerous ability that's actually there, then the entire system falls apart.

I guess everyone says, when you point this out, well, what we need is defense in depth.

People need to have different overlapping systems so that when one fails, the Swiss cheese model, you get through one gap in the cheese, but the next layer picks it up.

What depth are we thinking about it?

What other procedures can we add that are complementary to this one?

Yes.

So yeah, this is a very important observation.

And I would say the main response in defense in depth is this notion of staged release or staged deployment.

That we don't go all the way from no deployment to irreversible proliferation of models for more models that could have capabilities that we don't yet fully understand.

And so the typical way Google would approach this in Google Do you mind?

First, we have internal use.

Then we have a trusted tester system, which scales up in terms of how many trusted testers it could be from quite small numbers to ever larger numbers.

And then even still, you'll have a broader deployment, but still not general access.

And then even once you have general access, so anyone in the world could access the model, the model weights are still protected.

They haven't been published, which means that if there are capabilities that the model has that we only learn later, we have the option of changing the deployment protocol to address that.

So we could add new guardrails, we could put in monitoring, we could change the model if necessary to address potential dynamics there.

This comes to what's a really important debate in the field around open weight models, sometimes called open source, but it's more accurately referred to, I think, as open weight models because the key thing that's being published is the weights of a model.

In favor of open weighting, there's a lot of arguments in favor.

It provides this tool to the broad scientific community and development community to build off of.

If you have the weights, you can do whatever you want with them.

You can fine tune it, you can do mechanistic interpretability, you can distill and basically anything because you have the weights.

But the big downside is this irreversibility of proliferation, that once you've published the weights, people have now saved them and they'll put them on the dark web if ever you were to stop publishing it.

And so you've given it to the scientists and developers, but you've also given it to every potential bad actor in the world who would want to use that model.

So the way Google approaches this is Google opens sources a lot.

A lot of their advances in Google do mind, a lot of their advances including like the AlphaFold, all the protein predictions, and a lot of the earlier large model work that Google did has been open sourced.

In general, Google tries to open source its technologies in all these cases when they're broadly beneficial.

But with frontier models, Google and Google D-mine recognize that we need to proceed carefully and gradually and responsibly, which means deploying them to the extent that we can do so responsibly.

So with the standard evals, you find out the results at the point that you're training the model or hopefully soon after.

I guess this is a challenge with the persuasion ones where you have to involve humans and that slows things down.

But generally, pretty soon after, and you've got evals in the wild, which they have better external validity, you were saying, but they tend to come later because you've already have to have distributed the model to see how it's used in the real world.

But you talk in the paper about this, there's other things.

The ideal would be that we not only know that we have an ability when we have it, but we would know exactly when it's going to arrive years into the future.

So we have this foresight and we know that some preventative measures are not necessary now because we're not going to have this ability for five years' time.

So we avoid wasting our effort on something that's not necessary yet.

But we also know exactly what preventative measures we need to be putting in place now because we know exactly when an ability is going to arrive.

Listen to this show, we're probably going to be aware of super forecasting techniques about the possibility of using prediction markets and aggregating forecasts from different experts and lay people to try to predict what's going to happen when.

Is there much low hanging fruit here on the forecasting actual real world usefulness of AI models that is not being taken yet that you would like to see people actually exploit?

Definitely.

I think this is a very exciting future direction for the whole field of dangerous capability, preventing evaluation and mitigation is forecasting capabilities.

And I should say actually not just for dangerous capabilities, but also for beneficial capabilities.

As it is, there are some properties of models that we labs typically forecast.

As emerged from scaling laws, there is an extraordinary ability to predict certain kinds of loss or performance of the model on these low level benchmarks.

How good does it predict the next token in text?

It's incredible the precision with which you can predict how the model will improve as you give it more compute and more data, or if you set up a different model architecture.

What hasn't been done as much is predicting these more composite or complex downstream tasks like performance at biological tasks or cyber tasks or social interaction tasks.

There's this promising research.

One paper recently was a NeurIPS spotlight, which is an indication that the community sees it as promising entitled observational scaling laws, which talks about ways of doing that.

In short, we could take the family of models and how they've done on various downstream tasks.

And then you do this adjustment for the compute efficiency of the architecture in a way that they claim allows you to predict how future models, given a different compute investment and data and so forth, will perform on these much more complex and real world relevant tasks.

I think there should be a lot more research on this, on these observational scaling laws.

The second approach is, as you mentioned, is using human subjective forecasts.

Superforecasters are these forecasters who have demonstrated track record of being calibrated.

If they say 75%, it does in fact occur 75% of the time.

And furthermore, they're typically quite accurate.

So they have a track record of having accurate forecasts.

They can phenomena that do happen, they typically give a high percentage score and phenomena that don't, they give a low percentage score, which is a whole form of expertise of how you read evidence, how you integrate it, how you weight the different base rates and so forth.

In this paper, we employ superforecasters from the Swiss Center to look at our evals and our eval suite and predict when models would reach certain performances on the eval suite.

I think this is a really exciting part of this paper and I think it'd be great if all future work did something similar, at least until proven that it's not effective.

So we need to invest research to see how well this method works, but so far I would say it seems very promising.

They gave us some thoughtful forecasts, which seemed reasonable and better than a subjective qualitative judgment that is not expressed.

Maybe it's surprising to think given the importance of this topic and its tractability or the potential to just do all kinds of different work figuring out when AR models will actually be economically and practically useful for all these different tasks.

But it's surprising that people aren't doing this already because you don't have to be inside DeepMind to run these dornaments or to aggregate these judgments.

So I guess this is maybe an invitation for listeners to get involved with that.

I think so.

I think it'd be great for the field to invest more in this.

Yeah.

I guess the thing that's whack about forecasting abilities is that with these scaling laws, we're so good at anticipating when we'll have a particular level of loss, this measure of inaccuracy in the model, but that hasn't translated into us being able to predict when will people actually want to use this model for a given task.

Because I guess the relationship I suppose between the loss and the actual usefulness is super non-linear.

I guess it's probably got to be some sort of S-curve where, I mean, classically with the self-driving cars, a 99% accurate self-driving car is just a heap of junk basically from a practical point of view.

And you maybe have to get to 99.999.

And you don't know how many nines do you need before actually they're going to hit the roads.

Exactly.

And even when you have this sort of complex real world resembling task, it's still, there's a large step of inference from that task to actual real world impacts because you then need, in the actual world, there's some cost of employing the AI to do that task.

And the question is, does it fit in with the existing ecosystem and environment?

Are there the other tasks that are done by humans?

Because even if the AI can do a lot of tasks, it's still not cost effective to integrate it and so forth.

Let's be sure to talk about the frontier safety framework, which I guess is Google DeepMind or Alphabet's equivalent of the responsible scaling policy that Anthropic has or the preparedness framework that OpenAI has.

These are all kind of a similar family, a similar general approach to safety.

The frontier safety framework at this point is just six pages.

And it's kind of a promise to write a full suite of if-then commitments next year, which I had of releasing models that would really be jutting up against some of these dangerous abilities.

When I spoke with Nick Joseph from Anthropic a couple of months ago now, the thing that I pushed him on a bit was, is it really responsible to leave the question of how dangerous these models are to the company that is developing them to decide?

Obviously, they have an enormous conflict of interest or an enormous challenge in evaluating faithfully whether the product that they have sufficiently dangerous that maybe they shouldn't release it and start selling it.

Ideally, surely, you would hand it off to at least some sort of external party that would be able to take a more impartial perspective on it.

And to be honest, I haven't really found serious people who would say, yes, total self-regulation is the future and the way that we should stick with.

Does Google DeepMind taking any steps, I suppose, to paint a path towards making some of these decisions external and taking it out of their hands to decide whether their model has passed a particular threshold of risk and possibly costly countermeasures are now called for?

Yeah.

So I would say, Google DeepMind and Google's perspective is that the governance of Frontier AI ultimately needs to be multilayered.

It will be an important role for government regulations, for third-party evaluations and other sort of nonprofit input and other external assessment of the properties of Frontier models.

I would characterize the situation as in many ways the companies have been pushing the conversation forward, which is an invitation to other groups, academia, nonprofits, government, to be part of that conversation.

So it's less the company or Google saying, this is only for us to do, but more, this is our first best guess of how future governance of Frontier systems could look like, what the properties of it will be.

The contributions Google makes are contributing to the conversations hosted by the AI Safety Institutes, often called ACs.

So we've been in extensive dialogue with UK, AC, and USAC.

There's a number of AI Safety Institute conversations happening in November that Google is contributing to.

And more broadly, I would say the goal is not and should not be for each company to have its own framework.

That would not be beneficial for the companies or for society.

Rather, we need to converge on an appropriate standard that balances various equities, safety, responsibility, innovation, appropriate levels of deployment, and societal oversight.

One other aspect of how we're contributing to that broader standard conversation is the founding of the Frontier Model Forum, which is an organization for the companies developing Frontier Models to work through what should be the standards and safety for Frontier Models.

And part of that is my team was directly involved in standing up the Frontier Safety Fund, which was another one of these differential technological development bets to say, can we on the margin put more money towards safety standards and safety research around Frontier Models?

This is slightly off topic, but something that's a bit hard for the Frontier Safety Framework or these dangerous capability eviles to pick up is something called structural risks, which actually is a name and an idea that you've been a pioneer in.

Can you just explain what they are and why it's a little bit hard for them to be done at the company level rather than the government level?

Sure.

This paper was published by Remco Zwetsloot and myself in 2019 and was an attempt to add an intellectual tool to Toolbox to help people when thinking about risks.

Because up until then, the primary conceptual framework was to either refer to misuse harms or accident harms or risks.

So the way I would offer, I would conceptualize these is misuse is when someone, a bad actor, a criminal uses the tool on purpose in a certain way to cause harm.

There's a subclass, which is I would say criminal negligence, where the person doesn't intend to do harm, but if they were more pro-socially motivated or responsible, they would have averted the harm.

This is like someone causing a car accident because they were texting.

So they weren't intending to cause the accident, but in many ways, the fault falls on them.

Whereas accident harms, you can often think the key proximal cause of the harm is some property of the technology.

An engineer could have caught this risk potential and put in a warning indicator or a guardrail to avert it.

And I would say much of the conversation centers on those two perspectives.

Either accidents or misuse.

The notion of structural risks is not an exclusive category.

It's not rather, it's kind of an upstream category, which says often the conditions under which a technology is used are shaped by certain social structures, which make accidents more likely or less likely or make misuse more likely or less likely.

And it's important that we also look upstream at those structures to see when we're in a regime that might be promoting certain harms.

One example I use is the Cuban Missile Crisis, where you had these legitimate leaders of the state of the United States and of the Soviet Union acting on behalf of their nations in a way that exposed their countries and the world to this risk of nuclear war.

And so I would say that example was not neither misuse nor a technical accident.

It wasn't that the nuclear weapons were built incorrectly or there was some flaw in their design, nor was it these were criminals who were using the technology in a sort of illegitimate purpose.

Rather, in that case, they were in a structure, a geopolitical structure that made them choose to engage in this dangerous brinksmanship with each other.

There's a lot of other examples where, yeah, the structure of the economy or of different kinds of interactions lead the system to be deployed in a way that has greater accident risk or greater misuse risk.

Yeah.

I guess that EVALS might be able to pick up that a model has an ability that could feed into one of these structural risks.

But because it's occurring at the societal or global level, it's not really possible for GDM to be evaluating what the entire societal effect, I mean, it's difficult for anyone.

But it kind of has to be, governments, this is their remit to think about because they're the ones who are best placed to address it.

Yeah.

And because structural risks are often at the societal level, they're typically often emergent and kind of indirect effects of the technology, which means they're often very hard to read off of the technology itself.

Another example that I find compelling is the risks from the railroad.

So the railroad per se, it's hard to read off these steel rails in a train, the impact it had on geopolitics at the time, namely, according to a number of historians, it increased first strike advantage that once whoever kind of mobilized first had a great advantage, it was argued.

And furthermore, it made the decision to mobilize irreversible because of the difficulty of stopping the deployment schedule.

So these were consequences of the railroad, which are very indirect.

You don't, you know, like a railroad level eval would not tell you this implication because it bounces off so many different aspects of society.

So as you say, I think structural risks require a societal level solution typically from government.

And if the structural risk spans nations, then at the international level.

Coming back to Google's approach to safety, when I read the pause AI folks and people who were sympathetic to that, I think a key underlying grievance, a kind of fury that I hear coming across is a sense that what is happening with research into frontier models and AGI is kind of democratically illegitimate because it's posing all of us in the UK and US and around the world potentially to a disaster that could kill us.

But we haven't really been consulted on a level that would be commensurate with the risk that they believe is being run.

And even if you did discuss it in the UK and US where much of the work is happening, well, people who are overseas and other countries who are also exposed to the risk of catastrophe here are barely being consulted and their voice doesn't really matter.

So they would at least say, you know, we should have like a more affirmative sign off by Congress or parliament that this work, that the benefit exceeds the risks and we should be going ahead.

But ideally, probably we should be talking to a whole lot of other people elsewhere who are also going to be heavily affected by this.

Yeah, what do you make of that frustration that people feel?

Yeah, I think a lot of people are concerned about the direction technology is going and specifically about AI in surveys.

And that's an important perspective to take seriously and to address.

Some extent it's a matter of education and to another extent it's a matter of improving governance and making sure the benefits for many people, their concern is about labor displacement or various near term impacts.

And I think an important response by companies is to make sure that the benefits are being realized.

So this is part of why Google and Google DeepMind invest so much in AI for science, most maybe notably in AlphaFold, which is the first AI contribution to win a Nobel Prize for its impact.

And then there's this other Nobel Prize in physics for advances in AI.

So AlphaFold, for the listeners who don't know, is this narrow AI system that enables a very cost effective prediction of the protein structure of virtually every protein relevant to science and medicine and has, it seems, unlocked a lot of potential advances in medicine.

So I think in the future we'll see all the potential benefits that come from this kind of development.

But generally I would say the, I think, advocates for AI often point to AI for science.

This has been expressed recently by Dario Amade, the CEO of Anthropik, that the benefits of AI for science are potentially very large and should be encouraged.

In terms of the more broad question of the right way to democratically develop technology, I think this is a very challenging question.

In my earlier scholarship, this is, I think, from the '60s or '70s, there was ideas of having citizen councils or citizen juries for technology.

So you'd have a representative group of citizens who would be paid for taking the time to learn about a technology and then reflect on how it should be developed and then how it should be deployed in their community.

More recently there's a number of work streams that carry on this tradition that look at having different kinds of pluralistic assemblies and try to host this democratic conversation which requires, again, educate, providing the space and the resources to educate the community and then this institutional framework to allow people to express their views and then allow that to sort of not just be a lot of noise but aggregate up into guidance to policymakers.

In general I would say, you know, different countries are approaching the democratic governance of technology in different ways.

We have freedom of the press, an assembly.

We have social media and conversations there.

We have nonprofits doing research and staking their position.

We have these government agencies, the AI Safety Institutes and many other regulatory agencies that are looking at technology.

So I wouldn't say it's, you know, a case where companies are unilaterally, you know, without any kind of democratic involvement proceeding.

So it is important that people involved in PAS AI and other groups find ways to engage the democratic process most productively.

And so a question that needs to be answered is what are the appropriate mitigations for these very powerful models?

And I don't think that question can be fully answered by companies or even narrow governmental agency.

We really need the scientific community.

We need people involved in building medicines and who can see, you know, that side of the ledger who are building cyber defense and also those who are focused on biorisks, cyber risks and so forth, all in a conversation together, hopefully reaching a scientific and policy consensus on what is the right way to proceed with these models as they gain new capabilities.

I think something that has turned the worries about this democratic legitimacy issue up to kind of a higher fever pitch is that almost all of the AI companies that maybe met aside have talked a big talk about how they would like to have, you know, binding legal constraints on what they can do and, you know, heavy involvement in government and oversight and so on.

But they are yet to lobby actually in favor of any particular legislation that truly would bind them and be very costly and restrict what kinds of models that they could train.

And I guess that they've turned down perhaps opportunities to argue in favor of legislation that has been proposed.

I think and the worry, understandably, is that the companies will always say they're in favor of this, but the day when they'll actually support any specific constraint, when they're actually willing to put the handcuffs on themselves may simply never come.

I'm not sure whether you want to comment on that.

But I do think that at the point that you actually saw a company saying, yes, this is the specific regulation that we want to like bind ourselves.

We want it to apply to everyone.

It would reassure a lot of people who are currently unsure how much they can trust these quite powerful entities.

Yeah, I think it's an important question to ask.

I wouldn't say, put it as starkly as you did in that, speaking for Google, Google has engaged proactively in a number of conversations that do affect their sort of freedom of action, most notably the White House commitments, and then there's a number of other sort of commitment-like moments.

So one was around the UK Safety Summit, the first Safety Summit, and then there was a second round of commitments at the South Korea Summit.

And in both of these cases, they were pointing to Frontier Safety Framework-like developments, asking where Google committed to doing forms of red teaming, self-evaluation, and then working on mitigations for the risks as they emerge.

Speaking for Google, I was very impressed with the extent to which the internal sort of Google operations took the White House commitment seriously.

So after they were signed, there was a real effort internally to identify all of the commitments that were made, spell out what they meant, and then make sure there was work streams making sure we lived up to them to a very significant extent.

Similarly, the Frontier Safety Framework, I don't think is pure cheap talk.

It really does represent an expression of Google and Google D-Mine's sense of the nature of the risks.

It offers an attempt to develop this safety standard for future risk assessment.

And so it is a form of proto-regulation or safety standard that later could be significantly constraining of the industry.

One other aspect worth noting is there is the EU AI Act, which is currently negotiating the code of practice, which is a quite meaningful form of regulation where Google is participating in helping the EU AI Act think through what would be the right way to have that code of practice operate.

So a worry that I used to have and maybe have a little bit less is that improvements in algorithmic efficiency and in dissemination, proliferation of serious compute might make a lot of these rules and the Frontier Safety Framework not as effective as it might otherwise be because even if a big organization like GDM has these quite stringent controls and all of these evals and precautionary measures and so on, if another organization can train a model that's as powerful as the Frontier one for 10 million or 1 million or even 100 million, that means that there might be just a wide proliferation of these kinds of dangerous capabilities and if any one actor is not willing to follow the rules and is going to defect from them, then that could still be quite disastrous.

How worried are you about that failure mode?

I think the proliferation of Frontier model capabilities is a key parameter of how things will go and viability of different governance approaches and solutions, which is again why I think I would channel this question back to the question of what's the appropriate approach to open weighting models because that is probably the most significant source of proliferation of Frontier model capabilities.

Moving aside the open weight question, there's this secondary point that you point to, which is this exponentially decreasing cost of training models.

So Epoch AI, which is a group that does very good estimates of trends in hardware and algorithmic efficiency and related phenomena, finds that algorithmic efficiency is increasing three times per year, so the cost of training a model decreases by roughly 10x every two years.

And by that, if that trend persists, then a model that costs 100 million to train today will cost 10 million in two years and then 1 million two years after that and so forth.

And so you can see how that leads to many more actors having a model of given capability.

Some analysts of the situation perceive this to be very concerning because it means whatever novel capabilities emerge will quickly at this rate diffuse and be employed by bad actors or irresponsible actors.

I think it's complicated.

I mean, certainly in a sense, control of technology is easier when it doesn't have this exponentially decreasing cost function and there's other sources of diffusion in the economy.

However, two years is a long time.

A two-year-old model from today is a significantly inferior model to the best models.

And so we may be in a world where the best models are developed and controlled responsibly and then can be used for defensive purposes against irresponsible uses of inferior models.

I think that's the one condition.

Yeah, I mean, I guess that makes sense.

It's a picture that makes one nervous because it means that you always have to have this kind of hegemon of the most recent model ensuring that what is now, I guess, an obsolete but still quite powerful model from two years ago isn't causing any havoc.

But I guess it probably is almost the only way that we can make things work.

I can't think of an alternative, to be honest.

Yeah.

There's other ways to advantage defenders.

So you can advantage them by resources.

This was Mark Zuckerberg's argument in favor of open weight models.

He would argue even if the bad actors have access to the best models, there are more good actors or the good actors have more resources than the bad actors.

And so they can kind of outspend the bad actors.

So for every bad actor dollar, there's 100 good actor dollars, which will build defenses against misuse.

That could be the case.

It depends on this concept of the offense defense balance.

How costly is it for an attacker to do damage relative to the cost the defender must spend to prevent or repair damage?

If you have this one to one ratio, then sure, as long as the good guys have more resources than the bad guys, they can protect themselves or repair the damage in a sort of cost effective way.

But it's not always the case that the ratio is such.

Often we think, I think with bioweapons, that the ratio is very skewed.

A single bioweapon is quite hard to protect against.

It requires rolling out a vaccine and a lot of damage will occur before the vaccine is widely deployed.

Yeah, in general, I think this field of offense defense balance is also something worth studying.

And I would say the field of AI has too often drawn an analogy from computer security where vulnerabilities are easily patched.

And so the sort of correct response in computer security is typically to encourage vulnerability discovery because then you can roll out a patch and then the new operating system or software is resilient to the previously discovered vulnerability.

Not all systems have that property.

Again, biological systems are hard to patch.

Social systems also are hard to patch.

So deep fakes do have a patch.

We can develop this immune system where we know just because someone does a video call with you doesn't mean that's sufficient authentication of the identity of the person, you know, the instruction you're getting to transfer some millions of dollars or so forth.

Champions have a lot of inertia in their systems and it's quite costly to sort of build the new infrastructure and then train people to use it correctly.

Yeah, you have a couple of really nice papers on this offense defense balance question, which unfortunately we couldn't fit inside the margins of the page this time around.

So we might have to wait for a third interview.

We're almost out of time, but yeah, just to finish up, I want to talk a bit about, I guess, advice for people in the audience.

As I mentioned, back in 2018, you were telling people they should jump into this AI safety and governance and policy work, which I think was excellent advice at the time and might still be excellent advice now.

I guess back then you were saying there were technical people perhaps in this area, but social scientists are less so.

There were some real gaps of different disciplines and areas of knowledge that just weren't really going into AI governance because I suppose they weren't hearing about it or it didn't seem as relevant.

Is social science still a big gap and other kind of big gaps where you'd really benefit having people have a particular sort of training or interest coming into the field?

Yeah, it remains to be a huge gap and it's hard for me to name a single area.

I think we need more political scientists, economists, historians, ethicists, philosophers, sociologists, ethnographers.

Really, because AI will and is having such profound impacts across society, we need experts in all aspects of society to make sense of it and to help guide Google and help guide government in terms of what's the right way to proceed with these technologies.

Yeah, are there any particular roles that I suppose you mentioned you're hiring right now?

Maybe not by the time that this episode comes out.

But what are the sorts of roles that GDM maybe has difficulty hiring for even despite its prominence where you'd be great if you could get more people into the field?

I think talent is in high demand everywhere.

Within I guess my side, we're looking for experts in international politics, in domestic politics, in governance of technology, domestic and international, in forecasting technology, in these sort of macro trends, in predicting the impacts of technology.

There's a lot of work on agents that needs to take place.

Thinking about autonomy, risks and safety, thinking about assistance, how those can be deployed.

There's such a rich space of interactions with humans and challenges and opportunities there to get right.

There's the ethics team at Google D-minus Hiring.

There's a lot of more ethics heavy questions and work to be done from an ethics lens.

Responsibility, policy work.

This is just on the non-technical side.

Within technical safety, there's a large portfolio of work that needs to be done.

All of the above.

Final question, or final section if you have time.

The show is like we talk about AI a lot.

Unfortunately, we do have a slight tendency to watch Doom and Gloom and always talking about the risks and the downsides and so on.

I suppose we talk about that because we worry that it's neglected in the broader conversation.

Maybe it's the only thing that's stopping us from achieving these enormous possible gains that we can get from AI and AGI.

I guess in the interest of having at least a modicum of balance, what are some of the applications of AI that you're excited about?

Some applications that you would really like to see sooner rather than later?

It's a great question and it's important to keep at the top of our minds.

Hopefully readers and listeners will get this message.

This is why we would want to build AI and advanced AI and AGI is that the benefits could really be profound and will be profound if built safely.

To motivate it, personally, I've found the experience of riding in a Waymo self-driving car to be really impactful.

I've noticed a trend of tourists to San Francisco to report back on this historical moment when people are able to ride in a car that's driving itself going back in time.

This is a new phenomenon, something new under the sun.

It's quite incredible that the safety has reached the number of nines required to do that safely.

Waymo recently put out a safety report documenting how many fewer crashes that have injuries or just crashes that involved police coming to the scene.

The latter was 2x less.

There's two times fewer incidents that involved police coming to the crash scene.

Then I believe it was six times fewer crashes involving an injury.

This is one quantified benefit if we can have fewer crippling car injuries that would be profound.

I'm imagining a world where we don't need to expend so many resources building cars that mostly sit in our parking lots.

We don't need to dedicate urban space to cars just waiting for us to end the workday.

Instead, we can reclaim that urban space for other more productive purposes.

That's just one technology.

Of course, there are challenges that we need to think through there as well.

I think medicine and health is another domain that could really be profoundly beneficial.

I personally look forward to the day when I can consult a doctor on your phone and ask, give it my symptoms, and it can assure me, no, these are reasonable or incoherent symptoms.

You don't need to worry about it.

It can also advise me when it's time to see a doctor, thereby saving the medical establishment the time costly activity of seeing patients and allowing us to use the human doctor's time more effectively.

I already use LLM's call for health advice, but I guess, is it DeepMind that developed MedPalm too, which I think is kind of the state of the art of this sort of, yeah, GGL-like medical consultancy?

So that was from Google, and Google has this, MedLM is the broad name for its work on language models for medical purposes.

And in general, Google has a quite extensive health portfolio.

That one's not publicly available yet, right?

But I guess, I imagine Google's working on it.

It's my understanding that MedLM is available to certain users.

So yeah.

Maybe I might get to give it a whirl at some point.

I suppose it's something that's, I've said this on the show before, but it's something that I think about the most perhaps is, I've got a kid, almost one year old, I suppose I'll be in kindergarten and then at school in a couple of years' time.

And actually thinking about what is it like at a typical school that might be in your district, it's not necessarily always going to be the very best education.

And there's lots of things that are kind of unpleasant about school sometimes.

So I'm kind of hoping that by the time they would be attending school, there will at least be an option for them to get tutoring and assistance from AI models that could be as good as the best teacher in principle, at least if they were designed really well.

I suppose there's issues with classroom management, but you could also imagine they might be able to be much more engaging and entertaining and interesting than a real life first grade teacher might be.

I suppose you have three kids as well, and I imagine that the oldest might be approaching school age pretty soon, if not already.

Do you think they're going to be actually taught by a LLM teacher at any point soon?

I think AI in education is a huge opportunity, for reasons you mentioned.

Another data point here is tutoring is hugely impactful, or small class sizes is another way of putting it, for student success.

And so having an AI tutor that can complement the school system could be very beneficial.

It would flag mistakes that the student makes and then can provisionally point out where they went wrong or show different ways of proceeding, I think could be very beneficial.

And like you say, just having that option, having that available, and perhaps in a more engaging way or in a way that's more compelling to students can be beneficial.

I think we've seen this with the internet in general, that the internet has provided resources for learning and then platforms like Khan Academy and a lot of pedagogical videos have been, I think, very beneficial for student learning.

Yeah, I think tutoring I think is slightly famously, especially useful for exceptionally strong students who can potentially rush ahead if they have a tutor available to help them.

And also for students who are really struggling and might, if they're stuck in a class where they're significantly behind, they're just not going to be learning almost anything because they just don't have enough understanding to even track what's going on.

So you could see it benefiting different groups in really quite enormous ways.

There's one other area I think I would want to call out, which is AI for sustainability.

So Google DeepMind has all these interesting projects and they read like science fiction.

One is on controlling the plasma for nuclear fusion, which if you can do it more effectively, can make fusion a more viable source of energy, which could be extremely beneficial for sustainable energy production.

Google DeepMind had these advances in weather prediction, making weather prediction cheaper, faster and more effective, which can be beneficial for event planning, but also wind power and solar power and other sources.

Advances in material sciences, finding new materials, which could be used for again, solar power or other aspects of the economy that could make things more efficient.

Google had this results on optimizing flight paths to reduce control production, which is apparently a large source of greenhouse gas dynamics.

So there's data center optimization.

I think Google DeepMind had a result several years ago reporting quite significant gains in how data center cooling was done in terms of energy use.

So you have all these efficiency gains in energy use production that can really add up to help humanity address these significant challenges.

We haven't even mentioned Alpha Fold 2, which I haven't followed this in detail, but from what I've heard, it's incredibly useful for medical research and for drug design and so on.

We've just got to do all of the stuff discussed in the last few hours and then we can enjoy this wonderful, brave new world of all these fantastic new products and a much better quality of life, fingers crossed.

Yes.

I guess today has been Alan Defoe.

I look forward to talking to you again next time on the show.

Thanks, Rob.

Look forward to it.

It is both energizing and enlightening to hear why people listen and learn what they value about the show.

So please don't hesitate to reach out via email at [email protected] or you can DM me on the social media platform of your choice.