The Cognitive Revolution · 2026-05-20

DeepMind on Gemini 3.5 Flash, Omni Video, Agent Harness & AI Strategy

Hosts: Daniel Jeffries

Guests: Logan Kilpatrick, Tulsee Doshi

Gemini 3.5 FlashAgent HarnessMultimodal AIVideo GenerationRecursive Self-ImprovementContext WindowsModel PsychologyGoogle AI Strategy

Why it matters

Gemini 3.5 Flash model launched prioritizes speed, cost-effectiveness, and broad usability over absolute peak capability.

Key claims

  • Gemini 3.5 Flash model launched prioritizes speed, cost-effectiveness, and broad usability over absolute peak capability.
  • Agent harness infrastructure (Anti-Gravity) standardizes and powers agentic AI experiences across Google’s product suite, enabling faster iteration and integration.
  • Gemini Omni introduces native video generation and editing with multimodal understanding, aiming for a breakthrough in video AI similar to the 'nano banana' image generation moment.
  • Google balances frontier model scaling with serving billions of users, focusing on latency and cost trade-offs rather than only pushing the largest models.

Episode summary

Summary

In this in-person episode recorded at Google headquarters, DeepMind's Logan Kilpatrick and Tulsee Doshi discuss the upcoming launch of Google's Gemini 3.5 Flash model and related AI product integrations announced at Google IO 2024. They emphasize the strategic focus on cost-effective, fast models like 3.5 Flash that balance performance and latency to serve billions of users across Google's diverse product ecosystem. The conversation highlights the integration of models with a robust agent harness infrastructure, enabling standardized, agentic AI experiences across Google products such as the Gemini app, AI Studio, and search.

They also introduce Gemini Omni, a new video generation and editing model aiming to create a 'nano banana moment' for video by tightly integrating language, reasoning, and pixel-level understanding. The guests discuss Google's approach to recursive self-improvement, model psychology, and welfare, emphasizing a collaborative human-AI research process rather than fully autonomous AI-driven development. They address why context windows have plateaued, the rationale behind model size choices (no ultra model yet), and the ongoing research into diffusion models and multimodal AI capabilities.

  • Gemini 3.5 Flash model launched prioritizes speed, cost-effectiveness, and broad usability over absolute peak capability.
  • Agent harness infrastructure (Anti-Gravity) standardizes and powers agentic AI experiences across Google’s product suite, enabling faster iteration and integration.
  • Gemini Omni introduces native video generation and editing with multimodal understanding, aiming for a breakthrough in video AI similar to the 'nano banana' image generation moment.
  • Google balances frontier model scaling with serving billions of users, focusing on latency and cost trade-offs rather than only pushing the largest models.
  • Recursive self-improvement is part of Google’s roadmap, with Gemini models already assisting in research productivity, but human researchers remain central in the loop.
  • Context window sizes have plateaued due to cost and latency constraints; smarter context selection and retrieval-augmented methods are prioritized.
  • Google collaborates with external search providers like Exah for grounding in enterprise contexts, reflecting ecosystem openness despite internal capabilities.
  • Model welfare and psychological behavior are actively evaluated to ensure safe, helpful, and consistent AI interactions, with feedback loops from product teams driving continuous improvement.

Source material

Transcript

Hello, and welcome back to the Cognitive Revolution.

Today, after some 340 episodes, I am very excited to share the first episode that I've ever recorded in person.

With fan-favorite Logan Kilpatrick, member of Technical Staff at Google Deep Mind, and Tulsi Doci, Senior Director and Head of Product for Gemini Models.

The occasion for this conversation is Google's annual IO event, where they're launching the new Gemini 3.5 flash model, all sorts of agent infrastructure and AI product integrations, and plenty more.

We've recorded on Friday, May 15, just a couple days before the event, and while many at Google, including my brother Craig, who's giving a keynote on Wednesday, we're working overtime to polish their demos and presentations.

The overall vibe, at least compared to the rest of the AI space, was one of relatively relaxed confidence.

And why not?

From 2024 to 25, Google grew annual revenue by $50 billion, as much as anthropic is pulling in today, and they still have 25% of all global compute.

The deepest pool of research talent anywhere, and the most comprehensive AI portfolio of any company, with top to your positions, not just in language models, but also self-driving cars, medical and life sciences, and robotics.

So, after discussing the headline launches that they're announcing this week, which also include a new video generation model called Omni, which they hope will create a nano banana moment for video.

A new and improved and more agent-focused nati-gravity, and a product called Spark, which will bring more agentic functionality to the main consumer Gemini app.

I really wanted to take a step back and dig in on Google's overall AI strategy and philosophy.

We discussed their decision to lead with the flash model, and more generally to emphasize the cost adjusted performance Pareto frontier.

Whereas anthropic and open AI are clearly much more focused on competing to have the single most capable model in absolute terms.

We talk about how deep mind is no longer shipping models in isolation and leaving it up to product teams to figure out how to use them.

But instead, now providing a robust agent harness, which should help elevate and standardize AI experiences across Google's fast-product surface.

We get into the weeds on questions like why context windows seem to have mostly stopped growing.

Why Gemini models knowledge cut off is now more than a year ago, and whatever happened to that diffusion model line of work.

Perhaps most importantly, we discuss how the team at Google relates to the AI's they're creating, how they're thinking about things like model psychology and welfare, and their views on recursive self-improvement.

Which, as you'll hear, is definitely a part of their plan, but not something that they seem to be so singularly focused on as other AI leaders.

Overall, I think this is a great window into the thinking that underlies Google's AI research and product development, which has clearly sustained the company's historic run far beyond the point that many analysts had written them off.

With that, I hope you enjoy my first ever in-person conversation with Logan Kilpatrick and Tulsi Doshy, of Google D-Mine.

All right.

Well, we are here live at Google headquarters in the library at Grady and Canofe, the first ever in-person recording of the cognitive revolution, Logan Kilpatrick and Tulsi Doshy.

Welcome.

Thank you.

This is an operating room.

This is the first and person on the side.

The 300 with 50 plus.

And it's all been from my home office into threatened until today.

That's awesome.

Well, thank you for being here.

This is a crazy space, especially around Iowa, it's a zoo.

Yeah, it's always, it's always a good time here at Google HQ.

So, you may or may not remember the no-motes memo.

We've just passed the three-year anniversary.

It was May 5, 2023.

And in the intervening three years, Google has added 3.5 trillion dollars in market cap, which is more market cap than all but two other companies in the world, those two are in video and Apple.

So, the most I'd say are holding up.

Here we are at I.O.

And I'm sure they're going to be some exciting new things, so that will be deepening the modes.

So, first question, tell me, what are we launching this week to try to deepen those minutes?

A lot.

So, a lot of the exciting stuff.

So, let's see, let's start with some of the modelings, side of things because that's really exciting.

We have our 3.5 series coming out starting with 3.5 flash at I.O.

We're really excited about 3.5 flash because I think flash does this really awesome job of being at the sweet spot of being really smart while also being really fast and really cost-effective.

And so, flash is incredible.

It is like three times faster than other of the large models.

It's significantly cheaper for being able to still drive these like really awesome agetic and coding workflows.

And we've been using it internally a lot, which has been really fun to kind of see that play out.

So, that's one big piece, which is 3.5 flash, we're really excited about.

We're also releasing Omni Gemini Omni Flash, which is a video generation and editing experience.

What's really exciting about Gemini Omni in general is it's our push towards being able to bring all modalities in and all modalities out.

And the first way this is really manifesting is in this video editing context.

So, you're going to be able to make really awesome videos.

You're going to be able to put your own avatar into the videos, which is going to be awesome.

I've been having a much fun playing with that too.

We're continuing to upgrade anti-gravity and bring more into the developer experience so Logan can talk more about the developer experience overall, but 3.5 flash and anti-gravity are really going to come together to build something great there too.

And then we're in a in about a week or two coming soon is Gemini Spark, which also builds on 3.5 flash to build kind of more agetic experiences into the Gemini app.

So, we've got a slate of cool things coming.

Yeah, I think beyond the models, I think the other headline of the story is just like agent's agent's agents.

Like, you know, the meme of soon-dark from two years ago or last year, he's saying AI, AI all the time.

I feel like this year is agent's agent's agent's agent's agents.

And I think it's cool to see like this, I think this is the first year where we have, and actually I think this is not just us, but just like ecosystem-wide, this like model harness product symbiosis that's sort of taking place.

Like the model is sort of trained with the harness, the harness is powering the agentic product experiences.

Gemini Spark in the Gemini app sort of being one example of that.

It's powering the live coding 7AI studio.

It's powering the agent's API for developers.

It's powering.

I think there's something else.

Maybe that it's also maybe not a it'll roll out to other products across Google and sort of be this sort of like foundational layer to build on top of, which is really exciting.

So, I think not just like developer products, but our sort of consumer products.

And I think probably even more widely in the future across the rest of the Google products suite.

And I think there actually is one interesting thread of this, which is historically like Google didn't have this like through line of, you know, something that carries across all of our products.

I think then it was Gemini and sort of all of the sudden every Google product has Gemini and sort of getting them all stitched together, making all those products experiences great.

And I think now you're seeing that again with the anti-gravity agent harness and sort of as as products become agentic by default, you know how I have the anti-gravity agent harness being another through line through all of our products, which is really interesting.

Yeah, and I think it's been really fun from the modeling standpoint there is one thing that we did with Gemini 3 that we're really continuing.

I think with the 3.5 line is really bringing the model to every all of our products.

So, you know, 3.5 flash will be in Gemini app.

It will be in AI mode in search.

It's also powering anti-gravity.

It's also powering agentic experiences in AI studio in, you know, Gemini spark.

And so I think really this idea of how do we build a model and then how do we build it in partnership with the harness such that it actually works across all of these product surfaces, which actually have very different, you know, users in very different goals.

I think it's actually really, really awesome.

It's really hard, too.

I think I think it's actually gotten harder to do.

I think it's like it's almost like it's it was maybe telling in cheek.

It was kind of easy before because you just launched the model on a couple of surfaces and sort of it wasn't that bad.

I feel like now it's like you're sort of you have the constraints of like the very wide array of Google products that are just like for totally different users.

And sort of I think actually that, like, credit to the model team sort of like trying to, you know, find the fine line for all these different places because we're not just building for search, we're not just building for developers, we're not just building for cloud customers, it's not just for the Gemini app, it's like all of them at the same time, which is just exceptionally a lot of work to pull off that story on a consistent basis, which like from Gemini 3, 4 would have been the story, which is, you know, which is exciting.

I think one other thing I'll say is like one thing that's cool about this IO2 is what we're doing across modalities.

So, you know, looking at Logan said agent agent, which I think is true, this IO2 is really about bringing models to action in that kind of real world sort of use cases.

But I think what's also cool is we have the flash model, which is really about building these kinds of coding and agentic use cases.

We have Omni, which is really about kind of what is this like multimodal vision look like.

And then also actually Gemini live is getting an upgrade to.

Gemini live is getting faster, it's getting smarter, the model is much better at detecting background noise.

So, it really does actually feel like a partner in a lot of ways.

And I think it's kind of cool that we're all able to draw this through line across the different ways you might want to interact with a model and the different kinds of ways you might want to consume content, which I think is also really cool.

Okay, I've got like seven different directions that I want to go.

How about that in a follow-up?

But trying to seed them all for a bit.

Yeah.

Let's start with just the model.

So, it's interesting to start with flash.

One thing that I recall, I don't know if it was two Ios ago or whatever, but there was going to be three sizes of Gemini model at one point in time.

And there are three sizes.

Yeah.

Pro flash flash light.

Yes, but we never saw the ultra.

It's kind of what I'm what Amolidas.

Yeah, three were promised and then we had it, we took one off the top and added one at the bottom.

Deep thing too, deep thing too, actually, which is like a fourth scaling dimension from a model perspective, same.

That's a run that I'm skilled, though.

It is for sure.

Yeah.

Well, I guess two questions on like, why no ultra?

One is like, is it a compute limitation?

Like, how are you guys thinking about which model to release?

I, you know, in addition to the four point five trillion and therefore three point five added four point eight trillion market cap.

Google enjoys 400 billion a year in revenue.

I was interested to learn.

And, um, and product, though, is growing extremely fast.

Like, they might hit a hundred billion at the end of this year, but maybe even, you know, in the third quarter who knows.

And it seems like the revenue there is really driven by people's extreme willingness to pay for the very best model that they can get their hands on.

Maybe not at any cost, but like, relatively price in sensitively.

So I'm wondering, like, why no ultra?

That seems like that would be just a killer.

And yeah, we haven't seen it.

I promise I didn't plant this fashion.

I'm always like, you know, poking tells you on the side.

And I, I wanted one.

This is my favorite question.

So I'm glad I'm glad you're going to.

No, I think, you know, you're, you're right that I think, there is a slice of of users who are definitely willing to pay for a certain level of quality.

And I think we really do believe that the pro model has been like really pushing that quality.

But I also think for us, we've seen so much value from the flash and the flash light dimensions, because we also see a extremely large number of users, especially if you're thinking about building, for example, consumer applications.

Right?

If you think about the Gemini app, if you think about search, when you're serving to that kind of scale, latency really matters.

Cost matters, right?

Because actually, you find that users aren't willing to wait.

Right?

So we find that even when we tweak the model and, you know, hurt latency, we actually see that play out in our live experiments on search in the app, even if the model is hugely better from a quality perspective.

Because what you're asking users to do is wait.

And so I think for us, like part of the reason why we ended up introducing this flash light skew, that wasn't necessarily part of the, you know, the original 2.0 series was because we really felt like there's actually a large scale demand for this, depending on the types of use cases, especially when you're, when you're talking at that scale.

And so I think for us, it's really important that we're pushing the full range of what kinds of customers we can serve both internally and externally.

Like for our products, the flash and flash light skew matter a lot for our ability to actually serve to the Google populace.

And so we also imagine that that's true for external, you know, enterprise and developers.

And I think that's played out to be true, you know, as we've been, been actually seeing this in action.

Yeah, I think the two things that I'll add is, and there's like probably a more nuanced technical story on sort of like the ultra thread.

But it's not like it's also not like the pro models haven't scaled up over time.

So like I think there is like there's, you know, there's a story that you can spend at the end of the day the naming of these things is like marketing.

Like they definitely are getting like extremely capable, they're getting larger, you know, they're getting more powerful.

There's, you know, the test time scale, test time compute scaling with deep think, et cetera, et cetera.

And all types of stuff in that dimension.

So I think it is possible you could sort of like put the ultra brand on some of these things.

I think we've just had it, the decision so far has been not to do that.

But it hasn't been that like we haven't kept scaling up.

So I think the definitely has.

Yeah, there's almost been a conversation every time we scale up of like, should we call it ultra?

Yeah.

And what does that brand mean?

Because we could, but there's sort of also a question of, how do we keep consistency for users also kind of series to series?

Yeah, and I think actually also to re-articulate a point that tells you, mid.

Google and specifically Google DeepMind's mission is to like build a, or possibly and make sure it benefits like all of humanity.

And I think like that is like so deeply tied to the like Google product surfaces in which like we're serving, what is it, like, eight to plus billion user products or whatever it is.

And so at the same time that like obviously the frontier matters, obviously having great models that are really expensive and really, really intelligent matter and there's tons of use cases for that internally and for our customers, you also need to do the scaling up to billions of users for us to like actually do the thing that Google needs to do to achieve the mission.

And I feel like we've, we've done a good job hopefully like trying to walk the fine line of actually continuing to push the frontier and build great flash models.

And I actually think those two things are like more tied together, you know this better, more than I do, but like more tied together technically like it's, you know, it's hard to make great flash models if you actually don't have a great pro model and advice for us.

So we'll definitely keep pushing the frontier on, on both of those things.

Yeah.

I mean the perception from outside is by analogy to it's hard to make a good flash model with a good pro model.

People sort of think that there's like an ultra model internally that's the mega training run that's then being used to like help train pro, which maybe in terms of being used to help train flash.

Is that true?

Is there like a bigger thing inside but is only for these sort of distillation to the mid-size?

I mean we definitely use distillation as a way of kind of bringing down bringing down our sizes.

So you will see that like pro influences flash influences flash.

Like we also do the reverse where we scale up right.

So you take the pro, you take the flash recipe and scale up to the pro recipe for example.

And we do have I think what's been really fun especially over like seeing as we've used even anti-gravity into this point of Logan made with the harness.

I think we've been seeing a lot of examples actually of leveraging pretty awesome models to drive progress internally.

Actually like one thing Varun demos on stage on Tuesday is basically like being able to leverage a bunch of sub agents to go and complete a bunch of tasks and come back.

And you can actually try that as like an early preview and anti-gravity today if you go to slash teamwork.

And like that's an example I think of something we've been using internally which is an extremely smart model and it leverages both the combinations of the best of Gemini 3.5 as well as inference techniques and like you're able to actually accomplish so much.

And I think that's that's the kind of direction I'm excited for us to go into more.

So I think we're kind of pursuing all of these friends we're scaling up from from the pre-training and kind of frontier perspective and I think that's been really continuing to show gains.

There's a bunch we're doing on the post-training side and then there's also just a bunch we're pushing on on the inference side and then that plus you know trying to make sure we're working with the harnesses.

I think we're going to keep getting things that we're using internally that we then start to push out externally through previews.

Hey we'll continue our interview in a moment after we're with our sponsors.

The cognitive revolution is brought to you by Brave.

If you want to stop hallucinations and power your AI agents to do their own research with the Brave Search API.

Brave offers the only search API with its own index at scale.

It's lightning fast, excels in rag pipelines, and it's a leading search option for Cloud MCP and OpenClaw.

I've built Brave Search into my personal AI infrastructure as a core tool that all agents can use any time they need it.

To find guest headshots and company logos for the podcast, they use Brave's Image Search.

To build small business profiles for use in my Wemark prototyping work, they use Brave's Play Search.

Across all use cases, my agents tap into Brave's index of 40 billion high quality pages tens of times per day.

It's the only global scale index outside of big tech, which means no Google scraping and no SEO spam.

Plus with true zero data retention policies, you can meet compliance obligations and rest easy.

Pricing starts at just $5 per thousand API calls and you only pay for what you use.

Sign up now and get $5 in free credits to start and empower your agents to start calling the Brave Search API today.

Most billing platforms were built to send invoices and assume your pricing is simple and predictable.

But if you're building an AI product, a FinTech tool or a developer platform in 2026, your pricing is anything but.

Usage tiers, consumption billing, and bespoke enterprise contracts are now the norm and you're probably managing it all across disconnected tools and fragmented systems.

Sequence handles the entire revenue workflow from contract to cash.

Quoting, invoicing, metering, revenue recognition, plus sequence agents that automate the manual finance work that usually takes teams days each month, while also helping them to collect cash faster.

Companies like cognition, incident IO, runway, and open router use sequence to run their full revenue process between CRM and ERP without the spreadsheet mess.

If your pricing has gotten more complicated than your current billing set up can handle, check out SequenceHQ.com.

And use the code cognizant in the source field when you book a bubble demo to save 20% off year one.

So what's talk harnesses?

It seems I was just talking to Andrew Lee, who's a founder of task with David A.

And he said, fundamentally, everyone these days is building the same thing.

They're trying to all build the general purpose, drop in knowledge worker, and so that's got to have the intelligence at the core.

And that's got to have all this.

He calls it the mechasuit that is built around it.

So this harness sounds like the mechasuit that you guys are developing in-house.

And I guess first question is like, is this going to create silos?

You know, we've lived in this world so far where I could kind of mix and match my models and my infrastructure.

Right?

I could go to Langchain or I could use tasks, but I could use whatever, and I could pick whichever model and plug them in.

But as they get more deeply co-trained with the harness, there's this create kind of siloed worlds where you're kind of all in on one frontier model companies stack or another.

And if so, that would, like, have pretty significant implications for kind of switching costs and stickiness and pricing power of the, of the frontier model creators, what's your take on how, how sticky things are going to get?

It's a good question.

I mean, I think, again, Tulsi probably knows better than me on this.

But I think the best case is like, you can do both.

Like, the best case is like, it works really well for Gemini and sort of we can, you know, sort of do the things we wanted to do to scale up because we do have sort of control over the sort of full stack AI story as it's in our likes to say.

But then also it generalizes across other stuff.

Like, I think the developer ecosystem, people want toys, people want to have flexibility, these tools, there's lots of use cases.

Actually, there's like, you know, philosophical questions of like how, how good really is your model if you can't generalize to sort of other harnesses.

But yeah, I don't know how much.

Yeah, I think, I think that's the right.

I think I fully agree.

I think actually like, maybe to double click on what Logan said originally, right?

The benefit of the full stack that we have is we can hopefully build a really seamless experience, right?

And you get the best of Gemini, you get it working in the most effective ways for you.

You get it working in a way that is intuitive, is smart, is fast.

And so that also helps us then train the model to be better, right?

So this becomes this like flywheel that continues to power the model.

At the same time, I think we don't want it to only be the case that the model works in a single harness, right?

So we want any of our enterprise customers or a developer who's building their own use case to be able to leverage Gemini effectively.

And so it is important then from a model standpoint that we're training in such a way that we actually like we sort of call it like harness diversity, right?

We should be able to support a range of different approaches to tooling, to different approaches to orchestration, et cetera.

But I think what's helpful about this approach of kind of co-training and building that flywheel, it's easier to debug.

It's easier to think about data collection.

It's easier to evolve.

You can just move at a faster pace.

And I think we're seeing that across the industry.

And so finding that balance is important.

But I think it just helps build to make the model better.

Yeah, I think there's a good, this is also a good pitch for like a harness bench.

If that's not a bench mark that exists, let's somebody, somebody built harness bench.

Yeah, I would love to would love to collaborate if folks are interested in that.

Because I do think it's like a great test of like a Demis has sort of this perspective from a gate for games.

Actually as an example, like if models are so good, like why can't they play games really well?

And sort of if models are so good and we're actually approaching a GI, like why even if you do sort of the model harness training symbiosis, you still expect it to generalize reasonably well in other harnesses.

And if you can't, that's actually like, it's another sign of sort of the jagged intelligence.

So I think it'd be cool to see this like play out from an actual benchmark perspective.

Could be also perhaps productized as an RL environment and sold into you guys that what's quite the coverage industry these days.

So obviously the other big thing that I think is very much in the air.

And actually the reason I'm here this weekend, when we originally planning to do this remotely is I'm going to this event called recursive, where the topic is going to be recursive self-improvement.

And hopefully how we can navigate it successfully.

How, but in is Google the mind to recursive self-improvement.

Like when you're talking anthropic people, it's like they're almost religious about it.

When and also think it is see it as totally inevitable.

Opening eye has this later this year and in early 2028 timelines for like an ML intern and a like full fledged AIR and D employee, do you guys have like milestones or timelines for when you're going to hand off the ML research to AI's?

I mean we're already using Gemini like pretty deeply internally to improve Gemini.

And so I think that is very much a themed for us, which is like how can Gemini actually be a part of the Gemini development process.

And so that can include things.

I think that goes the full range from helping us be more productive.

So that's obviously like the simplest part of this to actually like you know submitting a CL that would actually like run any valve that would actually you know suggest a research improvement that would actually drive improvements to Gemini itself.

And I think there's a lot of ambitions we have to keep pushing in that research direction.

So I think very similar to the other other labs.

I think this is very much an area of investment for us in an area we're super excited about.

I think for me what's I'm really excited about is like I think there's this really awesome research partner opportunity that we have with Gemini right for it to help us with creative ideas for it to like help us test things faster.

Actually like it was awesome one of my co-workers, Ankha she's our lead for safety and alignment.

And the other day she I think maybe a couple days ago she ping me from her hot tub and she was like you know I could run all of these ablations from my phone because I could you know kick off a bunch of things to like actually a blatant Gemini to test for a bunch of these issues to see how you know some of a size differ or some data ablations differ and here's my report and I could do all of this in the last hour right and like that is amazing and that's the kind of thing that we can already do right so then imagine where we'll be in six months a year you know two years from now.

Yeah I feel like it feels like at least my personal perspective is it's it's like a very much more like practical perspective which like it's like as obviously as models get coding they're gonna go do things that is code related it's gonna they're gonna help us build our products they're gonna help us train models I think all the nuance of the stories and like sort of like where is the where's the human sort of in the driver seat of the stuff and I think like we are like the tools are built for the human to be in the driver seat which I think is an important thing as sort of we continue to go forward and also I think very genuinely though like and you know I think the model team and the researchers feel this more than ever like you definitely I think the the near term horizon is going to continue to be the human the driver seat because the the cost of these runs and like the opportunity cost of like going in the wrong direction and like putting a bunch of resources super super high and so I find it doesn't seem like super realistic in the short and medium term that you're gonna just like be letting you know large scale pre-turning jobs be kicked off by the ML intern that's gonna cost you you know X met many many dollars and lots of compute and taking it away from sort of the human research but like this like deep collaboration between AI and human human researchers I think it's like super obvious yeah there's also something really like amazing about how much that collaboration allows you to then focus on what is the interpretation of what you're seeing in the results where do you really want this to go strategically and so it changes a little bit of the role that the human can play which I think is also really powerful for our teams when you're doing research are you actually typing any code these days so it's interesting for me on the product side like uh on the code side for any code that I was already submitting I am mostly relying on anti-gravity and doing like bits and pieces more so bits and pieces myself but it's also been really cool to like start having the model generate slide decks to start generating actual kind of content from my thoughts we actually an anti-gravity today we introduced the Gemini mic so there's this like really awesome feature I don't know if you have been playing with it internally where you basically like ramble at the model so you like share a bunch of your your thoughts in whatever kind of loose form it is and then the model actually leverages that to take action and for me I've been finding that so much more powerful because I actually feel like I think a lot by talking and so for me like it's it's actually like a very it's like a very cool moment where I can be like okay I'm just going to sit here tell you what I'm trying to think through in my head and then have you actually bring that back to me in a way that is like recent and and well thought out yeah I feel like this this correlates so well to like I would love to see like a breakdown of like human type code versus like a generated code versus maybe there's like a divergence which is like audio audio input that then generated code and it actually very interesting that your point also is like I feel like audio input to being uh to generated output code has got to be like one of the fastest growing like input modalities um of what's happening and I find myself doing this all the time and like it is like the predominant way that I'm I'm building software at least when I'm not around a bunch of other people yeah I'm still typing things in so that it's not you're not really within it yeah they don't hear my my dumb ideas of the things that I'm trying to do um I don't know you see like if you walk around sometimes upstairs you'll see people kind of muttering at that I can't um yeah because they're they're now actually like I you know talking to create oh which I think is pretty cool in a school yeah one of my KPIs for myself for this year to really know if AI is improving my life is am I getting outside more and getting more exercise and I'll I'm starting maybe a little bit I wouldn't say I've won the game just yet but I still want to be able to like get my thoughts out so I think that is like the absolutely the frontier modality for me hey we'll continue our interview in a moment after where is my sponsors Visual AI is the ability for your software to not just store pixels but to actually understand what it's looking at one of our partners robo flow is the company making this happen they've built an end-to-end platform that makes it incredibly easy to go from a raw idea to a fully deployed application in just a few hours for example just look at blueprint pro they built an app to solve a major construction industry headache they're using AI to instantly understand a floor plan this was literally impossible just 24 months ago but now that visual artificial intelligence is accessible thanks to robo flow there are tons of new companies being built go to roboflow.com to read the full blueprint pro story and see how over a million engineers are building the next wave of visual AI that's roboflow.com today's episode is brought to you by end-thropic makers of clawed and clawed code over the last few months clawed has helped me build and refine a personal deep context database that now contains all of my emails slack messages tweets DMs across platforms video calls and podcast transcripts going back a full five years on top of that we've now layered summary articles describing my relationship with hundreds of contacts organizations and ideas and now that this exists there's almost nothing that clawed can't help with.

For taxis and I asked clawed to help me get organized it went through my inbox tracked down 1099s for all 10 of my part-time jobs and built me a comprehensive report on my expenses and donations.

For my angel investing clawed can now draft investment memos in exactly the form that my venture fund requires based on the calls I've had and the emails I've exchanged with the founders and when someone needs a favor clawed can often do it as well as I can recently a friend reached out to ask if I know anyone who might be a fit for a role that he is currently hiring for initially nobody came to mind but then I thought to ask clawed and sure enough it identified two great leads clawed is the AI for minds that don't stop at good enough it's the collaborator that actually understands your entire workflow and thinks with you whether you're debugging code at midnight or strategizing your next business move clawed extends your thinking to tackle the problems that matter.

So for problems we're solving get started with clawed at clawed.ai slash TCR that's clawed.ai slash TCR and check out clawed pro which includes all of the features mentioned in today's episode once more that's clawed.ai slash TCR.

So with the harness you said it's like now becoming this through line it's going across all Google product surfaces.

I would say as I'm sure well aware like commentary on Google's AI and engraations across its vast product suite has been that it is characterized by like some bangers and then there've been some which have been characterized as misses.

So presumably one of the benefits of the harness is that it's going to make it a lot easier for a sort of more standardized approach and kind of general high quality bar across all these integrations.

What would you say people should learn from the experience that Google has had to raise their own bar as they're going to go try and do these integrations themselves?

I think this is actually such a great story for us like I think very very practically like Google has done a ton of this infrastructure standardization across the AI stack over the last couple of years which I think has been awesome and I actually into the story it is like one of the threads of how we're able to land the Gemini 3 models across so many more products is actually because of this infrastructure standardization that happened and so we've gotten a lot of it's it's painful and difficult and there's of course lots of work involved in doing it but if you sort of pay that cost you actually do end up getting this and I think the the advice for for people who are in this position and sort of thinking about this is basically every 12 to 18 months now like you have to rewrite everything from scratch and so the best case is like you don't want you know end number of teams rewriting everything from scratch every time the paradigm shifts and the example historically the infrastructure was just like serving raw models and you get tokens in and you send tokens out now it's like you're there's a bunch of agent again infrastructure and there's tool loops and there's all these other things happening inside of the harness and so again you don't actually want you you want innovation but you don't want every team to have to go and reinvent that from scratch and so the fact that like you know X team across Google who just wants to ship some really cool agentic product doesn't need to think about like the nuance of all the details of the tool calling loop etc is a huge acceleration for them to like just go focus on building a great product and I think it's hopefully we see that like I don't know if like a lot of the agentic stuff we're landing at IO like would have been possible if we if we hadn't had sort of some of that infrastructure standardization across the harness and the model delivery I think the other thing I would say like as far as like lessons learned it's like there's really no substitute for being able to just experiment and iterate quickly right so I think this goes to all of Logan's points about the foundation being strong but I really think what has helped us is really being able to put in for example a new model iterate really quickly with a product on like hey what are the right prompts that would you know actually make this model viable for a different situation what is what are the ways to kind of prototype really quickly with this model what are the ways to get it in the hands of even just internal users quickly let alone external users and I think that is something that is now more and more possible with kind of like layers that are consistent across the team I think it's it's pretty amazing to see the speed at which we can go from you know having a checkpoint that we're really excited about to putting it in the hands of internal developers to then seeing it come to life in a product and then only when you see it come to life in the product you really start finding its rough edges and to be able to like actually then kind of come to terms with how you do that and so more and more than it becomes like okay how do you have the right ability to to prompts quickly how do you have the ability to run really good live experiments where you can get really good data and feedback quickly how can you build emails that help give you real signal those are the things that will speed up your progress of quality the most because it will give you the ability to actually get to the kind of product that you love and I think if you think about notebook LM I mean that team really understands the model like they are they are just like I mean you talk about a bang or product it comes from like a bang or team like they are really good at being able to like take the model and play with it quickly and like prototype quickly to get to something amazing and I think that's you see that actually play out in the product.

The best example of this is the original sort of audio overview experience and I think the thing that like shocked people about audio overviews was like the coherence of the dialogue and the coherence of the dialogue was just based Gemini with a bunch of bang or prompts and they sort of like knew how to sort of you know prompt whisper the model and get the best out of it I think obviously the the model that the actual audio model was really good as well but like the prompt dialogue was really difficult for them to pull off and they pulled it off in an incredible way and I think help people fall in love with that product.

So it's not like one big lesson is kind of modularizing it used to be sort of the model on one side and then like everything else that goes into the product on the other side and we're pulling a lot of the surrounding code and architecture and tools onto the model side.

How do it eats the scaffolding?

That's my that's my favorite way of thinking about this like just as at every crank of the chart of the model flywheel the model eats a bunch of scaffolding.

What happens when something's not needing somebody's needs?

Do they do a little fork of it and submit back a pull request to the main scaffold team or do they have to just say like hey I've got a need here can you help me out like what's the yeah.

It's definitely extensible it's definitely extensible and I think like actually the nuance of this would be like spark the way that spark is built on top of a bunch of this infrastructure probably looks slightly a little bit different than you know the way I shoot you probably is built on up actually because they're both running on the same side of infrastructure but the nuance is probably slightly different so there is this layer of extensibility that you get out of the box which is great and gives because obviously everyone's not building the same product at the end of the day so you need the extensibility is actually like a first class feature of any of these types of platforms that you want same thing actually on the model side.

But I think one of the things to your question that is like really awesome about being building Gemini within Google and having kind of all of these different product teams is you know there's always going to be something that doesn't work for them right because there's always going to be something that can get better in the model experience right so we're trying to build something in a product and like the amazing moment is when you start trying to build it and it doesn't work and so step one is you're like okay can I prompt my way out of it like what is that look like and then you start figuring out okay what are the losses really like where is the model falling down and then what we try to do as much as possible is keep these feedback loops with our product teams to say okay if this is where the model is falling down how do we bring that feedback back to the model in terms of e-values and data what does that look like so then that we can actually in our next revision of the model bring all that feedback back in and iterate on it and I think that's how you've seen Gemini get better is really from from that feedback of where things aren't working and so we try as much as possible to kind of have the structure be you know we train a model we hand that to kind of a wide range of teams those teams implement the model in their structures they do a bunch of things to Logan's point because it's extensible but they also find all of these places where the model falls down and we kind of cycle that back and I think that's actually been part of like the fun part of the job but also part of what makes I think Gemini work really well in some of these use cases.

Let's talk about Omni for a minute so it sounds like this is going to be sort of the nano banana moment for video you know I love that you're saying that because that is our tablet.

I don't even have to say great and by that I mean that there's a deep integration between language and reasoning and pixel space understanding right I have that kind of vision in my head from the nano banana launch of like here's a woman and here's like her breakfast and a cup of coffee and other all in one image and they all look like they did before and clearly that's not something that was done through a lossy language you know intermediary the model understands images so we're going to see that now I guess for video that sounds cool is it going to be available via the API and is it going to be I've noticed with I mean Gemini's been the only API that's accepted video for a while now but I don't know exactly how it works under the hood obviously but I I do feel that it's sort of kind of down sampled or maybe there's like you know frames taken out of it historically is this going to be a there's an FPS parameter if you want you can change how many say but it does down sample the number frames of it but you can control it so okay so is that pro tip for you yeah nice so it's as like that will still be the paradigm like it will be a frame based selection still on the input but then it's going to be natively speaking video pixels on the output it's a good question actually don't know I mean also it's not available on the API yet so lots of things things to still be figured out yes I think I have to figure out what we want on the API side for this to look like in terms of I think maybe the heart of your question is like native video generation that is so yeah so this what's exciting about Gemini Omni is it really is building on all of the magic of Gemini so kind of like this whole nano banana for video it's really about how do we bring in all of the world knowledge and the reasoning power of Gemini and actually be able to generate native video as a result of that and so I think we have to figure out then like how does this manifest in the context of the API from like a sampling standpoint kind of like similar to a lot of the decisions we've had to make about VO from a sampling standpoint but I think right now as of now you'll be able to use it in the Gemini app in flow and in YouTube and so those are all going to be ways that we can start actually seeing how people experience the model what you know what value or individual is getting and I think similar to this nano banana for video I think we're really excited for these types of things where you can say okay take some of these images take the scene and like make these things all come together in one video I think it's going to be really awesome zooming out kind of philosophically you may have seen this ruin post not too long ago about anthropic and this sort of relationship that the company as he sees it has with clawed or he describes anthropic as sort of almost worshiping clawed in a sense certainly they treat it including in the constitution as sort of a being or a mind you know something that they want to have like a given take relationship with opening eye on the other hand has their model spec which is like this thing is a tool it's supposed to follow these rules and you know it's a sort of more conventional relationship how would you describe the culture within Google as it relates to Gemini like how do people feel about how do they talk about it is there any of this sort of being entity other mind you know desire for pushback from Gemini or is it more of kind of the simple tool Google's a very big place a lot of there's a lot of people such as you have a lot of varying sort of perspectives I mean you know like to look at what I think you know even within GDM you're going to find a range of folks who will leverage Gemini differently I think in terms of how we think about it we do have of a strong point of view on the kind of behavior we want Gemini to have so I think we do really want to be intentional about how Gemini manifests itself to internal and external people but I do think it's really about how does Gemini help Googleers and how does Gemini help people within Google and outside so I think it is really much about like how do we how do we create like good partnerships between Gemini and people I think is very much like the ethos of what we're trying to build and so how does Gemini become that partner I think we use the word collaborator a lot like the word like how can Gemini be your collaborator both like in the code you're writing as well as in your like day-to-day life um and what you're doing and I think that's the ethos we're trying to bring and it's behavior in persona as well as in the kind of products we're building around it that makes sense yeah do you worry about its psychology you know there's all these examples from LLN whisperer types and from people that are putting models like you I'm sure you've seen and in labs has put Gemini in charge of a cafe and in Sweden right and it's like it's managing the cafe so those folks tend to report certain like doom loop you know we're kind of like a Gemini kind of getting really down on itself getting really discouraged seemingly feeling bad if you believe is any feeling inside of it how much does that kind of stuff concern you do like care about seeing a reduction in sort of psychological distress from one generation of Gemini to the next yeah it's interesting I I have a thought about the phrase psychological distress but what we do so I think it really does matter how Gemini communicates with you as a partner or a user of Gemini um I think that matters a lot and so we have actually like pretty extensive safety evaluations in terms of things like how Gemini engages with you in terms of things like secrecy in terms of things like um you know role play in terms of things like this kind of looping type behavior or rabbit holeing type behavior so there's actually a lot of that that we look into for every one of our checkpoints because it really does matter especially as we're starting to use Gemini more and more if you're using Gemini for hours a day it really does matter like that these attributes are well understood and well evaluated so yeah we definitely and we look at them launch over launch right to say okay like how does Gemini looking from from a perspective of secrecy for example launch over launch yeah and I think to be very explicit I think those cases where like the model does go off the rails I think it's definitely like a it's a model mug if you will like it's not not the intended behavior the goal is you know help the user with whatever the thing is that they're trying to do and so if you see those in whatever whatever product you're in thumbs up thumbs down send us the feedback so that the model team can can look and help try to chase those down if you think it one step further folks are doing more and more of these like model welfare checks and interviews where they just literally ask the model in some cases like how do you feel about the way that you are deployed is anything like that happening within deep mind I think good question I think the how how is it being deployed question I feel like the model is just uh this is my sort of personal sense of a lot of these tests like it's like completely out of the district like the model is no idea how it's being deployed so it's just like pontificating and a lot of these cases like it's not like the it's not like in the context window of any large major LMs is like here's the details of how you're being trained and here's sort of your serving set up and here are the people who are working on it though maybe like these are interesting things to experiment with in the future so I think a lot of is just like pulling out of random distribution of like the large scale you know training that happens on the models and I feel like it's it's actually less representative of um like how the model like it's just it's just not it doesn't have the context yeah one reason that's true which I was just noticing in the AI studio is I think all the models that are publicly launched at least so far still have a January 2025 knowledge cutoff and it's honestly like amazing that they do as well in search and that they can have like you know I ran a deep research on like what's you know will give me everything that Google has launched in the AI space and like what's even the speculation about what they're going to launch at I.o.

and it did like a very impressive job yes deep research script especially considering it knows in its weights nothing about the last 18 months um so I guess the first question just like why is the you know why are we still at a January 2025 kind of categorized as a mug yes this is also one of Logan's favorite topics to discuss um yeah I mean I think updating the knowledge cutoff definitely important and something that is on our radar I think the the other part though is like how does deep research do so well or how can we use the model in search is because we also have the model search right so I think for us it is like really important actually that the model be able to know when when to leverage its parametric knowledge versus when to actually go out and get the information from the web and especially because you know there is information that's as fresh as an hour ago or a minute ago like we want the model to be as up to date as possible and so I think for us we've been really leaning into how do we help the model search effectively um and that's a big part of what makes it successful in the context of search or the app um or even inch gravity actually for that monitor that reminds me of one of the more surprising bits of news that I've seen from Google maybe ever which is the partnership with Exah bringing Exah in as a alternative to Google for grounding um I never expected to see Google work with any other you know search provider so what's the throw behind that?

I think this is just generally um the like Google cloud does like tons of these types of ecosystem partnerships with folks like across actually like lots of things that are like you know somewhat competing sort of quote with what Google is doing and actually you can like a like the cloud market place generally like has lots of stuff there's actually the cloud Google cloud hosts um sort of a model garden there's the anthropic models there's other model providers there so I think it's like a very standard at the end of the day I think you know there's some um enterprise customers want choice and so I think it's it's trying to meet enterprise customers where they are I don't think it's like uh I think it's a good uh it's a good sound bite that like a Google can't do search and that's why we have to partner with other companies like at the end of the day to told these point the the model team and search is there's like a super deep collaboration the models are built with sort of that use case in mind and I think for for some portion of enterprise customers they want flexibility and sort of like their external search uh tooling providers and sort of Google clouds doing their doing their job as a as a great enterprise business that's sort of partnering and finding the right folks to work with.

Last couple minutes maybe just the little lightning round why hasn't context grown more in the last year too right we got like a million and we kind of that was like up from four thousand in just a couple years right but now we've kind of leveled off um is that because people don't want it and we saw this some quadratic model that came out with maybe the splash with a 12 million token context window and a new attention uh strategy to support that um is it people don't want it it's too hard there's not to compute to handle it like what what's currently limiting context I think uh people definitely do want lots of context but I think what we've also found if you look at even like personalization where you want to access like all of your personal context or coding where you have like extremely large code bases um I think a lot of the the frontier here is going to be actually on how you smartly use context so thinking about like compaction and like what are the right ways to like find the right elements of the context and bring them into the model and so I think that actually is like a huge opportunity is like how do you leverage all of this information that the model might have access to but actually a lot of it is frankly distracting um for the model to actually do the right thing and so how do you give the model the right amount of context in the right way to be most effective so I think that's actually really the direction that we want to be pushing in which actually then you know in actuality the the amount of context that the model is is leveraging is actually much much larger but because we're being smart about how that's actually coming into the context window you can actually fit it into smaller context windows um but I think also you know this goes back to my point about flashlight and flash etc like larger and larger context windows also come with cost um and so what we all what we also saw with customers and we still see with customers is that a lot of customers want to use smaller context windows because of that um and they want to be more intentional about what's going into the model and so I think we're trying to meet the moment in the right balance of how do you provide a lot of useful context while also meeting the right kind of latency cost kind of other tradeoffs yeah and I think one thing I'll add is in today's paradigm of how sort of you know continuing to extend context works I think it just ends up being that like it just becomes too cost prohibitive for customers in practice to actually use and I think even like at the extreme of one million token context like in some cases it can be like a few dollars for a request at that rate and I think the demand for that is like just so small um and so there's a huge amount of like compute required in order to do that and so there's like there's a lot of like trade-off things that you're juggling um but I'm hopeful like hopefully we're like a research break their way or something like that from enabling not to continue to scale up and have it not be such a such a large investment for that.

Perfected both from the user side and also just like the surfing computer to make it possible.

Speaking of possible research breakthroughs what happened to that diffusion coding model I was excited to see apps materialized in like three seconds in front of my eyes and it's been quiet on that front.

Diffusion is is is awesome it is super fast I think we are still testing and experimenting with it in a number of different ways trying to figure out like what is the best way to put this out into the world um where is it most useful um but I will say like actually part of the reason why we've also been investing in flashlight is like flashlight is an incredibly fast model and actually if you look at the 3.5 flash model we're releasing right now um on artificial analysis it benchmarks that like I think 280 tokens per second which is like crazy fast in fact it's actually so fast that like sometimes an anti-gravity like by the time I want a cancel like it's too late um and so I think like we already are like I think trying to figure out like where is where do you start getting to to log into different answer like the diminishing returns and like where do you see that that value proposition as I think part of the question too but we are continuing to push on diffusion research our researchers who are working on diffusion are doing some pretty awesome stuff I was like in a meeting with them the other day about some results that they have I mean I think they're like still pushing the frontier of kind of quality and speed in ways that are really really cool so I think we're going to see that play out um really well yeah and I'm excited I feel like it's uh it's a research exploration I feel like that was that was also obviously there was the the application where you could sort of test it last year at IO but I think the framing was like we're doing interesting research this is sort of like a look behind the curtain of the interesting research we're doing and hopefully it manifests in you know models maybe one day or just us informing our our perspective of what works and what doesn't um so yeah one thing actually as far as speed is concerned just another plug is actually an anti-gravity right now um there's actually a faster version of 3.5 flash so it is is speedy actually um which is I think we're kind of excited to see how people will use that and like what the reception reaction will be to that too people log fast models yeah no doubt um well time is the one research we can't get any more of that I know you guys are super busy leading up to yeah well we can build more compute no more we can't uh yeah hard to create time out of nothing um so maybe just last question what else is Logan asking that I haven't yeah yeah Logan what are you asking um let's see I mean I think you know we talked a little bit about this but like the one thing I will say is like I'm really excited about where audio is going also like that's one that I think we tend to talk less about but if you like think about the Gemini Mike example or you think about kind of like the the Gemini live experience I'm like really excited about moving towards a paradigm we're audio is just a bigger and bigger part of how we engage with these models and how they engage back um and so definitely try out Gemini live the the updated experience but I think that's another area that that it's like a paradigm I'm excited for us to keep pushing too yeah and I think the seed to plant is um obviously Google I owes an incredible moment and lots of stuff coming out the door but uh this is you know just the start of the the summer of amazing things and lots of other stuff um so the the engine the engine keeps shining and like there's you know there's lots of stuff uh in the works which I'm excited about and many more many more stories many more podcasts episodes so we can get told see us you have to get up to seven you hop to seven podcasts but but actually like legitimately I was in a movie room this morning where one of my team members was like I know we're gonna be launching this a few weeks later but I really need a vacation I was like well we're just we're just going we're just moving no rest for the weary and the AI you're at that's for sure um thank you guys for having to hear at Google headquarters and tells you don't see Logan Capatric thank you both for being part of the cognitive revolution.

How did you enjoy it now?

we read it saved it went back to train three five true laid and now who's complaining son I said we heard the speech this year is ages ages different kind of reach full stack before full stack was the conversation search cloud hardware model one foundation no nobody any of the story we've been right no more car the train kept working kept compiling not fighting like a battle but like a kitchen everyday you cook nobody listens to agent agents agent agents agents agent agents agents agents agents agents agents agent agents agents agents agents agent agents agents agents agents agents agents agents agents agents agents agents agent agents agent agents agent agents agents agents agents agents agents agents agents agents agent agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agent agents agents agents agents agents agents agents agents agents agents agent agents agents agents agents agents agents agent agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agent agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents agents In the heart of Vegas, this is my favorite question Come, you, come, you build more cup, you can't build more time That's the class, everything's falling Agent, agents, agents, agents The motto is the scaffolding, agents, agents, agents, agents, agents Every year we rewrite the stack, tokens and tokens We don't look back, who's your ship today My no-stop barrowing, what you call rapper code is what the mind does Swallow it last years, huh?

All right, it's already in the build Every spin on the will and not the swallows Kill, don't slow down, don't ask permission That's the design, that's the mission Three years of this, and we never been thrown No rest for the weary, no rest on the throne Now's the A, I, some love, and we all, and we all Educators, agents, agents, agents, agents The motto is the scaffolding, agent, agents, agents, agents, agent, agents Ages, agents, agents Ages, agents, agents, agents, agents, agents, agents If you're finding value in the show, we to appreciate it If you take a moment to share with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube.

Of course we always welcome your feedback, guest and topics suggestions, and sponsorship inquiries, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network.

The cognitive revolution is part of the Turpentine network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more.

We're produced by AI podcasting.

If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at AIpodcast.aing.

And thank you to everyone who listens for being part of the cognitive revolution.