Training Data · 2025-06-03

Google I/O Afterparty: Veo, Mariner, and NotebookLM leaders on human-AI collaboration

Hosts: Unknown

Guests: Thomas Iljic, Jaclyn Konzelmann, Simon Tokumine

Google LabsGoogle I/OVeo / video generationWhisk and FlowProject Mariner / computer-use agentsNotebookLM / Audio OverviewsGemini modelsHuman-AI collaborationGenerative mediaAgentic e-commerce

Read summary Jump to transcript Original podcast

Podcast feed URL

Open feed

Why it matters

Veo 3 cited as a major step up due to native audio co-generation.

Key claims

Google Labs leaders describe current moment as 'end of chapter one, start of chapter two' driven by model gains and product breadth unveiled at I/O
Thomas Iljic positions Whisk (consumer) and Flow (AI filmmakers) as a spectrum, with Flow built around the metaphor of a 'generative AI camera' for world-building
Veo 3 cited as a major step up due to native audio co-generation; remaining R&D focus is on the abstraction/UI layer (diorization, voice-to-character, reshoots) rather than core model capability
Project Mariner moved from foreground Chrome control to background virtual machines, supports up to 10 parallel tasks, and now bridges user browser context to VM execution via companion extension

Episode summary

Summary

Recorded shortly after Google I/O, this episode gathers three Google Labs product leads to discuss why public perception of Google in AI has shifted and what comes next. The hosts frame this moment as the end of "chapter one" and the start of "chapter two" for Google, citing strong leaderboard performance and rapid model improvements finally meeting a broad product rollout. All three guests describe years of internal groundwork that is now surfacing.

Thomas Iljic (Whisk and Flow) argues that creation must be iterative and "show and tell" rather than long text prompts. He predicts the line between video generation, simulation, and games will blur into world-building experiences where users can pause, reshoot, and regenerate — eventually sharing scenes others can step into. He sees Veo 3's co-generated audio as a major unlock and expects the next 12 months to be defined by remixable video content.

Jaclyn Konzelmann (Project Mariner) details the evolution of Google's computer-use agent from a foreground Chrome extension to a background virtual-machine runner capable of 10 parallel tasks. She frames the future in three layers — smarter agents, omnipresent environments (local device, VM, other devices), and agent-to-agent ecosystems — and believes agents will dismantle e-commerce friction, potentially reshaping business models since agents won't be distracted by ads. Simon Tokumine (NotebookLM) explains how Audio Overviews became an accidental viral hit, validates the "audience of one" thesis for personalized media, and previews more show types, mobile-native features, and longer-running project workflows.

Google Labs leaders describe current moment as 'end of chapter one, start of chapter two' driven by model gains and product breadth unveiled at I/O
Thomas Iljic positions Whisk (consumer) and Flow (AI filmmakers) as a spectrum, with Flow built around the metaphor of a 'generative AI camera' for world-building
Veo 3 cited as a major step up due to native audio co-generation; remaining R&D focus is on the abstraction/UI layer (diorization, voice-to-character, reshoots) rather than core model capability
Project Mariner moved from foreground Chrome control to background virtual machines, supports up to 10 parallel tasks, and now bridges user browser context to VM execution via companion extension
Jaclyn Konzelmann predicts a 'universal cart' agent paradigm and that e-commerce conversion rates will rise sharply once checkout friction is removed
Simon Tokumine says NotebookLM's breakout validates 'content for an audience of one' — personalized AI as a new media category
NotebookLM international Audio Overviews now run on native Gemini audio; mobile apps launched; roadmap includes new show types (e.g., LinkedIn feedback) and long-running project workflows
Group prediction: remixable video content is the most likely breakout AI application area in the next 12 months
Recurring contrarian bet across labs: show-and-tell multimodal interaction will displace long-form text prompting

Source material

Transcript

- I was talking to a founder, he gave me the analogy of...

The user to almost be like...

The way that a director would direct the cast and crew, change the lighting here.

Like, can you say this with a little bit more of an accent there?

And like almost like natural language, the way that a director would direct the cast and crew.

What do you think is the right way to mold the play?

- I still think it's show and tell everywhere.

So I don't think you do everything from text.

I think it's kind of actually counterintuitive to have to transcribe everything.

So I think there's a lot of like showing and acting and mimicking or giving a reference just as inspiration in addition to the text.

- Yeah.

- But the one thing that's starting to become more clear, at least for me, is kind of video generation, simulation, games, they're kind of like the same thing in this new world.

And what that means is basically you're kind of world building.

You're saying, this is the stage, these are the assets, these are how things are supposed to look.

And then you shoot in it, and then you can reshoot and refine and pause and correct something and go back in time and regenerate.

I think that's where this is heading and UI is gonna be fairly novel.

- Yeah.

(upbeat music) - Welcome to Training Data.

Fresh off of Google I/O, we're exploring some of the exciting AI updates with three leaders from Google Labs who are the leads on Google's product experiments around generative video, computer use, and notebook.

Thomas Iljek of WISC and VO reveals why the future of content isn't just about generation.

It's about remixable experiences where the line between movies and games blurs and where your creations become starting points for others.

Jacqueline Conselman of Mariner explains how computer use agents will fundamentally change e-commerce by removing human friction from purchasing.

And Simon Takamine of Notebook LM shares why personalized AI content, designed for an audience of one, represents a completely new media category.

You'll discover why these teams feel like a new chapter is starting for AI at Google.

Enjoy the show.

(phone ringing) - It's been exciting to see Google's just cooking in AI.

And IO last week was very exciting.

And it seems like the core of public opinion has just turned on its head so quickly.

And right now everyone's just like, Google's out in front in AI.

Why do you think that is?

Why did the public opinion change so quickly?

- I mean, the models, you know, to start with.

- Okay.

- I think they have a big thing to play with.

- Good, good answer.

Definitely the models and I think just the number of products that we have in seeing all of this breakthrough in technology and AI come out into all of those products, but also all the net new products that we're launching and the net new experiences.

It just, it was a lot last week and even not just at IO, but like the week leading up to it, I think.

You had a big moment the day before.

- Yes, yeah, I did, I did.

Yeah, it's definitely validating to see the, you know, public opinion on the models and Google's position in AI changing maybe recently.

It does feel to us on the inside at least that it's kind of, it's the result of a lot of work though.

So it feels like we've been improving to me at least for at least the last three years in this area of Gen AI.

And maybe what we're seeing externally is people seeing what we've been up to.

It helps that we're number one on many of the leaderboards and it helps some of the stuff that the models can do is state of the art and I think is only possible with some of the Google models.

But I think it totally, it just feels like the end of chapter one and the start of chapter two, yeah.

- Wonderful.

So here's what I'd love to do today.

We have three of the leaders from Google Labs in the room with us for those in the audience.

What I'd love to do is spend a little bit of time on each of the topics that you are responsible for and then we can round up with some overall thoughts and I think that sounds good.

Three topics we'll cover.

We'll go into Wiscflow and VO with Thomas, which is Google's video models and kind of creative image generation playgrounds for lack of a better word.

We'll go into Mariner, Google's computer use agent with Jacqueline and then we'll close on notebook with Simon and everyone knows notebook.

So that needs no introduction.

- Awesome, I'd like to hear that.

- Okay, Thomas, let's start with you.

Tell us about the history of how you all have been cooking and building and experimenting in the creative image video generation space and how long have you been experimenting with these products and what have been the key milestones so far?

- Sure, yeah, it's been a really exciting space.

I think that's a very long question so I'll probably rant a little bit.

I think the, I mean, we've had for a long time like good image models.

There was like image in, there's been DALI obviously externally, et cetera, but something like two, three years ago is when at least for us in labs, when we're thinking about products, we had the control net paper for people who remember.

So it's kind of like how do you take the model and start channeling it where you want.

So it's not just like a push button thing.

You can start saying, I want the pose to be like this or the C to be like this, that was one.

And then the second thing was Laura's where like you can kind of show the model a range of things and then suddenly you're able to kind of like pull from the image and be like, what's the range of possibilities for that particular piece?

And so that like iteration, the sense that you can start controlling the outputs, that felt like the right moment for us to start exploring the creative process.

- When was that?

- Probably two and a half to three years ago.

And so then a lot of stumbling and trying things and failing, I think we had things where we trained a bunch of our people internally to see what they could do with Confi UI type workflows.

We even had a little animation thing going on where we could have an episode with artists and we published the things that not so super villain if you want to check it out on YouTube.

And then more recently, we ended up with like a bunch of convictions out of that exercise.

So we had things like creation has to be iterative.

So we need to build kind of these controls next to the models.

Media comes with the blueprint, which is this idea that like if I generate something, you're able to kind of pick up where I left off.

And then the third one was like, it should be show and tell.

So basically this driving force was instead of just telling the model with very long prompts, I can actually like show you images, say like it should do kind of like this and we can build off of that.

So this is where we started with WISC on the consumer side for imagery and flow for everything that's high end filmmaking exercise.

- Really cool.

And do you imagine WISC and flow will be kind of end consumer products in the kind of Google portfolio of billions user scale consumer products eventually or how do you, is it your playground for kind of testing model UX and how best to bring this magic to users?

- Yeah, I think we see it as a spectrum.

So I think WISC is kind of our play in the, you know, really consumer space and thinking about like everybody now has this visual language at their fingertips.

They might not have it necessarily like the most advanced ideas in terms of like storytelling, but they can quickly remix each other's things.

And so we're trying to see what those dynamics look like.

So I think that's kind of our exploration space with WISC.

We'll see how it picks up.

I think a lot of the lessons will probably also graduate in just how we, you know, deal with user inputs and entry those across multiple surfaces.

And then flow is the other side of like, you have a vision, you know what you want.

And it's kind of like, how do we give you all the tools to create, you know, the best version of this in video?

- Yeah, okay, super cool.

Who's the ideal user, do you think, for flow and WISC?

- For flow, I think it's pretty clear for us.

We're starting with AI filmmakers.

And the reason is we want to build this kind of, we call it the generative AI camera.

Like, you know, you're doing world building, and you're shooting inside this world.

How do we actually develop the DSLR camera of generative AI video?

And then we'll distill kind of the Android version of the pixel, you know, pixel camera version out of it.

WISC is much more consumer.

There's a wide range of audiences.

You know, is it you creating something funny with your friends in a chat?

Is it kind of more, you know, inside the company, you're trying to create some visuals for slides?

This kind of like, all this, this whole range that we're exploring, we'll see where it lands.

- Yeah, so cool.

Okay, you said AI filmmakers.

Is that a thing?

Are people calling themselves AI filmmakers now?

And does it tend to be existing filmmakers that are, you know, looking to be more AI savvy?

Are you seeing, you know, net new creators come in and try to create feature films?

- I think it's certainly an ill-defined term.

But the reason why I like to say AI filmmakers versus filmmakers is I think if you take the extreme end of the spectrum, these are people who need very bespoke tools.

They have like entire workflows and processes, and you need to develop very specific ideas.

There's one tier under that, which maybe I classify as AI filmmaker, what potentially is, you know, pre-visualizations, where you're trying to quickly get like a version out, and maybe then you do the full process.

Or people who just don't have the budget.

So they're like, I don't have $100,000 to, you know, put my idea out there, but now I can at least take a shot at it.

And so those people are interesting to us because like you can really start from the ground up thinking of like, if you had this generative AI camera, what would the user flow look like?

Like how would you fit those pieces?

- Yeah, your answer to my initial question of, you know, the models are the reason that the court of public opinion has flipped so quickly.

It's been amazing to see VO's progress and VO3.

And, you know, for me, I don't know what evals you all look at to look at performance, but for me, it's the Will Smith spaghetti eating test.

And like, we seem to have passed that.

So like, are we at video AGI?

Or like, how do you think about the quality and the performance and what's ahead?

- There's still some room, but it's pretty cool.

I mean, the GDN has done really great with VO3.

It beat, I think the joke last week was that it beat VO2 in the rankings.

So it's kind of VO, being VO, so people were very happy about this.

I think it's, you know, adherence is going up.

Yes, we don't have the six finger problem.

Physics are getting pretty good.

There's still things where like, you know, if you want to have, for example, multiple characters and kind of choreographed of characters, have like full consistency across multiple scenes, like that's where there's still like a lot to come.

How do you refine your output?

Can you propagate changes across clips?

There's going to be still a lot of like improvements, but in general, yeah, a huge step up.

And the biggest reveal this time was audio.

So be able to co-generate audio with the video that brings kind of like, you know, an image is what like a video is more than an image and a video based sound is way more than a regular video.

That certainly has opened up like a lot of virality.

- Do you think the, you know, the R&D left to do to make the ideal tool for the craft, how much do you think is in the product and in the UI and how much do you think is going to need to happen the model research layer and things like steerability?

- I think it's both, but at least, and I'm sure people will have a wide range of opinions, but it's almost like we're at a state where everything we imagine in terms of controls, I think we have visibility in how it can be built.

You know, you want to have consistency of characters, of scenes, of location.

There's like different ideas around this.

You want to reshoot.

So that part, I think the part that's hard is still the abstraction of all of it.

So how do you put this into, what are the inputs that you want from users in the context of Vodio, for example, where do I define the voice?

How do I touch the voice to the character?

How do you find the mannerism?

How do that propagate?

So I think there's going to be a lot of work in that abstraction layer on top of the models and on top of the controls.

- Oh, so interesting.

So you think most of the model kind of R&D is almost a solve problem, is maybe too strong of a word.

- Not solved, but I think we, - We know how to do it.

- It will happen.

I think it's pretty clear that it's moving very fast.

And then we see a lot of things just like week after week coming up, but how we do the connective tissue on top, I think is still like pretty much open.

Vodio is one of those new frontiers, for example, of like, should I be talking and driving the audio, then changing my voice?

Should I be typing the texts?

How do I do diorization?

There's a lot of like, what are the inputs?

How do you give, how do you let people mold clay, you know, with all these models?

- What's your guess for how that future is for how people will mold clay?

And I was talking to a founder, he gave me the analogy of, you know, you want the user to almost be like the way that a director would direct the cast and crew of, you know, change the lighting here.

Like, can you say this with a little bit more of an accent there and like almost like natural language, the way that a director would direct the cast and crew.

What do you think is the right way to mold the play?

- I still think it's show and tell everywhere.

So I don't think you do everything through text.

I think it's kind of actually counterintuitive to have to transcribe everything.

So I think there's a lot of like showing and acting and mimicking or giving a reference just as inspiration in addition to the text.

- Yeah.

- But the one thing that's starting to become more clear, at least for me is kind of, how would I say it?

You know, the video generation, simulation games, they're kind of like the same thing in this new world.

And what that means is basically you're kind of world building, you're saying, this is the stage, these are the assets, these are how things are supposed to look and then you shoot in it and it can reshoot and refine and pause and correct something and go back in time and regenerate.

Like I think that's where this is heading and UI is gonna be fairly novel.

- Yeah.

You mentioned games.

I wanted to ask about this.

It feels to me like the existing way that we consume games versus movies is because there's such a tremendous fixed upfront cost of producing a movie.

If you imagine that in a world where every movie frame is generated, not pre-rendered, and that entire story arcs can unfold, it does feel like the movie and the game worlds start to merge.

How do you think that plays out?

- I think with, I mean, so for example, we have the genie model that's been really interesting.

So you give an image and you can kind of move your character and the world builds in front of the eyes.

But what's gonna be really interesting is how do you ground it, but games are fun because there's like very set constraints, movies are good because there's like very small details that matter, you know, the expression and the moment and the timing.

And so I think it's all about, it's almost about like the constraining of the capabilities towards what we need.

So I don't know, I think, and the other thing that strikes me and I think a couple of people on the team is like, it's not clear that is, we think in terms of these static formats that we have today, like an image, a video, and a game, is there something in between almost?

And what does that mean?

And kind of, you know, where is that gonna be distributed and interacted with?

Like I can share an image with you, but you can instantly turn it into a scene that you're walking into.

So am I sharing an image or am I sharing an experience?

- Yeah.

- Lots of questions, I guess.

- It does feel like, you know, the story is almost the common thing that makes a game and the movie good.

So, and that's different from an image is just a visual, right?

- Yeah, exactly.

The setting, the constraints, the, you define the rules of the game basically, and then you let other people enjoy themselves in it.

- Really cool.

My understanding is that video is still expensive and somewhat slow to generate.

Is your sense that that's getting solved quickly and like, will we have, everybody's gonna be able to generate, you know, two hour films in their, you know, in their pocket in a couple of years time?

Or is your sense that this is a, you know, longer, we got a lot of efficiencies that we need to build in order to make this kind of cost practical?

- I think, I mean, we've seen in imagery and we've seen in video kind of like the same speed of cost reductions that we've seen in other places, both, you know, the hardware is getting better.

I think the efficiency to your point, we have like the regular models and then we learn how to distill them to kind of, so that they just take less processing to get to whatever you asked for.

So I'm actually pretty optimistic that, you know, the costs are just going to keep coming down and the speed is going to increase kind of aligned with what we're seeing with other models.

- Yeah, got it.

Fantastic.

What do you think is ahead for AI in the, in the creative space, at least from a Google labs perspective?

- Well, we just launched flows.

We have a lot of things to do to just like, you know, deliver on that promise of like keeping you iterating.

I think that's the first thing.

Refinement of like outputs and you know, keeping there, going there and like insertion editing, reshooting, I think is really interesting to us.

But I think the Holy Grail will be some of these new formats and experiences.

Like what does it mean as a creator to share something with you that you can interact with?

That's something that we want to explore.

- Really cool.

I want to be able to talk to Will Smith as he's eating the spaghetti.

(laughing) Really cool.

Thank you so much for sharing what you all are doing over in the creative sphere.

- Of course.

- Okay.

Jacqueline.

- Yes.

- I would love to talk about computer use and Mariner.

Maybe for us off, why is it called Mariner?

- Great question.

So we wanted to give the project a name that really embodied what we were trying to do with this space, which was enable users to just go out and explore, enable agents to go out and explore.

And Mariner is sort of this whimsical open-ended name that just sort of embodies the spirit that we have on the team right now.

- I love that.

You all actually, you guys have really good product names across Google labs.

These are all really whimsical.

- I'm still trying to get rid of the LMBit on notebook a lot.

Apart from that.

(laughing) - I'm pretty happy with Wisk and Flow.

I think we need to be there.

- We're evolving our approach to naming.

- That's what we evolved at IO naming.

- That's very good.

- That's the statement of improvement.

- Yeah.

- Oh, that's funny.

Can you say a little bit about how Mariner works?

Like is it computer vision model behind the scenes?

Like it just feels like, you know, pure magic in a box, but give us a peek under the hood.

- I will take pure magic in a box any day.

So the way it works is really leveraging the power of Gemini.

That's kind of, you know, it's a, it's a action tune model on a recent version of Gemini.

But what that means is that we have all of the multimodal capabilities that Gemini gives us.

So it's able to plan and reason when a user enters in a task.

We're able to understand that.

We're able to come up with a plan on how we should actually fulfill that task.

And then the way it actually works is taking that and understanding the screenshots.

So this is where the multimodality of the Gemini model really comes in handy.

We're able to continue to take screenshots, continue down the trajectory of what it is that we're trying to achieve from the users tasks that they gave us and bring it all together that way.

- Yeah, got it.

Super interesting.

What's the history of the project and when do you anticipate you'll be rolling it out on this?

- So the project initially started last year, shortly after this time.

Actually, if we go back at IO last year, we kind of graduated the Google AI studio and Gemini API out of the labs team onto the developer team now.

And that freed us up to start exploring what we thought was coming next.

And that happened to be agents that could actually take action on behalf of users, not just answer questions or generate content.

So the team started working on it that at that point, we started grouping up with a bunch of different folks across Google to kind of bring together what we launched in December last year, which was Project Mariner as a Chrome extension that took action on your browser.

And then we continue to iterate on it based off of a lot of the feedback that we got from the trusted testers of that initial launch.

So we actually had a large group of trusted testers that we would be talking with regularly and understanding what was working well for them, what wasn't.

And we took that feedback and iterated on the most recent launch of Project Mariner, which we announced last week at Google I/O.

Really cool.

What was some of the feedback?

What are the magic sparks when people really are like, this is a game changing product for me?

Yeah, great question.

So it's funny.

One of the initial kind of magic moments that everybody had was watching Project Mariner take control of the mouse on the browser and being able to click scroll, typing text into text boxes actually felt net different, when you realized it was an agent doing it.

But quickly as you were using the initial version, the feedback became, this is super cool.

Can I please use my browser again?

I'd also like to be able to do work, which makes a lot of sense.

And so that was one of the big motivations behind moving towards this idea of users entering a task in the web app that could then run in the background on virtual machines.

But one of the key things that we did also to keep true to the initial vision was how can we start to think about bridging the context that a user had on what they were doing in their current environment to the task that they were sending to the VM and Mariner executing in the background.

And the way we tried to do that was if you install the companion extension now, it'll actually be able to see all the tabs you have open.

So when you're giving Project Mariner a task, let's say you happen to be looking at a recipe on a recipe site, and you're like, oh, wouldn't it be great if I get canonical use case, add all these ingredients to my Instacart cart?

Now when you go to Project Mariner, you could say, hey, add all the ingredients from this chicken recipe to Instacart, and you can select the tab that you have open with that chicken recipe.

And Mariner will understand that context.

We'll be able to revisit that site on the VM and complete the task with the context that you had in your local browser as well.

And it's almost superhuman in a way, because as a human, it's hard to context switch between browser tabs.

Yes.

And you're able to see everything in the tabs all at once.

Yeah, I think a big net win also was the ability for Project Mariner to do 10 tasks at once, not just one.

And that was really a big net unlock.

I was using it the other day, and I'd just come back from running an errand, and there was a bunch of stuff on my mind that needed to get done.

And the first thing I did was open up Project Mariner, enter in three different tasks for it, and then just sent them off to start making progress.

And I was able to jump back into the document that I happened to be working on.

And it was this magic moment of just, OK, not only is progress being made on these things, but I just got it off my mind.

I didn't have to keep thinking about it.

Do people want to see the computer mouse moving around first for a while before they're like, OK, I trust that thing to go off and do things for me?

If they do, they have that opportunity in the current Project Mariner experience.

You can go into full screen mode.

You can see the agent moving around and clicking on things, entering text.

You would also pause the task at any point and be able to take over it.

So giving the user the ability to take over and/or provide oversight on these tasks is something that we think is still very important when we have an open-ended platform like this or an open-ended experiment like this that really lets it up to-- or leaves it up to the user to try out different things.

And what's the user behavior you're seeing?

Like, are they like, please, just take the wheel.

I don't want to deal with it?

Or they actually want to backseat drive and watch the agent and make sure it's doing what it's supposed to be doing.

That's a great question.

I think initially watching it is this fun element, but also it develops a comfort for knowing how the agent is thinking and what it's doing.

But one of the pieces of feedback we also got from the initial launch was at the end of a task being complete, we just saved the entire conversation history, and it can get quite long.

And what users ended up wanting was just a summary of what did Project Mariner do to complete this task so I can make sure it did it correctly.

And that really kind of points to the question you're getting at, which is, I want to just hand the task off to this agent, but then I want to be able to just verify what it did at the end of the task, not sit there the entire time and watch it.

Yeah, yeah.

So interesting.

What do you think have a solved and the unsolved technical problem so far with computer use?

Because computer use still feels like to me, we're maybe in the Will Smith, the spaghetti is still sort of disappearing a little bit phase, and maybe that's an unfair characterization.

But I'm curious where you think we are on the evals and the performance so far for computer use, and what are the unsolved problems right now?

I think that's actually a totally valid comparison.

There's a reason we launched this as a research prototype with the experiment label on it right now.

I think we've seen really big gains from December to what we launched last week.

That said, there's definitely still model quality improvements to go.

I think there's also just application level improvements to go.

There's more seamlessly being able to have the user provide context upfront, which will make the agent more capable of understanding what it is it should be doing.

And then there's just more planning and reasoning that we could do at inference time or at the application layer time that's sort of in addition to the model improvements, improved system instructions, improved checks and calls to different models.

And then, of course, right now Project Mariner entirely completes a task by actuating or taking action on a browser.

You want an agent that has more skills than that.

You want an agent that knows when to call the right tools, that has memory, that's able to take advantage of a lot of the other stuff that we already see out there.

So I think it's just integrating a lot of that in and starting to innovate and climb on that.

And then, of course, right now Project Mariner is in the browser.

People use computers.

So we call this computer use.

So there's that entire dimension as well that I think we're going to continue to see innovations in.

Really cool.

Were there any contrarian opinions you all took in building Mariner?

So for example, I think some people have said screenshots.

It's going to be too slow.

It's not going to be fast enough.

You should use the website DOM or any contrarian bets you guys made.

So the reason we went with the screenshot is we wanted to make sure that it was a skill that we could develop that could be applied across things that aren't just websites.

I think the other aspect of that is DOM versus accessibility settings or accessibility trees is another leverage.

We're kind of betting on this one right now, but I would say everything's evolving.

So we're just willing to take pivots if and when it makes sense.

Yeah.

Makes sense.

What is it capable of doing today?

And what is the speed?

If I tell it to go-- the canonical go order me a pizza from Domino's.

Can it do that?

And how long does it take?

The speed is definitely an area that we want to keep hill climbing on is what I would say.

But it's interesting you say that because one of the things that I-- so I was recently using Mariner to help me complete the task, which was come up with-- let me take a step back.

I have a three-year-old at home.

She is going to be four soon.

Part of that means organizing a birthday party for her and being able to figure out loop bags for kids at a four-year-old's birthday party.

This task, as you can imagine, involves understanding what to put in the loop bag and then actually buying all of those things or finding links somewhere to go buy them.

And I gave Project Mariner this task.

And it was basically a personal research that turned into an action-taking task, which is find me the links and save them.

And the thing that really resonated the most with me on that one is as it was performing this task, first it did a search for good ideas to go in a loop bag.

And then as it just remembered those five items, that's something any of us could do.

That itself wasn't impressive.

But the first one was, I think, temporary tattoos.

So then it started looking for temporary tattoos.

It found a great link for it.

Instead of having to copy that link and paste it in a doc somewhere else, it could just remember it.

It could remember this massive URL.

And then it moved on to the next one.

And then at the end of these five items, it just gave me all five URLs that it had been able to inherently store.

So when we talk about speed and efficiency, I think there's two dimensions.

One is just the model calls and taking action and how do we improve it with different tool use.

But then the other one is how can agents just do things in a different way that are inherently faster than the way we would do things?

And I think we're going to continue to see improvements on both dimensions.

Yeah, I wish I could remember five URLs.

Oh, gosh.

OK, good point.

Let's see.

What do you think is ahead for Mariner?

Where do you see it evolving from here?

I think there's a couple of things.

Number one, we had a bunch of announcements last week around Project Mariner capabilities making their way into different Google products.

And I think that this is a kind of core capability that you'll start to see emerge everywhere from the Gemini app to AI mode in search.

So I definitely see a lot more coming to Google products with the stuff that we're doing right now in Project Mariner and kind of paving that path forward.

And then I think for Project Mariner itself, I actually like to think of things in three categories.

There's the agent itself.

I think that's going to get smarter.

That's going to get better.

That's a better model.

That's tool use.

That's memory.

That's context.

Then there's the environment.

We talked about how in December it operated on your local desktop in your Chrome browser.

So that's in the foreground.

Then we moved towards this idea of Project Mariner operating in virtual machines, which meant that it's now operating on VMs.

I think there's this middle layer, which is an agent that can still operate on your device, but in the background.

And there's a bunch of reasons and types of tasks where that becomes a really important kind of way for the agent to operate.

And then, of course, there's all the other devices.

But really what you want is a capable agent that's able to operate in a way that is omnipresent across all your devices locally on VMs.

And then the last one is the ecosystem part, which is where you start to get into the agent-to-agent interaction and how does your agent interact with all of the things that exist outside of its own world, essentially.

Yeah, so cool.

I think the canonical examples for computer use are book me a flight or order me a pizza.

Is that your sense of what computer use agents will actually be really good for?

Or what do you think-- I'm sure you spent a lot of time thinking about what applications will actually be the bullseye here.

How do you think that shapes out?

So I think we default to those because they're just easy to understand.

The travel planner, I mean, literally, it's a travel agent.

It couldn't be more analogous when you think of agents right now.

But no, the way I like to think about it is on a spectrum where you have tasks that are sort of-- in what I would consider do it with me, where you have your agent alongside and you can easily offload certain tasks to it.

But it's really working in unison with you.

And then you have these do it for me tasks, which is, hey, I just want to give my agent a bunch of stuff to go do and it will run it in the background.

I think part of the reason we see these tasks being used is twofold.

One, they're just incredibly easy to understand and everybody kind of gets what that use case is.

And they're usually starting from scratch.

There's no context you need up front.

You can just send an agent out to go do it.

And the demo as a result is pretty easy to put together.

And then the other one is just where the capabilities are at today.

And so as agents get more capable and you start to have more of these realizations on what they are actually able to do, you'll see much more advanced use cases or much more complex use cases.

And that also requires the user having more trust that they can give to the agent.

So I think that that will evolve over time.

And we'll see people come up with even more interesting use cases that they're willing to give an agent to do on their behalf.

Yeah, totally.

It's also going to require, I guess, it's going to inspire, I think, a shift in business model.

Because if you have a bunch of agents going off and browsing, trip planning, for example, they're not necessarily looking at the ads and the first things that show up.

And so I think it's going to create some business model evolution as well.

I agree.

I think there's a lot of evolution that's going to happen across business models, across how websites work, across how users will always want to use the internet going forward.

Like there's a lot of joy.

I think we all get in it from content creation to consumption.

But there's also a lot of other tasks that it's just ripe for disruption in a lot of ways.

Yeah, yeah.

I'm thinking humans are suboptimal in some ways.

We see the ad, we get excited, distracted, and I go and buy the dress.

And my agent, maybe I can instruct it to ignore the ads.

Maybe it actually knows it's going to find the best content regardless of what's showing up on the page.

So it's interesting to think about how that future plays out as agents do more of our browsing.

It's super interesting.

I will say that the dress that maybe you got distracted, I always get distracted by things too and end up purchasing stuff that gets sent my way.

But I'm always happy with it by the time I do end up purchasing it.

So I think that there's new opportunities to think about how do you actually involve agents in this new sort of business model ecosystem.

And hence that third bucket of there's going to be a lot of evolution happening in that space.

And I think that that's where we need to evolve as an entire ecosystem.

And it's not just like one player that's going to say, this is how it's done.

So it's been interesting just talking to different companies and different people who are also thinking in that space right now.

Yeah, really cool.

I mean, I do think also just as a user as well, I often don't buy things on the internet because it's such a pain.

Oh, I've definitely dropped off my stuff too.

I cannot.

I can't navigate this thing.

Either I don't understand it.

That happens quite a lot.

Or it's just like, I've just not got time.

That happens as well.

Or I can't be bothered.

Maybe it's just me.

But I'm not a fan of shopping, let's put it that way, in the real world and online.

But I'm a fan in what I get.

I'm a fan in the outcome.

And so I don't know.

I kind of feel like I might do more.

I would probably do more online shopping, I think.

So if I didn't have that barrier of actually having to do the shopping bit, that would be me though.

No, I agree.

What's really interesting is I don't know about you.

There are certain stores that I'll go on to.

And I'll just accumulate stuff in my cart.

And I won't want to pull the trigger until a little bit later on when I've had a chance to think about it.

But then I end up with a bunch of half built carts across a bunch of different websites.

And part of me also wonders, is there a world where my agent is that universal cart, essentially, where I'm like, add all this stuff to it, or create this aggregate area of all the items that I might be interested in buying.

And it can be across any site at this point, because the agent represents me.

And it can remember which sites to go on.

And then when I'm ready, it's sort of like, OK, one click, make this entire purchase, basically.

And it can go and check out on all of the different sites or all the different stores.

So that'll be an interesting area to think about.

Yeah.

OK, what I just heard from you guys is e-commerce conversions about to skyrocket then.

I mean, on my computer, it'll go up.

That's what I'm saying.

I don't know about anyone else.

Lots of diversity as well.

I go to the same old sites.

But I would love suggestions.

Yeah.

Yeah, it's like once you democratize computer use, then the laziness of humans to get through checkout is no longer the determining factor of which e-commerce companies will do well.

It's just like the best product wins.

So interesting.

OK, cool.

Thank you for sharing.

You're welcome.

OK, Simon.

Hey.

Hey, Les.

Hi.

Notebook.

Notebook LM or notebook?

We'll go with notebook LM.

We're still in notebook LM.

I think it's been so long now that it's definitely notebook LM.

There was a period where we were like, OK, it's now the time.

But I think we've gone through that multiple hockistic moment, which we can talk about.

Yeah, it's going to be hard to remove it.

I like it, though.

I mean, maybe every product that has an acronym or some weird letters after it, and there are a couple of them in the AI space, regrets that.

But at the same time, they become part of the team and the identity.

Totally.

Yeah, it's nice.

I like it.

I love that.

OK, so notebook LM was one of Google's biggest viral hits last year or last year?

It went viral last year.

Yeah.

But the team had been building it for a while before it took off.

Totally.

Yeah.

Tell me about how it's evolved in the last year.

Yeah, yeah.

Well, so firstly, the viral moment.

So my way into notebook LM was through audio overviews.

So me and the team had a kind of-- we were also exploring the future of content, but from a different angle, I think.

And notebook was the perfect balance of user control, but also the power of the technology.

And our hypothesis was that there was an opportunity for personal content.

So not content that is for everybody, actually a content that's for an audience of one, maybe two, maybe three, small group, maximum.

And that was kind of how we shaped the product.

We didn't think it was going to-- looking at the notebook user base back then, we thought that it was a great place to kind of test PMF, just kind of iterate on the product.

We were totally unprepared for the massive success of audio overviews, and then through that notebook LM as well.

So it was honestly the first couple of months was really just kind of hanging on for dear life.

Firstly, it was making sure that the TPUs don't fully melt as a riser, I think, had a gif out back then.

But there was also just a lot of iterations and fixing things and improving things.

And that was really the first couple of months.

I think since the start of this year, maybe we've managed to take stock.

At the end of last year, we launched the join mode, the ability to join in a podcast and audio overview, I should say, and talk with the host and ask questions and all this kind of stuff.

But at the start of the year, we kind of took stock.

And we've really been thinking about what is a notebook for the notebook users, how are you know users really leaning into notebooks, once they've come in the front door through audio overviews.

And we've started to think about-- and Jacqueline, you kind of touched on this, I think-- the criticality of context in really enabling these AI systems to be genuinely useful for you.

And we found that a lot of users, when they're using notebook, they use them for these kind of more longer running, almost like projects that they have.

So either they're hobbies, or if they're in the world of work, they can be ongoing projects, or they can be projects with a goal.

Like I've got to prepare for a presentation or something like that.

And so a lot of what we've been really doing is retooling how we look at notebook, and also building a strategy as well that leans more into-- I think there's more sort of longer running opportunities that we see in the notebook user data.

Of course, we've done a whole bunch of improvements too.

So we've just launched the mobile applications finally.

So they came out last Monday.

And we also launched international audio overviews as well, which was kind of the end-- it was the end of a long road, honestly, of upgrading the underlying AI infrastructure and models away from the very first, almost like research grade model that we used for the initial launch to native Gemini audio.

So what you hear now in the international audio overviews at the very least is native Gemini audio.

And that was a big push for many teams across labs and also GDM.

Yeah, super cool.

It feels like audio overview was almost the viral hook.

And you guys have been building out a lot in almost like the rag UI and just imagining what that workspace looks like.

What do you think the actual just audio overview podcast thing becomes?

And actually, I'm curious how you even ended up on the shape of two podcast hosts talking to each other.

It's just like-- it's such an engaging format.

I'm curious how you even landed on that.

And I feel like it's only in its infancy still in terms of-- I would love podcasts every morning that type me up for my day and things like that.

And so how much of your time is thinking about Notebook, the kind of rag workspace environment, for lack of a better word, versus Notebook, the podcast killer, the training data is going to be built on Notebook in the future.

Yeah, yeah, yeah.

Well, I hope not.

But maybe it can help.

So the way that we're starting to increasingly look at Notebooks is they're comprised of kind of three-- they give you sort of like three superpowers.

So one of them is they help you really accumulate information over time.

And that's-- there's a lot of amazing underlying database technologies that we apply that I think lean on first party Google technologies in a pretty unique way.

The second is they bundle in intelligence.

And when we launched last year, we used the old Gemini 1.5 Pro model back at that time.

But obviously now we've got thinking models and so on.

But the third thing is the stability for content and information to be adaptive to your situation.

And so podcasts or audio overviews, a conversation, it's one form that information might take.

But you can imagine many other forms that that informational knowledge might take as well.

So you might imagine it coming at you in the form of a comic book, or maybe a short movie, or maybe a mind map, which we've also launched.

But you can imagine many other types of media that fit the right circumstance and form and function for the moment for you to understand information, to be able to analyze it, make decisions with it, do something with it.

So that's the mindset that I think we have when we're thinking about the different-- you might hear us talk about transforming information from one state to another.

I think that's a fine word.

It's a little bit technical, to be honest.

It's more like adapting to you and fitting you, I think.

That's really what we're going for.

But in terms of just going back to your actual question around audio overviews and where it's going, there is a huge amount, I think, of room left in that technology.

So I enjoy audio overviews, and I use them a fair amount.

But I also-- every now and then, I'll be like, that's weird.

Why did they say that?

Well, they've kind of lost the plot there.

I didn't quite get the right narrative.

Sometimes, the uncanny valley or the illusion is broken when you're listening to them.

And while it might seem like there's a small amount of work we might need to do to fix that last step, there's actually a ton of work that we've got to do.

And so there's a lot of effort being placed into all of the various components that you'll need to make the experience feel like something where you suspend your disbelief more completely.

And alongside that, there are many other different show types.

We've kind of had one show type for a bit too long, I think, actually.

And we're bringing more out.

So we're actually working on some really cool things.

A lot of them inspired by users, honestly.

So one of the things that we saw users do right back at the start, but you keep on seeing it, is users putting in their LinkedIn.

They're putting in their LinkedIn.

Well, why?

Number one, it's kind of fun to hear people talk about you.

But a lot of users are using it to get feedback, like to understand from another person's perspective, they may not have access to you.

Feedback is truly a gift.

Real feedback is hard to find.

So how would somebody else look at me?

How would somebody else talk about my strengths?

And how might somebody else talk about areas to improve?

This is something that we see users already using audio overviews to access that sort of content or sort of information.

We think we can make that easier for people.

So a lot of what we're thinking about now are different show types that lead into some of the more viral successes that we've seen our users explore online.

And also think about brand new formats as well, that I think are going to be fun.

OK, so we're going to have training data at the comic strip.

I'm not saying.

We're definitely going to have it.

But I think not everything has a story.

And so applying different adaptations will almost be sort of context dependent, I think.

But oftentimes, it does help.

So one of the things that we were looking at the other day was we were looking at 150 page PhD dissertation.

It was invasive wolves, I think, in some part of Europe.

And yeah, you could have looked at a My Map.

You could have maybe listened to an audio overview if you had like 10, 15 minutes to spare.

But actually getting a kind of a comic book rendition of that PhD was really helpful just to kind of understand the overall narrative within it.

So we're still working on things like that.

But I think there's a lot of opportunity there.

And of course, comic books are very similar to storyboards and that intersectional work.

But Thomas is doing too as well.

So yeah, there's a lot of interesting ways that I think Labs projects intersect.

And we'll continue to explore them.

You can create a hero's journey comic book of somebody's LinkedIn career arc.

I mean, for an audience of one person and one person only, that's probably going to be the most awesome movie that we'll ever have ever seen.

So maybe, yeah, maybe.

That's awesome.

Really cool.

Where do you see a notebook going from here?

Yeah, well, like I said, we're really-- I think our focus is, aside from a whole bunch of different adaptations, we're really thinking about how we can be more useful to our users over their more longer running projects.

And so both in the world of the knowledge worker, but also in the world of students, these are our kind of core users, I think.

The project is really an area where those users both need the most assistance.

But it's also where-- it's also the point of highest value, I think, for them.

So if you're in the world of work, the project is where value accumulates.

Yes, it's on a unit of work.

Yeah, right.

It's a real unit of work.

We actually call them units of knowledge, but it's a great way of putting it.

And the same for a student as well.

The project, if it's a project with a goal passing a test, that's a big deal.

Or if it's an ongoing lifelong learning thing, that's also really important as well.

So I think really focusing on use cases in those domains is something we're thinking a lot about.

I'll say the other thing is I think one of the things I'm personally very excited about-- I've been in the consumer product space for many, many years.

And I guess one of the things that we did at Google when we went mobile first in the mobile first era is we moved a lot of our desktop products to mobile.

And if you look at those mobile products, many of them are the desktop products shrunk down to a small screen.

And that's OK.

And I think because we were one of the first, because we built Android and a lot of our big products basically got mobile-ified at that point, we find it hard to change at that point forwards.

But I've always been really interested in thinking about, if you have a desktop experience, what is a companion mobile experience that doesn't have to just be a carbon copy of the desktop experience, that maybe leverages the form factor, the sensors, the fact that it's with you at all times, to deliver an additive experience on top of a desktop experience.

So we've just launched the mobile experience after a fair amount of time and development, fair to say.

But what I'm really most excited about there is the opportunity to actually iterate on that novel mobile experience going forward.

For example, wouldn't it be cool if I was-- maybe I'm in a discussion with some amazing, really smart people, and I've popped a notebook down.

I've opened its native voice recorder, and it's just able to record the conversation for me.

And then I can transform that to later dates and accumulate them and all this kind of stuff.

That's the thing that is probably going to be weird if I open my laptop and push record on my laptop.

But for the mobile device, it's the perfect opportunity.

Totally.

Really cool.

Thank you for sharing.

Yeah, I know.

We're going to close it out with some predictions on AI as a whole.

Please jump in.

Hot takes, welcome.

Let's see.

Let's start with, what are your favorite Google Labs projects that we didn't talk about today?

What are the gems right now that you're most excited about?

The unreleased Google products that we're not allowed to talk about.

Not the unreleased products.

But you guys just announced 50 things.

There has to be others beyond the three we talked about today.

I have one, which is still in this video and image space.

But I think the virtual try-on stuff that we presented, there's a lot of exploration in it.

And I think that one to me is really nice because I think it meets a real direct user need.

It's the strength of Google, obviously, where you know we have all the inventory and you know how to connect this.

And it's just so fun to just see things on your-- so I'm very excited about that one.

I think this has like a big potential.

That's my favorite as well.

That's so funny.

OK.

I think that was a good one.

Stitch, I think, is really cool to be able to just talk to the product and describe what design you want and have it actually come out with that front end design.

I'd been using it a little in dog food before it was launched.

And so it's just-- I want to spend more time using it now that it's actually live.

Really cool.

What about you?

Well, mine is going to be Stitch.

So I'm going to have to think about it.

Two votes for Stitch, one vote for shopping.

Yeah, yeah.

I'm with you.

Two votes for shopping.

Yeah, there we go.

There we go.

What-- I guess, what areas do you think will be hottest in the application space for AI broadly in 2025?

Like, I think coding was maybe the breakout application in the last 12 months.

What do you think will be the breakout application in the next 12 months?

Video.

Yeah, I think there's something around these remixable content.

You generate something.

I take your thing.

I just riff off of it.

Yeah.

There's something around this that I think is going to pop up somewhere.

I hope it's us.

But that part feels really interesting.

It's going to feel like whiskey is heading a bit that way.

Vue obviously can power a lot of this in video.

Yeah, I think that's going to be something this year.

As you look back at past predictions of what you thought was going to be interesting in AI, where have you been really right, and where have you been really wrong?

We were just-- Well, let's say we're really wrong altogether.

Three, two, one.

Timing.

OK, same, Mark.

I think there's been several examples where we definitely felt like we were on to something, and we were on to something.

We were just too early into the space.

And so it's been fun to see projects kind of go on pause, or stop for a little bit.

And then some of them are starting to even come back around again at this point.

And so sometimes we just were a little too early.

But it just gives us a jump start when the models and the capabilities are ready.

Good problem to have.

What do you think you've been really right on and sticking to your convictions on?

I think this, at least for me in my space, the show and tell piece.

This idea that you shouldn't ask users to write two pages of text to describe, for example, an image.

The idea is you should be able to show and tell like you would do a friend or an artist that's working with you.

I think that has stuck in this.

It's kind of moving people away from prompting and towards kind of instructing and relying on the intelligence that lives behind.

So I think that's one.

But when I'm sticking with my guns, and I think you're here, it's there to stay.

Yeah.

I mean, this is probably obvious at this point.

But when we all started in labs, there was no Google LLM API.

Google didn't have a functional instruction tuned language model or anything like that.

And believe it or not, back then, in fact, I think the general consensus was that these were not really things that are easy to build a business around because of their cost.

And I think one of the things that we've all done actually is we've kind of stuck with the technology.

And now it's obvious.

But in the early days, it certainly was not obvious.

So yeah.

And we got that bit of timing right.

Yeah.

So the conference costs just writing that curve and just capabilities up, costs down.

And what will you build assuming that those curves continue?

Yeah, exactly.

In fact, when we joined, one of the traditions is right to think inside labs that George started, actually.

And a lot of the docs that we write were around, well, what happens in two years?

And of course, yeah, that curve is something that I think inspired a lot of us.

Yeah.

Fantastic.

Thank you all so much for joining to share what you're doing across the creative sphere, the computer use sphere, and the-- what do I call the notebook sphere?

The podcast killer slash-- Lucky.

Let's not say podcast killer.

But yeah, we can say knowledge.

Knowledge, creation, transformation space.

It's really, really cool what you all are building.

And you guys have such a cool job getting to kind of cook in the little test kitchen of Google.

And thank you for giving a preview of some of the stuff that's coming down the pipeline.

Thanks for having us.

Very welcome.

Thanks for having us.

Thank you.

[MUSIC PLAYING] [BLANK_AUDIO]