
The Cognitive Revolution · 2025-02-06
Gemini 2.0 Flash GA, Flash Lite & Pro Strategy with Logan Kilpatrick (Google DeepMind)
Hosts: Nathan Labenz
Guests: Logan Kilpatrick
Why it matters
Google DeepMind's Logan Kilpatrick on Gemini 2.
Key claims
- Gemini 2.0 Flash hits GA at $0.10/M input and $0.40/M output tokens; Flash Lite preview preserves 1.5 Flash pricing for cost-sensitive workloads.
- Experimental Gemini 2.0 Pro released; positioned around coding as Google's frontier domain, with pricing likely following prior Pro patterns.
- No Ultra-tier model: Kilpatrick says it's an open research question given how fast smaller models are improving and reasoning's compute trade-offs.
- Gemini app, AI Studio, and the Gemini API are now consolidated inside DeepMind for end-to-end research-to-production ownership.
Episode summary
Summary
Logan Kilpatrick joins The Cognitive Revolution to announce the general availability of Gemini 2.0 Flash at $0.10/$0.40 per million input/output tokens, alongside a preview of Flash Lite (a smaller, cheaper variant intended to preserve the 1.5 Flash price point) and an experimental release of Gemini 2.0 Pro. He frames the lineup as Flash Lite, Flash, Flash Reasoning, and Pro, noting Pro is positioned around coding excellence and a roughly Pro-tier pricing pattern, while flagging that an "Ultra"-class model is an open research question given how quickly smaller models are catching up and the compute trade-offs involved.
On product direction, Kilpatrick explains that the Gemini app, AI Studio, and the API have been consolidated into a single DeepMind organization that owns research, modeling, and production end-to-end—accelerating the path from research to developer access. He highlights text-to-app builders (bolt.new, Cursor, lovable, V0) as the most exciting near-term frontier and claims Flash delivers ~40x cost reductions vs. incumbents for those workloads. He also discusses the Multimodal Live API for co-present, real-time interaction (currently session-limited to ~10 minutes), native multimodal output (image/audio in early access), Imagen 3, and the thesis that reasoning is the unlock that will make very long context truly useful.
On the broader ecosystem, Kilpatrick calls for a unified benchmark/leaderboard platform and personal eval tooling (Kaggle is pushing on this), pushes back on the "commoditization" narrative, and says Google will start talking more publicly about reinforcement learning and reasoning work. Fine-tuning for 2.0 Flash is not yet available. Via his solo fund, he flags vision-language startups replacing domain-specific CV, agent-enabling reasoning, evals infrastructure, and "agent-to-website" middleware (attribution, negotiation) as the areas he'd most like to see pitched.
- Gemini 2.0 Flash hits GA at $0.10/M input and $0.40/M output tokens; Flash Lite preview preserves 1.5 Flash pricing for cost-sensitive workloads.
- Experimental Gemini 2.0 Pro released; positioned around coding as Google's frontier domain, with pricing likely following prior Pro patterns.
- No Ultra-tier model: Kilpatrick says it's an open research question given how fast smaller models are improving and reasoning's compute trade-offs.
- Gemini app, AI Studio, and the Gemini API are now consolidated inside DeepMind for end-to-end research-to-production ownership.
- Text-to-app builders (bolt.new, Cursor, lovable, V0) cited as the standout use case, with claimed ~40x cost reduction vs. competitors.
- Multimodal Live API supports real-time co-present interaction; currently session-limited to ~10 minutes while scaling memory/state and cost.
- Reasoning framed as the unlock that makes very long context (potentially "infinite context" per Jeff Dean) actually useful.
- Fine-tuning for 2.0 Flash not yet available; Google plans to expand RL/reasoning storytelling publicly; Kilpatrick highlights vision-language, agents, evals, and agent-website middleware as startup opportunities via his solo fund.
Source material
Transcript
We sort of released the experimental first iteration of Gemini 2.0 Flash back in December.
Today we brought Gemini 2.0 Flash, an updated version of it, into production so that developers can actually continue to build with it.
We announced pricing, 10 cents per million input tokens, 40 cents per million output tokens, which is I think a huge accomplishment for us to pull that off.
We're going to have the world's best coding model at Google, and I still believe this deeply.
And I think like Pro is going to be that model.
And a bunch of the reasoning work that we're doing is going to be that model that continues to push the frontier for us.
The world needs a platform in which it's hosting all of the sort of publicly available benchmarks and sort of leaderboards and stuff like that.
I find it incredibly difficult to just like navigate and get a snapshot of like, how good is this model?
There's like 20 random benchmarks here and 50 random ones here.
They're all split out over the place, and it's just like hard to keep track as a developer.
Logan Kilpatrick from Google DeepMind, product manager of the Gemini API and AI Studio.
Welcome back to the cognitive revolution.
Thank you for having me, Nathan.
I'm excited.
I'm hopeful that I'm getting close to the record for the most times on your podcast.
So I appreciate you for all this.
I think this might be setting the record at four if my count is correct.
So, yes, congratulations.
That's rare error and well deserved.
So it's launch day.
We'll get to everything that you've launched and what we should be thinking about building with it.
Quick little detour though, before we get there, you're now part of DeepMind.
So, you know, Google obviously is a vast company and is continuing to, I don't know, align, restructure, streamline itself to focus more and more on AI.
What's the story from the inside on what it's like to be at DeepMind now specifically?
Yeah, I'm super excited about this.
So we've been, you know, I joined Google 10 or 11 months ago, literally from day one.
It's been a deep collaboration with DeepMind.
DeepMind's gone through all these evolutions over the last few years, transitioning from an organization doing fundamental research to sort of actually productionizing models.
And then within the last three months with the Gemini app moving over and then AI Studio and the Gemini API is now actually an organization that like end-to-end does the research, creates models and then actually brings them to products inside of Google.
And I think that's been a shift for them.
But from my personal vantage point, like I think this is the thing that makes the most sense.
Like being really close to research and we already were really close to research through this collaboration we've had, but removing as much friction as possible for us to bring the researchers who like actually know how to bring in many cases, like get the most capabilities out of the models, like bringing those two things together makes a lot of sense and it's going to be a ton of fun.
So as an external person like who doesn't care about Google Reorx, which is most of the world, the thing that you'll hopefully see is an acceleration of model progress, but also an acceleration of product progress because we bring these two teams together.
Well, it sure seems from my vantage point on the outside that everything is accelerating and we've had previews of some of the stuff that is now going general availability today over the last few months.
And of course, there's been just one advance after another from DeepMind and others over the last few months.
Looking back a little bit, what would you say are the customer success stories and or just like coolest apps that you have seen come online that have been built with the Gemini API in recent time?
Yeah, I think the thing that I'm most excited about and also feels like we have the biggest opportunity here still is around all these like text to app creation softwares and like there's a bunch of examples of these like bolt.new just went live.
I think yesterday with Gemini support, Cursor has Gemini support now and is using 2.0 Flash.
Hopefully, we'll see others like lovable and V0, etc., etc., have that support as well.
If you look at just like the economics of running those products, it's extremely cost intensive, especially as like the number of people who know that you can actually do that use case today of put in a text prompt and like get a basically working app/website for free essentially is like a very small number of people and it feels like that's like this new frontier use case/product paradigm that I think is going to be picked up across like all the big players are going to do this, but I think also there's going to be a ton of startup activity in this like how do you just build domain specific software for people without those people actually having to know how to code.
So I'm really excited about that use case and I think we have a lot more model progress to still do to like become the world's best model at doing coding, but I think even for 2.0 Flash and 2.0 Pro from where we were six months ago, like it's just an incredible amount of progress and I think trying to keep pushing on progress in the context of Flash without like increasing the price in any dramatic way, I think has been the biggest win for us like I've seen tweets of like the LLM usage cost for some of these startups and you know, you can imagine what those and it's like on the order of like 40 or $50,000 a month and you can imagine what those costs would be with Flash.
It's like probably $1,000 or less or something like that.
It's like a 40x cost reduction, which is just crazy.
So that's one of them that I'm excited about.
I think we also previewed in December the multimodal live API, which allows you to do this sort of collaborative real-time conversational video text interface with the models and that feels like it is getting us closer to the future that I think we are sort of all been promised with AI, which is this like co-presence that's able to see the things that you do and interact with the services you do.
So I'm really interested and I've gotten like the entire spectrum of outreach from people who are using it to help them do coding to like people who aren't developers who are just using this product experience and they're blind and they're actually just you know, trying to navigate their daily lives using this tool and it's like crazy that you know, the product that we're building sort of as a demo for developers to drive adoption of the API is actually helping people who are just trying to live their daily lives of this product.
So I think it speaks to where we are in the adoption of this technology, which is like everyone is sort of waking up and like trying to find the best way to use these tools, which is really interesting to see happen.
Let's put a pin in the lovable and bolt discussion because I actually you're at your team, you're up perfectly because I think immediately after this episode, the next two episodes are going to be with the founders of those two companies.
I love that.
I am we're calling it software supernova because it does seem to be we're kind of mid transition to mid tipping point right now.
I feel like where it is becoming like pretty realistic for people that don't know how to code to create at least basic full stack applications.
Obviously, they're not going to create you know, enterprise platforms just yet, but you know, the models continue to advance.
So it feels like we're in a very different world quite soon with products and paradigms like that.
In terms of living my daily life, I have tried both the AI studio version, which is the desktop experience where I can share my screen with the multimodal Gemini and I've also been using on the mobile and I don't know why but for whatever reason, OpenAI only has that experience enabled on the mobile app as far as I know.
So I've been trying both of them.
This is too stupid, but it's also too good of an example.
So I got my kids the Nintendo switch for the holidays and we're going through historical video games, right?
Because I feel like they're young and the modern games are like I feel like too over stimulating and plus like let's burn through the old catalog first.
We'll work our way up.
So we're doing you know, original Nintendo games Nintendo 64.
So we're playing Mario 64 for the original Nintendo 64, which is like this open world game where you like go around and hunt stars and whatever.
But the frustration for me is I often don't know what to do.
So I've been sitting there with the the advanced voice mode on and sometimes, you know, also like showing at the screen and just telling it what level I'm in and having it tell me what to do in the game.
Like what is my objective?
You know, where should I go to find the star and my kids are really getting used to this and my even my one-year-old, you know, sometimes now like comes and wants to take the phone and he's like try to talk to AI.
So it is I think and for seniors to you know, my grandmother, I think about all the time this switch from a you type to it and it gives you you know, text back to it really can be co-present with you everywhere.
I feel like I've only still you know, only dipped a toe in that world.
But man, does it feel like a very different world for people that are not anchored to their desk all day, you know, doing computer work.
It's like wow, that could really just be I mentioned putting it in glasses too.
I'm sure you know from the company that made Google Glass.
I'm sure there's a lot of thought going into that sort of thing.
So I can yeah, that's my model.
Yeah, I feel like and my guess is we're actually going to see a lot of this, you know, I think if folks have been watching this closely, like, you know, it takes time for this progression to actually happen.
I think like text was the best example of this.
Like it was kind of a toy demo a year and a half ago and it's like now at large scale like text LLM applications are like broadly being used throughout the world at at billion user scale.
I think this co presence like there's it's going to take us like to be sort of humble of the place that we're in today, like we're going to have to iterate on the API, we're going to have to bring the cost down, we're going to have to make it so that it's like actually something that developers can build because you can imagine the like co presence cost of having AI with you all the time is like probably pretty expensive.
Like there's a reason that we'd limit the sessions to only 10 minutes right now because there's a whole lot of challenges to like scale us up beyond that and have it maintain memory and state and context of all the things that you were just talking to it about.
But to me, it's like very clear that that's the direction that we're going to go in and I think it'll be interesting.
It does feel like everyone, you know, again, like two years ago was talking about like, oh, everything's a wrapper.
There's not a lot of value to be created and it just feels like that continues to be wrong.
Like it feels like there's a new thing and then like all of a sudden like all of these new things that were not possible before can now be created which continues to just get me excited about like the future and also like making sure that we're enabling those next things so that people show up and create the experiences that don't yet exist today.
So it's going to be fun.
I'm excited and hopefully send over feedback as you keep using the real-time mode.
Cool.
Yeah, we'll do.
Hey, we'll continue our interview in a moment after a word from our sponsors.
It is an interesting time for business.
Tariff and trade policies are dynamic supply chains squeezed and cash flow tighter than ever.
If your business can't adapt in real time, you are in a world of hurt.
You need total visibility from global shipments to tariff impacts to real-time cash flow and that's NetSuite by Oracle your AI powered business management suite trusted by over 42,000 businesses.
NetSuite is the number one cloud ERP for many reasons.
It brings a county, financial management, inventory and HR all together into one suite that gives you one source of truth giving you visibility and the control you need to make quick decisions and with real-time forecasting you're peering into the future with actionable data plus with AI embedded throughout you can automate a lot of those everyday tasks letting your teams stay strategic.
NetSuite helps you know what's stuck what it's costing you and how to pivot fast because in the AI era there is nothing more important than speed of execution.
It's one system giving you full control and the ability to tame the chaos.
That is NetSuite by Oracle.
If your revenues are at least in the seven figures download the free ebook Navigating Global Trade three insights for leaders at netsuite.com/cognitive That's netsuite.com/cognitive Just taking one more beat on the cool stuff with the Gemini API.
Last time we talked about a couple things I wanted to check in on the status of.
One was really long context.
Another is just insane affordability as you mentioned like a huge drop compared to other options.
And then there's a sort of unclear exactly how sort of natively the video is being consumed but you can just like you know feed video audio multimodal inputs into the Gemini API as well.
What would you say are kind of the status of those things right now?
Are you seeing use cases where people are like 200,000 tokens is not enough?
I really need to provide more and therefore like Gemini is the one thing I can use or you know cost wise this just wouldn't be affordable if I didn't have this 10 cent per million input token pricing.
Yeah, this is a great question.
I had a long conversation with Jack Ray yesterday who's one of the co-leads for the reasoning efforts and he previously worked on sort of the long context breakthroughs with Gemini and enabling that from a research perspective and is now one of the co-leads with Nome on the reasoning models.
And we sort of opine for a long time about how it is funny that like maybe the real unlock for long context ends up being reasoning because like long context is extremely impressive and like it's really useful and we do see people using it in production and I do think there's cases where like 200,000 makes a lot of difference versus like a million or two million.
But like one of the inherent challenges is just like how many things can the model attend to in the context window?
And like it works really really well if you're just asking questions about a couple of things that are in the context window but if you're trying to put together a thousand different things that are in the two million context window it gets really hard to do that just because the inherent nature of how the models are trained and set up.
I think reasoning is maybe where this starts to change.
So I think like us having this really long context window where the models can actually just like think through and perhaps in the future like use tools and like bring information in and out of the context window I think is where long context is going to start to make a lot more sense and like truly become like an enabler.
That's where my head's at from a long context point of view.
I think video stuff still yeah video audio still happening natively image still happening natively inside of the models with the 2.0 release we showcase some of the like next steps of this native multimodality which is the models actually being able to output images and audio and that's available to early testers.
I don't know if you're in the early access program or not Nathan if you've played around with it yourself.
I have.
It's pretty good.
I think we still there's like more quality work that needs to happen still but I think the where this really starts to and we're about to actually at the time of this recording tomorrow we'll roll out imagine 3 in the Gemini API which I'm super excited about and if you sort of look at like why do we still have state of the art image generation models when we know that the models can sort of natively have this capability themselves like there's definitely this quality trade off in some domains where you trade off quality for world knowledge and I think the world knowledge case is like actually where this is really interesting like there's a ton of models out there that generate like really really pretty pictures and can do all types of cool stuff but they lack this world knowledge that the Gemini models have because it's this native capability that's coming as part of the training process.
So I think there's going to be this whole new onslaught of use cases which like didn't work before because the models weren't smart.
They were just good at generating images that will see happen with native image generation.
So hopefully we'll get that out soon as we continue to hill climb on quality.
Cool.
Yeah, that's interesting.
I think your point about the need to have long chain of thought in order to really fully take advantage of super long context is quite interesting.
I have been noticing for myself in like application development.
I do find like I want the hardest thought right so I'll go to the model that's going to think for me the longest and then I sometimes have to like contort myself or contort my inputs to get it to fit into the context but I do feel like yeah, I wouldn't 10x dump context into a model that is going to like immediately jump to the answer and I hadn't really put together why that might be but I think that's a pretty interesting hypothesis that I could imagine.
I look forward to dumping my full million token code bases into the flash reasoning sooner rather than later.
Yeah, I'd love to see if you have use cases that haven't worked well historically for long context.
I'm super curious if like also folks in the audience have.
Yeah, like you can do the compare mode right in AI studio.
Try it with like 2.0 Pro with long context and try it with reasoning with long context and see whether or not like the extra reasoning steps actually make a difference and my intuition is that it is going to which is exciting.
Yeah, cool.
Okay.
I like that.
Have you seen anything passive?
This is another area where I feel like flash, you know, very, very minimally famously said more people should try to spend a dollar a day on flash and you know, that is a lot of tokens.
And so it's kind of like you need passive applications to for most people to get there.
So I've been thinking about a couple people have asked me about could I use a vision language model to like monitor my factory floor for, you know, safety incidents or you know, policy violations or my grandmother lives in a senior living community and you know, they seniors like hate to wear their fall monitors.
So that's a constant battle and they probably a lot of them would, you know, to the degree that they have and my grandmother does have the ability to like make her own decisions on this probably would, you know, accept like not having to wear that thing if you know there was another sort of visual monitor that could, you know, keep track of her state at any given time.
Have you seen anything like that where people are just kind of truly passively almost Internet of Things like sending signals into the API?
This is a great question.
I think it is so core to my thesis of like what's going to happen for a lot of these domain specific because like and if you take a step back like how would people solve that problem today like you would need to go and buy some custom software that does that, which probably is expensive.
It might not generalize well like in my past life.
I was a machine learning engineer and we did a bunch of stuff with security cameras and like what it would look like.
How would you track someone moving from one frame where a camera is visible to another frame and how to keep object permanence of that person and like it's incredibly hard like it is not an easy problem to solve with traditional computer vision technologies and I think vision language models like just do this task incredibly well and the cost basis is now with flash like so low.
I think I haven't talked to anyone who actually has this in production, but I have to imagine that this is like the opportunity that people are going after and it's not it's actually not just the like bounding box use case and image understanding use case, which I think is really really powerful like being able to know like here's where an object is and we have a good demo of this in AI studio folks haven't tried the bounding box capabilities.
If you go to starter apps, there's like I think it's spatial understanding or maybe it's just called bounding boxes.
I forgot what the name of it is, but there's an example in there and it you can put in images and like ask, you know, to identify the objects and it'll throw bonding boxes pretty much just like you would get out of like one of those custom bounding box models that you could probably find on open source or something like that.
I think these use cases just take time, but I also think my guess is as vision becomes more and more prominent like we're going to see the whole like YC startup wave go after all of these ecosystems and industries where they're using domain specific vision models and not using like a general purpose model and your cost basis is just going to be wildly different.
And also you unlock all these use cases which those models are just like not actually capable of doing.
They're very very rigid and like can't, you know, be fault tolerant in a lot of those cases so I'm super excited about this.
So let's get into then what you're launching today.
I saw a interesting tweet you mentioned YC and the idea was basically that every time the frontier advances the some set, you know, all the all the YC companies go and kind of see like can this work for my use case and the report was every time a new model comes out like some subset of the current YC batch companies products start to work.
Others are kind of just like waiting, you know, and continuing to build all the other stuff with the expectation that they're going to get a model that's going to like, you know, tip them from kind of not quite working to working.
So what are you launching and you know, to the degree you can speculate like what is it going to make work that wasn't previously working?
Yeah, it's a great question.
I think one of the and just to draw a broader point, I think one of the really interesting observations I've had is thinking about like because of how much excitement there is about AI, how the like resource constraint has not made people think as deeply as they need to about this problem.
And I think like maybe this is me somewhat, you know, just because of like Gemini models like being at the frontier of cost per intelligence if you look at that as a ratio, but it is really like the YC companies are just like so well funded that like we bring the cost of intelligence down some reasonable factor.
And like, it actually in a lot of cases, like doesn't move the needle for the startups, like just because they have millions of dollars.
And I think it's interesting to see what the outcome of that trend is going to be.
And like if I had to guess, actually, I think it gives a lot of power to these individual developers who don't have this like large amount of, you know, financial backing from like tier one VCs is an example that can actually push the frontier of some of these use cases and capabilities, which I think is like a really interesting and cool feature.
Interesting and cool phenomenon.
But to answer your question specifically about what we're launching.
It's a whole sort of suite of Gemini 2.0 model.
So we sort of released the experimental first iteration of Gemini 2.0 flash back in December.
Today we brought Gemini 2.0 flash an updated version of it into production so that developers can actually continue to build with it.
And then we announced pricing 10 cents per million input tokens 40 cents per million output tokens, which is I think a huge accomplishment for us to pull that off and then announced a preview of flash light, which is the smaller variant of the flash model, which will intend to make available for production use very soon and the pricing for that.
And then we released the sort of experimental variant of 2.0 pro, which is the most capable frontier model we have joining rounding out the full offering with the flash reasoning model.
So now we have the sort of reasoning flash model.
We have flashlight the smallest model flash, which is sort of the most, you know, performant cost trade off wise, and then pro which is the most capable model.
And then the, well wait, so let's go down the availability too because you guys have high in my typical work, which is usually focused on like proof of concept type stuff.
I'm going to honestly love my life these days, I'd never have to really worry about you know the harder work of like making something actually production ready and much more get to focus on, you know that that easier, you know faster ascent portion of make the proof of concept work.
So, when I go to the AI studio and I, you know, grab code and go do stuff with it.
With one exception which I'll mention maybe in a minute.
I basically never hit rate limits.
So even when something still in kind of preview.
For my purposes like the limit is there's enough headroom there for me to like do usually all the testing that I want to do.
But if I'm both are lovable or certainly you know cursor or whatever, then they would be hitting those limits.
So, what is like super scalable now versus is still in kind of experimental access and just give us like the concrete on how much of these different models we can use.
Yeah, so 2.0 flash during experimental I think the AI studio UI has different has like some, we don't publish the limits because it can change dynamically based on how much capacity we have we don't usually change it but we sort of try not to publish it just in case we need to and don't want to make people sad.
But in the API, I think it's 10 or 15 our requests per minute on the free tier and then I think like 4 million tokens per minute which is a lot of tokens per minute to be honest with you so I think that's probably why a lot of people don't hit.
So unless you have like users or are trying to run a like people bunch of people have like internal evals and things are like people who are running leaderboards as another example, those are the usually the use cases where I people reach out and are getting rate limited.
So if you have like a production availability, you can now if you're on the paid tier the free tier stuff all stays the same you can keep using the model keep tinkering doing all that.
If you're on the page here you can get there's there's no daily RPD limit so you can send as many requests as you can.
So I think it defaults to 2000 requests per minute and still stays at 4 million tokens per minute, and then we're rolling out at tonight at midnight, or rolling out new quota tier so as you, you know, continue to scale usage you can unlock things like 10 million tokens per minute 10,000 requests per minute as well to help those who continue need to keep scaling up.
The infrastructure behind that is truly an incredible accomplishment.
Yeah, lots of TV use to make all this stuff happen and lots of like it's, I think, for people who aren't at LLM companies, you don't think a lot about this but it is there's just a lot of complexity and like how many models there are and like we you know for, you know, I see a lot of memes online about, you know, the bad naming conventions that we have with our models and stuff like that, which I appreciate but like I think it actually hits at a different point which is like there's there's just so many different model variants.
And it's really difficult and this is why like a lot of the feedback about our experimental model release train has been like, hey, we love these models let us use them in production, and the challenge is we have to be really picky about what model do we use in production just because like the compute footprint that it takes to like actually make it so that, you know, the cursors the bolts, the, the, you know, other yc startups, the developers trying to scale and build companies can get the compute they need is just like, it's hard, and it takes a lot of compute.
So we have to be a little bit more intentional about how we do it.
Ideally, we would just GA every model and like everyone would take every model of the production and we wouldn't need to worry about it.
But yeah, there's just a lot of constraints with with doing that.
And then, hey, we'll continue our interview in a moment after a word from our sponsors.
Yeah, if you're going to support millions of tokens a minute, I can imagine why you'd need to be prepared for it.
So, maybe help me develop my intuition for how should I think about flash light as it relates to flash, because I, my totally candid initial reaction was flash is pretty so cheap and pretty fast flashlight comes in 25% cheaper and presumably faster, but also like slightly weaker, you know, you've got the table, of course, of all the benchmarks and it's like a little less on most of them.
Have you been getting demand for like an even cheaper faster model than flash that seems like almost that's like hard for me to write my head around, to be honest.
So I think the positioning is around twofold one we wanted to give, like, technically by default the 2.0 flash price is more expensive than the 1.5 flash price.
So and especially given how much we had leaned into the sort of the low cost per intelligence of the models and really like we wanted to give people a one to one like if you were if, you know, seven and a half cents per million tokens was like the thing that was enabling your business, and we showed up and said hey by the way now it's 10 cents, it just didn't feel like a great story for developers given how much we leaned into that narrative so being able to have an option where, you know, it really was a direct, it was not only a better model, but was the same exact cost as you were getting before and I think for the 2.0 flash model just like because of a bunch of constraints like just did it wasn't going to be possible for us to keep that same level of price.
So, really about that like wanting to make sure that we didn't sort of mislead customers into thinking that they were going to, you know, be able to continue to push on whatever the Pareto frontier is that they care about with cost, cost intelligence.
And yeah, like I think there's also like some other things that as we think about how do we make the cost lower for these models like there's a bunch of features that aren't supported, like more of the like high end things, like that those flashlight models for example will never be able to do native image generation or native audio generations a lot of things we can do to keep the cost down as far as like serving those models at scale, which the 2.0 flash default version doesn't have.
It's also kind of similar to I don't remember when we talked last about this but we also put out that flash 8 billion parameter model.
So, you can kind of think about, we're not releasing the size of the flashlight model publicly but you can think about that as like another version of the sort of small model train that we'd previously done with flash and with flash AP.
Flash AP was like the most folks have looked at open router before like it was the most high token volume usage model, like on on open router which again is like a reasonable proxy for model usage and some in some contexts.
I think the clear feedback from us was developers love low cost models and like there's a huge amount of new use cases you can unlock by continuing to reduce the cost down so I think, you know, if we could have made an even cheaper model, I would have pushed for that as well.
I would love to hear if anybody's listening who fits that description of a 7.5 to 10 cent price change per million tokens would have made a meaningful difference for what you're trying to do in the world.
Reach out to me I want to hear that.
I really, I find that pretty hard to imagine I can easily imagine how people might just look at a menu and say I'll take the cheapest one because I'm just, you know, processing whatever and you know extracting addresses out of whatever, you know, stream of data so I can, I can imagine choosing the cheapest by default, but I have a hard time imagining how a business model gets disrupted by that kind of change but I'd welcome that.
I think you're right about this.
I think a lot of this is just like about how we tell the story to the world of how we show up and I think like it'd be easy for that to become like the narrative we don't want especially given how much we've pushed on reducing the cost for developers is like, you know, Google's raising the price for developers like that's not the narrative we want so it was really important to us to like preserve the continuity of making sure that we have that low price point available for developers who care about this but I think generally I agree with you like especially given how much the other signal that we've gotten is like, if you have better models like the story of a lot of the models in the ecosystem is if you have great models people will pay for them like that's actually not the limiter and a lot of cases for sure.
They're all cheap compared to human labor that's, I think, you know, such a striking two by four to the to the forehead the fact that it is often kind of glossed over.
Let's do the pro side.
So how should I think about pro you compare it to flash you can compare it to other frontier models that are out there.
But how should I kind of understand it in the increasingly busy constellation of available models.
Yeah, I think pro is where you're sort of not bound by costs in a lot of ways like we didn't release the price of the pro model yet, because it's still experimental but it's going to probably roughly follow the similar patterns to what some of the pro models have been in the past which is like, just a lot more expensive.
So it has to be and I think there's actually, you know, if you look at like what is the traditional advice for developers as they're building sort of the frontier applications is go with the model that's best, even if it's a premium, make your use case work and then figure out a way to sort of bring the cost down over time by switching to a smaller model or optimizing doing fine tuning whatever it is.
So I think it's important for us to like continue to honor that flow which I think actually works like there's a lot of that this is the default experimentation path that developers go on today.
I think specifically like the use case where we're seeing the best performance relative to the other domains is encoding and I think like we're going to continue to and I had a glib tweet, you know, a while I don't even remember when it was the day before.
Oh three I think got announced about how we're going to have the world's best coding model at Google, and I still believe this deeply.
And I think like pro is going to be that model and and a bunch of the reasoning work that we're doing is going to be that model that continues to push the frontier for us and coding and like it's a domain we need to win, especially if you believe in and we were talking about the sort of text to app creation stuff before, especially if you believe in that sort of trend continuing and the acceleration of developers continuing from like internal software engineering productivity standpoint.
So I think that's probably the best, the best use case for it.
It still has 2 million context, maybe it'll have longer context in the future.
So that might be something as well.
Yeah, we've heard whispers of infinite context.
You and I were sitting in that room together I'm pretty sure when Jeff Dean said we were getting infinite context some point in the future.
He didn't put a date on it at IO last year and that session we were in together so yeah hopefully I think that's going to be a huge, a huge unlock for folks once we lined it.
What would you guide people to think about the non release of it was ultra right was the largest scale we've kind of seen the same thing obviously from anthropic where you know it's like, wait a second, what happened to Opus?
Is there anything you can share to help people just generally understand like why we seem to have gone from small medium and large to now just small and medium.
That's a good question.
I think so, the historical context on ultra was basically trying to prove out the research direction that this that this, you know, scaling was going to continue.
The reality is they sort of proved with Gemini 1.0 when the original ultra sort of model candidate came out that that that was the case, but also then like there was just all of this like continual rapid innovation of like making like all of a sudden the, the pro model was better than all of a sudden like now I'm pretty sure flashlight is better than the original ultra model so it just becomes like this question of what are the cost trade off infrastructure equation to keep in mind and like you know you could imagine a world where we have an ultra model, and it's 5% better on every benchmark and it's, you know, five times larger and costs five times even more than pro and and like you just start to think about like do the math and think about like what from a research perspective like where does it make sense to spend our time and energy and also from an infrastructure footprint perspective where does it make sense to spend our time and injury and like we continue to see gains on like making models, much higher quality at the same size or even a factor of a size less than previous models, and I think now especially with with reasoning, there's even more question marks in my mind of like how much like if we could get a model that was even 10% better on every benchmark, like does that make sense given a world where we're not going to keep scaling pre training as like so much scaling and like there's just a lot of low hanging fruit work that we can do there.
I think back to the conversation I had with with Jack.
I think like whether or not we like officially release an ultra model, I think is an open like technical research question, just because of like all of those constraints.
And like, it's also kind of tongue in cheek we're like you know we could rename all the models and say that we have the ultra model if we wanted to and like maybe that would have been the right thing to do historically and make pro ultra and then remap all the other names and I think trying to honor like the the essence of what was intended through that ultra model I think is part of the thinking.
Yeah, I'm on the side of anything that maintains any clarity and naming schemes so I'm with the consistency that's the only reason I even know to ask this question right I mean if it was all renamed I would even just be even more confused than I am.
So, okay, let's go back to coding, and maybe, you know, think about the likes of bolt and lovable and cursor and Devin.
And interestingly, you know, many many options I can't even keep up with all my AI, you know, coding paradigms these days.
And they probably can't keep up with all the models.
So the question I have for you is like, okay we got all these benchmarks, probably, you know, the new one versus the old one kind of wins on some benchmarks loses on some benchmarks, whatever.
So, you know, the coding thread that I wouldn't say is like dominant but is definitely got some weight behind it over the last, say, month or two has been.
Yeah, a one is great.
Even though three many is great, you know, potentially Gemini Pro is great but for some reason people still seem to think called three five sonnet new is the best coder and the best coding assistant, and they haven't in fairness had time, you know, to compare it to the best coding assistant.
But how would you suggest people think about this, you know, the simplest thing to do would be like swapping, you know, a model in exactly the same situation that they currently have like, you know, one other model performing the best, very easily say well you're probably leaving some performance on the table because these two models are probably not going to have maximum performance under the exact same conditions they're each going to have their different conditions that elicit the best from them.
But how do you find it, you know, and how do you know how much time to invest in that and like, yeah, I don't know it's it seems very difficult and it's even difficult for me as just like an obsessive hobbyist.
What's the, what's the best practice or how should one think about absorbing a new Gemini Pro and comparing it to a cloud three five sonnet and a no three mini in today's world.
Yeah, this is such a tough problem space, and I just have an incredible amount of empathy for developers and founders and people who are building stuff because like, there is no silver bullet.
So, I'll give a couple of reactions to this like general thread first like my own personal example that sort of I think underscores at the second point that I'll make.
I was doing a bunch of you know I was doing the normal web developer thing the other day of trying to get the corners of a table rounded and I was, I was smacking my head on on this problem because of a bunch of weird constraints in the environment that I was working in and using the Gemini models, you know I became fed up at one point and I was like you know what, maybe the Gemini models just aren't that good and like I'm gonna go try clawed and see what it does and I went out of the environment that I went out of AI studio went into cursor, tried it on this like very simple prompt, and it worked single shot everything just worked.
And I started to ping a bunch of people frustrated and I was like this is, you know, we need to keep making coding better yada yada yada.
And then I went in just for my own sanity I went and reran the exact same prompt and stuff that I did again with Gemini models, and it also worked.
And I think like this was a good example of like I think just like like the prompt was really like I was just doing a bad job prompting and I pulled myself out of this environment that I was in of this like iterative loop with the model and I started over from scratch again and formatted the question in the context in a different way and like it worked with both models.
And I think this point underscores this like vibe slash like incredibly unstructured scientific way in which people make these decisions today.
And there's like a lot of the I won't name names of products but like work super closely with lots of teams who have LMS in production, and you would be surprised of like insert your favorite LMS product like in a lot of cases, how few evals people actually have to understand like what are the metrics that matter for us as we build our product and service.
And I think there's like two ends of this coin I think like one, the world needs a platform in which it's hosting all of the sort of publicly available benchmarks and sort of leaderboards and stuff like that I find it incredibly difficult to just like navigate and navigate and get a snapshot of like, how good is this model there's like 20 random benchmarks here and 50 random ones here and I gotta go, you know, look at Minecraft, because though that benchmarks really cool I love that benchmark but like, they're all split out over the place and it's just like hard to keep track as a developer and like to your point like, this is my job this is what I spent every waking moment of my life doing and like, it's still difficult to keep track of this and you could imagine for people who have much less time and are, you know, doing other things.
It's just hard to keep up with it so I think someone needs to build this platform and maybe this is like the, you know, you and I should do the Y Combinator like call for startups thing and like that's that is the call for startups like build this platform to help people really bring all this stuff together.
And then separately I think, and the folks at Kaggle actually are pushing on this notion of like, having personal evals and being able to sort of build a platform in which as new models are released and made available to the world you can sort of just run your personal evals will just like run behind the scenes on those new models and you'll just get an email that says like hey this model, based on what you've told us like might actually be one that you should spend your time checking out because it's really good at these things and you know those are the things that you indicated you care a lot about so I think that type of platform and product experience like taking the burden off of developers to sort of just do this is going to be awesome.
The challenge in that case is like you still have to make your personal email to begin with but I think that like one time cost is a lot less than like having to do all the spin up cost every time a new model hits the market.
Yeah, it almost seems like if I were going to take one sort of practical wreck out of all that from four developers I might say let your users choose, because at least that way you can like get some data and they can sort of feel a little more agency and you know, maybe there's something good to find there.
I think the personal benchmarks too but the things I care about typically don't have like a right answer you know I get way markets always the same challenge like what makes a good video.
You know, we can have like a language model judge but now we're in Hall and mirrors territory and there's just, we just don't have any at all cop to being like, we don't really have evals that work for us, you know, we can detect outright failure to follow the structure or you know it's too long.
Like, some guardrails will be like the clear thou shalt not I call them the thou shalt not have this task, like we can, you know, we can detect those, but past that it is really still vibes.
And yeah, it's interesting, we don't actually let our users choose but that could be an interesting, our users also don't even know any, you know, they're not in this world so the developers use cases more what I had in mind when I was thinking, let your users choose.
I think that's it's a trend that you you hit the nail on the head, like, and I don't even know how much of this is like a conscious decision that founders are making but if you look across like, again, some of the, you know, choose your favorite LLM products, with the exception of the market, like developers are, and like actually end users have this choice today like you go in there's a model drop down and most of those products and like a lot of them that are not created by large model providers have most if not all of the models available and like a timely example of this is co pilot like co pilot now, you know, was historically just powered by GPT models two years ago and I think like the developer community and like where the world has moved, it now has a model drop down and you can choose the Gemini models or the open AI models and like I think more and more products are going to go that route.
Also as like the.
This is maybe corollary to a point that a lot of people make but like, I think, you know, people talk all the time about like the commoditization or like the contraction of like the the delta between these models I think that's actually not true I think there's like a lot of this weird nuance that is going to continue to sprawl out over time where like you will get substantially different answers from, you know, insert whoever your favorite model provider is over time, even if like the capability is similar on paper there's still all of this stuff that will be different.
And I think it'll be important that people continue to have the choice because I think that that subtle difference makes a lot of impact on like the end product experience people are going to have.
Yeah, no doubt about that even deep seek over the last couple weeks has I think shown a very different profile and is probably the most, you know, and perhaps not surprisingly given its, you know, source is, it's like closest to base model is probably my best description of it and in that for that reason it can like write sometimes in ways that are like extremely compelling, but it's also much less behaviorally refined in the way that the Western, you know, at least top tier model provider created models are.
So, yeah, what is and that's all with you know very similar benchmark results right so it's very, there's a lot beneath the surface that remains to be unpacked with with any major new model release.
Let's talk for a minute I know you got to go before too long.
Last things I wanted to cover were briefly fine tuning, because as a developer that's something I'm always interested in and love to know the status there.
Then, maybe, you know, what's up with reinforcement learning at Google and we've seen the thinking model.
It struck me that there wasn't much actual mention of reinforcement learning whereas like the other developers are coming out and saying, we did a reinforcement learning model.
So like, do you mind has kind of positioned it differently although I assume that a lot of the same techniques are happening under the hood but I don't know.
And then I was actually going to ask for your call for startups to because I know that you have recently raised a solo venture fund, and we'll see if we can get you some deal flow.
So, yeah, fine tuning reinforcement learning your fund.
Take as long as you have.
Fine tuning is something like I'm, I continue to be incredibly bullish on fine tuning I think the future where everyone in the world is using their own fine tuned model a version of the model that's that's sort of has the context that they need it to have and the context isn't being like overly overly impactful on the priors of like how the model makes decisions I feel like is like the rough estimate of like how I or the rough explanation of how I think about fine tuning so it we don't have it yet for 2.0 flash, I think we need to I think we you know, we've been having a lot of internal discussions about like, what is the, you know, size of investment we want to make and fine tuning.
To me personally I think this is like one of the biggest opportunities your point about developers wanting this I completely agree.
So we're we'll keep pushing on it, it's not available yet, hopefully it will be available soon, and like even more than it is like there's a lot of limitations on how we do fine tuning at 1.5 flash today you can't do images, the, yeah, there's a bunch of rough edges so we need to solve all those things and make it so that it's really a first class experience.
Your second question reinforcement learning reinforcement learning Yeah, there's RL is part of making those reasoning models.
Part of this historically is that like why have we not talked a lot about it, we did this sort of normal low key approach of doing the release, which is what we've historically done for our experimental models I think we're actually like as the world is getting more and more excited about the sort of what's happening with reasoning models like we're going to start talking a lot more about that work.
And yeah, I'm excited for us to tell more of the story about the work that we're doing it's actually like the reasoning narrative inside of Google is the one of the narratives that gets me most excited about the direction that we're going in.
There's just been so much progress and like a ton of breakthroughs so it'll be hopefully soon we'll tell that story to the world and yeah we'll get you some people on the on the podcast to talk about it hopefully soon.
And then yeah startups like I still this, you know, this is the moment to build impactful, interesting companies.
Vision is one of those things that I'm really excited about like I still think like take the entire ecosystem that's built on domain specific computer vision models I think all of that is up for grabs with vision language models I think there's going to be a huge amount of startup activity in that space.
I think reasoning is going to make agents work which I'm excited about just because of like how many like I haven't, you know, I'm not an agent investor in a lot of ways but like I think there's just so many companies that are trying to tackle problems with agents and like a lot of them just don't work today and I think that's actually going to be the biggest unlock we were talking about before the like vision models, enabling some percentage of Y Combinator startups to like actually work all of a sudden.
I think reasoning is going to be that that continual breakthrough that like more than anything else that happens in the next two years is going to make those companies act their products actually work which I think is really exciting for them I think it's exciting for the world as we, as we sort of figure these things out.
I think there's a whole class of startups which I think don't yet exist but I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
I think it's really exciting for them to be able to do that.
All of this new, very basic how we engage with the internet experience that is going to change over the next few years.
I think there's a lot of people who have businesses and websites and companies that are going to have to, they're not going to be able to solve that problem themselves.
They're going to need somebody else to solve that problem for them.
I think there's some interesting companies that can be built to enable that sort of how your website talks to agents, attribution happens, how you're able to capture your slice of the value creation.
That's taking place.
I'm interested to see what happens in that space.
Yeah, that seems like a really good candidate for what's going to change the world most in the not too distant future.
First of all, it seems like it's really likely to work with a reinforcement learning paradigm because you're going to get a pretty clear signal from a lot of tasks as to whether or not it succeeded.
I've been watching Payman a little bit recently.
Any specific companies you would suggest people look at or any specific problems you think are most come pitch Logan if you're working on this?
Yeah, I think EVALS is the other one.
Actually going back to this point, you making the comment that even for you as somebody who I would classify as incredibly sophisticated in this space, you understand what's happening in the ecosystem.
You understand why EVALS might be important.
The fact that it's so hard for humans to articulate what is the taste and your perspective on some of these problems into something that can programmatically happen, I think is really, really interesting.
I don't know how that problem gets solved, but actually if you've spent too much time in the EVALS rabbit hole like I have, then one of the big things is actually most of the problems in life end up being EVAL problems.
If you look at like, it's just very interesting if you follow that chain of thought how things end up happening.
Yeah, for now I just do expert demonstrations like for Waymark, we just go and say creative team, write us a bunch of good stuff.
Fine tune on that, put those into a few shot examples and hope for the best.
And from there it becomes vibes.
But yeah, it would be nice to have something better.
So yeah, well, where can folks find you if they want to pitch you or if they have and you're incredible as everybody knows online with responding to questions, concerns, issues around the API.
So I don't want to bring more of that to you than you already have, but where should folks find you if they want to either point out an issue or pitch you a startup.
I saw someone respond to one of my tweets the other day and they said, I miss the old Logan, he used to reply to all of his replies on Twitter and I was then went and looked at my how much time every week I spend on Twitter.
And I was like, it's my time doesn't scale here.
I'm already putting in.
Yeah, you're gonna need an agent.
Yeah, way too many, way too many hours.
But yeah, Twitter, LinkedIn, everywhere that the internet exists.
Hopefully I'm there helping people with Gemini stuff.
Cool.
Well, Gemini 2.0 Flash is out today.
General availability.
It is good, fast and cheap.
So definitely one to check out.
Logan Kilpatrick from Google DeepMind.
Now, thank you again for being part of the Cognitive Revolution.
Thanks for having me, Nathan.
This was fun.
Always a pleasure.
It is both energizing and enlightening to hear why people listen and learn what they value about the show.
So please don't hesitate to reach out via email at TCR at turpentine.co.
Or you can DM me on the social media platform of your choice.