
The a16z Show · 2025-11-28
OpenAI's Sherwin Wu on Building for 800M Users, Model Proliferation, and Why Determinism Beats Pure Agents
Hosts: Martine Cassato
Guests: Sherwin Wu
Why it matters
Sherwin Wu on OpenAI's model proliferation strategy and 800M weekly ChatGPT users
Key claims
- ChatGPT has reached roughly 800 million weekly active users—about 10% of the globe—making first-party distribution unmatched, but the API's end-user reach is also enormous and at points has exceeded ChatGPT itself.
- OpenAI has fully abandoned the 'one model to rule them all' thesis; the future is a proliferation of specialized models (Codex, GPT-OSS, vertical-tuned variants), and this is also why OpenAI invested heavily in fine-tuning and open-sourcing.
- Reinforcement fine-tuning (RFT) is the recent unlock—moving beyond tone-level SFT to actually improve models to SOTA on specific use cases like medical coding or agentic planning, and OpenAI is piloting data-for-compute trades.
- Models are 'anti-disintermediation technology': users emotionally notice model changes (e.g., GPT-5 launch), and developers build harnesses so tightly coupled to a specific model that API stickiness is surprisingly high.
Episode summary
Summary
OpenAI's Sherwin Wu, who leads engineering for the developer platform (API), sits down with a16z's Martine Cassato to discuss how OpenAI operates at massive scale—now reaching roughly 800 million weekly ChatGPT users, or about 10% of the globe. Wu explains why OpenAI treats ChatGPT as a first-party app and the API as a horizontal platform simultaneously, framing the tension as manageable because Sam and Greg have been principled about it since day one, and because "growth solves a lot." A recurring theme is that models behave as anti-disintermediation technology: users know and care which model they're using (GPT-5, Composer, Cursor's tab model), making traditional software abstraction layers surprisingly brittle.
The conversation traces how the industry's mental model has shifted—from expecting one model to rule them all toward a proliferation of specialized models like Codex, GPT-OSS, and vertical-tuned variants. Wu highlights reinforcement fine-tuning (RFT) as the recent unlock that lets customers actually leverage their proprietary data, citing examples like medical insurance coding and agentic planning. OpenAI is piloting data-sharing arrangements where customers who contribute data get discounted inference or free training. Pricing has stayed usage-based (a "one-way ratchet" per Ben Cott's framing) because it correlates heavily with outcomes at the test-time-compute level.
Wu also defends OpenAI's open-weight strategy (GPT-OSS) as low cannibalization risk and ecosystem-positive, and dives into why the Agent Builder uses deterministic, node-based workflows: much real-world work is SOP-driven (customer support, sales, regulated industries), not the undirected, exploratory work software engineers do. He draws a parallel to NPCs in games, where the response catalog and game logic live in code while the model handles language understanding—exactly the constrained pattern OpenAI's agent tooling is designed for.
- ChatGPT has reached roughly 800 million weekly active users—about 10% of the globe—making first-party distribution unmatched, but the API's end-user reach is also enormous and at points has exceeded ChatGPT itself.
- OpenAI has fully abandoned the 'one model to rule them all' thesis; the future is a proliferation of specialized models (Codex, GPT-OSS, vertical-tuned variants), and this is also why OpenAI invested heavily in fine-tuning and open-sourcing.
- Reinforcement fine-tuning (RFT) is the recent unlock—moving beyond tone-level SFT to actually improve models to SOTA on specific use cases like medical coding or agentic planning, and OpenAI is piloting data-for-compute trades.
- Models are 'anti-disintermediation technology': users emotionally notice model changes (e.g., GPT-5 launch), and developers build harnesses so tightly coupled to a specific model that API stickiness is surprisingly high.
- Prompt engineering as originally conceived is dead; the discipline has shifted to context engineering—deciding what tools, data, and retrieval the model gets and when.
- Usage-based pricing is a 'one-way ratchet' and approximates outcome-based pricing well when test-time compute is heavy; outcome-based pricing remains hard to implement outside specific verticals.
- Open-weight releases (GPT-OSS) show negligible cannibalization: different use cases, different customers, and inference at scale is itself a moat OpenAI's inference team provides.
- Agent Builder leans deterministic with node-based workflows because large swaths of real work (support, sales, marketing, regulated industries, even game NPCs) is SOP-driven and benefits from enforced structure rather than free-form agency.
Source material
Transcript
We want chat GPT as a first party app.
First party app's a really great way to get 800 million wows or whatever now.
Tenth of the globe, right?
Yeah, yeah 10% of the globe uses it every week.
Every week?
Even with an opening eye, the thinking was that there would be one model that rose from all.
It's definitely completely changed.
It's becoming increasingly clear at F.
There will be room for a bunch of specialized models.
There will likely be a proliferation of other types of models.
Companies just have giant treasure troves of data that they're sitting on.
The big unlock that has happened recently is the reinforcement fine tuning.
With that setup, we're now letting you actually run a URL, which allows you to leverage your data way more.
OpenAI sells weapons to its own enemies.
Every day, thousands of startups build on OpenAI's API, many trying to compete directly with chat GPT.
It's the ultimate platform paradox.
Enable your competitors or lose the ecosystem.
Sherman Wu runs this high wire act, deletes engineering for OpenAI's developer platform, the API that powers half of Silicon Valley's AI ambitions.
Before OpenAI, he spent six years at Opendoor, teaching machines to price houses where a single wrong prediction could cost millions.
Today, Sherwin sits down with A16Z general partner Martine Cassato to explore something nobody expected.
That the models themselves are becoming anti-dist intermediation technology.
You can't abstract them away.
And every attempt to hide them behind software fails because users already know and care which model they're using.
It's changing everything about how platforms work.
Sherwin and Martine talk about why OpenAI abandoned the dream of one model to rule them all, how they price access to intelligence, and why deterministic workflows might matter more than pure AI agents.
Sherwin, thanks very much for joining.
So we're being joined by Sherman Wu.
It'd be great actually if you provided the long form of your background as we get into this just for those that may not know you.
I mean, I've used Sherwin as one of the top AI thought leaders, so I'm really looking forward to this.
Yeah, thanks for having me.
I'm really excited to be on the podcast.
Yeah, it's a little bit more of my background.
So maybe I can start from present day and go backwards.
So I currently lead the engineering team for OpenAI's developer platform.
So the biggest product in there, of course, is the API.
Is there more for the developer platform than the API?
It's kind of a similar, synonymous.
Well, so I also think about other things that we put into our platform side.
So technically our government work is also like offering and deploying this into different areas.
Yeah, like I've talked about.
Oh, like so you have like a local deployment.
Yeah, so we actually do have a local deployment at Los Alamos National Labs.
It's super cool.
I went to visit it.
It's very different than what I'm used to in a classified computer with our model running there.
So there's that, but like mostly the API.
Did you go to Los Alamos?
We didn't.
Yeah, I did go to Los Alamos.
It's great.
They showed us around.
They showed us some of the historic sites.
Real history.
Yeah.
I used to work at Livermore, man.
So I've got like a new one.
Oh, yeah, yeah, yeah.
I researched them out of college.
Right, right, right.
You saw them next.
Yeah, well, we hope to.
Yeah, so I work on the developer platform.
I've been working on it for around three years now.
So I joined in 2022.
I was basically hired to work on the API product, which at the time was the only product that OpenAI had.
And I've basically just worked on it the entire time.
I've always been super interested in the developer side and kind of like the startup story of this technology.
And so it's been really, really cool to kind of see this evolve.
And so that's my time at OpenAI.
Before OpenAI, I was at Opendoor for around six years.
I was working on the pricing.
So I had my general background before.
I think it's such a dissident.
Pricing at Opendoor to like running API.
It's such a different.
It's been fascinating actually for me to see the differences between the companies.
Like they're run so differently.
They both have opened in the name.
So some overlap.
But that's pretty much it.
But yeah, I was there for around six years working on the pricing team.
So our team basically would run the ML models.
This is actually pricing the assets on Opendoor.
Yeah, yeah.
The inventory.
Exactly.
So yeah, Opendoor would buy and sell homes.
And their main product was buying homes directly from people selling them with all cash offers.
And so my team was responsible for how much we would pay for them.
And so it was a really fun, like ML challenge.
It had a huge operational element to it as well because not everything was automated obviously.
Yeah.
But it was a really fascinating technical challenge.
And.
Is there any sense of that on the API side, like GPU capacity buying or is it just totally unrelated?
On the API side, there is a small bit of like how we price the models.
But I don't think we do anything as sophisticated as Opendoor.
Opendoor was just like such a hard problem.
It's like such a like expensive asset.
The holding costs are very expensive.
You're like holding onto it for like months at a time.
There's like a variability in the holding time.
And that's a long tail of potential things that could grow off.
Long tail, yes.
And like try to think about it from a portfolio perspective.
And like if one of them just like you're holding on it for two years, it blows everything like it was negative.
So it's a very, very different.
Six years?
Different challenge.
Yeah.
Yeah.
Six years there.
Wow.
Lots of up and downs.
Saw a lot of the booms.
Saw a lot of the struggles.
And then we IPO'd for a lot.
Yeah.
But yeah, just in general, it was a very great experience.
I think for me, it was also just had such a very like business operations and like a very like by the book type of culture.
Whereas OpenAI is like very different.
So interesting.
I was just thinking about it now.
It's like even for a company like that, like you don't think about it as a tech company.
But if there is a deep technology problem, it actually is the pricing, right?
It's like an ML problem.
Yeah, that's what it attracted me to the company.
It's not like the website.
It's not the platform.
It's not the API.
It's literally that.
Yep.
And that's what attracted me to it.
I think that was interesting.
It's also a way like lower margin business than OpenAI because you're like making a tiny spread on these homes.
They've talked about like basis points, like eating bits for breakfast and all that.
So anyways, I was at OpenDoor for around six years.
And then before that was my first job out of college, which was at Quora, Adam D'Anshroff.
No kidding.
Yeah.
So I was working on the newsfeed.
So worked on newsfeed ranking for a bit, worked on the product side.
That was actually my first exposure to like actual ML and industry and learned a lot from the engineers at Quora.
We basically hired a lot of the early feed engineers.
Was Charlie still there when you were there?
Charlie was not there when I was there.
See like right after you.
Yeah, yeah, yeah.
That was a really legendary team.
It's still known to be kind of this super iconic founding team.
Yeah, yeah.
The early founding team was really solid.
I still think that even while I was there, I would still like am amazed at the quality of the talent that we had.
Phenomenal.
I think there's like when the company was like 50 to 100 people.
But yeah, like a bunch of the perplexity team was there.
Dennis was on the feed team with me.
Johnny Ho, Jerry Ma.
Yeah, that's right.
This is crazy.
And then Alexander, the scale.
Yeah, that's exactly the way.
Yeah, yeah, that's crazy.
I was there between high school and college.
It was an incredible team.
I think I kind of took it for granted when I was there.
Yeah.
I was a good group.
How did you get to Quora?
What did you study in an undergrad?
Yeah.
So before that, I was at MIT for undergrad.
I studied computer science, did like one of those like computer science and the master's degree kind of like grounded in.
Yeah, yeah, yeah.
I ended up at Quora because I got in what we call an externship there.
So at MIT, you actually get January off.
So there's like the fall semester and then January is off.
That's good.
And then you have the spring semester.
And so it's called independent activities period.
So some people just like take classes.
Some people just do nothing.
But some people will do like month long internships.
And some crazy companies will offer a month long internship to a college student.
Yeah, yeah.
And it really is just kind of like a way to get people into it.
Did you come out here from Boston?
Yeah, yeah.
I was crazy.
So you had to apply.
I remember, yeah, this is I think 2013, January or something.
You had to apply.
And I remember the Quora internship was the one that just paid the most.
They paid, I think it was like $8,000, $9,000.
And it was like, wow, it was like a little for a month.
And you're just kind of ramping up like half the time.
I can eat for a year.
Yeah, yeah.
As a college student, it was like great.
And yeah, they would kind of like fly you out here.
So I did the interviews and then luckily got an offer.
And so, yeah, I came out for a January.
That was right when they moved into their new Mountain View office.
And I basically honestly just ramped up for like two weeks and then have two weeks of good productivity working on the feed team.
So that was that like user facing product work?
Yeah, I distinctly remember my externship project for those two weeks was just to like add a couple features to our feature store.
And that would make its way into the model.
I remember my mentor there was is Tudor, who's now running, I think, it's called Harmonic Labs.
Yeah, yeah, yeah.
Crazy team.
Crazy team.
That's unbelievable.
I mean, by the way, I think it's one of the untold stories of Silicon Valley.
It's like how good that original team and the car is.
I mean, a lot of them are still there and still good, but the diaspora from Quora is everywhere.
Yeah, yeah.
That's actually how I ended up at OpenAI too, kind of fast forwarding from there because OpenAI kind of kept a quiet profile ish.
I'd always kind of kept tabs on them because a bunch of the core people I knew kind of like ended up there.
It's kind of like checking in on it and they were like, yeah, something crazy is happening here.
You should definitely check it out.
So yeah, I definitely owe a lot to Quora.
But yeah, part of the reason why I went there versus other options as a new grad was the team was just so incredible.
And I just felt like I could learn a ton from them.
I didn't think about everything afterwards.
I was just like, man, if I could just absorb some knowledge from this group of people, it would be great.
Awesome.
So one place I wanted to start is something that I find very unique about OpenAI is it's both a pretty horizontal company.
Like it's got an API.
Like I would say we've got this massive portfolio of companies right now.
I would say a good fraction of them use the API.
And then it's also a vertical company in that you've got full on apps, right?
Yeah.
Like everybody uses chat GPT, for example.
And so you're responsible for the API and kind of the dev tool side.
So maybe just to begin with, is there an internal tension between the two?
Like, is that a discussion?
Like the API may whatever, it may help a competitor to like the vertical version or is it not?
Things are just growing so fast, it's not an issue.
I would just love to how you think about that.
By the way, it's very unusual for companies to have both of that.
There's two things this early.
It's very unusual.
Yeah, yeah, I completely agree.
I think there is some amount of tension.
I think one thing that really helps here is Sam and Greg just from a founder perspective have since day one just been very principled in the way in which we approach this.
They've always have kind of told us we want chat GPT as a first party app.
We also want the API.
And the nice thing is I think they're able to do this because at the end of the day it kind of comes back to the mission of OpenAI, which is to create and then to distribute the benefits as broadly as possible.
And so if you interpret this, you want it in as many surfaces as you want.
And the first party app is a really great way to get, you know, it's like 800 million wows or whatever now.
Like 100 million wows?
Yeah, it's pretty, it's actually mind boggling to think about.
I don't think many people listening to this don't understand how big that is.
Yeah, it's crazy.
It's going to be like actually historic for the time it's taken to get to 800 million.
It's historic.
It's also just like, yeah, the amount of time and just like how much we've had to scale up.
A tenth of the globe, right?
Yeah, yeah, 10% of the globe uses it every week.
Every week.
And it's growing.
And it's growing.
So like at some point, you know, it'll hit like, you know, it'll go even higher than that.
And so, so yeah, obviously the reach there is unmatched.
But then also just like being able to have a platform where we can reach even more than just that.
Like one thing we talk about internally sometimes is like, what is our end user reach from the API?
Like it's actually like really, really, it's really bright.
It might even, it's hard because chatgbt is growing so quickly, but like, like at some points, it was definitely larger than chatgbt.
And the fact that we're able to get tap in all of this and get the reach that we want, I think is really good.
But yeah, I mean, there's definitely some tension sometimes.
I think the, I think it's come up in a couple of places.
I think one of them is on the product side.
So as you mentioned, you know, sometimes there are competitors kind of like building on our platform who, you know, might not be happy if chatgbt launches something that competes with them.
Yeah.
I mean, that's the tale of the old is the cloud or operating systems or whatever.
So like that's, you know, I think it's more like, does chatgbt worry about the competitor type thing?
Like, you know, you enabling a competitor.
Yeah.
Yeah.
So I mean, the interesting thing is like, I would say not particularly mostly just because we've been growing so quickly.
It's like, you know, it's like, you know, force right now.
Yeah.
Yeah.
Growth solves so many, so many different things and like, and then the other way we think about it is like everyone's kind of building, building around AGI, building towards AGI.
Of course, there's going to be some overlap here.
So yeah, I mean, but, but I would say like, at least in my position, I feel more of this tension from the customer, like the API customers themselves, right?
It's like, oh my gosh, you know, you're like, are you going to build this thing that I'm working on?
Yeah, that story is as old as the computer system.
There's never not been a computer platform that didn't have that problem.
So, so, okay.
So I kind of go back and forth on this one.
I want to try one out on you, which is the problem historically with, you know, offering a core services and APIs, you can get disintermediated, right?
And so I can build on top of it, but then, you know, the user doesn't know, like, whatever, I build on top of the cloud, I disintermediate from the cloud and then I can switch to another cloud or whatever.
And it occurs to me that that's kind of hard to do with these models because the models are so hard to abstract away.
Like they're just, they're just unruly, right?
If you try to like have traditional software drive them, they just don't kind of manage very well.
So part of me thinks that it's almost like this, like anti-disintermediation technology that you kind of have to expose it to the, to the user directly.
Does that make sense?
And so I'm wondering of like, so even if I think chat GPT is really just trying to expose them all to the user, the API is kind of just trying to expose the model to the user.
So I think there's almost this argument.
It's like if the real value is in the models, it doesn't really matter how you get it to them because it's going to be very tough for someone to abstract it away in the classic sense of computer science of like they don't know that they're using the model.
Like you always know you're using GPT-5.
Yeah.
And the interesting thing is I think like the entire industry kind of has slowly changed their mind around this too.
I think like in the beginning, we kind of thought like, oh, these are all going to be interchangeable.
It's just like software.
Yeah, exactly.
So the keys are there for that you can just swap out.
Yeah.
But I think we're learning this on the product side with like, you know, the GPT-5 launch and like 4.0 and like how so many people like to go 3 and 4.0 and all of that.
I felt that when it changed, I'm like, I'm like, you're not as nice to me.
Like I like the validation.
Yeah, it's actually fun because I really loved GPT-5's personality, but I think it's like the way I use, you know, chat GPT was very utilitarian.
Oh, it's like, you know, mostly for work or just like information.
Yeah, I've definitely come around just, you know, but like I actually felt a dissonance when it changed.
It's like, it's like, you know, there's this emotional thing that goes on, but it's almost like it's an anti, you know, dissident mediation technology.
Like you kind of have to show this to the user.
Yeah.
Yeah.
And then you see a lot of like, you know, more successful products like cursor, like do this directly, especially the coding products where users want more control.
We've even seen some like, you know, like more general consumer products do this.
And so it's definitely been true on the consumer side.
The interesting thing is I think it's also been true on the API side.
And that's also something that I think.
Exactly.
That's exactly what I'm saying.
So like the argument could be that I could use the API to disintermediate you.
But like you don't see that happening because it's so hard to put a layer of software between a model and a person.
You almost have to expose the model.
Yes.
Yes.
And I think if anything, I think the models are almost like diverging in terms of like what they're good at and like their specific use case.
And I think there's going to be more and more of this.
But yeah, basically it's been surprisingly hard for like the retention of people building on our API is like surprisingly high, especially when people thought you could just kind of swap things around.
You might have, you know, like even tools that help you swap things around.
But yeah, the stickiness of the of the of the model itself has been has been surprising.
And do you think that is because of a relationship between the user and the model?
Or do you think it's more of a technical thing, which is like my evals work for like open AI.
And it's, you know, unlike the correctness maintains.
Yeah.
Yeah.
I think it's both.
So I think there's there's definitely an end user piece here, which is what we've heard from from some of our customers.
Like they just get familiar with the model itself.
But I also think there's a technical piece, which is like the also as a developer, especially with startups, you're like really going deep with these models and like really like iterating on it, trying to get get it really good within your particular harness.
You're iterating on your harness itself.
You're giving it different tools here and there.
And so you really do end up like building a product around the model.
And so there is a technical piece where, you know, as you kind of keep building with a particular product like GPT-5, you're actually like building more around it so that your product works uniquely well with that with that model.
So I use I use cursor and I've just like a lot of something like writing blogs and like, you know, we're investors.
And I use it for sometimes for coding.
And it's remarkable how many models I use in cursor.
So like literally my go to model is GPT-5.
I love GPT-5.
I think it's a phenomenal like, you know, and then like I use like max mode with GPT-5 for planning.
And then but you know, like I mean, I like the tab complete model that's in cursor and like, you know, the new model they just dropped is for like some basic, you know, some stuff is good.
Like yeah, if compose one's good.
Yeah.
And so like, you know, and I think that like kind of reflects this too, because it's like a particular model for each particular usage.
Like I've talked to a bunch of people who use the new composer model and it's just really good for like fast like first pass.
Exactly.
Like keep you in flow kind of thing.
And then you kind of like bubble out to another model if you want like, you know, deeper.
I mean, let me sit down.
I literally sit down as GPT-5 to help me plan something out and it's really good at that.
And then, you know, like when I'm coding and I'm doing like the quick chat thing, then I'll use composer and if there's like whatever there's like some crazy bug or something like that, like so, you know, do you remember like in the early days of all of this where like there's going to be one model and I mean, like even like investors, like we will never invest in a model company because like there will only be one model and it's going to be AGI.
But like the reality of it feels like there's this massive proliferation of models like you said before.
They're doing many things.
And so maybe two questions, maybe two blunts or two crafts.
But the first one is what does that mean to for AGI?
And the second was what does that mean for opening AI?
Like does that mean that like you end up with a model portfolio?
Do you select a subset?
Do you think this all gets superseded by some god model in the future?
Like how does that play out?
Because it's against what most people thought.
Most people thought this is all going towards one large model that does everything.
Yeah, I think the crazy thing about all of this is just like how everyone's thinking has just changed over time.
Like I distinctly remember this like and the crazy thing is not that long ago.
It's just like three like two or three years ago.
I remember like even with an opening eye, the thinking was that there would be like one model that rules them all and it's like why would you I mean like this kind of goes to fine tuning API product.
It's like why would you even have a fine tuning product?
Why would you even want to like iterate on it?
There's going to be this one model that subsumes everything and that was also kind of the that is also like the most simplistic like view of what the what the AGI will look like.
And and yeah, it's like definitely completely changed since then I think one.
And but then the other thing to keep in mind is like it might continue to change like you and from where we are today.
But it's like becoming increasingly clear.
I think that there will be room for a bunch of specialized models.
There will likely be a proliferation of other types of models.
I mean, you see us do this with like the Codex model.
Yeah, totally.
And so I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
I think that's the thing that's really important.
Because not have like, you know, winner take all consolidated dynamics, right?
I mean, you just have to have a healthier ecosystem, a lot more solutions you can provide a lot.
Yeah.
Yeah.
And as ecosystem grows, it generally is helpful.
Like, this is one thing we actually think about a lot too, is as the general, like AI ecosystem grows, like OpenAI just stands to benefit a lot from this.
And this is also why we've, like some of our products, we even started opening up to other models, right?
Like our Euthaul's product now allows you to bring in other models.
To all of this, we think it's like any rising tide generally helps us here.
But yeah, I think as we move into a world where there would be a bunch more models, this is why we've kind of invested in our model customization product with the Fine Tuning API, with the reinforcement Fine Tuning, opening that up as well.
It's also part of why we open sourced GPT-OSS as well, because we want to be able to, you know, facilitate that.
I want to talk about that in just a bit, because the open source is actually very interesting.
I mean, actually I thought the open source model was great.
Yeah.
But clearly it's something that a company has to be careful with.
Yeah.
And before that, I want to talk a little bit about the Fine Tuning API.
So I've noticed that you are moving towards kind of more sophisticated use of things like Fine Tuning, which, you know, in a way you can read that as a bit of a capitulation, not like, you know, there is product specific data and there's product specific use cases that a general model won't do to your point, right?
So like as opposed to proliferation model you do that.
It seems like a lot of that data is actually very, very valuable, right?
And so, you know, to what extent is there like interest in almost a tit-for-tat where you can like expose, you know, the ability to get product data into Fine Tuning and then you also benefit from that data because, oh yeah, the vendors provide it to you, versus like this is 100%, you know, like they keep their own data and there's kind of no interest in that.
Because that feels to me like the next level of scaling, this is kind of where we're at.
And so, I'm just kind of curious how...
Yeah, so I mean, maybe even like taking a step back, the main reason why we even invested in a Fine Tuning API in the very beginning is, one, there's been huge demand from people to be able to customize the models a bit more.
It kind of goes into like prompt engineering and also like, I think the industry has changed their mind on that as well, like Isabel.
The second thing is exactly what you said, which is the companies just have giant treasure troves of data that they are sitting on.
That they would like to utilize in some fashion in this AI wave.
And the simple thing is to put it in some like vector, like do rag with it or something.
But there's also, if they have a more technical team, they do want to see how they can use it to customize the models.
And so that is actually the main reason why we've invested in this.
The interesting thing was way back, kind of back in like '22, '23, our Fine Tuning offering was, I'd say like too limited, so that it was very difficult for people to tap into and use this data.
So it was just like an supervised fine tuning PI.
And like, oh, you can kind of use it, but in practice, it really is only useful for like, as honestly just like instruction following plus plus, you like kind of change the tone, you're just really instructing it.
But I think the big unlock that has happened recently is with the reinforcement fine tuning model, because with that setup, we're now letting you actually run RL, which is more finicky and it's like harder and you know, like you need to invest more in it.
But it allows you to leverage your data way more.
By the way, this is just a naive question for me, which is it feels from just my understanding from my own portfolio, it feels like there's two modalities of use.
One of them is I've got a treasure trove of data that I've had for a long time and I create my model on that treasure trove of data and all that happens offline and then I deploy that.
There's another one, which is like I actually have the product being used in real time, I've got a bunch of users.
And like I can actually get much closer to the user, I can kind of A/B test and decide which data, and like it's kind of more of a near real time thing.
Is this focused on like more product stuff or more treasure trove?
So the dream with the fine tuning API was that we should be able to handle both, right?
It's like we actually had this dream and we have this whole like Laura setup with the fine tuning inference where we should just be able to scale to like millions and millions of these fine tune models, which is usually what would happen if you have like this online learning thing.
In practice, it's mostly been the format.
In process, mostly been like the offline data that they've like already created with experts or something and like using their product that they're able to use here.
But the main thing I was trying to say around the reinforcement fine tuning API is it kind of changes the paradigm away from just like small incremental, like tone improvements, which is what SFT did, to actually improving the model to potentially soda level on a particular use case that you know about.
Like that's where people have really started using the reinforcement fine tuning API and that's why it's gotten more uptake.
Because if the discussion is less like, hey, I can make this model, you know, not like speak in a certain way better, it's less compelling.
But if it's like, hey, for like, you know, medical insurance coding or for like coding planning, agentic planning or something, you can create the world's best model using your data set with RFT and it becomes a lot more...
And will you ever like, or maybe do you, will you ever like find ways to get access to that data like...
Yeah, so the...
If I had the data and I wanted cheap GPUs, I'd trade you for it.
I don't know.
Yeah, I mean, we've talked about this and we've actually been piloting some pricing here too where it's like...
Because this data is like really helpful and it's kind of hard to get.
And if you actually build with the reinforcement fine tuning API, you can actually get discounted inference and potentially free training too, if you're willing to share the data.
It's always kind of, you know, it's up to the customer there.
But if they do, it is helpful for us and there'll be benefits for the customer as well.
That's awesome.
Okay, you said that the use on prompt engineering have changed.
Yeah.
Actually, I wasn't aware of that.
All the other things I wasn't aware of, this one I wasn't.
Yeah, I mean, I think the prevailing view, this is back in 2022.
I remember I was talking to so many people and they're basically...
I mean, this is similar to like the single model AGI view as well, which is like, like prompt engineering is just not going to be a thing and you're just not going to have to think about what you're putting in the context window in the future.
Like the model would just be good and not that it will just like, no.
It'll know what you need to do.
Yeah, that's definitely not a thing.
Yeah, but like that, like, I don't know, maybe people forget it, but like that was like a very common belief back then.
Yeah, it was great points to save.
Because like scaling laws or something with scaling laws and like you'll just mind mill with the model and like, you just like, like prompting and like instruction following will be so good that you won't really need to do it.
And if anything, like, yeah, it's like clearly been wrong.
Yeah, yeah, yeah.
But it is interesting because I think it's a slightly different world that we're in now where the models have gotten really, really good at instruction following relative to the, you know, like GB35 or something.
Yeah.
But I think the name of the game now is less on like prompt engineering as we had thought about it two years ago.
It's more of like, it's like the context engineering side where it's like, what are the tools you give it?
What is like the data that it pulls in?
When does it pull in the right data?
Well, this is very interesting.
I mean, to reduce it to like an absurdly simplistic level, like the weird thing about RAG, for example, the classic use of RAG is like you're using like cosine similarity to choose something that you're going to feed into a super intelligence.
Yeah.
So like, you know, you're like, I'm going to randomly grab this thing based on like fucking embedding space.
It doesn't really, you know, and then, you know, when you want the super intelligence to decide the thing to do.
And so it's like pushing intelligence in that retrieval clearly is something that makes a lot of sense.
It's almost like the pushing the intelligence out in a way.
Exactly.
And to be fair, I think like RAG was kind of introduced when the models were like, it's like pre-reasoning models.
So it was like, you only had to kind of like one shot to like do this and it wasn't that smart.
But now that we do have the reasoning models, now that we have, I mean, if you like, one of my favorite models is actually O3 because it was like one of the most diligent models.
It was like O3.
It would just like do all these tool calls.
And it's like really the intelligence itself trying to like do the, you know, tool calls or RAG or anything like that or write the code to execute.
And so the paradigm has shifted there.
But yeah, because of that, I think like, context engineering, prompt engineering, what you put, what you give the model is like extra important.
Yeah.
Okay.
So you have API, which is horizontal.
You've got chat GPT and other products, which are vertical.
We haven't even talked about pixels.
This is all just language.
Are agents a new modality?
Is that something else?
Like, you know, like a codecs or...
What do you mean by modality here?
Like, I mean, they feel both vertical and horizontal to me in a way.
Like to me, chat GPT is a product, right?
It's like it's a product and like my mom uses it, right?
Yep.
And an API is a dev thing.
You kind of give it to a developer and like a CLI is kind of somewhere in between to me.
It's like, is it a product?
Is it like it is horizontal?
Yeah.
How is it handled internally?
Is it a totally separate team that does agents or...
No.
So it's...
Yeah, it's interesting because like I think the way that you frame it just now almost seemed like agents was like this like singular concept that like, you know, might have its own particular team.
Maybe a better question is what is an agent to you?
Yeah, yeah, yeah, yeah.
It's like even getting a language is like important for this conversation.
So I actually don't even know if it would be helpful for me to share, but my general take on agents is it's an AI that will take actions on your behalf that can work over long time horizons.
And I think that's the pretty general...
Pretty utilitarian.
Yeah, yeah.
Definitely.
But like if you think about it that way, yeah, I mean, maybe this is what you mean by modality, but it is just a like way of like using AI and it is a, I guess it could be viewed as a modality, but we don't view it as like a separate thing separate from AI.
Well, let me just try and kind of, you know, give you a sense of where this question is coming from.
Like I know how to build a product, like, and we know how to do go-to-market for products.
We know how to do like, you know, we know the implications of turning them into platforms.
Like it's just we've been doing this for a very long time, right?
We know how to do the same thing for APIs, right?
We know how to do billing.
We know like the tension of like people build on top of it and all of that stuff.
And like what I've been trying to...
And this is just maybe a personal inquiry.
It's just not clear for me for an agent if it sits in one of those two camps, is it more like the product camp?
Is it more like the...
Or is it...
Because it's kind of both.
Like I could like literally give you coding.
Yeah, yeah.
And like as a user and then you just talk to it or I could like build in a way, kind of embed it in like my app.
And so like, but then that means something to you as far as like, you know, how do you price it and what does it mean for ecosystem?
Like, for example, like would you be fine if I started a company and just like built it around codecs?
Is that a thing?
Starting a company and building it around codecs?
Yeah, yeah.
I actually think that'd be great.
Like it's a...
We like release like the codecs SDK and we like want people to be able to build it and hack on it.
Yeah.
Actually, I think this might be what you're getting at, which is...
And this is like a kind of a unique thing about OpenAI and kind of reflects on how it's run, which is at the end, like at the end of the day, OpenAI is like an AGI company.
It's like an intelligence company.
Yeah, for sure.
And so agents are just like one way in which this intelligence kind of be manifested.
And so the way that I'd say we actually think about internally is all of our different product lines, Sora, Codex, API, Chat GPT are just different interfaces and different ways of deploying this.
So you don't really...
So there's no like single teams like this is, you know, like thinking about agents.
I would say the way that it manifests itself more is like each product area thinks about like what is, you know, this intelligence is actually turning into a form where like it can actually...
Agenic behavior is more possible.
What would that look like in a first party product like Chat GPT?
What would that look like?
That's actually why Codex ended up becoming its own product site.
What would it look like in a coding style product?
Like we explored it in Chat GPT, like kind of worked there, but like actually the Kly interface actually makes a lot more sense.
That's another interface to deploy it.
And then if you look about the API itself, it's like this is another interface to deploy it.
It's...
You're thinking about it in a slightly different way because it's the developer first mindset.
We're helping other people build it.
The pricing is slightly different.
But it's all these like different manifestations of this core like intelligence that is the agent behavior.
It is so remarkable how much of this entire economy is basically just token laundering.
In a sense, it's literally like anything I can do to get like English shit or like a natural language in and then like the intelligence out.
I mean, it's because things are so resistant to layering.
It's so hard to lay your language out.
Like, you know, I could even do it pretty easily with like Codex.
I could just like use it as a component of a program and just basically launder intelligence.
I mean, of course, I'd be charged to do that.
So, my view of this and having seen now so many kind of launches of different products, I've seen agent launches and the definition that you have.
I've definitely seen APIs and I've seen products on these.
It's like they're actually quite different than like what we're used to.
Like the COGS is different.
The defensibility is different.
So we're kind of rewriting it.
And so it's kind of like, you know, you came from a kind of pricing background.
I mean, you're working on a model of pricing.
Now you have the API.
So I just love your thoughts on like, I mean, how have you evolved your thinking and how do you price these, you know, access to intelligence where, you know, you don't know how many people can use it.
It's almost certainly usage based billing, not something else.
Like, can you talk just a bit about like philosophy around pricing on these things?
It's different for product versus API.
Yeah, I think the the the the honest truth here is like it's evolved over time as well.
Like I should think the simplest, like the reason why we've done usage based pricing on the API, honestly, is because it's been like, it's closest to how it's actually being used.
And so that's kind of how we how we started.
I should think usage based pricing on the API has has has like surprisingly held strong.
And like, I shouldn't this might be something that we'll keep doing for for quite a long time, mostly because I'm just so high.
I don't know how you don't do usage based.
Yeah, yeah, yeah, yeah.
I just don't know how that.
Yeah.
And then and then there's also the strategy of like how we price it.
And internally, one thing we do is we always make sure that we actually price our usage based pricing from a like cost plus perspective.
Like we're actually just like trying to make sure that we're being responsible from a margin perspective.
But this is a huge shift in the industry in general, just because I remember the shift from on prem to to recurring.
Yeah, that was a big, big deal.
Like that created Zora like created whole company.
Yeah, yeah, yeah.
It's like all books on it.
And like, yeah, yeah, yeah.
Consultants on how you do this.
It changed.
Yeah.
I think the shift to usage is as bigger, bigger.
And it's also a really hard technical problem.
Yeah.
I can't even imagine 800 million wow.
Like how do you build?
Yeah.
Well, 800 million wow is a little easier because it's not user based pricing.
It's subscription.
So it's like now it's like way easier.
That's way easier.
But I mean, there's still like a lot of users on the API that we need to like, you know, manage all the billing side.
There's some like overages or stuff you've got to deal with on that or?
What do you mean by overages?
Like, I don't know.
I guess people have quotas and then we'll kind of like, oh, I see.
They're like max quotas that we don't let people go over.
But like in practice, these quotas are like pretty, pretty massive.
That would literally be like one of the most complex systems somebody's ever built to be would do usage based at like that scale.
I mean, these are very, very, very, and like you have to be correct.
Like these are very hard systems to scale.
Yeah.
Yeah.
Yeah.
I mean, we have a whole team thinking about this now internally.
Yeah.
I mean, users for your pricing is also interesting.
So there's, um, uh, we acquired this company, um, uh, called Roxette a while ago, a founder, uh, his name is Ben Cott.
He's very good.
Awesome.
Thank God.
Awesome.
Incredible.
He's one of the best like Ben Cott of your listening.
We're huge fans.
I'm a huge fan.
He's going to, he's going to love this.
Um, so he's great.
He's a legend.
Anyways, I was talking to him about pricing as well.
And his, his, his take is, is, uh, that pricing is kind of like a one way ratchet.
And like, basically once you get a taste of usage based pricing, you're never going to go back to like the per se, the like per deployment type pricing.
And I think that's definitely true.
And I think it's just cause it's getting, it gets closer and closer to like your true utility, you're getting all this thing.
The main pain point is like, you have to maintain all his infra.
Yeah.
To like get it to work well.
Yeah.
But if you do have it, he thinks it's like a one way ratchet where like there's just like no, no going back.
And then, and I think the hot new thing now is like, oh, with AI, you can now kind of measure like outcomes.
And so that's like another, you know, like step forward.
And if that works, like maybe it's a one way ratchet.
Um, so we, we thought about that as like, you know, is there some type of like outcome based pricing?
This is more on the first party side on an API.
It's kind of, yeah.
That's very hard.
I mean, that's hard because you end up having to price and value non computer science infrastructure, right?
Like you're literally going into verticalization now.
Like you're like, I mean, listen, if it's like porting a code base, maybe you'd have some expertise, but if it's like, whatever, like increasing crop yields.
Like some level you need to like, But there could be a world where like the AI is like, you know, where it can like actually, you know, make judgments of these and do it in an accurate enough way where it can tie to billing.
I think this is a problem with AI conversations because like at any point in time, you're like, but it could get good.
Yeah.
It's not a problem anymore.
Yeah.
Yeah.
At some point it'll be solved.
It's so much like the front engineering and the single AG, I think from before.
Yeah.
Yeah.
It's like when you, when you reach that level of, of when you push it that far, everything's kind of solved on outcome based pricing.
It sounds very appealing.
Like if it can work and it can work.
But one thing that we've started realizing is, um, it actually ends up correlating quite a bit with usage based pricing, especially with test time compute.
Like if the thing is just like thinking quite a bit, like actually, you know, if you charge just by usage, race, usage, usage based and not outcome based, you're like basically approximating outcome based at this point.
Um, if the thing is like thinking for like so long, it's like highly correlated with, with what it's doing.
It's just adding more value.
Yeah.
Yeah, exactly.
Exactly.
And so like maybe at the end of the day, like usage based pricing is all you need and so we're just going to like, you know, live in this, in this world forever.
Um, but, uh, yeah, I dunno, it's, it's constantly evolving.
I think our chain are thinking has evolved here as well.
Um, I, I personally am like keeping track of if, um, the outcome based pricing setups can actually work here, but at least on the API side, I think, you know, it's, it's such a usage based setup.
We have to get infrastructure around this.
And so I think we'll probably stay with that for a while.
So how do you think about open source?
I mean, you know, I think you're the only big lab that's releasing open sources.
I know Google has some of theirs.
Okay.
Yeah.
It's mostly smaller models on their side.
Yeah.
That's right.
Yeah.
So how do you think about open source vis-a-vis, you know, competition, cannibalization, you know, like what's the, what's the strategic goal?
What's the complexity?
Yeah.
Yeah.
Um, so, uh, I personally love open source.
Like I think, I think it's great that there's a, all of us grew up with it, right?
Yeah.
I grew up with it.
Like the internet wouldn't exist without it.
Like, you know, so much of the world world is built built on top of it.
Cloud wouldn't exist without of it.
Yeah.
Nothing would exist without of it.
Yeah.
Except for maybe windows.
Yeah.
It's interesting because like, I felt like over the last, there was before we launched the open source model, I know Sam feels this way as well.
Yeah.
It's like, there's this like weird, like, you know, uh, mindset where because opening, I hadn't launched anything.
It just seemed like it was super like anti, like opening.
I was like super anti open source.
Um, but I'd actually been having conversations with Sam ever since I joined about open sourcing a model.
We were just trying to think about like, well, how can we sequence it?
What compute is always a hard thing?
It's like, do we have the compute to kind of like train, train this thing?
So we've always wanted to kind of do this.
Yeah.
And so we were able to finally do it.
I think it was early.
Was it earlier this year?
I like lost some time.
AI time is so great.
Yeah.
I was like, was it last year?
No, it's this year.
Yeah.
When GPOSS came out.
Um, and so I was just really glad that we did that.
The way that I generally think about it is one, um, I think as a, um, this is also particularly true for, um, for open AI because as you said, we are vertical and a horizontal company is like, we want to continue investing in the ecosystem and just from like brand perspective, I think it's good.
But then also I think from open AI perspective, um, uh, if, uh, the AI ecosystem grows more and more, it's like a rising.
It's always been like, yeah, it's all like really helpful for us.
Uh, and if, uh, if we can launch an open source model and it helps like unlock a whole bunch of other use cases in the other industries, I think that's, uh, you know, that's, that's actually not good for us.
Also, I'll say what, what people don't talk about a lot is like how well these open source AI business models actually work because like this is very like, like the cannibalization risk is actually very low.
Yeah.
Um, and like you don't really enable competitors a lot because I mean, when we say open source, you really mean open weights, right?
It's not like they can recreate it, right?
You know, and like, if I can distill your API as well as I can distill, like you giving me the weights in some way, like, and so like it doesn't really change that dynamic a lot.
Um, but yeah, I mean, to be, to be clear, like we have not seen cannibalization at all.
Yeah, of course not.
It's like, it's like, seems like a very different set of use cases.
Um, the customers tend to be like slightly different.
The use cases are very different.
Um, and by the way, it turns out inference is super hard to actually have the available fast performance.
That's a hard, hard problem.
Yeah.
So, so like I'd say the way that I personally think about open source in relation to the API business in particular is well, one, it hasn't shown cannibalization risk.
So, you know, I'm not particularly worried about that, but also like, especially for all these major labs, like they're usually like two or three models where like that is where you're making all of your impact, all of your revenue.
And those are the ones where we're throwing a bunch of resources into improving the model and these tend to be the larger ones that are like extremely hard to inference.
Yeah.
We have a really cracked inference team at open AI and so my sense is like, even if we just like, you know, open source, like if we just literally open sourced GPT five or something, it would be really, really hard to inference it at the level that we are able to get it to do.
There's also by the way, like feedback loop between the inference team, like the training team too.
So like we can kind of like optimize all of that.
Can you, can you, like, is it possible to verticalize models for products?
I have like train models specifically for products.
Yeah.
I mean to actually, yeah.
I think, I mean, we've kind of done this with GPT five codex, right?
Or do you mean like even more verticalization?
I mean, like, like, deep, deep, deep verticalization where like, you know, like, like, like the, the released model wouldn't, you know, it's like actually part of a product.
I think we're like basically starting to move in that direction.
I think there's a question of how deeply you verticalize it.
I think most of what we've done is mostly at like the post training, like the tool use level, like codex is particularly good at using the, sorry, GB five codex is particularly good at using the codex harness.
But there's like even deeper verticalization you can do like that.
And that one I think is more of an open question.
Yes.
Well, like a lot of my, I mean, a lot of my mental model, this comes from the pixel space, which is like you, you know, you can lower a bunch of image models, right?
And you can, you can do a bunch of stuff to make it better, more suitable for some products, for example.
But like these open source models are really, really good.
And like, I, you would believe that you could like verticalize a model for like editing or cut and paste or this or that, you know, like that's actually part of this, but you actually don't see that happen.
Yeah.
It's almost always like you're just kind of exposing like a, a model, not something like specific to a product.
Yeah.
I think, I think, so I think there's a distinction to be made between the, like the, the image model space and the text model space.
Yeah.
And like, image models tend to be way smaller and like you can iterate on it a lot faster.
Like that's why you get that crazy cool proliferation of like image model side.
Whereas like, I don't know, for the text models, there's always going to be this like really big fat free training step that like you have to invest in.
And then even the post training side is like, you know, it's not the, it's not like the easiest thing.
Like it's, you know, we all wait, like just from a compute perspective, obviously it's much smaller, but like it's still pretty heavy to do like a full mid train or like a post training run.
And so I actually think like that's one of the biggest, bigger bottlenecks.
Because I think you're, you're, you are right that like on the image side.
Yeah.
You can like fine tune it.
Like image diffusion model to be like extremely good at like editing faces.
Yeah.
Like something very specific.
And then you build a product around that.
Yeah.
Yeah.
Yeah.
And it's like, yeah, you can just kind of put all these resources into it and iterate on that one specific model.
Whereas it's much, it's a much heavier motion.
It seems like on the text side.
I got to say it is a bit of an anti pattern to do both languages and like language based models and diffusion like pixel models in the same company.
Like most that have tried like it sounded very clunky to do it.
But I mean, you and Google are the two kind of counter examples for this.
And so like, is it possible to even like converge the infrastructures on these things?
Like, I mean, is it totally different orgs?
Is it shared infrastructure?
Like, yeah.
How do you operationalize?
Yeah, I think, I think it totally writes an anti pattern.
It's pretty tough to pull off.
I think honestly, like props to Mark on our research team for like, you know, structuring things in a way where we're able to do it.
For my perspective, I think the biggest thing is I think our like image, like our, I think, all like the world simulation team, like the team that built Sora and all that under Ditia is just extremely solid.
Like they are probably, it's like the highest concentration of like talent that I've seen in a while.
But is it the same?
Is it the same?
Is it like, are they like totally separate infrastructure?
Do they use the same infrastructure?
Yeah, yeah, yeah.
So it's actually like pretty separate.
So and I think that's part of the reason why we're able to kind of do this.
Well, it's like one is like the team needs to be extremely strong, which they are.
And then two is they're they're they're run very separately.
They're kind of like thinking about their own particular roadmap.
They think about productization very separately as well.
All right, which is how like the Sora app kind of came out of that as well.
And then, yeah, even like the inference stacks are slightly different, are kind of like different.
They own a lot more around their inference stack and they optimize their inference stack pretty, pretty separately.
And so I think that contributes to helping us run things in parallel.
But it's pretty hard to pull off for sure.
Maybe you can educate this on me.
Like so I think about APIs as mostly text based for open AI.
Do you guys do actual do you do actual based stuff?
Yeah, yeah, we do.
We have a bunch.
So Dolly Dolly is in the API.
The OG OG model Dolly twos in the API.
That was like the first real text image.
All right.
Yeah, yeah, yeah.
Yeah, that was actually the model that got me to go to open AI because it was the summer when I was looking for I was thinking about something new.
It's when Dolly two came out and it just completely blew my mind.
And I distinctly remember I was like asking it to do the simplest thing like draw a picture of a duck or something.
It was like the simplest thing now and it just like it generated a picture of a you know, like a white duck.
And so that that was actually the thing that that kind of got me to open it in the first place.
But yeah, we have a bunch in our in our API, the image gen model as well as in our API and then Sora two is in our API we launched it at Dev day.
It's actually been a huge hit.
I've been very very surprised.
Need more GPS for that.
But the amount of use cases.
And then from your standpoint, like you can converge that like the API infrastructure probably like that.
Yeah.
So there's yeah, I'd say on the API side, a lot of the infrastructure shared for those.
But once you reach the inference level, they're separate, right?
Because you got to inference them differently.
Yeah.
And it is that team that is just like been really laser focused on making that side particularly efficient and and yeah, and work well separate from the from the context models.
Yeah, yeah, we have image and we have video gen and we'll continue adding more today.
So it feels like we've been evolving our our our thinking as an industry on a bunch of stuff, right?
Like one of them for sure is like the models like we've talked about.
The other one is like context engineering.
It seems to me that like actually how you build agents and expose them has evolved to somebody can talk a bit about that.
Yeah.
Yeah.
I think so I dev day this year when we launched our agent builder, I got a bunch of questions around this because the agent builder is like, yeah, it's like, yeah, it's like the bunch of different nodes and it's like a deterministic thing.
And I was like, oh, is this really like the future future of agents?
And we obviously put a lot of thought into this when we were thinking about building that product.
But the way I think about it is do they came from a point of being constrained by the way, they're like, oh, this is too constraining and like, yeah, I think people are like, it's too constraining.
It's not like AGI forward, you know, like at the end of it again, at the end of the day, the AGI will do everything.
And so like, why not?
Why why why why have nodes in this like node builder thing?
Just tell them what to do.
Yeah.
And so I think there's like two things at play here.
One of them is like there is a like practicality component.
And then the other thing is I think there are actually like different types of work that exist out there that could be automated into agents.
And so on the practicality side is yeah, like the models today just like maybe in some future world instruction following would be so good that you just like ask it to do this four step process.
And I like always does the four step process exactly.
We're still not there yet.
And in the meantime, you know, this entire industry being born and a lot of, you know, people still want to use these models.
What can you build for them?
So there's a practicality component of it.
When did you launch that?
Dev day.
So it feels like forever ago.
Earlier this month, October, October 6 or something.
Yeah.
Yeah.
Yeah.
So less than a month ago.
Yeah.
Okay.
It's been it's been crazy seeing the the reception to it, by the way, like it's the I think the video where Christina on my team demos agent builders, like one of the most viewed videos on our YouTube channel now.
I will say I will say just anecdotally from kind of my perspective.
People love it.
That's great.
But I also saw the dissonance to like I saw when it came out people like, wait, what is this?
Yeah, exactly.
No code.
Yeah, exactly.
That's another low code thing.
Yeah.
And now people love it.
Yeah.
Yeah.
Yeah.
So there's a practicality piece.
There's another piece, which is like when we were talking to our customers, we've realized that there's like, because at the end of the day, a lot of this, the agent work is just trying to automate work and like what people do in their day to day jobs.
And I realize there's like actually like two different types of work.
There's the work that we think about, which is like maybe what like software engineers do, which is like, it's very undirected.
It's like a high level goal.
And then you have like, you know, you have your cursor and you're just like writing, writing code.
And you're kind of like exploring things and going towards an objective.
That's like, I don't know, more like knowledge based work, like data analysis, maybe like that, like coding is kind of like this.
Yeah.
But then there's another type of work, which is actually what we realized is like maybe even more prevalent in industry than software.
We're just not aware of it, which is work tends to be very procedural, very like SOP oriented.
Like customer support is a good example of this.
Like customer support, there's like very clear policy that these agents and people have to follow.
And it is actually not great for them to deviate from this and like try something else.
It's like the team really, the people running these teams just really want these SOPs to be followed.
And this pattern actually generalizes to a ton of different work.
A standard operating procedure.
Yeah, sorry.
So it's like the way in which you need to operate the support team.
But like this extends to like marketing, this extends to like sales, extends to like a bunch, way more than it has any right to.
And what we realized is like there's a huge need on that side to have determinism here of which an agent builder with nodes that kind of like helps enforce this thing ends up being very, very helpful.
But I think a lot of us, especially in Silicon Valley, don't really appreciate that there's like a ton of work that actually falls into this camp.
I got to say like there's a pattern that's similar to this.
I'm wondering if you've seen it, but I've seen where some regulated industries actually can't let any generated content go to a user.
Yeah.
And so what they do is I think it's so interesting.
So like either pass in like a conversation tree and like you can choose something from here.
Yeah.
So there's some human element to it.
So it's part of the prompt.
They're like here are the viable things you can say, choose which one to say.
So the language reasoning is happened by the model, but nothing generated comes out.
Interesting, interesting.
Does that make sense?
Yeah, yeah, yeah, yeah.
And then another one I've seen is like actual pseudocops.
I'll pass in like a Python function.
And then we'll ask a human to like use the pseudocode to write actual pseudocode that makes it in or?
It actually has a response catalog as part of it.
And it has like the logic to apply.
Interesting.
And so like the model takes the language in from the human user.
And then well like the logic of how to respond is like in Python code.
It just turns out that like there's been a lot of code written for these types of things.
And then it actually includes the responses that you would send out.
Does that make sense?
Actually a lot of NPCs are done this way.
And they're done in PC.
So yeah.
So because the way that I think about it is like, you know.
So that way with the NPCs, it's the actual code being generated by the model is not what ends up making it to the end user.
Just to the...
It's not the code is not being generated by the model.
It's the prompt has the code.
So let's say that I have an NPC.
And I want the NPC.
Like let's say you're the gamer.
And so you're coming in and you're talking to my NPC.
But my NPC has some logic that it needs to do.
Like if you say a certain thing, I'll give you a key.
Or maybe a little barter.
Like describing the game logic in English just doesn't work.
Actually if you try and do it.
And then actually scripting the output doesn't work either if you needed to use it in a game context.
Like you'd have to know like give like a specific direction or specific this or that.
So how do you make these things behave in a more constrained way?
People will pass in functions.
Like they'll actually describe the logic in Python.
So like my prompt will be like you're an NPC in a video game.
The user just asked you a question.
Here's the logic you should go through.
If the user says this then do this.
It's like the pseudocode.
Like if the user has this in the belt do this.
Like whatever whatever whatever.
And then here are the set of valid responses.
And so you're almost constraining.
I see.
And then when it actually does do a response you can validate that it's one of those responses.
I see.
It's like highly structured.
Yeah.
So the NPC still only exists in that like the space that it can act in is still only within the space of the program that you get.
Yeah.
Well the logic is in there.
So we can have a normal conversation.
But like in this much as you're trying to guide the logic for like game design or game logic.
So you see this with NPCs but you also see this with regulated industries.
I literally can't have it like.
Yeah I was gonna say what you described kind of sounds like giving the SOPs to like your set of human operators to like.
Yeah yeah yeah.
Stick to it please.
Yeah you must say these three things and here's like the discussion.
And like you cannot give a refund if it's like less than this amount.
Yeah yeah yeah.
Very interesting.
I mean yeah I don't want to equate them to NPCs but like this is similar to similar.
I'm just saying it's actually like if you want if you want to really guarantee what happens.
You have there's like a set of techniques that you do.
And like there's some situations where you want to constrain what they do.
It could be from a regulatory standpoint it could be because you want it to run for a long time.
And it also could because I actually have game logic and my game logic is a traditional program.
Like I have like a monetary system.
I have an item system.
I have a battle system.
Like you can't describe that in English.
Like you have to kind of give it to them so it can behave within that.
Yes and that is exactly the problem I think we're trying to solve here.
Right.
So like if you do not give it any of this like you can just kind of golf and do whatever.
And yet they're like regular regulatory concerns around this.
And that is the exact use case that I think we're trying to target with Asian building.
That's awesome.
Well listen we're running out of time and there's a million more things I want to ask you.
But at this I really appreciate your time to come in.
It was a great kind of surveying like what's going on.
And particularly like teasing apart horizontal versus vertical in this phase.
Yeah.
That's what I really want to do.
So thank you so much.
Yeah.
Thank you.
Thanks for listening to this episode of the A16Z podcast.
If you liked this episode be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family.
For more episodes go to YouTube, Apple Podcasts and Spotify.
Follow us on X at A16Z and subscribe to our substack at a16z.substack.com.
Thanks again for listening and I'll see you in the next episode.
As a reminder the content here is for informational purposes only.
Should not be taken as legal business, tax or investment advice or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see a16z.com/disclosures.
(upbeat music)