
Training Data · 2025-06-10
OpenAI Codex: From Autocomplete to Async Agents
Hosts: Unknown
Guests: Hanson Wang, Alexander Embiricos
Why it matters
Codex runs in its own cloud container with a terminal.
Key claims
- Codex-1 is O3 with additional reinforcement fine-tuning — same base model, specialized training for professional SE output (PR descriptions, codebase style adherence, proof of testing).
- Codex runs in its own cloud container with a terminal; receives a task and returns a completed PR, designed as a 'thought experiment' in delegation rather than pairing.
- Onboarding focuses on an 'abundance mindset': top users run 20+ tasks per hour, often launching identical tasks multiple times to pick the best output — a story about launching an animation fix at 1am by running it 4 times is highlighted.
- Training and production use identical compute environments, eliminating 'works on my machine' problems and enabling agents to learn from realistic, messy real-world repos.
Episode summary
Summary
Hanson Wang (researcher) and Alexander Embiricos (product lead) from OpenAI's Codex team join Training Data to discuss the evolution of Codex from the 2021 line-autocomplete model (which powered GitHub Copilot) to today's cloud-based asynchronous coding agents. They explain why they 'resuscitated' the Codex brand and how the new Codex agent operates in its own containerized cloud environment with a terminal, returning completed pull requests to the developer.
The core model insight is that Codex-1 is essentially O3 with additional reinforcement fine-tuning — not a brand-new architecture. The key differentiator is alignment with professional software engineering practices: writing good PR descriptions, matching codebase style, and showing proof of testing. Hanson contrasts this with O3's strength in competitive programming, noting that competitive-coding prowess doesn't translate to mergeable enterprise code without this specialized training.
The team articulates a clear paradigm shift from 'pairing' with AI in-flow (like Copilot autocomplete) to 'delegating' tasks to agents working on their own computers. This requires an 'abundance mindset' — running many tasks in parallel, sometimes launching the same task multiple times to pick the best result. Top internal users at OpenAI are producing 10+ PRs per day. The long-running nature (up to 30+ minutes) introduces new UX challenges around helping users specify intent for tasks whose requirements they may not fully know upfront.
Looking forward, both guests expect the number of professional software developers to grow rather than shrink, as lowered barriers create demand for more bespoke software. They envision a future where ChatGPT serves as a single general assistant, with specialized interfaces (like Codex) for power users, multiple OpenAI agents (Operator, Deep Research, Codex) collaborating, and developer workflows eventually resembling a TikTok-like vertical feed of agent-generated suggestions to swipe through.
- Codex-1 is O3 with additional reinforcement fine-tuning — same base model, specialized training for professional SE output (PR descriptions, codebase style adherence, proof of testing).
- Codex runs in its own cloud container with a terminal; receives a task and returns a completed PR, designed as a 'thought experiment' in delegation rather than pairing.
- Onboarding focuses on an 'abundance mindset': top users run 20+ tasks per hour, often launching identical tasks multiple times to pick the best output — a story about launching an animation fix at 1am by running it 4 times is highlighted.
- Training and production use identical compute environments, eliminating 'works on my machine' problems and enabling agents to learn from realistic, messy real-world repos.
- Codex is among OpenAI's longest-running agents (30+ minutes per task); key UX challenge is helping users specify intent for tasks whose full requirements they don't yet know.
- Vision: ChatGPT becomes a single general assistant, with Codex as a specialized surface for developers; multiple agents (Operator with browser, Deep Research, Codex with terminal) will eventually collaborate like a human teammate with multiple tools.
- Expectation that the number of professional developers will grow substantially as bespoke software demand explodes, not shrink.
- Project codename 'WHAM' was chosen specifically to be easy for agents to grep — intentional codebase design (typed languages, tests, unique identifiers, good docs) makes code more agent-addressable.
Source material
Transcript
In my opinion, the easier it is to write software, then the more software we can have.
Right now, we think of, I bet you if we pull up our phones, well, you folks are investors, but if you're not an investor, I bet you if you pull up your phone, most of the apps on it are apps that are built by large teams for millions of users.
And there's very few apps that are built just for us in the specific thing that we need.
And so I think as it becomes more and more practical to build bespoke software for people or teams, we'll end up having higher and higher demand software.
[MUSIC] Welcome to Training Data.
Today, we're joined by Hanson Wang and Alexander M.
Buricos from OpenAI's Codex team for a fascinating look at the future of software development.
Codex is OpenAI's series of AI coding tools that helps developers delegate tasks to cloud and local coding agents.
Unlike the original OpenAI Codex, which was developed in 2021 to autocomplete lines of code, the latest evolution of Codex can complete entire tasks for you autonomously in the background.
The key difference between O3 and Codex is that while O3 is great at competitive programming, Codex has been R-al-tuned to be great at day-to-day enterprise development tasks.
Alexander and Hanson share more about the backstory for Codex and the broader paradigm shift from snappy autocomplete to longer-running background agents.
Plus, they share their surprising vision for how developers will interact with AI in the future as sync and async experiences merge.
Hint, it might look more like TikTok than your current IDE.
[BEEPING] Thank you guys for joining us.
It's wonderful to have you here.
Hey, thanks for having us.
Great to be here.
We'd love to hear a little bit more about what you guys work on.
Tell us about the Codex team and your story.
Cool, yeah.
I'm Hanson, I'm one of the researchers that helped train the Codex1 model.
And I'm Alex, the product lead.
I think for me, the name Codex is such a great callback to the original Codex model.
That was kind of like a ha moment for me when it first came out, because I think GPT-3 was really cool.
But then Codex was like the first moment where I felt like, wow, this can really do something that is going to change the world.
And it's actually kind of like how I got into the whole startup space.
And one of the first couple demos I did was using Codex to do data analysis.
I think it's actually a funny story.
I was here for as part of Sequoia's Arc program.
I talked about Lauren.
And then when the demos we did, we actually used OpenAI Codex to do data analysis.
And that's how I started in the startup space.
And I think as time went on, as the later versions of GPT came out, became super clear that using AI for HNTech use cases was going to be the future.
And so I joined the company to work on agentic coding efforts.
Yeah, and this was like a per standard OpenAI style where we liked the naming to be as easy to follow as possible.
This is the Codex of like, I think it was 2021.
Yeah, this is pretty chat GPT, right?
Exactly, yeah.
So it was actually like the model powering GitHub co-pilot.
And then recently, as we were working on this product, which we'll talk about, we thought this is like a super fun brand, also a very apt name, code, codeX, code execution.
So we decided to sort of resuscitate the brand and keep using it.
You said resuscitate.
So was codex dormant for a while?
And then you all resuscitated it for the-- We haven't used the brand recently.
OK, really cool.
Can you tell us a little bit about codex, the agent, and what it does?
Yeah, I think so.
Basically, codex is a coding agent that has its own container and its own terminal kind of like fully in the cloud.
You give it a task, and it comes back to you with a PR in this sort of one shot style.
And we actually experimented with a lot of different form factors kind of along the way, but kind of in the end decided to settle on this one.
Yeah, so we've been working on a bunch of agents, and we've been working on a bunch of coding products as well.
And basically, in our mind, codex is like this thought experiment for how would it work to code with AI, but where we sort of put all our effort into thinking about what would that would feel like if the AI is working on its own computer independently from you.
And so you're delegating to it rather than pairing with it.
And so some of the things that we're really proud of with this codex launch are thinking about the compute environment and how do we set it up so that the agent can actually work on its own but be productive, and creating the model which I have to talk more about.
Basically, that isn't just good at writing code that looks good or is functional, but also is really good at writing code that is useful for professional software engineers and like mergeable ideally without even touching their own computer.
So what is the difference between codex and codex CLI?
Yeah, we've definitely gotten some questions about that.
I promise this is all going to make even more sense over time.
So basically, codex for us is like our brand for agentic coding.
And we have this vision of like, you know, we're going to have this agent and mostly the agent will work on its own computer, but it should all be able to meet you in any of the tools that you use, wherever you work, be that your terminal or your IDE or your management tool.
So codex CLI is basically like codex in your terminal.
So CLI stands for command line interface, right?
So it's like in your terminal, you can work with codex.
That's like your environment.
And then codex or codex in chat GPT is basically a codex working on its own computer.
Today, those are just distinct things.
As a brief aside, one of my favorite things about working at OpenAI is how willing we are to cut scope and just like launch things quickly.
But over time, we'll actually bring those things closer together.
So you can really think of it as just like, it's just like codex and, you know, it can be in chat GPT or it can be in your CLI.
Very cool.
And so what did you have to do differently for the model to make it useful beyond just writing the next line of code?
Yeah, so I think one of the most interesting progressions.
So if you go back to, you know, like the 01, the first reasoning model that we launched, we highlighted like how good it is at math and even like coding competitions.
Like as of now, I used to be a competitive coder and like it's better than me, a competitive coding, it's better than most, almost all people at OpenAI at that.
But I think one of the things that we saw was that, you know, despite being good at these programming competitions, it wasn't actually that good at producing mergeable code.
And so like we even highlighted it, this in the blog post with models like 03, like the code that it generates often, you know, like isn't quite to the taste or style that a professional software engineer would expect.
So a lot of the effort that we spent on training this model was aligning the model to basically like the taste or the preferences of professional software engineers.
And that's something that took a lot of, I guess, specialized training.
Yeah, I have this like very like producty analogy that I like, which is like, if you take our like reasoning models, which are great at coding, they're great at coding, but it's kind of like this like really precocious like competitive programmer like college grad who doesn't have many years of job experience being a professional software engineer at like on a team.
Right.
And so a lot of the work we did to go from like 03 to like Codex one was actually like the equivalent of like those first few years of job experience where it's like, hey, like what is a good PR description look like?
You know, PR titles, like how do you read the style of the code base and then make sure your code is in the same style?
How do you like test well?
How do you show that you tested well stuff like that?
What's typically aha moment for when somebody uses Codex?
Yeah, I think one of the things we have in the onboarding is like find and fix a bug in the code base.
I think that's one of the areas where Codex really shines is like specifically like bug fixing just because it can actually like independently try not just to see if you know something looks a bit off, but it can actually go and then like verify that, okay, like I can try and reproduce a particular issue.
And so I think like even, you know, like leading up to the Codex launch, there are a couple of bugs where, you know, like we were sitting there kind of like wondering what's going on and honestly, like sometimes easiest thing to do is just like paste in a description of the issue into Codex.
And we were surprised how frequently they could actually end up with a usable fix.
Yeah, like fun story here.
Hopefully this doesn't give away too much.
But at 1 a.m.
the night before launch or the morning of launch at 1 a.m.
we were we were looking at a bug with like an animation, a lot of animation.
And, you know, this is the kind of thing like, okay, I guess we could cut it from launch scope.
It'd be okay to launch without it.
But we really wanted to get in and we just couldn't figure this out.
And so an engineer ended up like describing what the bug was and putting it into Codex and actually like a fun pro tip for anyone who's like using Codex is that if there's a really hard task, it can be useful to ask Codex to like take multiple cracks at it.
So they pasted that description in and ran it four times like, hey, there's this bug.
We can't figure out what's going on.
And three of those rollouts did not work.
And then one of the four was just like the fix of the bug that we were stuck on for like hours at 1 a.m.
before launch.
And so landed the fix, you know, deployed the code and the animation was in for launch.
That's awesome.
Maybe tell us more about how you all are using it internally at OpenAI.
Like is every engineer, is every researcher using Codex now in their workflows?
Yeah.
And actually, can I give you the other like kind of magic moment?
Oh, yeah, please do.
Definitely.
So like one of the interesting things about Codex is that it's a very different form factor from maybe what people are used to.
Like a lot of the products that people are used to, especially in software, maybe like GitHub Copa was like the first really good one.
There are really things that kind of like work with you in flow and you're just kind of seamlessly going back and forth.
You're kind of pairing and its flavors on pairing.
And we think that's awesome.
And like the Codex CLI is a tool that you can use in that way.
But for the for Codex, you know, we really wanted to push this idea of like you're delegating because like in the future, we imagine that actually, you know, the vast majority of coding is actually going to be done independently from like the human like working on their computer who can only do one thing at a time.
And so, you know, it'll be done.
But basically, it'll be done by agents working on their own computer.
And so that is a very different thing to delegate to an agent than it is to pair with sort of an AI model that's like in your tooling.
And so you have to kind of use it differently.
And so when we actually were working on an alpha before launch, we would just give this agent to people and be like, hey, like just use this however you want.
And we noticed that many, many of the people trying to use our alpha of Codex, we're just like not really finding it super useful.
And then we're like, that's interesting.
Let's look at how people like at OpenAI are using like internal tooling like Codex.
And we realized there was like a big difference, which is the mindset of using it.
The mindset that works really well for Codex is like kind of this like abundance mindset and like, hey, let's try anything.
Let's try anything even multiple times and like see what works.
It saves me time.
And so we've kind of shifted the way that we even onboard people into the product to try to create this aha moment, which is running many tasks in parallel.
So like for us, if we see someone like trying it out and like they've run like 20 tasks in like an A or an hour, that's amazing.
And like we they're probably going to like they've understood basically how to use the tool.
Fascinating.
How does that change the role of the human when you have to review all of this code?
Like if two of the three work, then what do you do?
Yeah, I think we put a lot of focus on also making the outputs easy for people to review.
So like one of the things that we're proud of is like we haven't seen this in too many other tools is like the ability for the model to cite its own work.
So not just like the files that are changed, but also even like the terminal output.
So like if it ran a test and you know, like for some reason test wouldn't work.
It actually like tells you that and it tells you like here's the exact kind of like terminal command I ran.
Here's the output makes it much easier to verify the outputs.
But but it is like a great point.
I think we're shifting to a world where like a lot of the time that we spend, you know, normally coding, a lot of that is going to shift to actually reviewing this reviewing the code.
Do you need humans to review the code?
Because I think of code is one of those things where, you know, it compiles or it doesn't.
And once it compiles, you can go and check if it does the thing it was supposed to do.
Like, do you even need humans to do the code review?
I think, yeah, I mean, for the foreseeable future, at least I do think I do see that to be the case.
I mean, I think a lot of it's also just like building trust with the early users.
I think people really need to have a feeling for like, you know, what things are working well, what things are not.
And I think there's always just like some external context about like, you know, what makes this code correct that, you know, might be beyond what what you initially provided as context.
Yeah.
Like if you think of what, you know, what a developer does and this is obviously oversimplifying, but there's like, OK, there's coming up with what things maybe should be done, discussing them with the team, maybe deciding what to do.
You call that ideation, you know, maybe then there's design like, OK, what are we actually doing and then like planning, how are we going to do it?
Then there's implementing and then validating, you know, testing those changes.
And that's basically a loop and that small loop of like implementing and then testing is what Codex is great at right now, although we can talk about how you can use it for planning to.
And then there's actually deploying the code and then maybe maintaining the code, writing documentation, et cetera.
And so like, you know, I forget the exact stack, but I feel like that I remember recently is like engineers spend like maybe like 35 percent of their time coding.
It's not actually the majority of even what engineers do.
And so, you know, the future that we're trying to build towards is that is one where, you know, if you're a software developer or even like in any profession, all the work that is like easily automatable, that's usually the grungeier type of work you're not doing, you're delegating that.
And then the work that is more interesting because maybe it's ambiguous or maybe because it's really hard, that's the work that you're driving.
So we're trying to build towards that that work, that that world.
And I think we have to get there iteratively.
So, for example, right now, if you're a human and you write code, another human is going to review that code.
Right.
And so we're not going to come in and just like try to change that.
And we're like, OK, let's plug into that.
So, you know, the way the product works right now is like you, the developer are being accelerated by the tool.
You ask for some code to be written.
You decide if it's good and you want to push it out to your team and then your team can review it.
And then over time, we'll basically kind of expand what we can do.
So we'll help more and more with like planning, maybe even designing, maybe even thinking about what to do in response to things that are happening in your app or at work.
And then we'll push to like make review easier and easier as hands-on describing.
Yeah.
And I do think I see a future where you have, you know, like multiple agents collaborating together.
So you have, you know, Codex, the Codex agent writes the code and then maybe like the operator agents, the one that's testing it.
And all of the things that all the different agents that we've been working on at the company can kind of like come together.
That's awesome.
Have you seen people now that you can delegate doing writing code, people beyond engineering team start to use Codex and it was beginning to the world of vibe coding.
You guys are helping us bring us further down that hole.
Yeah, this is actually super funny.
We were so the answer is yes, but I'll tell you a story.
We were working on our launch blog post with Lindsay here, and we were talking about like what quotes to, you know, to quote from customers.
And we had a customer that wanted to say, yeah, like we on the engineering team love this and also it's like a power tool for PMs.
And I remember looking at that quote and be like, this is a really cool quote because I'm on the product team and I use it to just like avoid having to bug an engineer about things or to answer questions.
But I remember looking at that quote and being like, do we want that in the launch blog post because the target audience for what we're building is like specifically professional software engineers, not vibe coders.
So I think we ended up not including that exact line.
But I think over time, like as you know, as we have agents that can help us code, I would expect more and more people to be able to contribute to code basis.
Yeah.
I think the number of professional software developers goes up or down over time.
This is just my opinion, but I think it goes way up.
Huh?
I think I'm not vibes, coders, professionals, offers developers.
Yeah, yeah, I think so.
But, you know, yeah, in my opinion, the easier it is to write software and then the more software we can have right now, we think of like, I bet you, if we look pull up our phones, well, you folks are investors, but if you're not an investor, I bet you if you pull up your phone, most of the apps on it are apps that are built by large teams for millions of users.
And there's very few apps that are built like just for us and the specific thing that we need.
And so I think as it becomes more and more practical to build like bespoke software for people or teams, we'll end up having higher and higher demand software.
Yeah.
As I think about how I use it, I think it just really is a multiplicative factor right now, rather than any kind of any sort of replacement.
Just like especially, especially looking at the patterns of our internal power users is like a really dramatic difference in like the top users of Codex are like doing, you know, like 10 plus PRs every day.
And it's just like really such a multiplicative factor that I can't see like a world in which like it's like lowering the bar to creating software so much.
That said, I mean, I think this is a really important question.
And to be completely honest, like we don't know.
And so this is something that we as a company pay a lot of attention to.
I want to talk a little bit about the, you know, what's happening under the hood on the technology side, as you mentioned that, you know, the model itself, one of the things that makes it different from competitive programming is you've, you know, you've made it more, you know, be good at the things that a professional software developer would do is that the biggest difference on the model side?
Or like, should we think of that as a, you know, a close cousin of O3?
Yeah.
So it's definitely the same model as O3 with additional reinforcement fine tuning.
But that's it.
Yeah, I think so.
Part of it is kind of like these more like qualitative aspects of what makes a good software engineer versus simply like a good, let's say, like coder, you know, like style, even like how it writes comments.
That's I think that's like one of the things that people have noticed with other models.
And then on top of that, I also want to highlight one of the big challenges was like making good environments for the agent to kind of learn in.
And so if you think about like real world software repositories, it's like so varied and complicated.
Like think about like how much DevOps has to go into like setting up a repository and that's something we're kind of like learning the hard way with our environment setups.
But should we talk about the multi repo I was showing you yesterday?
Oh, yeah, like I was showing Hanson the repo for the startup that you know, opening I acquired and so we joined.
And so we were looking at that repo together, thinking about it for use as an environment and Hanson's like, so like, where are the unit tests?
Because the agent uses unit tests to verify it.
And I was like, this is a real startup that has no unit tests.
So I can't complain.
So yeah, like you have all these like really messy environments.
So we yeah, we have to over the course of training, like we have to basically generate these really realistic environments for the agent to learn from.
And I think like one of the reasons that we're able to make such like an end to end product work is that we have like the same environments that we use during training and the same like, basically this, you know, containerization infrastructure that we're using to serve in production.
So this so our users are, you know, like we're running our own compute environments.
When users use codecs, they're running in the exact same environments that we're using the training.
So you don't have the agent saying but it works on my machine.
Exactly.
I think these are also the longest running agents I've seen out of open AI, deep research, maybe use the previous one that was longest running.
And my understanding is, you know, codecs can, you know, sometimes spend 30 minutes on different tasks.
Are there any kind of surprising challenges and things you've encountered just getting inference time to scale up on, you know, query for so long?
Maybe I'll start with the product side.
And then there's many on the moment side.
But on the product side, actually, the thing that I think the most about is like user intent.
It's like, actually, you know, if you imagine someone using like autocomplete in their IDE, it's like, not super hard necessarily, and obviously difficult, but it's not super hard to predict like, what are they trying to do like right now for the next like microsecond.
But for doing a task that takes 30 minutes, it's actually like fairly difficult to help a user describe the task like they may not even know exactly what they want for 30 minutes worth of work.
And so something that we spent a while debating and it's like still a thing we debate is like, what is the right granularity of a task for someone to give to codecs?
And like, how can we make it easy so that codecs can like, be really flexible, where you can use it for like one line changes, you can use it for like big refactors that you know exactly what you want, or like larger features where you know what you want?
Or maybe can you use codecs when you don't know exactly what you want?
And so maybe you should ask codecs for a plan, and then you can like have it codecs suggest tasks and then like do those tasks afterwards.
So that's still a topic of debate.
Yeah, iteration for us.
Yeah, I think that's actually like a good pro tip for you for using it.
It's actually like really good at coming up with its own plans.
And then you know, sometimes it's really tedious to specify everything you want upfront.
And that's kind of like one of the unique challenges about working like if you wanted to work for, you know, an hour at a time that you kind of do have to specify a lot upfront, that which means that you have to spend like, I don't know, like 10 20 minutes coming up with that.
But if you use actually like the ask mode to, to first like, you know, generate like a high level plan of what you want want to do, and then you can like iterate on that with the model before you, you know, send it off for for an hour.
It really is like working with an intern.
Yeah.
What about on the model side, anything that's surprising in terms of model behavior as it starts to run for so long?
Yeah, I think I think our models have gotten a lot better at kind of like sticking kind of like on task as it, especially like with these longer rollouts.
I will say like there are cases where, you know, like even the there is a limit to the model's patience, even though it's quite high.
So it can be frustrating sometimes, you know, it's like, it goes off for like 30 minutes.
And then you know, that this is a case that we're working to get better at where it's like, you know, it's kind of like just like a human comes back to you.
It's like, sorry, I don't this is too much.
I don't have enough time to do this.
Actually, like that's one of the things it says.
Very, very human like, yeah.
Yeah.
I'm curious how you think about the right interaction patterns and how they evolve and how the suite of products around this evolve over time.
We have Codex, we have Codex CLI.
What else do you think is out there in the design space for engineering and building products?
Yeah, so the Codex as we launched it is really just like, you know, it's a research preview.
It's a thought experiment, useful one, but it's still still very early.
And what we're most proud of with Codex is the model and the you know, the beginning of this like foundation for computer environments.
And the UI we shipped is is one that we iterated towards and there's some fun stories there.
But it's definitely not the final form factor.
And for those listening, basically the UI we shipped is an interface in chat GPT where you can like submit a task and ask Codex to like either answer your question or write code.
And then you kind of have this like something that looks a little bit like a to do list of like things that you can like go look at merging.
Really, I think for so we built that to really lean hard into this idea of like an asynchronous agent that you delegate to.
But what we want to build towards is a setup where you don't have to think about whether you're delegating or whether you're pairing with an agent.
And it's really it should just feel like working with a teammate and where that teammate is like ubiquitously present in all the tools you work with.
So you should be able to pull up any tool that you're working in be it your terminal, your ID, your issue management tool, maybe your alerting tool, your errors, you know, the tool shows you errors, and just ask for help.
Maybe even Codex has already taken a look before you even got there and it has like an opinion there.
And you can be able to ask something be it a short question or a long question, it'll just like appropriately decide how much time to spend before answering you and just like help you land those changes.
So basically, we want to kind of blend this idea of like pairing and delegation.
But the first thing we shipped was just like the the the purest thought experiment.
The other thing I'll add to this is like, one of the unique things about working at OpenAI is that we are the makers of chat GPT, which is sort of the you know, the AI is the most people use.
And so we don't actually see a future where, as you go about your day, you're deciding whether to use like the Codex agent or I don't know, you're like shopping agent or like the taxi ordering agent, by the way, I'm just like naming random things here, or you're like marketing agent.
Actually, the way we think this should work is you should just have like one assistant that you talk to, and you can ask it anything about anything, and it can just like, do the things you need.
And so that's, you know, that's chat GPT that will become our assistant.
And then if you're a power user of a certain type of tool, so let's say you're a software developer, you spend a lot of time in certain functional tools, then you can go into that tool and have like a bespoke interface with buttons with lists that you can use to like efficiently go about your day.
Do you think we'll still use IDs?
Yeah, for sure, but they'll evolve, right?
Like right now, they're like very focused on writing code.
And like as Hanson was saying, like probably agents will be writing more and more code.
And so it's gonna become like, there'll be a shift in emphasis towards like landing code or reviewing code or like validating them, or maybe even like a shift in emphasis towards planning like bigger arcs.
Yeah, I think we're already seeing a lot of people on the team.
They kind of like first thing in the morning, they come in like they make coffee, and then they like kick off a few tasks just to kind of get a starting point.
And then you know, they come back after their breakfast, and they look at the tasks that were the PRs that got generated, then they'll take those and the ID is kind of like the place where you take, you know, it's not it's maybe we'll get you like 80% of the way there, hopefully, or even more.
But then there's always this like last mile where you go in and really like fine-tune based on kind of like your own vibes.
How do you see the broader market evolving?
Like within opening AI, you have so many different strategies here.
And as you think about async tasks, as you think about some of the things that you mentioned moving into chat GPT, we're seeing a lot an explosion of other tools and specialized models.
You obviously are biased, but I'm curious what your read is of the broader market.
Yeah, it's a crazy time to be a developer right now.
Like there are just so many new tools that are just so helpful.
Like a fun story recently as I was in the airplane and there was no Wi-Fi, and I had thought that I was gonna maybe write some code and like build a thing and there was no Wi-Fi and I was like, you know, it's great.
Like it's just not worth my time to like even try to write code anymore.
Whereas, you know, the startup that I was working on like many years ago, like part of the genesis of that startup was like me writing some code without Wi-Fi in an airplane.
And I just wouldn't even do that anymore because like the market is just like it's just changed so much.
And I think this I think we're gonna see like an equivalent shift in an equivalent amount of time.
So like in the next two years, coding will look completely different.
I think right now most of the tools that people spend, you know, that people find the most value from are tools that work really closely with you like in your development environment, you know, like basically pairing.
And I think the shift that we're gonna see, but we have to figure out how this will happen.
But the shift that we're gonna see is that actually the majority of code will be written by agents and those agents won't be working in your environment where you can do one thing at a time, but they'll be working in their own environments.
And they won't just be triggered by you like thinking of specific tasks, but they'll be like connected into the tools you use doing work there.
And so I think we'll see basically that shift towards agents.
I think we're gonna have to figure a lot out a lot about code review as you were asking about like personally like I don't exactly know how that's gonna work, but I do know that even already at OpenAI we're seeing like much more code is merged by agents, but actually also even more code is generated by agents as folks are like, you know, like say kicking off tasks four times to like choose their favorite implementation.
And so it's like not a hundred percent clear how we should even like manage all this code that is being written.
Some things that I will say though in case it's useful to the audience is that there are definitely things you can do to your code base to make it more addressable for agents.
This isn't necessarily particularly novel, but you know obviously using like typed languages is really helpful.
Another thing that's very helpful is like having like smaller modules that are like better tested.
Like we joke about...
Having good tests at all, yeah.
Yeah, having tests like we joke about by startups repo, but like I bet you we would have written it differently if we were writing it today.
And even there's like small things like the codename for this project is WAM.
This is the codename for codex.
It's like WHAM and when we named it we were very intentional in doing so because we knew we would have code like in the server, like for the website, in various other places, and we wanted it to be really easy for the agent to like search for WHAM related code and find it.
So we named the project, you know, WHAM and we grep the code base first to figure out how often it was there.
Like if we would have called it something like code or codex or agent, you can imagine like it would have been really hard for the agent to...
And now you called it codex and now the agent's gonna be confused.
Well, so in the code, this is kind of my point, right?
Like intentional design.
Like in the code we use the term WHAM like a lot because that's actually much easier for the agent to find.
Obviously if we didn't use a word like that, the agent could still find its way but it would have to do spend much more time to find the right files.
Yeah, it is cool that, you know, like a lot of the things that actually make the code base easier for humans too also tends to make it easier for agents.
Like good tests, for example, writing good docs is another good example where like now I think there's even more of an incentive to do that because like not only does it make your life easier, it makes the agent's life easier.
Okay, sorry to be the annoying VC but cloud code and jewels are also like I think agentic coding experiences from others.
Do you think the mark...
I'm curious how you think your experiences compare today and then do you think the market is probably going to converge towards the same vision of what you know, sync and async coding look like and in that version of the future, what do you think OpenAI wins on?
I think we're going to see a little bit of everything, right?
Like even in what you mentioned, like there's like tools that are working on your computer, there's tools that are working on their own computer.
Like as I mentioned, like I think we're going to see the majority of work being written where like the agent has its own computer but it will still be really important for us to invest in accelerating developers who are doing work on their own computer too, right?
So ideally we get the best of both worlds there but most work is done in agent compute.
I think the way I see it as well is like I think one of the the hardest part of software engineering really is like taking all the context from the world and like encoding it in these requirements, these like design docs and then the implementation like I think as we alluded to earlier is like not actually like that much of the life cycle is spent on that physical coding.
And so I think where chat GPT shines is like it is this assistant that has you know like has memories now, it has access to like a lot of different connectors to like all the different tools you use.
We have like operator deep research that have all these like different capabilities.
And so I think the vision where that like all comes together is where you know like a tool like codecs can really shine once it has access to all that knowledge, it's able to like make use of that and I think with that it should be able to do a much more effective job at you know just the coding part.
Yeah like imagine like hiring a software engineer and like the only thing that that software engineer can do is like take a task from you and produce a PR, right?
Or you know it has like these like very well defined features and it can exactly do those things.
And then you ask for like a random thing like oh hey like the team is getting together like do you mind like also I don't know getting a meeting room and like leading a brainstorming like it just would just be so frustrating if like you hired a teammate and they refuse to do that kind of work, right?
And so similarly I think like it's really like we're building towards a future where like agents that you're working with are a little bit more generalized.
Like you know to reference like Hansel was talking about like you know operator and deep research like if you think operator has a web browser, deep research has like a different flavor of a web browser, codecs has a terminal, like really like your teammate has pretty similar tools like a human teammate, right?
And so like the goal for us eventually is to like pick places where we want to really invest in a specific audience to like make rapid progress.
So we obviously we're doing that with coding with codecs or like GPT 4.1 where we like generated specific evals for that audience and then like made a better model for them for developers.
But then over time like generalize these things into like simple things that everyone can use.
So I think like again with us you know with OpenAI and like ChachPT I feel like that's a place where the products we build will look very different from something that's like very only specifically for coding.
What do you think will be the primary UI that developers use to interact with codecs?
I think do you think it'll be ChachPT, the CLI, the IDE, all the above?
Yeah it does.
I think a mix of all the above I think we just kind of like want to meet developers where they are in that moment.
So it might not even be like in the editor or in the terminal might be like on slack like you know someone messages you like hey like there's a bug and you're just like hey like go fix it.
I'll give you my like fun future UI that is like not at all serious but maybe the future of like working with agents if you're like you know a startup founder in the future and you have like a team of just you or you and a couple of founders and many agents actually looks like TikTok.
You know maybe you have like vertical feed and it's basically an agent has produced video that you can watch with like an idea like hey a customer wrote in with this request I think we should fix it and then you swipe right to say like yeah let's let's fix this let's do this you swipe left to say no we should do that.
Tinder or TikTok?
Sorry it's a hybrid.
I didn't say this was gonna make a lot of sense.
I like it.
And then you press and hold to provide feedback so you feel like yes like do it but you know make sure the font is in italic and so basically you have all these agents who are like subscribed to information at your company or on your team and they're proactively coming up with ideas and you know doing them and then giving you updates and you're kind of just like curating the work that is being done.
And they show you little previews of what the world could look like.
Yeah obviously that's a half joke though you know I think that'll be like kind of the arms like working with agents and then there's like it's definitely gonna be really important for people to be able to like go do the work themselves and like pair with agents in.
I get that it's a half joke but it is like it's a really cool visual because I think everyone agrees conceptually with this idea of you know collaborating and reviewing all the different changes an agent makes is going to look very different from how we code today but like nobody's actually given me a visual of what that might look like so that's a really cool idea.
I love it.
Awesome should we wrap with the lightning round?
Let's do it.
Okay a recommended piece of content or reading for for AI fans?
For me that's like immediate that's like The Culture by Ian Banks.
You read it?
Yes it's amazing.
Yeah it is a science fiction series started being written in the 80s and it is unusually positive in its view of like how a future spacefaring like human and non-human like race could kind of look and there's a lot of questioning about like what is the purpose and meaning of life when we have AGI.
Yeah I think for me it's like Anything by Richard Sutton.
I think that was like my introduction to reinforcement learning and I think it's like it's kind of a joke here that like we read the bitter lesson like every single day that's like kind of the philosophy of open AI like I think you know even with codecs like we give it a terminal and like it literally uses POSIX tools that's like the most like bitter lesson way of working with with the computer.
And your favorite AI apps?
Gotta be chat GBT.
Okay either it could be like a new feature that you guys have released other than codecs or something outside of open AI.
Okay so I guess I don't it's funny I don't really think of AI apps right I did but I do like it when my life gets easier so you know some things that I like are like when you're using AI but it's kind of invisible so like just I'm in product so I often like file bugs and like linear has a really elegant integration where when you file a bug from a slack conversation it just generates the bug from the slack conversation but they never say AI anywhere just like you actually kind of don't even notice that it's using AI.
Oh wait I came up with an answer for favorite AI app Waymo.
Ah yes yeah I think I mean I think for me like uh co-pilot has definitely been um the the thing that you know keeps delivering value every single day for me.
Okay robotics, bullish, bearish.
Bullish?
Yeah.
Which new application or application category do you think will break out in 2025?
Other than coding.
Yeah I mean I think I think when you had Yusa and Josh on it's kind of the same answer but 2025 is definitely the year of agents I think we're gonna see agents take off in a lot of different categories.
Yeah I have to agree with that.
What type of agents are you most excited about?
Aside from coding agents.
Yeah.
That's a good question.
Well I mean so my take would be like you know if we I know it's meant to be rapid fire right but like kind of the way we think of agents is you have reasoning models right and then you give those reasoning models like access to tools of the trade and then you figure out how to train that agent to like do the the sort of specific function right so it's like not just about writing it's about journalism or it's not just about coding it's about software engineering right so that's kind of what we're doing and in my mind we the reason I'm so excited about agents this year is because we now have a few agents shipped from OpenAI and you know and other companies are shipping agents too and so we're starting to see what kind of the shape of this is and starting to like identify the primitives and so specifically what I've been excited about is like as we bring this together and you come up with like an agent that you don't have to provision like separately for every single function but it's an agent with a computer that has a browser and has a terminal and it can do like multiple things without you having to like exactly specify like you are my coding agent or something.
Really cool.
Thank you so much for joining us.
Congratulations on what you've built at Kodak's and thank you for giving us a preview of how you think the coding market will evolve and also giving us a peek into how you know long running async agentic experiences will play out.
Really appreciate it.
Thank you.
Thanks for having us.
Thank you.
[Music]