Training Data · 2025-11-06

OpenAI Sora 2 Team on Generative Video, World Models, and Co-Evolving Society

Hosts: Constantine, Sonya

Guests: Bill Peebles, Thomas Dimson, Rohan Sahai

Sora 2Diffusion TransformersSpace-Time TokensWorld Models / World SimulatorsPhysics in Generative VideoCameos and LikenessFeed Ranking and Creation vs ConsumptionAPI and Developer EcosystemIP Rights and MonetizationIterative Deployment / AI SafetyDigital Clones / AgentsFuture of Filmmaking

Read summary Jump to transcript Original podcast

Podcast feed URL

Open feed

Why it matters

OpenAI Sora 2 uses space-time tokens for physics-consistent generative video

Key claims

Bill Peebles pioneered diffusion transformers (DiTs) for video; Sora 2 uses space-time tokens so attention spans the full video, enabling object permanence and physics consistency.
Sora 2 is trained to defer to physics on failures (e.g., missed shots rebound realistically) rather than 'cheerleading' the prompt—a semantic agent-failure mode not seen in prior models.
The Sora social app was built in roughly three months; the cameo feature emerged from internal play and became the central hook that humanizes AI video.
Thomas Dimson is explicitly designing Sora's feed to optimize for creation over consumption, learning from Instagram's ranking challenges—claiming ~100% day-one creation and ~70% return-creation rates.

Episode summary

Summary

The OpenAI Sora team (Bill Peebles, Thomas Dimson, Rohan Sahai) sits down with Sequoia's Training Data to discuss the technology and product thinking behind Sora 2. Bill explains how diffusion transformers operate on 'space-time tokens'—cuboid patches that span both spatial and temporal dimensions—which gives Sora global attention over an entire video and enables emergent properties like object permanence. He frames Sora 1 as the 'GPT-1 moment' for video and Sora 2 as the 'GPT-3.5 moment,' with Sora 2 specifically trained to respect physics rather than bend reality to satisfy a prompt (e.g., a missed basketball bounces off the backboard instead of magically going in). He envisions a future 'GPT-4' leap for video that could enable scientific experimentation inside world simulators.

The product team describes the rapid, ~3-month build of the Sora social app, emphasizing that the cameo feature—allowing users to insert their likeness into generated videos—was an internal accident that became the product's emotional core. Thomas Dimson draws on his Instagram ranking experience to argue that Sora's feed is explicitly optimized for creation rather than passive consumption, citing that ~100% of new users create on day one and ~70% continue creating on return. The team also discusses exposing Sora 2 via API to support long-tail use cases like filmmaking and CAD, and outlines an in-progress framework to monetize cameos featuring licensed IP for rights holders.

Looking forward, the team describes cameo as a 'lowest bandwidth' way to give the model information about a user—a foundation they expect to grow into persistent digital clones operating in a Sora-mediated alternate reality. They explicitly invoke OpenAI's 'iterative deployment' philosophy, saying releasing Sora 2 now (rather than waiting) is meant to let society begin establishing norms around AI likenesses before the technology reaches agents that act in the physical world. Future roadmap items include pet and object cameos and longer-form narrative work.

Bill Peebles pioneered diffusion transformers (DiTs) for video; Sora 2 uses space-time tokens so attention spans the full video, enabling object permanence and physics consistency.
Sora 2 is trained to defer to physics on failures (e.g., missed shots rebound realistically) rather than 'cheerleading' the prompt—a semantic agent-failure mode not seen in prior models.
The Sora social app was built in roughly three months; the cameo feature emerged from internal play and became the central hook that humanizes AI video.
Thomas Dimson is explicitly designing Sora's feed to optimize for creation over consumption, learning from Instagram's ranking challenges—claiming ~100% day-one creation and ~70% return-creation rates.
Sora 2 is exposed via API in the same state as the app, targeting long-tail integrators in film, gaming, and tools rather than a first-party social platform.
OpenAI is actively building a monetization framework so IP rights holders are compensated when their characters appear in user cameos.
Long-term vision: cameos as the foundation for persistent digital clones and a Sora-mediated alternate reality where 'copies of yourself' perform tasks—framed as why OpenAI is deploying iteratively now.
Team explicitly compares Sora 1 to GPT-1, Sora 2 to GPT-3.5, and says a 'GPT-4' step is needed before world simulators are reliable enough for scientific discovery (e.g., turbulence/fluid dynamics).
~7 million generations per day, reached the top of the App Store, with a notably broader user base than expected (beyond the AI Twitter/filmmaker early adopters).

Source material

Transcript

For OpenAI across the board, it's really important that we kind of like iteratively deploy technology in a way where we're not just like dropping bombshells on the world when there's like some big research breakthrough we want to co-evolve society with the technology.

And so that's why we really thought it was important to like do this now and like do in a way where you know, we've hit this again, this kind of like GPT 3.5 moment for video, let's make sure the world is kind of aware of what's possible now.

And also, you know, start to get society comfortable in like figuring out the rules of the road for this kind of like longer term vision, where there are just copies of yourself running around in Sora and the ether, like just doing tasks and like reporting back in the physical world, because that's where we are headed long term.

Today on training data, we sit down with the team behind OpenAI's Sora, Bill Peebles, Thomas Dimson and Rohan Sahai.

You'll hear about space time tokens, building internal world simulators and how optimizing for creation instead of consumption is just better for social platforms.

This conversation goes way beyond video generation and into questions about how society will co-evolve with powerful simulation technologies.

We promised that this was an actual real world conversation and not a video generation.

We don't know how to prove that to you.

Let's jump in.

Hey guys, thank you for being here at Sequoia.

Congratulations on Sora.

Maybe you could tell us a little bit about yourselves and how you got to OpenAI and Sora.

Yeah, I'm Bill.

I'm the head of the Sora team at OpenAI.

I had a pretty traditional path came through undergrad doing research on video generation, then continue that work at Berkeley.

And then started at OpenAI working on Sora from the first day I joined.

And I'm Thomas.

I work as an engineering lead inside of Sora.

I have a bit of a longer story, but I worked at Instagram for about seven years doing some of the early kind of machine learning systems and recommender systems there.

But it was a very tiny company.

It was about 40 people.

Then I quit, did my own startup for a while, which was Minecraft and the browser, which we've talked about a couple times.

And I think that OpenAI noticed that we had a very cracked product team there.

And so they acquired our company.

And I've been bouncing around different products inside of OpenAI and on the research side as well on post training.

But super happy we landed kind of together on Sora to bring this thing to life.

It was a really cool product in between two, like the global illumination product.

Oh, yeah, I still believe in it.

Yeah, me too.

Awesome.

I'm Rohan.

I've been at OpenAI for about two and a half years.

Started as an IC on chat GPT.

But then as soon as I saw the video gen research, I got quickly Sora pilled and made my way over there.

And so currently we have the Sora product team.

Before that, just startups, big companies within kind of the valid bunch of random stuff.

Yeah.

Cool.

Well, Bill, you are the inventor of the diffusion transformer.

Can you tell us what that is?

Yeah.

So most people are pretty familiar with auto regressive transformers, which is the core tech that powers a lot of language models that are out there.

So there you generate tokens one at a time and you condition on all the previous ones to generate the future.

Diffusion transformers are a little bit different.

So instead of using auto regressive modeling, it's kind of the core objective, you're using this technique called diffusion, which at a very high level basically involves taking some signal, for example, video, adding a ton of noise to it, and then training neural networks to predict the noise that you applied.

And this is kind of a different kind of iterative generative modeling.

So instead of generating token by token, as you do in auto regressive models, diffusion models generate by gradually removing noise one step at a time.

And in Sora 1, we really kind of popularized this technique for video generation models.

So if you look at all of the other competitor models that are out there, both in the States and in China, most of them are based on DITS, diffusion transformers.

And a big part of that is because DITS are a really powerful inductive bias for video.

So because you're generating the whole video simultaneously, you really solve issues where quality can degrade or change over time, which was kind of like a big problem for prior video generation systems, which DITS ended up fixing.

So that's kind of why you're seeing them proliferate within video generation stacks.

When I try to visualize it, I mean, for each diffusion, you have a matrix of pixels, and then you do the entire video at the same time, which you can basically see as different frames, I imagine.

Can you visualize that as matrix of matrices that basically transforms over time?

Yeah, that's a good question.

So we really kind of consider things at the granularity of like space-time tokens, which is sort of like an insane phrase.

But whereas, for example, characters are very fundamental building block for language, for vision, it's really this notion of a space-time patch.

You can just imagine this little cuboid that composes both X and Y, like spatial dimensions, as well as a temporal locale.

And that really is kind of like the minimal building block that you can build visual generative models out of.

And so diffusion transformers sort of consider these, almost you can think of it like voxel by voxel.

And in the traditional versions of these diffusion transformer models, you have all of these little space-time patches talking with all the other ones.

And that's how you actually are able to get properties like object permanence to fall out, because basically, you have full global context of everything going on in the video at every position in space-time, which is like a very powerful property for a neural network to have.

And is that the equivalent of the attention mechanism?

Is the objects movement throughout the video?

Yeah, that's right.

So in our like SOAR1 blog post on video generation models as world simulators, we kind of laid out some visuals, which sort of go into exactly your point here, which is really attention is like a very powerful mechanism for sharing communication, like sharing information across space-time.

And if you represent data in this way, where you patchify it into a bunch of these space-time tokens, as long as you're properly using the attention mechanism, that allows you to transfer information throughout the entire video all at once.

What are the biggest differences between SOAR1 and 2?

And I remember with the original SOAR1, you were already seeing kind of emergent properties where the more you scale, the more it's able to do things like understand physics.

Is SOAR2 purely a function of scaling or what are the biggest differences?

Yeah, that's a great question.

We've spent a long time really just doing like core generative modeling research since the SOAR1 launched to really figure out how we get the next step function improvement and video generation capabilities.

We really kind of operated from first principles, right?

So we really want these models to be extremely good at physics.

We want them to kind of feel intelligent in a way that I'd say like most prior video generation models don't.

So by that, I really mean, if you look at kind of any of the previous set of models that were out there, you'll notice a lot of this kind of like effects that happen.

Like if you try to do any sort of complicated sequence of like physical interactions, right?

For example, like spiking gymnastics classic.

Riding a dragon like you did.

Riding a dragon.

That was fun.

That happened for real actually, Constantine.

Not generated.

You know, there are like very clear problems with the past generation of models that we really like set out to solve with SOAR2.

And I think one thing that's really cool about this model compared to prior ones is that when the model makes a mistake, it actually fails in a very unique way that we haven't seen before.

And so concretely, for example, if let's say like the text input to Sora is a basketball star wants to like shoot a hoop, right?

Shoot a three throw.

If he misses in the model, Sora will not just like magically guide the basketball to go into the hoop, right?

To be over optimistic about respecting what the user asked for.

It will actually defer to the laws of physics most of the time.

And the basketball will actually like rebound off the backboard.

And so this is a very interesting distinction, right?

Between like model failure and like agent failure.

Agent as in the agent that Sora is like implicitly simulating as it's generating video.

And we haven't really seen this very unique kind of like semantic failure case in like prior video models.

This is really new with SOAR2.

It's kind of a result of, you know, just the investment we put in like really doing like the core generative modeling research to like get this massive improvement in capability.

Okay, so not purely a function of scale.

You're actually, you know, there's some concept of agents and plus it in this there's there's things you're doing beyond just scaling.

Well, the notion of agents, I'd say is actually mostly like implicit from from scale.

Like, you know, in the same way where we kind of showed that object permanence, right, begins to emerge in SOAR1 pre-training, once you hit some like critical flops threshold, we see similar kinds of things happen as we like push the next frontier, right?

So you begin to see these agents act more intelligently.

You begin to see the laws of physics be respected in a way that they aren't at like lower compute scales.

How does the concept of a space time latent patch relate to a space time token relate to object permanence and how things move through the physical world?

Yeah, that's a great question.

So I'd say space time patch and space time token are more or less synonymous with one another.

I'll use them interchangeably.

You know, what's really beautiful, right is when people started scaling up language models, from like GPT one to GPT two to GPT three, we really began to see the emergence of like world models internally in these systems.

And what's kind of beautiful about this, right, is there's incredibly simple tokenizers that actually go into like creating the data that we train these systems on.

But despite this very simple representation, right, you know, like BPE characters, what have you, when you put enough compute and data into these systems, like, in order to actually solve this task of predicting the next token, you need to develop an internal representation of how the world functions, right, you need to like simulate things.

And like, you know, the models make lots of mistakes right now at like low compute scales.

But as you continue pushing it from three to four to five, you just see these internal world models get more and more robust.

And it's really analogous for video, right, and in many ways more explicit.

So I think it's easier to picture what like a world model or a world simulator looks like with video data, right, because it is literally representing like the raw observational bits of like, all of reality.

But what's really remarkable, right, is because these space time patches are just this like very simple, and like highly reusable representation that can apply to like any type of data, right, whether it's just like video footage of like this set, whether it's like anime cartoons, like whatever it is, you're just able to build like one neural network that can operate on this vast, extremely diverse set of data and really build these like incredibly powerful representations that model like very generalizable properties of the world, right, it's useful to have a world simulator to predict like how a cartoon will unfold.

And likewise, it's useful for predicting how you know, this conversation might unfold.

And so that really puts a lot of optimization pressure on Sora to like grok these like core fundamental concepts in a very like data efficient way.

Did you have to put effort into selecting the data such that it reflected the physical world, for example, I'd imagine if you have data from the physical world, it all abides the laws of physics.

But you mentioned anime, that might not always abide in the laws of physics.

Did you have to be selective?

Or did it naturally find patterns that separated that out?

That's a really great question.

We did spend a lot of time, you know, really thinking about, you know, what does like the optimal data mix for like a world simulator kind of look like?

And to your point, you know, I think, in some cases, we'll make decisions that, you know, maybe are like making the model really fun.

Like, for example, people love generating anime, but, you know, do not necessarily like perfectly represents, like the laws of physics that are like directly useful for like real world applications.

So like to put it another way, right, I think in anime, there are certain primitives that are simplified that are actually probably useful for understanding the real world, you know, people still locomote through scenes, for example, like if there's like some crazy dragon that's like flying around, that's probably like not so useful for like grokking aerodynamics or something.

Dragon Ball Z is more or less how I learned athletics, you know, there you go.

Motion and Super Saiyan.

I think it is an interesting question, like, that I do not know the answer to whether somehow like pre-training on simplified representations of like the visual world, whether that's like sketches or like some other modality, like, you know, makes you more efficient at like rocking these concepts.

I think it's actually a very interesting scientific question that we need to understand better.

Do you think we're close to exhausting the number of pre-training tokens there are out there?

Or do you think video data is just video is just so massive and it's actually one of the more untapped vats of data?

Yeah, the way I kind of think about this is the intelligence per bit of video is much lower than something like text data.

But if you integrate over all of the data that really exists out there, the total is much higher.

So to directly answer your question, you know, I think it's hard to imagine ever fully running out of video data.

There's just like so many ways that it exists in the world that like, you know, you will be in a regime where you can continue to just like add more and more data to these pre-training runs and continue to see games for like a very long time, I suspect.

You think we'll ever discover new physics?

Because the LLM world, you know, Einstein thinking the whiteboard is all I'm thinking.

There's also just the if you develop a perfect simulator and you just simulate physics better, better, you might learn things about the world that we haven't learned yet.

I totally think that this is like bound to happen one day.

And like, you know, I think we probably need even like we probably need one more step function change, I'd say in like model quality to like really get to a point where, for example, you can think about doing like scientific experiments in models.

Like you could imagine right one day you have a world simulator that is like generalized so well to the laws of physics that like you don't even need like a wet lab in the real world anymore, right?

You can just like run biological experiments within Sora itself.

And like, again, this needs like a lot of work to like really get to the point where you have a system that's robust enough to do this reliably.

But you know, internally, like again, we've used Sora 1 as kind of being like the GPT 1 moment for video, it was like really the first time things started working for that modality.

Sora 2 we really view as like GPT 3.5 in terms of like it really being able to like kickstart, you know, the world's creative juices and like really like breakthrough this kind of usability barrier where we're seeing like mass adoption of these models.

And we're going to need a GPT 4 breakthrough to really get this to the point where this is useful for like sciences as we're seeing now with GPT 5, right?

Like I feel like every day on Twitter, I see another like convex optimization lower bounds get like improved by GPT 5 Pro.

And I think eventually we're going to see the same thing happening for the sciences with Sora.

Do you think you need physical world embodiment to get there?

Or do you think a lot of it can be done in effectively in sim?

You know, I am like always amazed, you know, every time we like push another like 10x compute into these models, like what just like magically falls out of it, with like very limited changes and kind of like what we're training on and like the fundamental like approach to what we're doing.

You know, I suspect some amount of like physical agency will certainly help.

I have a hard time believing it will like make you worse at like, you know, modeling like collisions or like something else.

Video only is like quite remarkable though.

And I wouldn't be surprised if it's actually kind of like AGI complete for like building like a general purpose world simulator.

So for this concept of a general purpose world simulator, a world model, where you can do science experiments in that world, you think that video is the sole or some combination of video and text are the combined data inputs, and you train it on this type of this type of model or is it going to be does it have to be based on more structured laws of physics that are understood and laws of biology that are understood?

I think it probably depends a lot on the specific use case you're kind of envisioning for for the world simulator.

Like, for example, if you just really want to build like an accurate model of how like a basketball game is played, I actually think like only video data and like maybe audio as well, like kind of sufficient to build that system.

Not of me playing basketball.

That would be an inaccurate, very bad player of basketball.

You know, you actually like Sora's current understanding of how people play basketball Constantine may be at your level.

Wow.

Okay, that makes you know, it's possible.

It's possible.

He does this to you.

It's accurate.

But it's better than mine Constantine.

For once.

That was like a sort of one situation.

Well, you're at sort of two.

We'll toss some hoops.

Is that what they'll say?

Like, you know, I'm down.

Okay.

I'm down.

Yeah.

Thanks.

Thomas is first, first statement in the five.

You know, I think it is an interesting question.

Like, what are all of the modalities that should be present in like this kind of general purpose system?

Like, certainly, you know, if you add more modalities, I have a hard time believing it will decrease the intelligence.

I also think there's an argument to be made that just, you know, adding more and more does not provide like significant marginal value compared to like, you know, full mastery of like video and audio, for example.

I think it's an interesting open question.

I'm not actually sure right now.

And it's something we need to understand more.

Yeah, so cool.

Sonia a minute ago mentioned Einstein at a whiteboard.

And obviously, that makes me think of you Thomas and your hair.

Me too.

It had to come.

Like if if any hair gives the feeling of space time tokens, definitely yours.

At some point, you know, Bill, you're the creator of this revolutionary technology that has changed the way that AI video is created.

At some point, you from Sora one to Sora two said, Hey, all together, he said, there needs to be an application around this, there's some benefit to an application, you brought together some of the best product people in the world.

How did that crew come together at OpenAI?

Yeah, it's a, I mean, the story is never as linear as you might think it is.

So I think that I mean, we've had a product team on Sora, since the get go, Rohan was like, spearheading that effort in the Sora one days.

But I think Bill's right when he says it was really like a GPT one kind of moment, we're seeing pockets of very interesting things there.

But the models were not like, you know, models without sound videos without sound, it's like a very different kind of environment.

And so we're working on that surface, mostly target on on kind of like a prosumer demographic.

And separately, I mean, wrong, probably go into more details of all that.

Separately, we're also just kind of exploring different social applications of AI inside of OpenAI and like, what that could look like, we had a lot of prototypes, most of which were quite bad.

And when we started to see some of the magic was actually with image gen, before it had been released, we were playing with it internally in a social context.

And the social context was really interesting to see that what people were doing is you'd sort of like, take an image, and then you'd have like a chain of remixes of that image, where I don't know, there was a, it's a duck, and then now the ducks on somebody's head, and now everything's upside down, and they're smoking a cigarette, like, just a lot of weird things.

Yeah, it's like, and we were seeing this, we're like, oh, this is kind of like a very interesting thing that like, you know, nobody can really do that with like, social media, because it's so hard to create something or riff on something, it like is such a high barrier to entry action.

Maybe you have to get a go get a camera set up and not just like thinking of the idea, there's actually a lot of things involved.

And so we were like, okay, this is a very magical behavior, how can we kind of productize that behavior?

And we're mostly thinking about away from Sora, some of the Sora research was still ongoing.

And I mean, there are signs of life, but it wasn't like quite there yet in productized form, they'll probably had it in his head somewhere, so I can see the future, but that's fine.

I'm a little bit more cats can't quite see the future yet.

So we were just exploring that, I think we tried a few things.

And then at some point, the research was really just showing very clear value of even iterative deployment style value of like, oh, this is something that people will really want.

And so we went into this project, like, two or three months ago, it was very long, it was like July 4.

That's when you disappeared, Thomas.

That's when I disappeared.

Yeah.

And we just kind of locked in like, okay, we're finally doing it, you know, that's always a moment.

And we started with without any magical features, just like, okay, let's just try to get native video environment where you can hear the audio full screen.

And we did some quick generations, things were showing very, they're very cool, very fun, very interesting.

And because of that image and experience, we sort of had thought like, okay, what's the magical here, magical thing here is that like, barrier to entry is very, very low for creation.

Coming from Instagram, that's like, it's impossible to get people to create on Instagram.

And that's the most valuable thing that people do.

So what does that unlock?

And it's like, okay, well, that remix thing from image gen that kind of could still apply here.

And so he brainstormed all these things about how could remixes work?

And like, what does a remix mean here?

One of those was this like cameo thing, which I think also Bill added his head, but this isn't as in the ether, as in the ether for sure.

But we just were like hacking together things on the product, it was just see if this works.

I, I didn't think it would work at all.

But it was on the list.

And there were a few other things on the list.

Some of them were pretty crazy.

It was like, why didn't you think it would work?

I am bad at predicting technology.

Like, it wasn't super clear to me that you could like, you know, take a likeness of a person and have that kind of imagined into a video form.

And whether it would work or not.

And so we had early prototypes of different things of like people reacting in the video corner or stuff like that.

But when we saw cameos just start to work, and even playing internally, like, Ron, you remember that day where we're like, Yeah, feed is entirely candy.

Yeah, it's entirely.

It just went from, you know, we didn't have that feature.

Once we had that feature product market fit on the team, all everything we were generating was all of each other.

You must have seen the meme potential.

I mean, yeah, that's, I think at first, we were just like, this is hilarious.

This is amazing.

And then like a week later, we were like, this is still all we do.

Is there something here?

Yeah.

I mean, at first we were actually a little bit like, is this good?

Like, yeah, hey, the cameos, it's just all cameos.

Now, does anyone else care about those?

People care about other people doing stuff.

And we kind of got to the point where we're like, no, no, this is actually good.

Like, it's actually, it feels like I'm coming back to see.

And it really humanized it a lot where like a lot of AI video is just kind of static scenes that are quite beautiful, quite interesting might have extremely complicated things going on.

But they lose that human touch.

And it really felt like it was coming back into it.

So another learning from image into like image and took off and had viral moments because I think you could put yourselves in these scenes in accessible ways that weren't possible before.

Obviously, this massive like, put me in a Ghibli scene.

People taking selfies with their idols and stuff like that.

And so the once you once you're actually kind of thought about it, it's like, yeah, cameo feature makes a lot of sense.

You put yourself in all these scenes.

That's way more exciting you and your friends.

It's novel.

It's like, that's something you could do before.

Yeah.

And then that combined with remixes, I mean, came as kind of remix to begin with.

But then you start to think about, OK, well, now I can riff on Rohan doing something or whatever it is.

Like with Bill, Eddie wrapped in an action figure package.

And it was that's been remixed like an insane number of thousands of times.

Yeah.

So like just very, very crazy things that kind of go on and very emergent.

A lot of stuff that I would have never thought of.

Actually, how many generations of you guys have been like publicly posted at this point?

I have no idea.

I know I'm 11,000 or so.

I was like a little less than that.

Wow.

Yeah.

What does surprise you about the types of users that are really sticking with Sora?

Who is it really a hit with?

If you just go to the latest feed, which is just like the fire hose of everything.

Yeah, it's space time Thomas mode.

It's wild out there, but that gives you a pretty good snapshot into like just everything happening.

I mean, I think we have like almost seven million generations happening a day.

So you can imagine there's just a ton of information there.

It's one of my favorite ways to just get product feedback.

It is so diverse.

The type of stuff people are doing, the type of people, they'll be like a complete variety of age.

Some people just doing envisioning themselves in scenes that seem like motivation oriented.

People just meeting with their friends, people cameoing some of like the public figures on the platform that have done cameos.

So I think the diversity has has surprised me.

I was kind of expecting this sort of like, you know, the Twitter AI crowd to like heavily dominate the feed.

They definitely dominate like the press cycles, at least the ones that, you know, we're most exposed to.

But in terms of people actually using this, it's quite a wide variety.

And last thing I'll say is a bigger departure from like the sort of niche AI film crowd that exists before, which is great early adopters, but now you kind of get these, I thought it would start there.

But it felt like it started with just a way wider range of people.

I think getting to the top of the app store helps with that and just get people who are like browsing and see this thing.

My mother keeps cameoing Thomas.

Thomas, you wrote the original algorithm, if I'm right, for the Instagram ranking, ranking algo.

There was a lot in the Sora 2 blog post about how you guys are clearly being very intentional about how you want to do ranking in the algo.

Can you talk a little bit about lessons learned from Instagram and how you're approaching it over at Sora?

Yeah, I mean, there's a lot to cover in that.

I think that the first thing to think about when we think about these platforms or think about Sora specifically is the thing I was mentioning before about creation.

So Sora enables basically everybody to be a creator on this platform.

And that is a very, very different environment than something like Instagram, where you have this extreme power law of the people that are creating.

And the power law just naturally gets more narrow.

What's the right word there?

But more head heavy.

Yes.

So sometimes I feel like I have to defend myself on the Instagram algorithm side.

We actually did it for, I mean, we did it for a reason.

It was to solve a problem.

It wasn't just kind of like a random decision to optimize for ads or something like that.

And the reason we did that was that we noticed that like, what was happening on Instagram over time was because it was chronologically ordered, every single person that posted was guaranteed to have the top slot of all their followers.

And so if you think about that for a second, the incentive for somebody in that environment is actually to create constantly because they are guaranteed distribution when they create.

And over time, because of this like power law becoming heavier and heavier, or like more head heavy, those type of people, which are great, they provide a lot of value to the ecosystem, but they start to crowd out people you really care about.

And so maybe you follow National Geographic or something, not the Dunkin National Graphic, I love them.

But, you know, if they're posting 20 times a day, your friend's not, they don't have the same, like optimization objective, they're probably just a picture of their coffee or something.

And so you'd have 20 Nat Geo posts, and then one picture that you actually really cared about, you never really scrolled to.

And there's not too many solutions to that problem if you have a guaranteed ordering.

One of them is that you have to unfollow all these accounts that you maybe care about, but care about not as much as the person that posts once a day.

And the other is that you have to permute the feet.

And so we went with that path, we tried it, we tested it out internally, it was very kind of controversial to do.

But I think that you can actually kind of like math this out.

It's like a proof that basically, over time, you're going to have to take control over distribution on the platform, in order to prevent these kind of issues and show people what they actually care about.

So that's why we did it.

And it actually showed a lot of value.

I remember the early tests, I won't get into the numbers on them, but they were pretty unambiguous actually, about this was showing more people that you cared about, it was improving your experience with the platform, actually move creation, which is unusual, it made people create more because they were seeing more content that was like accessible to them.

But I also think that these things can go astray over time.

And I won't say like the Instagram algorithm is unequivocally bad or unequivocally good.

But when we started to open up to more unconnected content, and add pressure was very strong, there's also a natural company incentive to optimize for just blind consumption, because that's how you make money.

So maybe cheaper content, or maybe just like get people to scroll more and more and more and more.

And that also can encourage people to create less, because it's just like a more mindless scrolling mode.

You guys have very concretely committed to doing things to prevent that kind of behavior.

We have a lot of mitigations there in place.

But I think what it really comes down to me is just like, what are we trying to do as a platform.

And I think the magic of this technology is that everybody is a creator.

And so we want this feed to be optimized for you to create, to inspire you to create.

And that can be like, sometimes when you think of inspiration, you think of like, oh, it's this beautiful, crazy scene that's so elegant.

When I think about that, I think about like a meme culture or something really funny, or like, oh, that's cool.

I've got a riff on that.

And I think that's a very different brain mode, when you're browsing the feed.

And of course, we have lots of other things in place.

So like, I think it starts with an incentive.

That's our incentive right here is to encourage more creation in the ecosystem.

But there are certainly use cases we want to prevent, we're not going to get them right all the time.

It's very challenging.

It's a very living system.

It's also very hard to write a recommender system when you have no data, and you don't know what to recommend, you don't know how the platform is going to evolve.

But that's like basically how I kind of think about the incentives of feed.

And then Rohan, we have a lot of mitigations in place that I think you've been kind of like thinking about and maybe even more deeply than I have about like, preventing maybe the extreme cases.

And so I don't know if you want to talk a little bit.

Yeah, happy to.

But one thing before you I mean, just one thing to add is that the stated intent of like, optimizing for creation is working really well.

Yep.

It's almost 100% of people who like get past the invite code and all that on the app end up creating on day one.

When they come back, it's like 70% of the time they come back, they're creating.

And 30% of people are actually even posting to the feed.

So not just like generating for themselves are actually like posting into the ecosystem, which is an incredible testament to the model, how fun it is, and to like how what we're optimizing for is actually working pretty, pretty well right now.

But yeah, beyond that, I mean, like one of the top of my things is I think we don't want this just to be like a mindless scroll and beyond just optimizing for creation and the ranking algorithm, there are things we can do, like trying to just get you out of this sort of flow state of just like consumption and push you into like creative mode.

I think there's a great article on this called like the curvilinear nature of casinos where they design it so you never have to make any decisions.

It's just like you walk in a circle, there's no windows, all that kind of stuff.

We can be very intentional about not doing that.

And like, you know, whether it's an infeed unit, that's like, hey, you just kind of viewed a couple of videos in this domain, why don't you try creating something or other ways to just kind of like push you out of that we actually have things like that in the product.

Yeah, there's just some of the things that come in line.

Yeah, really commend you guys for what you've done to, you know, make sure that there's a version of the world where video model as world simulator could have just ended up with us, you know, each retreating into our own computer screens and just becoming addicted and just retreating into ourselves.

And I think the amounts to which you're, you know, prioritizing the human elements and the social elements, I think that the care you've put into that really shows.

I don't think we would have launched like a feed of just like AI content that wasn't that didn't have a human feel like just, I don't think that excited us.

And as soon as we we like have the product, we had cameo and we had that feeling internally.

We're like, okay, this is actually a little different than.

Yeah, I think it was totally obvious.

Again, it was like a pretty crazy sprint to go through this.

And it wasn't like super obvious to us what would emerge, but I think that the idea, it makes sense in retrospect, but it was a completely not obvious product decision that cameos would be the thing where it's like, of course, you just want to see your friends doing cool things.

So that's like, that makes sense.

But I was never actually that afraid of competitive pressure in that, that crazy product phase.

Cause I was like, we sort of had these, all these non-trivial decisions that are obvious in retrospect, but we're not obvious at the time that we're sort of building on top of each other.

It's like, okay, cameos.

Well, there's also a version of cameo where you have a crazy flow that's just for you.

And it's a one player mode cameo and you like go through this onboarding flow and do your stuff.

But we were already seeing these interesting dynamics where it's like, well, I could take my video.

That's crazy.

Like, and then we can have like an argument or like I'm going to have an anime fight.

Doesn't matter.

And I was like, okay, so that's, that's actually the human element.

That's the, that's the magic of this.

It's actually strangely more social than a lot of social networks, even though it's all AI generated content.

So very unintuitive.

Is it a separate, is it fine tunes version of Sora too, or is it like, is a separate model from what's available over the API or is it the same?

Between the app and product.

So we're currently exposing like the models in the same state across API and the app.

Okay.

Really interesting.

What are you saying people do on the API side?

And is it different from the types of things people are doing on the, on the consumer app?

The motivation behind even launching an API is just like support of these long tail use cases.

Like we have this vision of enabling, you know, chat GPT scale level consumer audience with this tech, but there's tons of very niche things out there.

You can imagine people who are much, you know, with Sora one, we went out and talked in a lot of these studios.

What we heard from them is like, they want to integrate this in this specific part of their stack in this specific way.

And we'd love to support all these long tail use cases, but we don't want to build a thousand different kind of interfaces for this stuff.

So that's the kind of stuff we're excited to see with the API.

So far, it's been, you know, it's been kind of those kind of like a little bit more of a niche company, not trying to build like a first party social app, but maybe, you know, has some either filmmaking kind of audience or kind of people they're supporting, or even just like, we've definitely we've seen some like people trying to, you know, I think there was like a some company making.

They were doing something with CAD where they were like using Sora.

Yeah, yeah, yeah.

So there's, there's cool use case out there.

I think we're still getting a sense of what they are.

Yeah, I think there's a lot that can be done with these things.

I think about gaming all the time, just good background.

You know, AI and gaming is always a very controversial subject, but it's very clear that there's a there's a place and there's a role.

Maybe doesn't have to interrupt the creative process can enhance it.

And I'm pretty excited to see some of those use cases emerge.

Do you think the video models are good enough now to for people to be able to build video games on top of the API?

Or do you think we're still not a rev or two away?

I have my own take on this.

I was gonna say never bet against the ways people can be creative with technology to build like someone will be able to build a game and maybe has built a game already.

Will it feel look and feel like, you know, obviously there's latency with this model.

So you'd have to do all sorts of crazy stuff to get around that.

But like, I think that your mind immediately goes to like kind of the obvious sort of things that you would do in gaming.

And we've seen some of that sort of stuff, certainly in research blogs and that kind of thing.

My mind often goes to like, okay, this is like a creative tool that's a little bit different.

And the types of games that really excite me there, I'll just go off on one, which is this like there's a game called infinite craft, which is the world's simplest game.

It's a web game where you just take elements, it's like fire, water, earth, you have like four elements to start, and you just drag them and it cools off this game, it combines into something new.

And the thing it combines with is like, it's LM based.

So it's like, fire and earth might be a volcano.

And then volcano plus water might be an underwater volcano or Godzilla or something like that.

You always end up in Godzilla for some reason.

But that's a game that like, it's like, oh, it kind of makes sense where it's like, yeah, you don't really need a crafting tree.

The LM can derive this crafting tree and it's a process of discovery.

And so I think there's a lot of untapped stuff in that space where, again, I like the idea of a process discovery.

In fact, my philosophical philosophical view on LLMs and video models to some extent is that it is a process of discovery.

These are all in the weights.

You're just unlocking it with like a secret code, which is your prompt.

And I love that.

That is very magical.

That was always in gaming.

That was the thing that like excited me the most was discovering something new, especially if it was a true discovery, it wasn't put there by somebody else.

Maybe they just enabled the mechanics around it.

So I think there's a huge opportunity in that space of gaming.

When you think about games and just a different thing and like embrace this technology in a very different way.

It reminds me of how some of the earliest use cases for GPT-3 were the kind of these text games.

So it's different from how you think of a playable video game, but actually a lot of these mechanics are very game-like.

Exactly.

Yeah.

I think there's still constraints and I think that's going to be like the mechanism design.

That's still very human.

Like a lot of the early games with GPT-3, they're kind of like, yeah, it was fun for a minute and then it kind of went off the rails.

You're like, I don't really know what I'm doing anymore.

But again, this is sort of in some ways, sort of feels like a little bit of that where it's got a little bit of gaming DNA inside of it, where it feels very fun and different and exploratory.

So I like things like that.

And I think there's going to be more use cases that we can't even think of.

It's too creative.

What do you guys think on the creative filmmaking side?

Is that an important target market?

Do you want to empower the long tail or do you want to empower the head, so to speak, of the creative market?

It's a really good question.

We've benefited a lot from creatives who are really willing to go all in on even the early technology like Dolly 1, Dolly 2, and really help steer us along the path.

And I think it's important that we continue to build things for those folks.

And we are working on some things that are more targeted towards creative power users long term.

At the same time, I do think AI is a very democratizing tool at its best.

And so what's kind of beautiful about the SOAR platform in general is whenever someone strikes gold, you see one of these beautiful anime prompts that goes to the very top of the feed for everyone.

Anybody can go and remix that.

Everyone has the power to build on top of that and learn from all of these people who come in with this incredible knowledge about how to really get the most out of these tools.

And so I am really excited just to see the net creativity of humanity just increase as a result of this.

But I think a big part of that is continuing to empower people who are always at the frontier, which are these more pro-oriented creator type folks.

And so we want to keep investing in them as well.

We've nerded out for a while, almost a couple of years now, about that vision of feature film length content.

Yes, you have these amazing cameos and shorter content.

But at some point, the individual creator, it's been something that you've been excited about for a very long time.

When do we get there?

Is there a point where we have a feature film that is created on Sora 2?

And how do we consume it?

Is it in the Sora app?

Is it posted somewhere else online?

Do you go to a movie theater and watch it?

Yeah, it's a great question.

I mean, I think this will happen in stages to some extent.

So like if you guys watch the launch video, I mean, that was maybe like Daniel Fraden, who's on the Sora team.

And he already with these tools, right, is able to pump out these like incredibly compelling short stories within like days at most.

I mean, he literally made that like all by himself in almost no time.

And he's been like continuing to like put new ones out there on like the open AI Twitter sense.

Clearly, this is like massively compressing the latency that's associated with like filmmaking.

I think to get to the point where like really anybody can do this, right, like any kid in their home can just like fire up the app or sort of calm or something and go and make this.

It's really like an economics problem of like the video models video is the most intensive compute intensive modality to work with extremely expensive.

And you know, we're making good progress on the research team, like really continuing to figure out ways to make this affordable for everyone long term, like right now, for example, the store app is like totally free.

In the future, there will probably be ways where people can pay money to get more access to the models, just because that's the only way we can really scale this further.

But you know, I think we are not far off from this world where anybody can really like have the tools to make amazing content.

You know, I think there's gonna be like a lot of bad movies that get created by this.

But like likewise, you know, there's probably the next great film director who is just kind of like sitting, you know, in their parents house, like still in high school or something and just like has not had the investment or the tools to be able to like really see their vision come to life.

And we're going to find like, absolutely like amazing things from like giving this technology to like the whole world.

I'm looking forward to the feature film length.

Constantine's Greek Odyssey.

Coming to theaters near you.

We're all in it together.

Different characters.

I play the cyclops.

It's a good one.

I think just to touch on that one more thing that something I've learned from recommender systems over and over again, is it like, oftentimes, so the tools getting people more creative is going to be a huge unlock for just, you know, making people more more creative in general, and because you don't need this access to this, like filmmaking equipment, all that sort of stuff.

But we do consistently see that things content is like also a social phenomenon, in a way, and like, movies and all that all all everything you see out there is kind of a bit of a social phenomenon, in addition to the actual content itself.

And so I think we're going to enter a very interesting world where, you know, there's so many people creating and so much content out there, that even the idea that people are paying attention to it watching it is going to become more and more important.

And I think that's actually going to make the quality of content just to kind of elevate because there's just anybody can create and actually it's going to be the consumption that's going to be quite limited, which is very different than the world we live in today.

You guys are very thoughtful and intentional about how you treated IP holders.

Can you say more than that?

You know, we've been in close partnership with like a bunch of folks across the industry and like really trying to like both show them kind of this like new technology, right, that is actually like a huge value proposition for rights holders across the board, right.

And like we're hearing so much excitement from the folks we're talking with, like they really see this as being like, you know, a new frontier for again, like, you know, every kid in the world having the ability to like go and like use like some of this beloved IP and like really like bring it into their lives in a way that feels much more personal and custom than what's been possible before.

At the same time, you know, we really want to make sure that we're doing this like in the right way.

So we've been like really trying to take feedback and like really steer our roadmap in a way where we know that, you know, both users are going to have an awesome experience getting to use this IP, but also the rights holders are going to get, you know, properly monetized and rewarded in a way that, you know, everyone wins basically.

So we're right now actively working on trying to scope out the exact details about like how we're going to, you know, for example, make it so if you want to cameo your favorite character from some like beloved film or something, you can do that in a way where you have access to it, but like monetization will flow back to the rights holder, right.

So really trying to figure out this kind of like new economy for creators.

We kind of just have to create this from scratch right now.

There's a lot of deep questions about how to do this the right way.

And, you know, as with everything with this app, we come into it with like an open mind and we hear feedback and we iterate quickly.

You know, we're not sure where this is going to totally converge, but we're working closely with people to figure it out.

Really cool.

What's ahead?

Pets.

Yeah, I think I mean, one, sorry, what?

Is that one of the most demanded features?

Yeah, those demanding.

I will remind us.

We were just talking about curing diseases and world models and now we're to the future.

This is something.

No, it's actually so that's definitely true.

We've committed to that is coming.

But we have, I promise the, we actually had Bill's dog as like when we were playing around with this rocket, the goodest boy.

Yeah.

And actually was very, very cool to actually feature a pet.

You can imagine where that goes.

It doesn't have to necessarily be a pet.

It could be anything, a clock or whatever you have.

Well, yeah.

Do you have a special clock?

Actually, it's really compelling.

Yeah, I didn't think it could be so compelling until Thomas showed me this clock.

It's like a sentient clock.

It's like based on like a real clock.

Yeah, I had a clock.

My father, my father was a technology person for a while.

It's company Veritas gave him a clock for his like whatever anniversary.

Anyway, so I have it on my table somewhere and there's this old Simpsons episode where they talk about a walking clock.

And for some reason that's just been an earworm in my head for the last 30 years.

And so I always, it's like, you know, they're telling some joke and it's like, is it a walking clock?

It's a walking clock.

It's like walking clock.

And then it's like, no man, it's my dog.

And so it connected in my brain where I was like, okay, rocket walking clock.

And then so I tried it.

Thomas is the floor.

Yeah.

So connected my brain and we've been playing around with this just to see if we can get it to work and whether there's something special there, which is part of the fun of being on the sword team is you get to play with his emergent crazy technology.

And like, maybe it does something you wouldn't even have expected.

So I recorded a two second video of my clock and then I gave it some cameo instructions and I said, you're just a walking clock.

You walk clock, you talk like you talk your character.

And then I generated my first video and it was insane.

It was crazy.

It was a walk clock.

And then I had one where it was talking to Bill and Bill was like, I didn't think it would ever land the pet cameo feature.

And then walking clocks like here I am, you know, I just landed.

So it's coming.

It's all internal memes.

Talk about the immersion IP.

Yeah.

Who needs a spooky mom when you can have a walking clock?

What's the greatest thing?

One thing to add in terms of the future, I think on the feature film question, sometimes something I think about all the time is like what, you know, what will that actually look like?

I think my, I mean, caveat, Bill's the only one who is who's good at predicting the future here.

But my sense is that the, you know, as we get to longer forms, what our equivalent of a feature film will look and feel very, very different from what a feature film is today.

You know, I don't know exactly what that looks like, but I think on the subject of creators and what's coming in the world, I think a new medium and a new class of creators, new class could include a lot of existing creators and, and support existing sort of mediums and stuff like that.

But I think we're just in the early innings of, of what I imagine will be the next film industry rather than thinking about this being a feature film.

But I think there'll be something new.

There's some anecdote.

I hope this is true because I say it all the time, but apparently when the recording camera like, you know, hit the world, the first thing people did was record plays.

This is like the least interesting thing you could do with a recording camera.

It's like, what's the big idea?

Oh, we, people don't have to travel around acting.

We can just film them and distribute it.

And then someone was like, wait a minute, we can make a film and film in all these different areas.

And I feel like we haven't, we're in like the first inning of so many different sort of things that people will do with this technology, especially as the constraints change with latency and length and all that kind of stuff.

So cool.

And fun film history nerd fact is one of the original videos, and we should check this as well, but I think the original video was made just down the peninsula to settle a bet on if a horse, when it galloped all four legs, it left the ground.

And I could see a world where you have new, that is an example of new scientific discovery.

People didn't actually have an answer to that.

Now that you have a new simulation format, what are we going to be able to discover in that?

It will be crazy.

And I think one, one broader point here is, you know, this app right now feels very familiar in a lot of ways, right?

It's like a social media network at its core.

But fundamentally, like the way that like we really view it internally, right, is with cameo, we've kind of introduced the lowest bandwidth way to give information to Sora about yourself, right, aspects about your appearance, about your voice, etc.

You can imagine over time that like that bandwidth will greatly increase, right?

So the model deeply understands your relationships with other people, it understands, you know, more than just how you look on any given day.

It's, you know, seeing your full like how you've grown up, all of these details about yourself, and will really be able to almost function as like a digital clone, right?

It's like there's really a world where the Sora app almost becomes this like mini alternate reality that's running on your phone, you have versions of yourself that can go off and interact with other people's digital clones, you can do knowledge work, it's not just for entertainment, right?

And it really involves more into a platform, which is really aligned with kind of where these like world simulation capabilities are headed long term.

I think when that happens, the kind of immersion things we will see are crazy.

And you know, for OpenAI across the board, it's really important that we kind of like iteratively deploy technology in a way where we're not just like dropping bombshells on the world, when there's like some big research breakthrough, we want to co evolve society with the technology.

And so that's why we really thought it was important to like do this now, and like do in a way where you know, we've hit this, again, this kind of like GPT 3.5 moment for video, let's make sure the world is kind of aware of what's possible now.

And also, you know, start to get society comfortable and like figuring out the rules of the road for this kind of like longer term vision for where again, there are just copies of yourself running around in Sora and the ether, like just doing tasks and like reporting back in the physical world, because that's where we are headed long term.

So cool.

So you're building the multiverse?

Actually, kind of.

Yeah.

Okay.

Well, can can see me go and find my soulmate somewhere in there?

I mean, anything is possible.

Call for action every one.

It is kind of crazy, though, because now I'm gonna sound totally totally cuckoo.

But if we're in a computed, you know, environment, you're building the perfect simulator.

That kind of is the way you ultimately understand and break out of the computer environment.

Right.

Like, are we getting closer to the heart of the matrix?

Some very deep existential questions.

Yeah.

What's your guys pee of we're simulated?

Like this is rising.

Yeah, me too.

Oh, I'm low.

Yeah.

Oh, man.

But yeah, it's okay.

Really?

Okay.

I'm just like, you know what?

It's gotta be real.

Yeah.

I feel like I'm not like, solid 60%.

I don't know.

Like more likely than not at this point.

I'm there to.

Well, yeah.

Zero.

So we might make a Calcie on it.

Yeah.

Trivally.

Yeah.

What's the Oracle?

It's worth 10.

Yeah.

What do you think are the theoretical limits to Sora?

Yeah, it's actually a great question.

I thought a little bit about this.

Like, I think there's like a question.

Can you eventually simulate like a GPU cluster, right?

And Sora or something.

And I assume there are some very well defined limits on like the amount of computation you can run within one of these systems.

Like given the amount of compute you're actually running it on.

I've not like thought deeply enough about this, but I think there are some like, there's some like existential questions there that need to get resolved.

Yeah.

See this why his piece of him is so high.

Fascinating.

Well, you got a few lightning round questions for the team that we just kind of generated on the fly here.

And take your time.

Jump in whenever you have an answer.

Your favorite cameo on Sora to date and what happened?

That is so tough.

I have a hot one.

Shocker.

Yeah.

Okay.

So there was this TikTok trend of, and I got obsessed with them.

I don't know why, but these Chinese factory tours where they're like, hello, I'm the chili.

This is the chili factory.

They get like one like, and it's me.

And it's like, they're showing their chili factory and they're like, it's the chili factory.

Like this is amazing.

Or like there's an industrial chemical one.

I've lost the name, but there's an industrial chemical factory.

And the first day I had my cameo options open just because I was like, I just want to see what happens.

And the first day late at night, I opened my cameos and I was starting to get tagged in factory tour cameos that were all in Chinese.

And I was like, I'm in the chili factory.

And I was so excited.

I get zero likes.

I liked it.

I was just me, but I was like, I'm the chili factory guy.

Now I'm like doing the ribbon cutting at the chili factory.

Amazing.

That's too deep of a cut.

That's so congratulations.

Fun fact.

I actually have done Chinese factory tours in real life and they are truly epic.

Yeah.

There's this one just, I saw Mark Cuban and George dancing around, but that was pretty good.

That got me.

But I mean, my more back to the, like just scrolling the latest feed and just seeing like the wholesome content of people like doing things with their friends, actually, I think what brings me the most joy of they're not like super liked, but it's like people just like getting a lot of, you know, value obviously from just like making videos with their friends.

So Sam has so many bangers.

I like the one of him doing like this K-pop dance routine about like GPUs or something.

It's very good.

Actually, I would put it on my Spotify.

It's like we had the full song.

Wow.

It was very good.

It was like generated by Sora.

It's like, like very compelling.

Yeah.

All right.

Well that leads to the next one.

Cause you mentioned Spotify.

What does an AI fully generated AI win first Oscar Grammy Emmy?

I think the like logical answer is like a short winning an Oscar.

Yeah.

I think that's probably right.

What would we win it for?

Like for like a George George trilogy.

Yeah.

We need new content.

Yeah.

I do think if people stitch things together and interesting way, yeah, I think there's a, you can actually start to make some very compelling storytelling in that.

And I don't think it's like, it doesn't really feel like AI anymore.

Uh, the, the content I'm seeing like that, that was actually something I noticed with Sora as well.

Just like it wasn't even noticing it was AI.

Um, it was just kind of interesting content.

That's more interesting question.

What will we know?

Oh yeah.

Maybe it's already happened.

I feel like for Oscars, one of the cool things that'll be unlocked is this long tail of Epic stories in history, stories of heroism and struggle and all of these things that have been locked up because of the cost of creating.

You know, as a, as a history enthusiast, I cannot wait for AI to unlock all of those stories.

Have you seen the Bible video app?

No, it's really good.

I'll show it to you after like perfect example.

Yeah.

Or there's this movie, the last duel a few years ago about this, this really terrible crime that was committed in immediate medieval France that was historically relevant.

And you know, basically says a lot about humanity and it just got picked up because eventually Hollywood picked up this important story about humanity, but how many more are there in human history?

That's going to be really cool.

Um, favorite character from any film or TV show.

I have a really random one.

Go for it.

Uh, you guys see Madagascar, King Julian.

Oh, played by Sasha, Sasha Baron Cohen is a lemur.

He's a lemur cat.

Absolutely.

It's just, it's a banger.

It's his humor meets kid friendly storytelling.

It's just perfect.

I play a lot of video games.

So I mean, your classic answer is going to be like Mario or something like that.

Although I'll do the deeper cut of we were always joking about the rapper.

Yeah.

For rapper the rapper, the old old PlayStation game, one of the original rhythm games.

And it's got a great artistic style and it's got great IP of just this little, what is he a dog?

Yeah.

That's a good pick.

When I was a kid, I played the like Pokemon trading card game competitively for awhile.

Um, so I was like really in like the Pokemon rabbit hole.

So like, I don't know.

It's not like Pikachu.

Super non consensus, like a fridge deep cut.

Um, okay.

First world model scientific discovery, most specific possible.

Obviously you're not going to say the discovery.

I suspect it will be something related to like classical physics, like a better theory of turbulence or something.

That would be my guess.

I was guessing there's going to be something like that.

I was like, Navi Stokes, I don't know.

Yeah.

Some fluid dynamics thing that's maybe hard to understand.

Now there's a lot of like unsolved kind of problems there.

I think sometimes they call it like continuum mechanics where it's like in between.

Um, and we don't have good models of them.

Something that lends itself to simulation, just like the amount of iterations you can do of a simulation, unlocking something, which I don't.

Yeah.

Something in that realm.

The last thing we'll be able to accurately simulate.

I do think there's like a set of physical phenomenon for which video data is like a poor choice of representation, right?

So like, for example, is it really efficient to learn about, you know, like high speed particle collisions or something from like video footage?

Maybe.

Um, I really think video is at its best when, you know, the phenomenon that you're trying to learn about is just natively represented, uh, in the physical world.

And so when you, when you need to do like, you know, like quantum mechanics or some other discipline where, you know, it's more theoretical, we don't have like video footage beyond and see it.

Yeah.

Things that we've like manually rendered for like educational purposes.

It feels like a weaker medium for understanding those things.

So I suspect those would come last.

I guess it's the things we don't have sensors for.

Right.

Yeah.

Maybe the last things we care to simulate is another way of thinking about the answer.

I don't know.

I mean, people aren't doing much with smell right now.

You know, that's green fields to tell you about that.

Kind of awkward.

We're still trying to figure out how to simulate Thomas with bad hair.

It remains an unsolved problem.

Not even Sora can do it.

Thomas's hair flow, just general.

There was a good, there's a good round of people being bald.

We were all good.

Oh, yeah.

All gems were good.

Actually kind of cool.

That's a, that's our use case.

That doesn't really talk about very much, but it's like visualization when you're bald.

Yeah.

Everybody wants to be bald.

No, it's just like, you just see yourself in some different contexts.

I think that can be quite powerful.

Even like therapeutic in some ways where you just like see yourself in some contexts that you either want or don't want yourself to kind of be in and just see, see yourself.

It's a real use case.

Yeah.

Guys, thank you so much for coming from space-time tokens to object permanence, world models that will enable scientific discovery, the democratization of creation, all the way to walking clocks.

You guys have covered it all.

Thank you so much.

And the future is being created by you.

Thanks, Constantine.

Thanks, Sonya.

Thank you.

Bye.