NVIDIA AI Podcast · 2025-10-21

NVIDIA's Nemotron: Open Models as Full-Stack Co-Design Anchors

Hosts: Noah Kravitz

Guests: Bryan Catanzaro, Jonathan Cohen

Nemotronopen-source AIopen weightsfoundation modelsfull-stack co-designaccelerated computingdatasets and synthetic datareasoning modelsstate-space modelslow-precision training4-bit arithmeticsovereign AIenterprise AI deployment

Read summary Jump to transcript Original podcast

Podcast feed URL

Open feed

Why it matters

NVIDIA's Nemotron anchors full-stack co-design with ~4x training efficiency

Key claims

Nemotron is NVIDIA's open AI stack of models, datasets, algorithms, and methodologies—not just weights—organized into Nano, Super, and Ultra size classes plus multimodal variants.
NVIDIA trains Nemotron primarily to drive full-stack co-design across chips, networking, software, and model architecture, feeding lessons back into its accelerated-computing platform.
Smarter pre-training datasets have produced a ~4x convergence speedup; reasoning efficiency (e.g., 2,000 vs. 10,000 tokens for equivalent answer quality) delivers ~5x throughput gains.
Openness is framed as both a trust mechanism for enterprises/sovereign deployments and the necessary delivery channel for efficiency research that hardware alone cannot transmit.

Episode summary

Summary

Bryan Catanzaro and Jonathan Cohen, both VPs of Applied Research at NVIDIA, sit down with host Noah Kravitz to explain how Nemotron functions as far more than an open model release—it is the cornerstone of NVIDIA's full-stack co-design strategy. By training Nemotron themselves, NVIDIA learns how to optimize its chips, networking, and software, then feeds those insights back into the platform. The guests emphasize that Nemotron includes not just models (text and multimodal LLMs in Nano, Super, and Ultra sizes) but also datasets, algorithms, and methodologies released openly to the community.

A recurring theme is that intelligence and efficiency are inseparable. Catanzaro highlights that smarter pre-training datasets have yielded a 4x convergence speedup, and that reasoning efficiency—generating high-quality answers in 2,000 tokens instead of 10,000—delivers comparable 5x gains. Cohen frames accelerated computing as fundamentally about end-to-end optimization rather than raw FLOPs, with datasets now treated as first-class components of the platform.

On openness, the guests argue that open weights, datasets, and recipes are the only viable delivery mechanism for research-driven efficiency gains, and that transparency is necessary for enterprise trust and customization (including sovereign AI use cases). Technical breakthroughs discussed include Nemotron Nano V2, a hybrid state-space model that runs 6–20x faster than comparably intelligent peers, and a demonstration that 4-bit floating-point training can produce world-class models with dramatic energy savings. Looking ahead, NVIDIA plans larger Nemotron models, deeper multimodal integration (including audio/speech), and continued reasoning advances—available via Hugging Face and build.nvidia.com.

Nemotron is NVIDIA's open AI stack of models, datasets, algorithms, and methodologies—not just weights—organized into Nano, Super, and Ultra size classes plus multimodal variants.
NVIDIA trains Nemotron primarily to drive full-stack co-design across chips, networking, software, and model architecture, feeding lessons back into its accelerated-computing platform.
Smarter pre-training datasets have produced a ~4x convergence speedup; reasoning efficiency (e.g., 2,000 vs. 10,000 tokens for equivalent answer quality) delivers ~5x throughput gains.
Openness is framed as both a trust mechanism for enterprises/sovereign deployments and the necessary delivery channel for efficiency research that hardware alone cannot transmit.
Nemotron Nano V2, a hybrid state-space model, runs 6–20x faster than comparably intelligent peers on the same hardware.
NVIDIA demonstrated world-class training results using 4-bit floating-point arithmetic on Blackwell, enabled by Transformer Engine co-design—an order-of-magnitude efficiency gain.
Scaling Nemotron internally is compared to Conway's Law: model training is less modular than software, making internal transparency and collaboration a critical operational principle.
Roadmap includes larger frontier models, deeper multimodal integration (audio/speech, vision), stronger reasoning, and continued architectural innovation beyond pure transformers.

Source material

Transcript

[Music] Hello, and welcome to the NVIDIA AI Podcast.

I'm your host, Noah Kravitz.

Right now, the world is watching AI evolve faster than ever before.

And that progress isn't just being fueled by technological breakthroughs in scale.

It's being fueled by human collaboration.

Open source models, open datasets, and shared research are giving developers, enterprises, and governments the building blocks they need to innovate together.

NVIDIA has been part of this movement from the very beginning, contributing open libraries, publishing datasets, and research, and most recently, sharing families of open models.

Which brings us to today's episode.

We're talking about Nemotron, specifically, unlocking the secret of Nemotron.

On the surface, Nemotron may look like just another open model family.

But the real story is how it anchors NVIDIA's strategy for building accelerated infrastructure and driving increased adoption of AI everywhere.

Joining us to unpack this open secret are two of the leaders driving this work forward.

Bryan Catanzaro is Vice President of Applied Deep Learning Research at NVIDIA, and Jonathan Cohen is Vice President of Applied Research at NVIDIA.

Bryan and Jonathan are here today to talk Nemotron.

I can't wait.

Gentlemen, welcome to the AI podcast.

Thank you so much for making the time to join us.

Thank you for having us.

It's great to be here.

So let's start at the top, and I'll direct this one to you, Bryan, to get us going, if that's all right.

What is Nemotron?

And as a follow up, why did NVIDIA decide to build its own family of models when you already work with essentially every major model builder out there?

Nemotron is NVIDIA's open technology for artificial intelligence.

Nemotron includes models that we train.

It also includes data sets that we release, as well as algorithms and methodologies.

And our goal with Nemotron is to support the community in building customizable AI that can be integrated deeply and tightly into the beating heart of every business around the world.

Our second goal with Nemotron is to help NVIDIA design systems for deploying and constructing AI.

There's a lot of questions about how AI works that touch the various design decisions that go into building NVIDIA's software and hardware systems.

And we can answer those questions better because we build Nemotron.

So, you know, ultimately, we're excited to open up Nemotron even further and continue to put it out there for the community.

We love learning from the community.

Nemotron is built in collaboration with the community where we learn a lot from what others are doing in the community.

And then we try to contribute what we can back.

We think that this is a great opportunity for NVIDIA to support the AI industry.

Yeah, so Nemotron is a collection of large language models and it's probably worth saying.

So they're text models and multimodal LLMs and we've kind of settled on like three sizes.

We think of them as weight classes.

So we have smaller models that we call nano models.

We have medium sized models we call supermodels.

And then we have the largest frontier sized models, which we call Ultra.

So Nemotron collectively refers to everything Brian said and then this family of models that work well.

And so how does Nemotron fit into NVIDIA's broader AI strategy?

Because from what I understand, it's not just and I say just the models are huge, but not just the models, but it's kind of a cornerstone for for growing the ecosystem.

Yeah.

Well, you know, if you think of NVIDIA as an accelerated computing platform company and you ask the question, well, what is an accelerated computing platform mean in this age of AI?

So it includes chips, it includes networking, it includes software stack, but it also includes the models.

And, you know, when we think about what is a platform today, a platform is all of those components.

And if you're building applications and you care about the quality of the models, but you also care about the performance, like Brian mentioned, one of the reasons that we train Nemotron models is so we can learn.

We are pushing the limits ourselves so that we can learn and make sure that our platform is the best.

But it also means we can do co-design.

We can cooperatively design the model architecture, the software stack and the hardware across the whole all of the hardware components altogether.

And we've been doing that.

And and that gives us opportunities to make things more efficient, lower latency, higher throughput, more energy efficient by by improving things across that entire stack up into the model architecture.

And so Nemotron is a really important part of that strategy as a accelerated computing platform company where our success comes from this full stack co-design and optimization.

Yeah.

One thing I wanted to add to that is that these days there are new things that are part of accelerated computing that maybe people haven't considered.

So, for example, data sets that we use for pre-training and post-training models have a dramatic effect on how quickly the model converges.

In fact, you know, comparing different revisions of our Nemotron pre-training set, we've accelerated pre-training by a factor of 4x just by having a smarter pre-training dataset, which means that you can actually train a much smarter model with the same amount of compute.

Yeah.

What makes one dataset better, more optimized to help the model converge faster than another?

Well, what we're trying to do with LLMs is build something intelligent that can help us solve problems.

They can answer questions.

They can reason.

It turns out that if you just take all of the texts that humankind or computers have ever produced on the Internet and train an LLM on it, that's kind of where the community started many years ago.

But it turns out that's not the most intelligent way of building AI, because a lot of that text isn't adding very much intelligence.

And so every organization that builds LLMs spends an enormous amount of effort and compute in understanding their dataset, refining it, rephrasing it using synthetic data generation.

And the effort that we put into these datasets has an enormous impact on how quickly the models train and also on the overall strength of the model once it's trained.

And so these days, I believe that the datasets that we release as part of NemoTron are an important part of NVIDIA's accelerated computing efforts, because it's not really possible to think about how fast a system is for training.

If you're training on a dataset that's not very smart, it's going to take an enormous amount of compute to get to the same amount of intelligence as if you were training on a dataset that was much more polished.

And so that's kind of the genius of accelerated computing, is that we try to understand the problem from first principles and we try to optimize the entire stack end to end.

And these days, it seems pretty important that NVIDIA's accelerated computing platform includes NemoTron.

Right.

I can give another example that's interesting.

So reasoning, the way these models reason is they generate thinking tokens, right?

They're asking a question and then it generates a lot of tokens as it thinks through the answer.

And there's very clear examples where you can generate a lot of tokens and not actually make a lot of progress towards the answer, or you can be more efficient, generate fewer tokens and make more progress.

And again, from the same perspective of accelerated computing, you don't really care, you know, did it generate 10,000 tokens?

Well, you do care.

You care if you can generate the same quality answer in 2000 tokens instead of 10,000 tokens.

That's a 5x speed up.

And so that's also part of the accelerated computing story.

So all of these are opportunities we have to make things faster.

Exactly.

Accelerated computing has never just been about how many arithmetic operations per second you can perform.

It's really about what capabilities do you provide.

Yeah.

And I think the key to NVIDIA's historic success is as a company, we've always focused very much and had deep expertise on the actual end applications people care about.

You know, whether it was computer graphics or high performance computing or deep learning or now modern AI, it's really thinking about what's the end goal and how do you how do you build a platform that gets you that goal with the least amount of time you have to wait, you know, lowest latency highs.

You talked some about the openness, the collaboration and co-design that's such an important piece of this.

Open source, a big part of Nematron.

What does it mean?

And maybe, Brian, I'll ask you this first, but either of you, what does it mean to call Nematron one of the most open AI development efforts that we've ever seen?

Well, we really think that it's important for AI to be trusted and widely deployed.

And in order for that to happen, we think it's important that enterprises have the option to understand the data sets and the technologies behind AI and fine tune them for their own problems and then integrate them very tightly into the software and systems that they use to solve problems for their markets.

We think it's AI is not a one size fits all solution.

And we've seen in the past, you know, many instances of when open platform technologies really allow different industries to differentiate different solutions for the problems they face.

For example, the Internet as an open technology had really different implications for different industries like health care versus retail.

The way that those organizations use the Internet to change the work that they do was was quite different.

But the fact that the Internet was an open technology allowed many companies, many industries to think about solving their problems in a new way using the Internet.

And when we think about AI, it seems obvious that enterprises need that ability as well.

You know, the world's most important and valuable data always has the most sensitivity about it.

And so we think it's important to support enterprises as they learn how to deploy AI, that they can do it in a way that respects their work, their privacy, the important ways that they go about problem solving in sort of a unique way for their business.

And so we think that that it's really important that there exists an open foundation for organizations around the world to build and deploy AI and NemoTron is how we're contributing to that.

I can just add one one thought to that from the perspective of accelerated computing.

If you think about, you know, we come up with some way to make a chip faster.

How does the world consume the benefits of that acceleration?

Well, in the case of a chip, you buy a chip and you get the benefits.

But what if we come up with a technique that makes models more efficient at thinking or a data set mix that that saves you time and training?

How does the rest of the world receive the benefits of that?

Like in what form do you package it?

I think the answer, the only answer is we have to teach everyone what we did by sharing it through open source, open, open weight models, sharing the data sets, explaining how they work, sharing the algorithm.

So I think it's natural that open source is a delivery mechanism for the technology that's going into our platform.

So from a little bit of a hypothetical, I'm an IT leader or a business leader at an organization and I'm hearing what you guys are saying.

And we want to do this.

And we have specialized needs in our industry and we have troves of our data that kind of represent company intelligence and our special way of doing things that has brought success.

And we're ready to embrace the AIH and transform.

We could use NemoTron and I'm going to walk through this and point me when I get this wrong.

We could use NemoTron to take an open source model and we could customize it, train it on our company data and the rest industry data, things to help it understand what we do and the problems we're trying to solve in our industry.

NemoTron could help.

We could add reasoning capabilities and that sort of thing to it.

And then we have a kind of, and I don't want to misuse the term sovereign.

And if you guys want to talk about sovereign AI, but we would then have our own sort of customized, adapted to our business, our industry, the way we do things, our data.

And it's ours because we took an open model, we trained it.

And so we don't have to worry about the sensitive data being out in some commercial model somewhere or what have you because it's our model now.

I think that's one aspect.

Close.

Yeah, that's one.

I mean, there's many aspects.

So for example, if you say, you know, NVIDIA trains a model, a NemoTron model, and it's great.

But since you've disclosed all your training data and look at your training data, for whatever reason, we have some policies for this data we can't use.

And we can say, that's fine.

Everything you need to reproduce what we did is there.

You can train your own model, excluding that data.

Or you say, well, I like the data, but the mix is wrong.

I don't know, I'm a sovereign project.

And it really needs to be very good at speaking this language and understanding this culture.

And that data wasn't as represented in your training set as I want it to be.

Everything that we did is transparent.

And so you can make these modifications yourself.

I mean, that's one aspect.

Fantastic.

Yeah, right.

NVIDIA has released datasets, recipes, alignment techniques alongside the models.

So along these same lines of building trust and transparency, why is all of that important?

Why is this full level of transparency important for the end users, you know, to be able to customize and deploy safely?

Well, I think ultimately, if you don't know what's in a technology, it's harder to trust it.

And every business has different ways of thinking about the problems they're solving.

They have different problems.

And I think it's important as we get more sophisticated about deploying AI and we integrate it more tightly into business problems around the world for businesses to be able to inspect, you know, how is this AI built?

And, you know, therefore, I can build trust that it's going to help my business solve problems.

You know, the integration is a really important point as well.

So with NemoTron models, there's a really broad spectrum of integration.

You can run it locally on a machine without any Internet.

You could also run it through an API in the cloud and everything in between.

You can deal with your businesses sensitive data using the same data management and security protocols that your business already has.

And I think for a lot of applications of AI, that level of customizability and introspection is going to be essential.

I also want to say that I think there's a real big benefit to open technologies in the sense that they tend to develop faster.

So Nvidia believes that helping AI grow creates opportunity for us.

And we think that one of the best ways of helping AI grow is to contribute in an open way to the community.

I think when you consider a technology that's being developed kind of independently by a few different organizations, but they're not able to share very much about what they're doing, there's obviously going to be a lot of reinvention that has to happen.

And the progress is going to be slower.

And so if we are able as a community to come together to contribute ideas, data models to each other and learn from each other, I think that will progress faster.

And we've seen that over the past couple of years as various organizations have been contributing to the open technologies for AI.

It's really helped the community move forward.

And like, for example, OpenAI just released GPT-OSS.

That was a fantastic thing for the field.

Alibaba has been doing some great work with QN models.

Obviously, NetA's family of llama technologies has been extraordinarily helpful to the field to help the field grow and develop.

And at Nvidia, we know that when AI grows, it's opportunity for everyone.

It's opportunity for businesses that they can solve new problems.

And it's opportunity for us because we work with every business that's building AI.

Yeah, I mean, a good example of that playing out is our own research groups often will you like if you have some idea for a way to improve a model, we often will just take one of the existing open weight models, not necessarily NemoTron, that sort of gives you the best vehicle for trying out your idea, right, improve it in some way and publish a paper, release a result.

Right.

So so like we are building on all the work from these other organizations that release open weight models all the time as well.

And that's, you know, this is no news to you guys or probably many listeners of the show.

But that same sentiment has been echoed so many times over I mean, over the past couple of years in particular, I guess we've had from all industries and walks of research and life.

And you know that the more we're collaborating, the faster we move as a whole.

Yeah.

Our guests today are Brian Cotton's arrow and Jonathan Cohen.

They're both from Nvidia.

Brian is vice president of Applied Deep Learning Research.

Well, Jonathan serves as vice president of Applied Research.

And they're here talking to us about Nvidia NemoTron family of open models and open technology.

We've been talking about the importance of open open technologies to the community in general to Nvidia, the learning that goes into informing really the whole stack, the hardware, the models, the software, the connectivity, networking, everything and the data sets as Brian was talking about and how it all really comes together to make things advance faster and more efficiently.

Broadly speaking, NemoTron has been a huge effort at Nvidia with many teams working together.

They still are to bring this to life from advanced research to commercially licensed models and data sets now.

Can you guys talk about the pipeline from research to production models?

What that's like, what it's been like for NemoTron?

Well, it is a huge effort and it takes a lot of people with different talents coming together to build NemoTron.

We've organized the project around basically the different stages of development that a model has to go through, pre-training, post-training, alignment and so forth, as well as different functional areas, like, for example, long context recall or image understanding.

So within each of these areas, we have multiple teams working together, some of which are very researchy, very theoretical and others are very engineering focused and then whole spectrum in between.

I would say it's a great honor to be part of a project where people are coming together to build something like this.

It's also a big challenge.

You know, trying to get so many brilliant minds pointed in the same direction.

I think that's one of the central challenges facing every development effort around the industry these days is how do we work together to build one amazing thing as opposed to building a hundred small things?

Yeah.

And that's really something that's been inspiring to watch come together.

Yeah, if you compare it with like a large scale software effort, you know, there's this like famous observation called Conway's Law, which is the communication patterns that are observed within a piece of software tend to mirror the organization patterns of the organizational structure that build that software.

And training a model is like I mean, Conway's Law is definitely an issue, but it's just a very different endeavor.

It's not like I build a module and you build a module and we have a nice clean interface.

Somehow, all of these things have to get combined together, you know, image, Brian's example, image understanding and long context recall.

Somehow, I'll get combined together into a single training recipe and a single data set mix.

And so the modularity is, I think, less than in software engineering.

And so the this idea that you can just decompose it and have lots of teams with sort of clean interfaces between them doesn't really work as well.

And so I think there's a real struggle in scaling up an effort like this to a very large team to do something really big.

Is there a new paradigm emerging?

Interesting.

Yeah, I wonder, you know, over the next five, 10 years, there'll be some new new law named after someone in some management principle here.

It's it's an interesting thing that we've certainly been thinking about, but it does present these challenges.

I think one of the most important principles that we've kind of settled on is you just need a lot of internal openness and transparency.

You have to solicit ideas.

There are a lot of people across the company and outside of the company working on all these problems.

You have to solicit all these ideas and you have to encourage them all to work together.

That's the only way forward.

And so that just takes a very like mature culture and and good leadership and ego lists, you know, operation and everyone being really motivated by the at the end of the day, by the work.

I would say also that one of the amazing things about AI is that it's such a general technology that it really changes the way that we do AI.

You know, it used to be like 20 years ago when I was a grad student that it was common for people to build state of the art models in computer vision on their own, like one graduate student on their own to build a model that's that was state of the art in some important area of computer.

And, you know, that's kind of how we were trained as PhD students is like, go be brilliant on your own.

Well, with with modern AI, the best results come from using industrial scale equipment and, you know, general models that can then be taught how to solve important problems.

But that requires working together.

So one of the first things that AI has changed is the development of AI itself and organizations that can figure out how to collaborate and work together succeed.

And, you know, that's one of the reasons also that we really believe in Nemo Tron as an open project is because we've seen how openness internally has made it possible for us to solve whole classes of new problems with AI.

We believe that as Nemo Tron and other open efforts come together, bring together more ideas and more force to bear on the development of AI, that the results will be stronger.

Jonathan, Nvidia has a history of building end to end products.

You know, self-driving comes to mind, gaming, of course, super pods, but then disaggregating them for the world to use.

Does Nemo Tron follow that same pattern in your mind?

And if so, how?

Yeah, I think so.

I think when we talk about that and Jensen talks about this a lot, you know, what we mean is our solution, but the things ultimately that we build are very complicated, integrated systems with many layers and many components.

And on the one hand, we need to build the whole thing ourselves because it doesn't work unless you build the whole thing yourself.

So we need to train a whole model at the end of the day.

You know, it doesn't make sense for us to release, like, I don't know, a way to make a reasoning recipe without actually training a model to do reasoning, you know, like, you know, do these things and put the whole thing together.

But at the same time, I think it's very important that we put all of the components into the ecosystem and allow people to consume the parts that they want and not consume the parts that they don't want.

So this is how our hardware is.

You know, we design data center scale computers at this point, but we don't sell it as a single data center.

We design the whole thing.

We build the whole thing.

Then we chop it up into pieces and we sell it through, you know, normal sales channels.

And and people, our customers are free to take the parts they want, replace, you know, it's truly an ecosystem.

You know, if you don't like the way you don't like our CPU, use a different CPU.

You don't like the storage.

Use a different storage.

You don't like this networking.

Use a different networking.

And we're open and interoperable with all these things.

And it's it's a tremendous engineering challenge to work that way.

But I think it's why we've been so successful is because it allows us to harness the power of like the entire computing industry because we're not really locking anyone out at all.

We're including everybody.

And so when we think about large language models, I guess we're thinking in the same way.

So we're going to develop techniques and anyone is free to take them.

You know, other companies that train large language models for a living are free to take anything we built.

They probably won't take all of it, but they're free to take anything.

They want to take the software.

That's great.

They want to take some of our data sets.

That's great.

They want to take the software and the data sets and some of the training recipes, but modify them.

That's great.

They want to take the finished models.

So in that sense, I think philosophically, that's that's absolutely how we think about products, how we think about hardware, how we think about software.

And it's now how we also think about foundation models.

And I think that's one of the things that makes Nvidia unique as a big tech company is that although we do full stack and and integration, we don't dictate to our customers how that technology is going to be deployed or used.

We know that it's not a one size fits all problem or even assembled.

Right.

And so we're we're happy to support companies of all shapes and sizes in every industry, develop and deploy A.I. And because Nvidia has this orientation, the supportive orientation where we we understand that it's not one size fits all.

That actually is the secret to why we we are able to collaborate with all of these companies.

And we want to do that with A.I. technology as well.

Kind of switching gears a little bit, but still talking along technical lines.

Can you share any exciting technical breakthroughs that came about during the Neubertron development process?

And what they might mean going forward specifically in terms of efficiency and deployment, but really take it as broad as you like.

Yeah.

Well, Nvidia is thinking about A.I. from an accelerated computing perspective.

And we have a belief that the faster we can make a model, the smarter it's going to be.

And this follows just because clearly if we're able to think quicker, then we can get more thoughts in the same amount of time.

That can help us solve problems.

So we're bringing this perspective of accelerated computing to A.I. and kind of a unique way.

A couple of things just from the past few months that we've demonstrated that I'm really excited about.

One is we released a model.

We call it Neubertron NanoV2.

It is a hybrid state-space model.

So it's not a pure transformer model, but it uses this other technology for reasoning over sequences called a state-space model that has some pretty big efficiency.

Benefits, you know, on the same hardware compared with other models of the same intelligence, we're about six to 20 times faster.

And, you know, we're pretty excited about, you know, the capabilities of this model, but it's just the beginning.

You know, we have really ambitious plans to continue evolving the architectures behind Neubertron as well as the systems that are used to build and deploy it.

Another thing that we were able to show recently is we trained a Neubertron model using a four bit floating point arithmetic and we're able to get world class results, which is really exciting because using only four bits per parameter of the neural network can be dramatically more energy efficient than using other representations.

And we know that the development of AI is going to be constrained by the efficiency with which we can train it and deploy it.

And so showing people new algorithms that are more efficient then is going to help push the industry forward.

And, you know, it's not enough to say, hey, I've got the system.

It's really fast at low precision arithmetic if no one understands how to use it.

So Neubertron is our way of demonstrating to the community like, hey, you can take advantage of this amazing low precision hardware to train a world class model if you follow this algorithm.

It's amazing that four bits is enough.

Like if you just think about how little, how low resolution that is, the fact that that works is pretty incredible.

So maybe can you rephrase for folks who might be listening and myself included who don't fully get the ramifications of what doing four bit arithmetic and these results really mean?

Well, one fun analogy from my childhood comes from video games.

I don't know if you remember the eight bit.

Yes, of course.

And there was the 16 bit Nintendo system and I was like, wow, there's so many more colors with the 16 bit Nintendo.

It's like, wow, look at that smooth gradient, right?

So if you only have eight bits, you can represent 256 numbers.

With 16 bits, you can represent about 65000 numbers.

With four bits, you can represent 16.

Right.

So it's a very, very small amount of options to pick from.

Like if you're going to draw a picture using four bit numbers, it's actually going to be pretty hard to make it look smooth.

Right.

Of course, what we're doing with our four bit training hardware and software isn't as straightforward as just using exactly one of 16 numbers for every parameter in the neural net.

They actually come in blocks.

The blocks have scaling factors attached to them in hierarchical ways.

And that's all accelerated by software and hardware that we've built in transformer engine and in our Blackwell GPU generation.

And so it's kind of amazing that we're able to take this raw material that's very coarse and rather small and we're able to make it flexible enough to train a world class neural network.

Right.

But on some level, I always like to think of this as like you can have any number you want as long as it's one of these 16 and somehow, you know, it still works.

It is pretty miraculous.

Yeah.

Amazing.

As we wrap up the conversation, but look ahead to the future of Nematron, what can developers and enterprises expect next?

We've talked you've talked about it a little bit, some of the things coming through the pipeline and that you're working on.

But what can devs and enterprises expect from Nematron and, you know, perhaps more importantly, how can they start to engage with Nematron right now?

Well, I can just say, you know, you should expect us to train some big models.

We've trained recently some smaller models.

We'll be training some bigger models.

You can expect us to incorporate more multimodal technology.

We have from NVIDIA, we have some of the world's best.

Well, I guess the world's best open way speech recognition models at this point.

And that technology hasn't really been incorporated into Nematron and we're working towards adding audio and these kinds of capabilities.

So I think there's a lot of just really cool technology we're working on, really bringing all of the best technology across NVIDIA and concentrating in Nematron.

I think that's, you know, that's something people can look forward to.

I don't know, Brian, what you would say.

Well, yeah, I would also reinforce how important reasoning is to Nematron.

It's been a core part of Nematron development for the last year and we were super proud that we were able, for example, to take Nematron reasoning and add it to Metaslama family.

We know that there's a lot more work to do to make reasoning even stronger and we're really excited to do that.

Brian, John, this has been great, really informative conversation.

But just to hear the two of you talk about Nematron from the inside out, just just a treat.

So for folks who are listening and want to get started with Nematron, the models are available now.

Yeah, so our models are available on Hugging Face.

You can download them.

Perfect.

You can also experience all of them on build.invidia.com and download them there as well.

Excellent.

We do have a landing page on invidia.com for Nematron and we're busy filling it out right now, gathering all of the Nematron content together in one place.

So I would go there.

Excellent.

And work in progress, I'm sure, as the content like the technology itself evolves and evolves.

Again, John, Brian, both of you, I know, tremendous amount on your plate with Nematron and everything else.

So we appreciate the hour to come on and help shout out from the rooftops, tell the world about all the fantastic work you and your teams have been doing.

Congratulations and all the best going forward.

As you said, not just inside of NVIDIA, but collaborating with the community and working to raise all the votes together.

Thanks for having me.

Thanks, everyone.

[Music]