Quizzing Intel exec Sandra Rivera about generative AI and more



Intel threw a lot of information at us a couple of weeks ago at its Intel Innovation 2023 event in San Jose, California. The company talked a lot about its manufacturing advances, its Meteor Lake chip, and its future schedule for processors. It felt like a heavy download of semiconductor chip information. And it piqued my interest in a variety of ways.

After the talks were done, I had a chance to talk to pick the brain of Sandra Rivera, executive vice president and general manager of the Data Center and AI Group at Intel. She was perhaps the unlucky recipient of my pent-up curiosity about a number of computing topics. Hopefully she didn’t mind.

I felt like we got into some discussions that were broader than one company’s own interests, and that made the conversation more interesting to me. I hope you enjoy it too. There were a lot more things we could have talked about. But sadly for me, and lucky for Rivera, we had to cut it off at 30 minutes. Our topics included generative AI, the metaverse, competition with Nvidia, digital twins, Numenta’s brain-like processing architecture and more.

Here’s an edited transcript of our interview.

Event

GamesBeat Next 2023

Join the GamesBeat community in San Francisco this October 24-25. You’ll hear from the brightest minds within the gaming industry on latest developments and their take on the future of gaming.


Learn More

Sandra Rivera is executive vice president and general manager of the data center and AI group at Intel.

VentureBeat: I am curious about the metaverse and whether Intel thinks that this is going to be a driver of future demand and whether there’s much focus on things like the open metaverse standards that some folks are talking about, like, say Pixar’s Universal Scene Description technology, which is a 3D file format for interoperability. Nvidia has made been making a big deal about this for years now. I’ve never really heard Intel say much about it, and same for AMD as well.

Sandra Rivera: Yeah, and you’re probably not going to hear anything from me, because it’s not an area of focus for me in our business. I will say that just generally speaking, in terms of Metaverse and 3D applications and immersive applications, I mean, all of that does drive a lot more compute requirements, not just on the client devices but also on the infrastructure side. Anything that is driving more compute, we think is just part of the narrative of operating in a large and growing tam, which is good. It’s always better to be operating in a large and growing tam than in one that is shrinking, where you’re fighting for scraps. I don’t know that, and not that you asked me about Meta specifically, it was Metaverse the topic, but even Meta, who was one of the biggest proponents of a lot of the Metaverse and immersive user experiences seems to be more tempered in how long that’s going to take. Not an if, but a when, and then adjusting some of their investments to be probably more longer term and less kind of that step function, logarithmic exponential growth that maybe –

Mercedes-Benz is building digital twins of its factories with Nvidia Omniverse.
Mercedes-Benz is building digital twins of its factories with Nvidia Omniverse.

VentureBeat: I think some of the conversation here around digital twins seems to touch on the notion that maybe the enterprise metaverse is really more like something practical that’s coming.

Rivera: That’s an excellent point because even in our own factories, we actually do use headsets to do a lot of the diagnostics around these extraordinarily expensive semiconductor manufacturing process tools, of which there are literally dozens in the world. It’s not like hundreds or thousands. The level of expertise and the troubleshooting and the diagnostics, again, there’s, relatively speaking, few people that are deep in it. The training, the sharing of information, the diagnostics around getting those machines to operate and even greater efficiency, whether that is amongst just the Intel experts or even with the vendors, I do see that as a very real application that we are actually using today. We’re finding a wonderful level of efficiency and productivity where you’re not having to fly these experts around the world. You’re actually able to share in real time a lot of that insight and expertise.

I think that’s a very real application. I think there’s certainly applications in, as you mentioned, media and entertainment. Also, I think in the medical field, there’s another very top of mind vertical that you would say, well, yeah, there should be a lot more opportunity there as well. Over the arc of technology transitions and transformations, I do believe that it’s going to be a driver of more compute both in the client devices including PCs, but headsets and other bespoke devices on the infrastructure side.

Nvidia Grace Hopper Superchip
Grace Hopper chip

VentureBeat: More general one, how do you think Intel can grab some of that AI mojo back from Nvidia?

Rivera: Yeah. I think that there’s a lot of opportunity to be an alternative to the market leader, and there’s a lot of opportunity to educate in terms of our narrative that AI does not equal just large language models, does not equal just GPUs. We are seeing, and I think Pat did talk about it in our last earnings call, that even the CPU’s role in an AI workflow is something that we do believe is giving us tailwind in fourth-gen Zen, particularly because we have the integrated AI acceleration through the AMX, the advanced matrix extensions that we built into that product. Every AI workflow needs some level of data management, data processing, data filtering and cleaning before you train the model. That’s typically the domain of a CPU and not just a CPU, the Xeon CPU. Even Nvidia shows fourth-gen Zen to be part of that platform.

We do see a tailwind in just the role that the CPU plays in that front end pre-processing and data management role. The other thing that we have certainly learned in a lot of the work that we’ve done with hugging face as well as other ecosystem partners, is that there is a sweet spot of opportunity in the small to medium sized models, both for training and of course, for inference. That sweet spot seems to be anything that’s 10 billion parameters and less, and a lot of the models that we’ve been running that are popular, LLaMa 2, GPT-J, BLOOM, BLOOMZ, they’re all in that 7 billion parameter range. We’ve shown that Xeon is performing actually quite well from a raw performance perspective, but from a price performance perspective, even better, because the market leader charges so much for what they want for their GPU. Not everything needs a GPU and the CPU is actually well positioned for, again, some of those small to medium-sized models.

Greg Lavender, CTO of Intel.
Greg Lavender, CTO of Intel.

Then certainly when you get to the larger models, the more complex, the multimodality, we are showing up quite well both with Gaudi2, but also, we also have a GPU. Truthfully, Dean, we’re not going to go full frontal. We’re going to take on the market leader and somehow impact their share in tens or percentage of points at a time. When you’re the underdog and when you have a different value proposition about being open, investing in the ecosystem, contributing to so many of the open source and open standards projects over many years, when we have a demonstrated track record of investing in ecosystems, lowering barriers to entry, accelerating the rate of innovation by having more market participation, we just believe that open in the long-term always wins. We have an appetite from customers that are looking for the best alternative. We have a portfolio of hardware products that are addressing the very broad and ranging set of AI workloads through these heterogeneous architectures. A lot more investment is going to happen in the software to just make it easy to get that time to deployment, the time to productivity. That is what the developers care most about.

The other thing that I get asked quite a bit about is, well, there’s this CUDA moat and that’s a really tough thing to penetrate, but most of the AI application development is happening at the framework level and above. 80% is actually happening at the framework level and above. To the extent that we can upstream our software extensions to leverage the underlying features that we built into the various hardware architectures that we have, then the developer just cares, oh, is it part of the standard TensorFlow release, part of the standard PyTorch release part of Standard Triton or Jax or OpenXLA or Mojo. They don’t really know or care about oneAPI or CUDA. They just know that that’s – and that abstracted software layer, that it’s something that’s easy to use and easy for them to deploy. I do think that that’s something that is fast evolving.

Numenta's NuPIC platform.
Numenta’s NuPIC platform.

VentureBeat: This story on the Numenta folks, just a week and a half ago or so, and they went off for 20 years studying the brain and came up with software that finally is hitting the market now and they teamed up with Intel. A couple of interesting things. They said they feel like they could speed up AI processing by 10 to 100 times. They were running the CPU and not the GPU, and they felt like the CPU’s flexibility was its advantage and the GPU’s repetitive processing was really not good for the processing they have in mind, I guess. It is then interesting that say, you could also say dramatically lower costs that way and then do as you say, take AI to more places and bring it to more – and bring AI everywhere.

Rivera: Yeah. I think that this idea that you can do the AI you need on the CPU you have is actually quite compelling. When you look at where we’ve had such a strong market position, certainly it’s on, as I described, the pre-processing and data management, a part of the AI workflow, but it’s also on the inference and deployment phase. Two thirds of that market has traditionally run on CPUs and mostly the young CPUs. When you look at the growth of people learning training versus inference, inference is growing faster, but the fastest growing part of the segment, the AI market is an edge inference. That’s growing, we estimate about 40% over the next five years, and again, quite well positioned with a highly programmable CPU that’s ubiquitous in terms of the deployment.

I will go back to say, I don’t think it’s a one size fits all. The market and technology is moving so quickly, Dean, and so having really all of the architectures, scalar architectures, vector processing architectures, matrix multiply, processing our architectures, spatial architectures with FPGAs, having an IPU portfolio. I don’t feel like I am lacking in any way in terms of hardware. It really is this investment that we’re making, an increasing investment in software and lowering the barriers to entry. Even the DevCloud is absolutely aligned with that strategy, which is how do we create a sandbox to let developers try things. Yesterday, if you were in Pat’s keynote, all of the three companies that we showed, Render and Scala and – oh, I forget the third one that we showed yesterday, but they all did their innovation on the DevCloud because again, lower barrier to entry, create a sandbox, make it easy. Then when they deploy, they’ll deploy on-prem, they’ll deploy in a hybrid environment, they’ll deploy in any number of different ways, but we think that, that accelerates innovation. Again, that’s a differentiated strategy that Intel has versus the market leader in GPUs.

Hamid Azimi, corporate vice president and director of substrate technology development at Intel Corporation, holds an Intel assembled glass substrate test chip at Intel's Assembly and Test Technology Development factories in Chandler, Arizona, in July 2023. Intel’s advanced packaging technologies come to life at the company's Assembly and Test Technology Development factories.
Hamid Azimi, corporate vice president and director of substrate technology development at Intel Corporation, holds an Intel assembled glass substrate test chip at Intel’s Assembly and Test Technology Development factories in Chandler, Arizona, in July 2023. Intel’s advanced packaging technologies come to life at the company’s Assembly and Test Technology Development factories.

VentureBeat: Then the brain-like architectures, do they show more promise? Like, I mean, Numenta’s argument was that the brain operates on very low energy and we don’t have 240-watt things plugged into our heads. It does seem like, yeah, that ought to be the most efficient way to do this, but I don’t know how confident people are that we can duplicate it.

Rivera: Yeah. I think all the things that you didn’t think were possible are just becoming possible. Yesterday, when we had a panel, it wasn’t really AI, it wasn’t the topic, but, of course, it became the topic because it’s the topic that everyone wants to talk about. We had a panel on what do we see in terms of the evolution in AI in five years out? I mean, I just think that whatever we project, we’re going to be wrong because we don’t know. Even a year ago, how many people were talking about ChatGPT? Everything changes so quickly and so dynamically, and I think our role is to create the tools and the accessibility to the technology so that we can let the innovators innovate. Accessibility is all about affordability and access to compute in a way that is easily consumed from any number of different providers.

I do think that our whole history has been about driving down cost and driving up volume and accessibility, and making an asset easier to deploy. The easier we make it to deploy, the more utilization it gets, the more creativity, the more innovation. I go back to the days of virtualization. If we didn’t believe that making an asset more accessible and more economical to use drives more innovation and that spiral of goodness, why would we have deployed that? Because the bears were saying, hey, does that mean you’re going to sell half the CPUs if you have multi threads and now you have more virtual CPUs? It’s like, well, the exact opposite thing happened. The more affordable and accessible we made it, the more innovation was developed or driven, and the more demand was created. We just believe that economics plays a big role. That’s what Moore’s Law has been about and that’s what Intel’s been about, economics and accessibility and investment in ecosystem.

The question around low power. Power is a constraint. Cost is a constraint. I do think that you’ll see us continue to try to drive down the power and the cost curves while driving up the compute. The announcement that Pat made yesterday about Sierra Forest. We have 144 cores, now doubling that to 288 cores with Sierra Forest. The compute density and the power efficiency is actually getting better over time because we have to, we have to make it more affordable, more economical, and more power efficient, since that is really becoming one of the big constraints. Probably a little bit less, so in the US although, of course, we’re heading in that direction, but you see that absolutely in China and you see that absolutely in Europe and our customers are driving us there.

VentureBeat: I think it is a very, say, compelling argument to do AI on the PC and promote AI at the Edge, but it feels like also a big challenge in that the PC’s not the smartphone and smartphones are much more ubiquitous. When you think of AI at the Edge and Apple doing things like its own neural engines and its chips, how does the PC stay more relevant in this competitive environment?

Pat Gelsinger shows off a UCIe test chip.
Pat Gelsinger shows off a UCIe test chip.

Rivera: We believe that the PC will still be a critical productivity tool in the enterprise. I love my smartphone, but I use my laptop. I use both devices. I don’t think there’s a notion that it’s one or the other. Again, I’m sure Apple is going to do just fine, so lots and lots of smartphones. We do believe that AI is going to be infused into every computing platform. The ones that we are focused on are the PC, the Edge, and of course, everything having to do with cloud infrastructure, and not just hyperscale cloud, but of course, every enterprise has cloud deployment on-prem or in the public cloud. I think we have probably seen the impact of COVID was the multi-device in the home and drove an unnatural buying cycle. We’re probably back to more normalized buying cycles, but we don’t actually see the decline of the PC. I think that’s been talked about for many, many years but PC still continue to be a productivity tool. I have smartphones and I have PCs. I’m sure you do too.

VentureBeat: Yeah.

Rivera: Yeah, we feel pretty confident that infusing more AI into the PC is just going to be table stakes going forward, but we are leading and we are first, and we are pretty excited about all of the use cases that we’re going to unlock by just putting more of that processing into the platform.

VentureBeat: Then just like a gaming question here that leads into some more of an AI question too, where I think when the large language models all came out, everybody said, oh, let’s plug these into game characters in our games. These non-player characters can be much smarter to talk to when you have a conversation with them in a game. Then some of the CEOs were telling me the pitches they were getting were like, yeah, we can do a large language model for your blacksmith character or something, but probably costs about a dollar a day per user because the user is sending queries back. This turns out to be $365 a year for a game that might come out at $70.

Intel PowerVia brings power through the backside of a chip.
Intel PowerVia brings power through the backside of a chip.

Rivera: Yeah, the economics don’t work.

VentureBeat: Yeah, it doesn’t work. Then they start talking about how can we cut this down, cut the large language model down? For something that a blacksmith needs to say, you have a pretty limited universe there, but I do wonder, as you’re doing this, at what point does the AI disappear? Like it becomes a bunch of data to search through as opposed to something that’s –

Rivera: Generative, yeah.

VentureBeat: Yeah. Do you guys have that sense of like there’s somewhere in the magic of these neural networks is intelligence and it’s AI and then databases are not smart? I think the parallel maybe for what you guys were talking about yesterday was this notion of you can gather all of your own data that’s on your PC, your 20 years worth of voice calls or whatever.

Rivera: What a nightmare! Right?

VentureBeat: Yeah. You can sort through it and you can search through it, and that’s the dumb part. Then the AI producing something smart out of that seems like to be the payoff.

Rivera: Yeah, I think it’s a very interesting use case. A couple of things to comment there. One is that there is a lot of algorithmic innovation happening to get the same level of accuracy for a model that is a fraction of the size as the largest models that take tens of millions of dollars to train, many months to train and many megawatts to train, which will increasingly be the domain of the few. There’s not that many companies that can afford $100 million, three or four or six months to train a model and literally tens of megawatts to do that. A lot of what is happening in the industry and certainly in academia is this quantization, this knowledge distillation, this pruning type of effort. You saw that clearly with LlaMA and LlaMA 2 where it’s like, well, we can get the same level of accuracy at a fraction of the cost in compute and power. I think we’re going to continue to see that innovation.

Numenta can scale CPUs to run lots of LLMs.
Numenta can scale CPUs to run lots of LLMs.

The second thing in terms of the economics and the use cases is that indeed, when you have those foundational models, the frontier models, customers will use those models just like a weather model. There’s very few, relatively speaking, developers of those weather models, but there’s many, many users of those weather models, because what happens is then you take that and then you fine tune to your contextualized data and an enterprise dataset is going to be much, much smaller with your own linguistics and your own terminology, like something that means – a three letter acronym at Intel is going to be different than a three letter acronym at your firm versus a three letter acronym at Citibank. Those datasets are much smaller, the compute required is much less. Indeed, I think that this is where you’ll see – you gave the example in terms of a video game, it cannot cost 4X what the game costs, 5X what the game costs. If you’re not doing a large training, if you’re actually doing fine tuning and then inference on a much, much smaller dataset, then it becomes more affordable because you have enough compute and enough power to do that more locally, whether it’s in the enterprise or on a client device.

VentureBeat: The last notion of the AI being smart enough still, I mean, it’s not necessarily dependent on the amount of data, I suppose.

Rivera: No, if you have, again, in a PC, a neural processing engine, even a CPU, again, you’re not actually crunching that much data. The dataset is smaller and therefore the amount of compute processing required to compute upon that data is just less and very within reach of those devices.

GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.



Source link