wccftech.com
Inworld AI had abig presence at GDC 2024, where it demonstrated new tech demos of its AI Character Engine in collaboration with gaming giants like Microsoft, Ubisoft, and NVIDIA. One year later, during the recent GDC 2025, their presence was undoubtedly way more understated, with less flashy partnerships to talk about. That doesn't mean there's no development going on behind the scenes, though. During the recent convention in San Francisco, we caught up with Inworld AI CEO Kylan Gibbs to discover what they've been up to lately.Let's talk about your company's evolution over the past few years.We've been around for almost four years now. The first thing we started out with was this character engine that was a server-side application connected to your game engines via an SDK. It was largely meant to abstract away a lot of the complexity of AI, mainly for designers and narrative writers. The biggest learning that we had was that people wanted much more control, and they wanted the logic and everything to run locally.A big focus for us product-wise has been shifting away from that server-side application to take the logic and tools that we built our own engine with and turn them into a series of libraries that developers can actually use directly in the engine to build their own AI engines effectively. That means it's a C++ runtime that can be transformed as needed for other engines.That has been the transition from character engine to framework. As part of that, we've had a focus on observability and telemetry. One of the challenges is that, with AI, a lot of game developers don't have the transparency that they need to actually understand when something breaks, what went wrong, and when something is good, what might happen.That's our portal tool, which allows developers to access the telemetry built into that framework. The big thing, though, is we need to bring not just the logic locally but, ideally, the models locally, which is what every game developer wants, so we've had a huge focus on that as well.What we've built is a tool that allows us to use our cloud to actually distill down models that can be used locally. Of course, the challenge there is that a lot of consumer hardware is not ready to run everything locally. What we end up building into a lot of these applications is what we call a hybrid inference model where you have the actual model locally stored, but it detects if it can run on the hardware and if not, it backs up to a cloud version. For example, if it lands on a PC with a GeForce RTX 5090, you run it locally. If it lands on a Nintendo Switch, you're going to use the cloud.The other big focus that we had is what we call controlled evolution. The biggest challenge with AI overall right now for games and for consumer apps in general is if you launch a game today with a given model, and you keep that model, in six months, it will be outdated because AI is moving too quickly. You need to basically constantly be able to select from all the third-party or our own models that are available, figure out which one is the best at that given time, and then do a bunch of optimization on it based on your user usage.We try to work with developers so that they do not have to make a $20 million commitment to a specific cloud provider model provider but to use whatever the best model is at any given time and optimize that specifically for their use case, because every model is built for these kinds of huge general-purpose tasks. We need to do one thing super well, and so we do a lot of work there.Because AAA games and all the largest studios that we work at Inworld with have obviously very long development cycles, the biggest launches today are largely mobile browser-based applications. The AAA ones take a little longer. The ones that I think are most exciting are Status, for example, which is from a company called Wishroll. It's a game where you roleplay as a character in another universe's Twitter.Crazy idea. But they hit 500K users in 19 days from launch with an average spend of an hour and a half per user per day, which is crazy traffic and the whole thing is powered by your achievements, the content. It's just mindblowingly creative in terms of what they built.The other one is Little Umbrella. They have another part of the company called Playroom, which might be familiar. They built Death by AI as their first game and just released another one called The Last Show, which is effectively a JackBox-style party game powered by AI. Those are super fun because they lean into AI orchestrating multiplayer scenarios in real time.A few other cool ones are Streamlabs, where we created a streaming assistant in a collaboration between us, Streamlabs with Logitech and Nvidia. The game that we're using for it is Fortnite. In that case, you have this system that's living alongside the game in real time, seeing what's happening in the game, understanding the game's state, observing the user comments, hearing what the streamers are saying and being able to take complex actions, like do I need to overclock the GPU? Do I need to change the camera settings? Do I need to trigger an in-game event? All of those different things can actually happen, and they have to happen with millisecond latency. So, to make it all work performantly, that kind of mix of hybrid and local inference has to be required.Speaking of StreamLabs, does it have functionality for a sort of gaming coach where it can monitor how you're proceeding with the game?Yes, with Streamlabs, that's basically how it performs. In this case, these are professional streamers often using it, so they really don't need coaching. But if you were a player going into the game, you'd belike, what the heck does this item do, right? What's the best next thing for me to do? It can do all of that.The biggest class of use cases that we're seeing, which I call companions and assistants, are two different varieties of companions: disembodied and embodied companions. Disembodied is your Streamlabs assistant. It's outside the game, able to observe it, but not actually literally within the game. It's often used for coaching, assistants, questions, and live walkthroughs.The other is embodied. You would use it for onboarding, which is a huge use case. Instead of having your blocks of text and everything starting, a character sees what you're doing, gives you suggestions, tells you how to play the game, and gives you comments. It can also be used later on, for example, for things like difficulty assistance. Maybe if you're stuck, it can show and tell how you're going to do this.There are other use cases like player emulation, especially when you're doing multiplayer co-op games and MMOs. You jump in, and you're in hour one. You want to get a feel for the game, but you don't want to die, so how do we make it feel like you're playing with other players, maybe even with speech and everything else? Or, maybe you and I are playing a co-op game and you drop off, and then I want a character that comes along and makes it feel like I'm still playing with you. There are a lot of different use cases in that companion assistance space that are super exciting.Is the monitoring integrated as an SDK within the program itself, or does it have the functionality to read video inputs, for example?The logic is integrated into the application itself as much as possible. We actually integrate all the model understanding in it. You can embed local visual models that can understand things in real time. Really, the constraint is what hardware you want to run on. We have a demo that is fully running on an NVIDIA GeForce RTX 5090, an AMD Radeon RX 7900 XTX, or a Tenstorrent QuietBox. In that case, you can run it all locally. In that case, your application is just as old-school as it can be. It just happens to have AI logic in there and models that are embedded. That's where I think the industry needs to be going now, because everybody doesn't have the hardware power, we're still in the case where some of that needs to back up to the cloud. But really, the only thing you're ever using in the cloud is just a stored model, and you have an endpoint to it, but we try and keep all that logic locally because the developers need control over that.For video monitoring, one example I've heard from NVIDIA showing off their gaming co-pilot assistant is being able to monitor a region of the screen. Say you're looking at a mini-map, you're playing a MOBA and you want to know when something disappears. How difficult would it be for either an end user or programmer to set up the variables to have it monitored for something like that?That's a great question. You can think about two things. The ideal for this is to do a full screen view visual language model or OCR. With OCR, you're basically taking screen captures. The reason you want a visual language model ideally is because it gives you that spatial awareness, but we see two ways to do that.For a developer, what you'd probably do is set it up so that you're having it pointed at specific pixels on the screen and understanding based on those. What we often actually push people to think about as well is that sometimes you don't need to understand vision because you have the game state. You have code on the backend.People often miss that, or they're like, we're trying to just understand the visuals, whereas actually, the code in the backend is telling me everything I need to know. It's kind of that dance between what I am actually not able to capture based on the game code and what I need that visual for.What's the possibility of using Inworld AI for quality assurance testing?You could use it. To be honest, our focus is primarily on player-facing AI. As I engage with more studios, my answer is you should build that yourself. And my reason for that is, I just think it's going to be continually driven down in terms of cost to use these large language models. Anything that's QA is a bit different, but for any kind of content creation productivity stuff, I think they just need to build it in-house and do it themselves.For QA specifically, we don't build agents for testing or anything else. You could build that yourself using our tech. It's ultimately a more infrastructural piece of technology. We have some groups trying to create player emulating bots that they can send into the world and use.We don't build a specific solution for QA testing, but we've often seen it used for prototype testing. In this scenario, you might set up a world with a general cityscape, and I want to see a hundred different varieties of this where the agents are responding in slightly different ways. It helps with rapid prototyping so that you can identify, out of that set of a hundred different options, which one is the most fun or engaging.But in terms of core quality assurance or bug fixing, it's something that developers could build, but my honest response is use our tech if you want for that, but it's probably a good area for you to build yourself because it's going to be a core part of your workflows in the future.The main reason I was asking for it is that it can monitor the game state, setting up variables where you're looking at, say, unintended interactions.Right, that's super interesting. As I was mentioning, that telemetry piece that we have is super valuable there. Because it's built into the game code, you can set it up so that you're running telemetry against any part of the game. If you want to detect what types of character responses or NPC interactions tend to result in the player completing the mission, you need to know what kind of AI stuff is actually happening there.So, I guess I would say this. We don't do general QA. We certainly are really focused on making sure that you can QA the crap out of your AI. Anything that is AI in there, we need to give you all the data, all the metadata, everything that you possibly need so that you can figure out how that actually works.I think it's essential because I honestly think one of the broken parts of AI today is it's all a black box and if you're building and iterating on a game and doing playtest, you need to know when it breaks, how it breaks, and how that's all connected. We don't do QA for the broader space of game development, but as people are integrating AI, you need to QA the crap out of that. And that's where the telemetry piece comes in.'Crowds are one of those areas that have just not really evolved in about 10 years. For example, instead of just random people walking back and forth, as you kind of see in every game or standing still, they notice each other. Someone says something, someone walks up, they start having a conversation, they decide to do something, and they go off. As a player, you might not be able to put your finger on what is more immersive in that case, but it just feels more alive. We do a lot of that kind of stuff in terms of environmental awareness, too. We don't just power characters; we can power any part of the game state. How does the environment adapt to different people? How do you create different parts of quests or event generation?'Have you noticed any issues that you're resolving with Inworld AI, such as hallucinations?Not as much anymore, because we have the ability to distill down these models and train them for specific tasks and run a lot of filters over them. The hallucinations can be controlled as much as you want and you can also perform data structure validation. So if you're outputting, for example, a JSON format, you can constrain it to specific JSON formats, certain lengths, and certain types of words.Where hallucination comes in, for example, is with that game that I mentioned earlier, Status. They kind of take advantage of it to a degree because they want characters to come up with crazy ideas but still stay within character.It depends on how you define hallucination. In some cases, breaking outside of IP norms is one form of hallucination. Another form of hallucination is coming up with completely made-up stuff that doesn't make sense in the game and breaks the data structures. We focus a lot on that former one because we work with many IP holders who are super sensitive to it. On the other side, that's a pretty solved problem, but both are solvable. One just requires a lot moremachine learning depth there.Can you talk about the dynamic crowd tech that you are working on?Yeah, I love this. One of the big problems I encounter when I engage with studio heads who work on any kind of open-world game is that there are two ways that people have tried to build better player experiences. They just make worlds bigger and bigger, and they think a bigger world means more playtime. And I'm like, I can't go on horseback for another 20 minutes. That's one part of it. The other is graphical fidelity. Effectively, they try to consistently increase the graphical fidelity, thinking that if they have a bigger world with higher fidelity, people will like it.Dynamic crowds is part of the general solution to that, which is how to make the world feel more alive. Crowds are one of those areas that have just not really evolved in about 10 years. For example, instead of just random people walking back and forth, as you kind of see in every game or standing still, they notice each other. Someone says something, someone walks up, they start having a conversation, they decide to do something, and they go off.As a player, you might not be able to put your finger on what is more immersive in that case, but it just feels more alive. We do a lot of that kind of stuff in terms of environmental awareness, too. We don't just power characters; we can power any part of the game state. How does the environment adapt to different people? How do you create different parts of quests or event generation?Maybe, if you have just completed a quest, I want to generate an event. For example, OK, I just saved this cat. I'm walking up to an ice-cream vendor. The cat jumps on the ice cream vendor, they shoo it away, and someone comes over. It's those little parts of the world that come more alive and make it feel more immersive, which is almost like a new form of fidelity that we are pushing now.Is there a toolset that Inworld AI is building so that developers can integrate a sample of the technology?That's a great question. As I mentioned, we build these templates with our framework. As soon as we start working with developers, we provide them with a sample to get started and understand how the tech works, and then they build around that. But it's not a black box component that they plug in. It's basically a chunk of code that they get to go in and change because how you might want to build crowds is different from another, not to mention how it interfaces with your assets and the rest of your system. That's why we think in terms of templates rather than components.Absolutely, especially when you're trying to adapt to the same game and engine framework across a broad spectrum of devices, some of which may be intended to be played offline.Exactly. That's where the story of local hybrid is really important, because most people want to launch their games on multiple devices. How do we create a sense of player parity? We all know there's different graphics, right? If I play The Witcher 3 on another device, it's a completely different graphical experience. There's also that kind of level of how do we give the sense of parity while also recognizing the different constraints of the different devices.Do you think Metahumans are a technology worth investing in, or is itpetering out?Honestly, I see so many people - especially because people who come to us want to feel innovative, where there's this idea that it needs to be hyper-realistic fidelity. Every time you go to Metahumans, you end up being like, oh man, facial animations are really hard and anything becomes difficult. Also, players get this uncanny valley effect.Generally, we've seen people migrate away from that direction towards more stylized characters that engage people. For example, Metaphor: ReFantazio has super high-fidelitycharacters, but it's not Metahumans. So, I feel like there is certainly a transition in the other direction. I feel like there are certain interested parties who want to maximize that fidelity so you can maximize your GPU capacity, but I personally have just consistently seen stylized characters and worlds play out a little bit better, and it makes the development a lot easier. It also allows you to feel more differentiated for your players because, otherwise, every game feels like it has the same Metahumans in it. I don't necessarily want to say it's petering out, but there's certainly a recognition that it's not the right solution for most players.Tell me a little bit about what Inworld AI is building with Nanobit for Winked.In that case, they had this interactive novel type of game. It's a great experience where they release every few weeks or months a new pack of episodes. Those take a while to develop, and what happens is people get very attached to the characters during those experiences, and then go oh, I'm going to wait for the next month to get my favorite character back in the next episode.In that case, they integrated these characters as a kind of stopgap. Now, every time you finish an episode, you can have a conversation with a character from the world. The work we did there was, how do we make these beloved characters that people are really attached to feel the same as the ones in the human-written stories so that people can continue experiencing them?A lot of it was about how to achieve that dialogue quality without breakingthe bank. A lot of that was custom model training to fit the specific character persona.Lastly, for players who want to experience Inworld AI technology firsthand, can you talk about the next commercial release in which they might be able to see that technology in a game?I can't mention any of the AAA games because they all want to be secret, but there's probably going to be some stuff this Summer that will hopefully see some very large titles announced. We will also have another large showcase happening around June where we'll be showing off some new case studies and have our own event. So, I would look around the summer for some big stuff to happen.Thank you for your time.Deal of the Day