• Five Ways to Get Better Battery Life From Your Steam Deck

    After the Nintendo Switch, the Steam Deck might be the most impressive gaming handheld of the last decade. It brings Steam games—most of which were initially designed to run on Windows PCs—to a remarkably designed portable device. The only problem? Battery life can be rough on some games. If you're struggling to stay charged, here are some tips to help you out.When it comes to your Steam Deck's battery life, you're going to notice a lot of variability, even from one game to another. AAA games that rely on high-end GPUs will typically guzzle power. On laptops or desktops, that's usually not as much of a concern, but on the Steam Deck—when those games run at all—they can burn through the battery quickly.So, while we have plenty of tips to get the best battery life, it's important to keep in mind some games will simply burn through your power no matter what. Fortunately, SteamOS is already pretty power efficient, and there are several handy tools to help.First, learn what, exactly, is draining your batteryThere are a few common culprits for battery drain in games, and it's helpful to understand them before diving into solutions. This is because what works for one game with minimal performance impact, could make another game unplayable. With that in mind, here are a few key things that drain your battery:Your hardware settings. The display on your Steam Deck is always a pretty big battery drain, and turning down the brightness can help. Wireless radios like wifi or Bluetooth are always sipping power, even if you're not using them, so you can sometimes turn these off if you don't need them.Your refresh rate and FPS. Your Steam Deck has to update the screen dozens of times every second, and for some games it might be way more than necessary. 60 to 90 frames per second might be necessary for a fast-paced game like Doom Eternal, but it's overkill for Stardew Valley.Your processor's TDP. Thermal Design Poweris a complicated metric, but it serves as a shorthand for how much power your processor is using. On the Steam Deck, you can limit this directly, which is a blunt way of saving battery, but it can help sometimes.The most useful tool to help you diagnose your biggest battery drains is the Performance Overlay. Press the three-dot menu button while in a game and navigate to the Performance section and you'll see an option to enable this overlay. There are several levels of detail, ranging from a simple frame rate counter, to real-time power consumption and temperature readouts. The Performance tab is also where you'll find several useful features we'll discuss, so it's good to make friends with this tab.Dive into your game's display settingsWhile the Steam Deck has a lot of useful features for managing battery life, you're still going to find some of your best options in your game's settings. Most games have presets to lower graphics settings with one quick toggle—like switching from Ultra to Medium—and some have even more advanced settings.This is particularly important to keep in mind if you play Steam games on multiple devices. Some games will try to sync settings between them, which can lead to your game rendering at a higher resolution or frame rate than the Steam Deck is even capable of displaying.In general, here are a few settings you should take a look at:Resolution: The Steam Deck has a 1280x800 resolution, so unless you're using an external monitor, there's no reason to set your game to a higher resolution. Most games won't let you go higher anyway, but it's worth it to double check. You can also go lower for some games, if you don't need as much detail.Frame rate: Many games offer the ability to cap how many frames the game generates, even if your display is capable of showing more. This can have a substantial impact on your battery life, especially for games that need to perform a lot of complex calculationsfor every new frame.Graphical presets: If your game has a preset slider, try starting on the lowest preset and working your way up to see how the game performs. The Performance Overlay can be a huge help here, to see how much power your system is drawing on different presets. If your eye can't tell the difference, but your battery can, drop the settings.You can play around to find the right balance for you, and it will vary greatly by game. In some games, you might want more graphical detail, but fewer frames per second, while others would benefit from the exact opposite. Try a few options to see what works best.Adjust your refresh rate and FPS in tandem with the Frame Limit sliderAs mentioned above, the number of times your game updates the screen per second can be a huge factor in battery drain. This is affected by both the screen's refresh rateand your game's frames per second. To complicate matters further, your refresh rate can have an effect on your input latency, meaning it's important to strike a delicate balance.To simplify this, the Steam Deck has a slider called Frame Limit that can impose a limit on how many frames your game displays and strike that balance for you. It automatically adjusts your refresh rate to be evenly divisible by the FPS limit, avoiding unnecessaryrefreshes, while still maintaining the highest refresh rate possible to reduce input lag.It's a workaround that's placed somewhat late in the pipeline, and it's sometimes better to adjust your game's settings directly, but it simplifies a complicated process. If you'd rather adjust your display's refresh rate directly, you can toggle Disable Frame Limit and adjust the refresh rate from 45Hz to 90Hz directly. Keep in mind, though, you might still need to adjust some game settings to avoid generating frames your display will just throw out.Put a cap on your Thermal Design Power, if you mustTweaking your game's graphics settings can adjust your power consumption with scalpel-like precision. By comparison, the TDP limit is a hammer. But even hammers have their uses. By design, the TDP slider on the Steam Deck will put a hard limit on how much power the CPU/GPU can draw from the battery. You can't get much more direct battery savings than that.The problem is that games typically, you know, need power. And even games with really fine-grain settings don't generally ask the user to decide how much electricity to draw. For some, especially graphics-heavy games, putting a hard limit on TDP can cause massive performance drops or even game crashes.Less demanding games, though, can benefit from playing with this setting. A useful rule of thumb is that if the game you're playing is already struggling to maintain a consistent frame rate, try something else before touching TDP. But for games like Stardew Valley, where you're never really concerned with frame rate, you can experiment with lowering the TDP limit to 10W or even 5W to see how well the game performs.Of course, setting a TDP limit only matters if it's below what your game was using in the first place. This is another area where the performance overlay comes in handy. You can get a sense of how much power your system is drawing during your games, and use that to gauge how low you want your TDP limit to be.Don't forget per-game battery setting profilesOn top of all these settings, you can also set game-specific profiles to change your battery settings automatically based on the title you're playing. I can't recommend this feature enough, especially if you tend to play games with very different power demands. Few things are more annoying than forgetting you set a low TDP limit for a simple game, then launching a more demanding game that strains against that limit.To use this, it's one simple toggle on the Performance tab. Enable "Use per-game profile" and the Steam Deck will automatically create a profile for every game you use. You can disable this toggle to switch back to the default, if you ever decide you prefer one consistent profile.Keep in mind the profiles only account for the Steam Deck's settings itself, not any game-specific settings. But it's still a handy tool. It can be overwhelming to keep track of all the different buttons and knobs you can fiddle with to get extra battery life, but the Steam Deck manages to balance a ton of customization options with the simplicity of straight-forward, user-friendly tools so you can game longer.
    #five #ways #get #better #battery
    Five Ways to Get Better Battery Life From Your Steam Deck
    After the Nintendo Switch, the Steam Deck might be the most impressive gaming handheld of the last decade. It brings Steam games—most of which were initially designed to run on Windows PCs—to a remarkably designed portable device. The only problem? Battery life can be rough on some games. If you're struggling to stay charged, here are some tips to help you out.When it comes to your Steam Deck's battery life, you're going to notice a lot of variability, even from one game to another. AAA games that rely on high-end GPUs will typically guzzle power. On laptops or desktops, that's usually not as much of a concern, but on the Steam Deck—when those games run at all—they can burn through the battery quickly.So, while we have plenty of tips to get the best battery life, it's important to keep in mind some games will simply burn through your power no matter what. Fortunately, SteamOS is already pretty power efficient, and there are several handy tools to help.First, learn what, exactly, is draining your batteryThere are a few common culprits for battery drain in games, and it's helpful to understand them before diving into solutions. This is because what works for one game with minimal performance impact, could make another game unplayable. With that in mind, here are a few key things that drain your battery:Your hardware settings. The display on your Steam Deck is always a pretty big battery drain, and turning down the brightness can help. Wireless radios like wifi or Bluetooth are always sipping power, even if you're not using them, so you can sometimes turn these off if you don't need them.Your refresh rate and FPS. Your Steam Deck has to update the screen dozens of times every second, and for some games it might be way more than necessary. 60 to 90 frames per second might be necessary for a fast-paced game like Doom Eternal, but it's overkill for Stardew Valley.Your processor's TDP. Thermal Design Poweris a complicated metric, but it serves as a shorthand for how much power your processor is using. On the Steam Deck, you can limit this directly, which is a blunt way of saving battery, but it can help sometimes.The most useful tool to help you diagnose your biggest battery drains is the Performance Overlay. Press the three-dot menu button while in a game and navigate to the Performance section and you'll see an option to enable this overlay. There are several levels of detail, ranging from a simple frame rate counter, to real-time power consumption and temperature readouts. The Performance tab is also where you'll find several useful features we'll discuss, so it's good to make friends with this tab.Dive into your game's display settingsWhile the Steam Deck has a lot of useful features for managing battery life, you're still going to find some of your best options in your game's settings. Most games have presets to lower graphics settings with one quick toggle—like switching from Ultra to Medium—and some have even more advanced settings.This is particularly important to keep in mind if you play Steam games on multiple devices. Some games will try to sync settings between them, which can lead to your game rendering at a higher resolution or frame rate than the Steam Deck is even capable of displaying.In general, here are a few settings you should take a look at:Resolution: The Steam Deck has a 1280x800 resolution, so unless you're using an external monitor, there's no reason to set your game to a higher resolution. Most games won't let you go higher anyway, but it's worth it to double check. You can also go lower for some games, if you don't need as much detail.Frame rate: Many games offer the ability to cap how many frames the game generates, even if your display is capable of showing more. This can have a substantial impact on your battery life, especially for games that need to perform a lot of complex calculationsfor every new frame.Graphical presets: If your game has a preset slider, try starting on the lowest preset and working your way up to see how the game performs. The Performance Overlay can be a huge help here, to see how much power your system is drawing on different presets. If your eye can't tell the difference, but your battery can, drop the settings.You can play around to find the right balance for you, and it will vary greatly by game. In some games, you might want more graphical detail, but fewer frames per second, while others would benefit from the exact opposite. Try a few options to see what works best.Adjust your refresh rate and FPS in tandem with the Frame Limit sliderAs mentioned above, the number of times your game updates the screen per second can be a huge factor in battery drain. This is affected by both the screen's refresh rateand your game's frames per second. To complicate matters further, your refresh rate can have an effect on your input latency, meaning it's important to strike a delicate balance.To simplify this, the Steam Deck has a slider called Frame Limit that can impose a limit on how many frames your game displays and strike that balance for you. It automatically adjusts your refresh rate to be evenly divisible by the FPS limit, avoiding unnecessaryrefreshes, while still maintaining the highest refresh rate possible to reduce input lag.It's a workaround that's placed somewhat late in the pipeline, and it's sometimes better to adjust your game's settings directly, but it simplifies a complicated process. If you'd rather adjust your display's refresh rate directly, you can toggle Disable Frame Limit and adjust the refresh rate from 45Hz to 90Hz directly. Keep in mind, though, you might still need to adjust some game settings to avoid generating frames your display will just throw out.Put a cap on your Thermal Design Power, if you mustTweaking your game's graphics settings can adjust your power consumption with scalpel-like precision. By comparison, the TDP limit is a hammer. But even hammers have their uses. By design, the TDP slider on the Steam Deck will put a hard limit on how much power the CPU/GPU can draw from the battery. You can't get much more direct battery savings than that.The problem is that games typically, you know, need power. And even games with really fine-grain settings don't generally ask the user to decide how much electricity to draw. For some, especially graphics-heavy games, putting a hard limit on TDP can cause massive performance drops or even game crashes.Less demanding games, though, can benefit from playing with this setting. A useful rule of thumb is that if the game you're playing is already struggling to maintain a consistent frame rate, try something else before touching TDP. But for games like Stardew Valley, where you're never really concerned with frame rate, you can experiment with lowering the TDP limit to 10W or even 5W to see how well the game performs.Of course, setting a TDP limit only matters if it's below what your game was using in the first place. This is another area where the performance overlay comes in handy. You can get a sense of how much power your system is drawing during your games, and use that to gauge how low you want your TDP limit to be.Don't forget per-game battery setting profilesOn top of all these settings, you can also set game-specific profiles to change your battery settings automatically based on the title you're playing. I can't recommend this feature enough, especially if you tend to play games with very different power demands. Few things are more annoying than forgetting you set a low TDP limit for a simple game, then launching a more demanding game that strains against that limit.To use this, it's one simple toggle on the Performance tab. Enable "Use per-game profile" and the Steam Deck will automatically create a profile for every game you use. You can disable this toggle to switch back to the default, if you ever decide you prefer one consistent profile.Keep in mind the profiles only account for the Steam Deck's settings itself, not any game-specific settings. But it's still a handy tool. It can be overwhelming to keep track of all the different buttons and knobs you can fiddle with to get extra battery life, but the Steam Deck manages to balance a ton of customization options with the simplicity of straight-forward, user-friendly tools so you can game longer. #five #ways #get #better #battery
    LIFEHACKER.COM
    Five Ways to Get Better Battery Life From Your Steam Deck
    After the Nintendo Switch, the Steam Deck might be the most impressive gaming handheld of the last decade. It brings Steam games—most of which were initially designed to run on Windows PCs—to a remarkably designed portable device. The only problem? Battery life can be rough on some games. If you're struggling to stay charged, here are some tips to help you out.When it comes to your Steam Deck's battery life, you're going to notice a lot of variability, even from one game to another. AAA games that rely on high-end GPUs will typically guzzle power. On laptops or desktops, that's usually not as much of a concern, but on the Steam Deck—when those games run at all—they can burn through the battery quickly.So, while we have plenty of tips to get the best battery life, it's important to keep in mind some games will simply burn through your power no matter what. Fortunately, SteamOS is already pretty power efficient (at least compared to other operating systems), and there are several handy tools to help.First, learn what, exactly, is draining your batteryThere are a few common culprits for battery drain in games, and it's helpful to understand them before diving into solutions. This is because what works for one game with minimal performance impact, could make another game unplayable. With that in mind, here are a few key things that drain your battery:Your hardware settings. The display on your Steam Deck is always a pretty big battery drain, and turning down the brightness can help. Wireless radios like wifi or Bluetooth are always sipping power, even if you're not using them, so you can sometimes turn these off if you don't need them.Your refresh rate and FPS. Your Steam Deck has to update the screen dozens of times every second, and for some games it might be way more than necessary. 60 to 90 frames per second might be necessary for a fast-paced game like Doom Eternal, but it's overkill for Stardew Valley.Your processor's TDP. Thermal Design Power (or TDP) is a complicated metric, but it serves as a shorthand for how much power your processor is using. On the Steam Deck, you can limit this directly, which is a blunt way of saving battery, but it can help sometimes.The most useful tool to help you diagnose your biggest battery drains is the Performance Overlay. Press the three-dot menu button while in a game and navigate to the Performance section and you'll see an option to enable this overlay. There are several levels of detail, ranging from a simple frame rate counter, to real-time power consumption and temperature readouts. The Performance tab is also where you'll find several useful features we'll discuss (under Advanced View), so it's good to make friends with this tab.Dive into your game's display settingsWhile the Steam Deck has a lot of useful features for managing battery life, you're still going to find some of your best options in your game's settings. Most games have presets to lower graphics settings with one quick toggle—like switching from Ultra to Medium—and some have even more advanced settings.This is particularly important to keep in mind if you play Steam games on multiple devices. Some games will try to sync settings between them, which can lead to your game rendering at a higher resolution or frame rate than the Steam Deck is even capable of displaying.In general, here are a few settings you should take a look at:Resolution: The Steam Deck has a 1280x800 resolution, so unless you're using an external monitor, there's no reason to set your game to a higher resolution. Most games won't let you go higher anyway, but it's worth it to double check. You can also go lower for some games, if you don't need as much detail.Frame rate: Many games offer the ability to cap how many frames the game generates, even if your display is capable of showing more. This can have a substantial impact on your battery life, especially for games that need to perform a lot of complex calculations (like graphics-heavy shooters) for every new frame.Graphical presets: If your game has a preset slider, try starting on the lowest preset and working your way up to see how the game performs. The Performance Overlay can be a huge help here, to see how much power your system is drawing on different presets. If your eye can't tell the difference, but your battery can, drop the settings.You can play around to find the right balance for you, and it will vary greatly by game. In some games, you might want more graphical detail, but fewer frames per second, while others would benefit from the exact opposite. Try a few options to see what works best.Adjust your refresh rate and FPS in tandem with the Frame Limit sliderAs mentioned above, the number of times your game updates the screen per second can be a huge factor in battery drain. This is affected by both the screen's refresh rate (how many times the display physically updates the pixel you see) and your game's frames per second (or FPS, the number of times the GPU generates a new frame per second). To complicate matters further, your refresh rate can have an effect on your input latency, meaning it's important to strike a delicate balance.To simplify this, the Steam Deck has a slider called Frame Limit that can impose a limit on how many frames your game displays and strike that balance for you. It automatically adjusts your refresh rate to be evenly divisible by the FPS limit, avoiding unnecessary (and asynchronous) refreshes, while still maintaining the highest refresh rate possible to reduce input lag.It's a workaround that's placed somewhat late in the pipeline, and it's sometimes better to adjust your game's settings directly, but it simplifies a complicated process. If you'd rather adjust your display's refresh rate directly, you can toggle Disable Frame Limit and adjust the refresh rate from 45Hz to 90Hz directly. Keep in mind, though, you might still need to adjust some game settings to avoid generating frames your display will just throw out.Put a cap on your Thermal Design Power (TDP), if you mustTweaking your game's graphics settings can adjust your power consumption with scalpel-like precision. By comparison, the TDP limit is a hammer. But even hammers have their uses. By design, the TDP slider on the Steam Deck will put a hard limit on how much power the CPU/GPU can draw from the battery. You can't get much more direct battery savings than that.The problem is that games typically, you know, need power. And even games with really fine-grain settings don't generally ask the user to decide how much electricity to draw. For some, especially graphics-heavy games, putting a hard limit on TDP can cause massive performance drops or even game crashes.Less demanding games, though, can benefit from playing with this setting. A useful rule of thumb is that if the game you're playing is already struggling to maintain a consistent frame rate, try something else before touching TDP. But for games like Stardew Valley, where you're never really concerned with frame rate, you can experiment with lowering the TDP limit to 10W or even 5W to see how well the game performs.Of course, setting a TDP limit only matters if it's below what your game was using in the first place. This is another area where the performance overlay comes in handy. You can get a sense of how much power your system is drawing during your games, and use that to gauge how low you want your TDP limit to be.Don't forget per-game battery setting profilesOn top of all these settings, you can also set game-specific profiles to change your battery settings automatically based on the title you're playing. I can't recommend this feature enough, especially if you tend to play games with very different power demands. Few things are more annoying than forgetting you set a low TDP limit for a simple game, then launching a more demanding game that strains against that limit.To use this, it's one simple toggle on the Performance tab. Enable "Use per-game profile" and the Steam Deck will automatically create a profile for every game you use. You can disable this toggle to switch back to the default, if you ever decide you prefer one consistent profile.Keep in mind the profiles only account for the Steam Deck's settings itself, not any game-specific settings. But it's still a handy tool. It can be overwhelming to keep track of all the different buttons and knobs you can fiddle with to get extra battery life, but the Steam Deck manages to balance a ton of customization options with the simplicity of straight-forward, user-friendly tools so you can game longer.
    0 Commentarii 0 Distribuiri
  • Asus Refreshes ROG Strix, ROG Zephyrus, TUF Gaming Laptops With Nvidia GeForce RTX 5060 Laptop GPU

    Asus has refreshed its ROG Strix, ROG Zephyrus, and TUF gaming laptops with the Nvidia GeForce RTX 5060 GPUs at Computex 2025 in Taipei. The Taiwanese tech giant has brought Nvidia's mid-range graphics to its ROG Strix G16, ROG Strix G18, ROG Zephyrus G16, ROG Zephyrus G14, TUF Gaming A18, TUF Gaming A16, TUF Gaming F16, and TUF Gaming A14. All models in the lineup run on either an Intel or an AMD CPU. They pack up to 90Wh batteries. The ROG Strix G18 and ROG Strix G16 come with up to an Intel Core Ultra 9 275HX processor or an AMD Ryzen 9 8940HX processor.Asus is yet to share detailed pricing of the new Asus ROG Strix G16, G18, ROG Zephyrus G16, G14 and Asus Tuf Gaming A18, A16, A14 and F16 laptops with Nvidia GeForce RTX 5060 graphics cards. They are confirmed to be available in Canada in the coming weeks through the Asus store, Amazon, Best Buy, Canada Computers, among others.Asus ROG Strix G18, ROG Strix G16 SpecificationsThe Asus ROG Strix G16 and G18 come with Windows 11 Home and in Intel or AMD CPU configurations. Users can configure it with up to an Intel Core Ultra 9 275HX CPU or an AMD Ryzen 9 8940HX processor paired with the new Nvidia GeForce RTX 5060 laptop GPU with 8GB GDDR7. They support up to 32GB of DDR5 RAM and up to 1TB of PCIe 4.0 NVMe M.2 SSD storageROG Strix models offer ROG Nebula displays with up to 240Hz refresh rate, 3ms response time, 500 nits peak brightness, and 100 percent DCI-P3 coverage. The Asus ROG Strix G18 has an 18-inch 2.5K screen, while the ROG Strix G16 has a 16-inch OLED 2.5K screen. They offer advanced Tri-Fan cooling and a 1080-pixel Webcam. They pack a 90Wh battery.Asus ROG Zephyrus G16, ROG Zephyrus G14SpecificationsThe Asus ROG Zephyrus G16 is equipped with up to an Intel Core Ultra 9 285H CPU or AMD Ryzen AI 7 HX 350 processor, paired with up to 32GB of LPDDR5X 7467 memory, and 1TB of PCIe 4.0 NVMe M.2 SSD storage.The ROG Zephyrus, on the other hand, can be equipped with up to an AMD Ryzen AI 9 HX 370 processor, up to 32GB of LPDDR5X 8,000Mhz memory, and 1TB PCIe 4.0 NVMe M.2 SSD storage. Both models ship with a new Nvidia GeForce RTX 5060 laptop GPU and 8GB GDDR7 graphics memory.Asus has packed Nebula OLED displays with 500 nits peak brightness on the ROG Zephyrus G14 and G16. The latter has a 16-inch 2.5K display with 240Hz refresh rate, while the ROG Zephyrus G14 sports a 14-inch 3K display with 120Hz refresh rate. They offer Bluetooth 5.4 and Wi-Fi 7 connectivity. You also get a 1080p full-HD IR Webcam.The ROG Zephyrus G1473Wh battery, while the ROG Zephyrus G16 has 90Wh battery.Asus TUF Gaming A18, TUF Gaming A16, TUF Gaming F16, TUF Gaming A14 SpecificationsThe Asus TUF Gaming A18 supports up to an AMD Ryzen 7 260 processor and the Nvidia GeForce RTX 5060 laptop GPU. It is equipped with up to 32GB of DDR5 RAM and 1TB of PCIe Gen 4 storage. It boasts an 18-inch display with 144Hz refresh rate, a 16:10 aspect ratio, 300 nits of peak brightness, and 100 percent coverage of the DCI-P3 colour gamut. The laptop houses a 90Wh battery.Asus TUF Gaming A16 ships with up to AMD Ryzen 9 8940HX, while the TUF Gaming F16 can be configured with up to an Intel Core i7 14650HX processor. Both models feature the Nvidia GeForce RTX 5060 laptop GPU.Both TUF Gaming A16 and TUF Gaming F16 have 16-inch displays with up to 2.5K resolution, 165Hz refresh rate, and up to 400 nits peak brightness. They pack 32GB DDR5 RAM and up to 1TB PCIe 4.0 SSD storage.They offer military-grade durability and second-generation Arc Flow fans for thermal management. They have 90Whr battery units.Asus TUF Gaming A14 SpecificationsThe Asus TUF Gaming A14 ships with Windows 11 Home and has a 14-inch 2.5K IPS display with 165Hz refresh rate, 400 nits peak brightness, and 3ms response time. It also packs the new generation Nvidia GeForce RTX 5060 laptop GPU paired with up to an AMD Ryzen AI 7 350 processor with up to 16GB LPDDR5X onboard RAM and 1 TB M.2 2280 PCIe 4.0 SSD storage.For connectivity, the Asus TUF Gaming A14 has Bluetooth 5.3 and Wi-Fi 6E. It is backed by a 73Wh battery.
    #asus #refreshes #rog #strix #zephyrus
    Asus Refreshes ROG Strix, ROG Zephyrus, TUF Gaming Laptops With Nvidia GeForce RTX 5060 Laptop GPU
    Asus has refreshed its ROG Strix, ROG Zephyrus, and TUF gaming laptops with the Nvidia GeForce RTX 5060 GPUs at Computex 2025 in Taipei. The Taiwanese tech giant has brought Nvidia's mid-range graphics to its ROG Strix G16, ROG Strix G18, ROG Zephyrus G16, ROG Zephyrus G14, TUF Gaming A18, TUF Gaming A16, TUF Gaming F16, and TUF Gaming A14. All models in the lineup run on either an Intel or an AMD CPU. They pack up to 90Wh batteries. The ROG Strix G18 and ROG Strix G16 come with up to an Intel Core Ultra 9 275HX processor or an AMD Ryzen 9 8940HX processor.Asus is yet to share detailed pricing of the new Asus ROG Strix G16, G18, ROG Zephyrus G16, G14 and Asus Tuf Gaming A18, A16, A14 and F16 laptops with Nvidia GeForce RTX 5060 graphics cards. They are confirmed to be available in Canada in the coming weeks through the Asus store, Amazon, Best Buy, Canada Computers, among others.Asus ROG Strix G18, ROG Strix G16 SpecificationsThe Asus ROG Strix G16 and G18 come with Windows 11 Home and in Intel or AMD CPU configurations. Users can configure it with up to an Intel Core Ultra 9 275HX CPU or an AMD Ryzen 9 8940HX processor paired with the new Nvidia GeForce RTX 5060 laptop GPU with 8GB GDDR7. They support up to 32GB of DDR5 RAM and up to 1TB of PCIe 4.0 NVMe M.2 SSD storageROG Strix models offer ROG Nebula displays with up to 240Hz refresh rate, 3ms response time, 500 nits peak brightness, and 100 percent DCI-P3 coverage. The Asus ROG Strix G18 has an 18-inch 2.5K screen, while the ROG Strix G16 has a 16-inch OLED 2.5K screen. They offer advanced Tri-Fan cooling and a 1080-pixel Webcam. They pack a 90Wh battery.Asus ROG Zephyrus G16, ROG Zephyrus G14SpecificationsThe Asus ROG Zephyrus G16 is equipped with up to an Intel Core Ultra 9 285H CPU or AMD Ryzen AI 7 HX 350 processor, paired with up to 32GB of LPDDR5X 7467 memory, and 1TB of PCIe 4.0 NVMe M.2 SSD storage.The ROG Zephyrus, on the other hand, can be equipped with up to an AMD Ryzen AI 9 HX 370 processor, up to 32GB of LPDDR5X 8,000Mhz memory, and 1TB PCIe 4.0 NVMe M.2 SSD storage. Both models ship with a new Nvidia GeForce RTX 5060 laptop GPU and 8GB GDDR7 graphics memory.Asus has packed Nebula OLED displays with 500 nits peak brightness on the ROG Zephyrus G14 and G16. The latter has a 16-inch 2.5K display with 240Hz refresh rate, while the ROG Zephyrus G14 sports a 14-inch 3K display with 120Hz refresh rate. They offer Bluetooth 5.4 and Wi-Fi 7 connectivity. You also get a 1080p full-HD IR Webcam.The ROG Zephyrus G1473Wh battery, while the ROG Zephyrus G16 has 90Wh battery.Asus TUF Gaming A18, TUF Gaming A16, TUF Gaming F16, TUF Gaming A14 SpecificationsThe Asus TUF Gaming A18 supports up to an AMD Ryzen 7 260 processor and the Nvidia GeForce RTX 5060 laptop GPU. It is equipped with up to 32GB of DDR5 RAM and 1TB of PCIe Gen 4 storage. It boasts an 18-inch display with 144Hz refresh rate, a 16:10 aspect ratio, 300 nits of peak brightness, and 100 percent coverage of the DCI-P3 colour gamut. The laptop houses a 90Wh battery.Asus TUF Gaming A16 ships with up to AMD Ryzen 9 8940HX, while the TUF Gaming F16 can be configured with up to an Intel Core i7 14650HX processor. Both models feature the Nvidia GeForce RTX 5060 laptop GPU.Both TUF Gaming A16 and TUF Gaming F16 have 16-inch displays with up to 2.5K resolution, 165Hz refresh rate, and up to 400 nits peak brightness. They pack 32GB DDR5 RAM and up to 1TB PCIe 4.0 SSD storage.They offer military-grade durability and second-generation Arc Flow fans for thermal management. They have 90Whr battery units.Asus TUF Gaming A14 SpecificationsThe Asus TUF Gaming A14 ships with Windows 11 Home and has a 14-inch 2.5K IPS display with 165Hz refresh rate, 400 nits peak brightness, and 3ms response time. It also packs the new generation Nvidia GeForce RTX 5060 laptop GPU paired with up to an AMD Ryzen AI 7 350 processor with up to 16GB LPDDR5X onboard RAM and 1 TB M.2 2280 PCIe 4.0 SSD storage.For connectivity, the Asus TUF Gaming A14 has Bluetooth 5.3 and Wi-Fi 6E. It is backed by a 73Wh battery. #asus #refreshes #rog #strix #zephyrus
    WWW.GADGETS360.COM
    Asus Refreshes ROG Strix, ROG Zephyrus, TUF Gaming Laptops With Nvidia GeForce RTX 5060 Laptop GPU
    Asus has refreshed its ROG Strix, ROG Zephyrus, and TUF gaming laptops with the Nvidia GeForce RTX 5060 GPUs at Computex 2025 in Taipei. The Taiwanese tech giant has brought Nvidia's mid-range graphics to its ROG Strix G16, ROG Strix G18, ROG Zephyrus G16, ROG Zephyrus G14 (2025), TUF Gaming A18, TUF Gaming A16, TUF Gaming F16, and TUF Gaming A14. All models in the lineup run on either an Intel or an AMD CPU. They pack up to 90Wh batteries. The ROG Strix G18 and ROG Strix G16 come with up to an Intel Core Ultra 9 275HX processor or an AMD Ryzen 9 8940HX processor.Asus is yet to share detailed pricing of the new Asus ROG Strix G16, G18, ROG Zephyrus G16, G14 and Asus Tuf Gaming A18, A16, A14 and F16 laptops with Nvidia GeForce RTX 5060 graphics cards. They are confirmed to be available in Canada in the coming weeks through the Asus store, Amazon, Best Buy, Canada Computers, among others.Asus ROG Strix G18, ROG Strix G16 SpecificationsThe Asus ROG Strix G16 and G18 come with Windows 11 Home and in Intel or AMD CPU configurations. Users can configure it with up to an Intel Core Ultra 9 275HX CPU or an AMD Ryzen 9 8940HX processor paired with the new Nvidia GeForce RTX 5060 laptop GPU with 8GB GDDR7. They support up to 32GB of DDR5 RAM and up to 1TB of PCIe 4.0 NVMe M.2 SSD storageROG Strix models offer ROG Nebula displays with up to 240Hz refresh rate, 3ms response time, 500 nits peak brightness, and 100 percent DCI-P3 coverage. The Asus ROG Strix G18 has an 18-inch 2.5K screen, while the ROG Strix G16 has a 16-inch OLED 2.5K screen. They offer advanced Tri-Fan cooling and a 1080-pixel Webcam. They pack a 90Wh battery.Asus ROG Zephyrus G16, ROG Zephyrus G14 (2025) SpecificationsThe Asus ROG Zephyrus G16 is equipped with up to an Intel Core Ultra 9 285H CPU or AMD Ryzen AI 7 HX 350 processor, paired with up to 32GB of LPDDR5X 7467 memory, and 1TB of PCIe 4.0 NVMe M.2 SSD storage.The ROG Zephyrus, on the other hand, can be equipped with up to an AMD Ryzen AI 9 HX 370 processor, up to 32GB of LPDDR5X 8,000Mhz memory, and 1TB PCIe 4.0 NVMe M.2 SSD storage. Both models ship with a new Nvidia GeForce RTX 5060 laptop GPU and 8GB GDDR7 graphics memory.Asus has packed Nebula OLED displays with 500 nits peak brightness on the ROG Zephyrus G14 and G16. The latter has a 16-inch 2.5K display with 240Hz refresh rate, while the ROG Zephyrus G14 sports a 14-inch 3K display with 120Hz refresh rate. They offer Bluetooth 5.4 and Wi-Fi 7 connectivity. You also get a 1080p full-HD IR Webcam.The ROG Zephyrus G14 (2025) 73Wh battery, while the ROG Zephyrus G16 has 90Wh battery.Asus TUF Gaming A18, TUF Gaming A16, TUF Gaming F16, TUF Gaming A14 SpecificationsThe Asus TUF Gaming A18 supports up to an AMD Ryzen 7 260 processor and the Nvidia GeForce RTX 5060 laptop GPU. It is equipped with up to 32GB of DDR5 RAM and 1TB of PCIe Gen 4 storage. It boasts an 18-inch display with 144Hz refresh rate, a 16:10 aspect ratio, 300 nits of peak brightness, and 100 percent coverage of the DCI-P3 colour gamut. The laptop houses a 90Wh battery.Asus TUF Gaming A16 ships with up to AMD Ryzen 9 8940HX, while the TUF Gaming F16 can be configured with up to an Intel Core i7 14650HX processor. Both models feature the Nvidia GeForce RTX 5060 laptop GPU.Both TUF Gaming A16 and TUF Gaming F16 have 16-inch displays with up to 2.5K resolution, 165Hz refresh rate, and up to 400 nits peak brightness. They pack 32GB DDR5 RAM and up to 1TB PCIe 4.0 SSD storage.They offer military-grade durability and second-generation Arc Flow fans for thermal management. They have 90Whr battery units.Asus TUF Gaming A14 SpecificationsThe Asus TUF Gaming A14 ships with Windows 11 Home and has a 14-inch 2.5K IPS display with 165Hz refresh rate, 400 nits peak brightness, and 3ms response time. It also packs the new generation Nvidia GeForce RTX 5060 laptop GPU paired with up to an AMD Ryzen AI 7 350 processor with up to 16GB LPDDR5X onboard RAM and 1 TB M.2 2280 PCIe 4.0 SSD storage.For connectivity, the Asus TUF Gaming A14 has Bluetooth 5.3 and Wi-Fi 6E. It is backed by a 73Wh battery.
    0 Commentarii 0 Distribuiri
  • OSU's Open Source Lab Eyes Infrastructure Upgrades and Sustainability After Recent Funding Success

    It's a nonprofit that's provide hosting for the Linux Foundation, the Apache Software Foundation, Drupal, Firefox, and 160 other projects — delivering nearly 430 terabytes of information every month.But hosting only provides about 20% of its income, with the rest coming from individual and corporate donors. "Over the past several years, we have been operating at a deficit due to a decline in corporate donations," the Open Source Lab's director announced in late April.

    It's part of the CS/electrical engineering department at Oregon State University, and while the department "has generously filled this gap, recent changes in university funding makes our current funding model no longer sustainable. Unless we secure in committed funds, the OSL will shut down later this year."

    But "Thankfully, the call for support worked, paving the way for the OSU Open Source Lab to look ahead, into what the future holds for them," reports the blog It's FOSS.

    "Following our OSL Future post, the community response has been incredible!" posted director Lance Albertson. "Thanks to your amazing support, our team is funded for the next year. This is a huge relief and lets us focus on building a truly self-sustaining OSL."

    To get there, we're tackling two big interconnected goals:

    1. Finding a new, cost-effective physical home for our core infrastructure, ideally with more modern hardware.
    2. Securing multi-year funding commitments to cover all our operations, including potential new infrastructure costs and hardware refreshes.

    Our current data center is over 20 years old and needs to be replaced soon. With Oregon State University evaluating the future of this facility, it's very likely we'll need to relocate in the near future. While migrating to the State of Oregon's data center is one option, it comes with significant new costs. This makes finding free or very low-cost hostinga huge opportunity for our long-term sustainability. More power-efficient hardware would also help us shrink our footprint.

    Speaking of hardware, refreshing some of our older gear during a move would be a game-changer. We don't need brand new, but even a few-generations-old refurbished systems would boost performance and efficiency.The dream? A data center partner donating space and cycled-out hardware. Our overall infrastructure strategy is flexible. We're enhancing our OpenStack/Ceph platforms and exploring public cloud credits and other donated compute capacity. But whatever the resource, it needs to fit our goals and come with multi-year commitments for stability. And, a physical space still offers unique value, especially the invaluable hands-on data center experience for our students....ur big focus this next year is locking in ongoing support — think annualized pledges, different kinds of regular income, and other recurring help. This is vital, especially with potential new data center costs and hardware needs. Getting this right means we can stop worrying about short-term funding and plan for the future: investing in our tech and people, growing our awesome student programs, and serving the FOSS community. We're looking for partners, big and small, who get why foundational open source infrastructure matters and want to help us build this sustainable future together.

    The It's FOSS blog adds that "With these prerequisites in place, the OSUOSL intends to expand their student program, strengthen their managed services portfolio for open source projects, introduce modern tooling like Kubernetes and Terraform, and encourage more community volunteers to actively contribute."

    Thanks to long-time Slashdot reader I'm just joshin for suggesting the story.

    of this story at Slashdot.
    #osu039s #open #source #lab #eyes
    OSU's Open Source Lab Eyes Infrastructure Upgrades and Sustainability After Recent Funding Success
    It's a nonprofit that's provide hosting for the Linux Foundation, the Apache Software Foundation, Drupal, Firefox, and 160 other projects — delivering nearly 430 terabytes of information every month.But hosting only provides about 20% of its income, with the rest coming from individual and corporate donors. "Over the past several years, we have been operating at a deficit due to a decline in corporate donations," the Open Source Lab's director announced in late April. It's part of the CS/electrical engineering department at Oregon State University, and while the department "has generously filled this gap, recent changes in university funding makes our current funding model no longer sustainable. Unless we secure in committed funds, the OSL will shut down later this year." But "Thankfully, the call for support worked, paving the way for the OSU Open Source Lab to look ahead, into what the future holds for them," reports the blog It's FOSS. "Following our OSL Future post, the community response has been incredible!" posted director Lance Albertson. "Thanks to your amazing support, our team is funded for the next year. This is a huge relief and lets us focus on building a truly self-sustaining OSL." To get there, we're tackling two big interconnected goals: 1. Finding a new, cost-effective physical home for our core infrastructure, ideally with more modern hardware. 2. Securing multi-year funding commitments to cover all our operations, including potential new infrastructure costs and hardware refreshes. Our current data center is over 20 years old and needs to be replaced soon. With Oregon State University evaluating the future of this facility, it's very likely we'll need to relocate in the near future. While migrating to the State of Oregon's data center is one option, it comes with significant new costs. This makes finding free or very low-cost hostinga huge opportunity for our long-term sustainability. More power-efficient hardware would also help us shrink our footprint. Speaking of hardware, refreshing some of our older gear during a move would be a game-changer. We don't need brand new, but even a few-generations-old refurbished systems would boost performance and efficiency.The dream? A data center partner donating space and cycled-out hardware. Our overall infrastructure strategy is flexible. We're enhancing our OpenStack/Ceph platforms and exploring public cloud credits and other donated compute capacity. But whatever the resource, it needs to fit our goals and come with multi-year commitments for stability. And, a physical space still offers unique value, especially the invaluable hands-on data center experience for our students....ur big focus this next year is locking in ongoing support — think annualized pledges, different kinds of regular income, and other recurring help. This is vital, especially with potential new data center costs and hardware needs. Getting this right means we can stop worrying about short-term funding and plan for the future: investing in our tech and people, growing our awesome student programs, and serving the FOSS community. We're looking for partners, big and small, who get why foundational open source infrastructure matters and want to help us build this sustainable future together. The It's FOSS blog adds that "With these prerequisites in place, the OSUOSL intends to expand their student program, strengthen their managed services portfolio for open source projects, introduce modern tooling like Kubernetes and Terraform, and encourage more community volunteers to actively contribute." Thanks to long-time Slashdot reader I'm just joshin for suggesting the story. of this story at Slashdot. #osu039s #open #source #lab #eyes
    NEWS.SLASHDOT.ORG
    OSU's Open Source Lab Eyes Infrastructure Upgrades and Sustainability After Recent Funding Success
    It's a nonprofit that's provide hosting for the Linux Foundation, the Apache Software Foundation, Drupal, Firefox, and 160 other projects — delivering nearly 430 terabytes of information every month. (It's currently hosting Debian, Fedora, and Gentoo Linux.) But hosting only provides about 20% of its income, with the rest coming from individual and corporate donors (including Google and IBM). "Over the past several years, we have been operating at a deficit due to a decline in corporate donations," the Open Source Lab's director announced in late April. It's part of the CS/electrical engineering department at Oregon State University, and while the department "has generously filled this gap, recent changes in university funding makes our current funding model no longer sustainable. Unless we secure $250,000 in committed funds, the OSL will shut down later this year." But "Thankfully, the call for support worked, paving the way for the OSU Open Source Lab to look ahead, into what the future holds for them," reports the blog It's FOSS. "Following our OSL Future post, the community response has been incredible!" posted director Lance Albertson. "Thanks to your amazing support, our team is funded for the next year. This is a huge relief and lets us focus on building a truly self-sustaining OSL." To get there, we're tackling two big interconnected goals: 1. Finding a new, cost-effective physical home for our core infrastructure, ideally with more modern hardware. 2. Securing multi-year funding commitments to cover all our operations, including potential new infrastructure costs and hardware refreshes. Our current data center is over 20 years old and needs to be replaced soon. With Oregon State University evaluating the future of this facility, it's very likely we'll need to relocate in the near future. While migrating to the State of Oregon's data center is one option, it comes with significant new costs. This makes finding free or very low-cost hosting (ideally between Eugene and Portland for ~13-20 racks) a huge opportunity for our long-term sustainability. More power-efficient hardware would also help us shrink our footprint. Speaking of hardware, refreshing some of our older gear during a move would be a game-changer. We don't need brand new, but even a few-generations-old refurbished systems would boost performance and efficiency. (Huge thanks to the Yocto Project and Intel for a recent hardware donation that showed just how impactful this is!) The dream? A data center partner donating space and cycled-out hardware. Our overall infrastructure strategy is flexible. We're enhancing our OpenStack/Ceph platforms and exploring public cloud credits and other donated compute capacity. But whatever the resource, it needs to fit our goals and come with multi-year commitments for stability. And, a physical space still offers unique value, especially the invaluable hands-on data center experience for our students.... [O]ur big focus this next year is locking in ongoing support — think annualized pledges, different kinds of regular income, and other recurring help. This is vital, especially with potential new data center costs and hardware needs. Getting this right means we can stop worrying about short-term funding and plan for the future: investing in our tech and people, growing our awesome student programs, and serving the FOSS community. We're looking for partners, big and small, who get why foundational open source infrastructure matters and want to help us build this sustainable future together. The It's FOSS blog adds that "With these prerequisites in place, the OSUOSL intends to expand their student program, strengthen their managed services portfolio for open source projects, introduce modern tooling like Kubernetes and Terraform, and encourage more community volunteers to actively contribute." Thanks to long-time Slashdot reader I'm just joshin for suggesting the story. Read more of this story at Slashdot.
    0 Commentarii 0 Distribuiri
  • 10 incredible new tabletop games for you to play in summer 2025

    10 incredible new tabletop games for you to play in summer 2025

    Lucy Orr

    Published May 16, 2025 2:00pm

    Pokémon TCG is in for a hot summerGameCentral looks at the most exciting new summer tabletop releases, including adaptations of Final Fantasy, Assassin’s Creed, and Citizen Sleeper
    The tabletop games industry has become an unlikely victim of Trump’s tariff trade war. Just after it was recovering from Covid supply chain issues it now sees itself hit with manufacturing issues and an uncertain future. It’s so bad that board game developer CMON has already shut up shop and Stonemaier, famous for the hugely successful Wingspan, is suing the Trump administration. Meanwhile Cephalofair, developer of fan favourite Gloomhaven, can’t even get their product on to the shelves, as it’s stuck in China.
    While I don’t expect any empty shelves at the UK Games Expo this month, there’s definitely panic in the air. Although one company that doesn’t seem to be too concerned is Games Workshop, who have always manufactured most of their products in the UK – although accessories and terrain for your favourite Warhammer army might become harder to find in the future.
    After everyone got into it during lockdown, the tabletop industry was riding a huge boom, with recent industry projections of the market doubling to around £20 billion by 2030. But US tariffs have left the industry reeling and could see the price for tabletop games around the world rise significantly.
    Despite the doom and gloom there’s plenty of exciting new products already out this year and many more on the way from massive brands such as Pokémon and Disney, as well as new Kickstarters that you can print yourself, thereby completely bypassing any manufacturing issues.
    There’s also never been a better time to support your local board game shop or Dungeons & Dragons club, as their overheads rise, so here’s a selection of games you might find on their shelves right now and those coming soon, that I’m excited to play during the summer break.
    Magic: The Gathering – Final Fantasy
    Square Enix’s iconic Final Fantasy franchise is stepping onto the cardboard battlefield with Magic: The Gathering, and the result is as gloriously nostalgic as it is mechanically exciting. The Universes Beyond initiative brings beloved characters, summons, and settings into Magic’s gameplay, with Cloud Strife leading the charge.
    The Final Fantasy Starter Kit offers two pre-constructed 60 card decks, packed with flavour and function and perfect for newcomers attracted by the sight of a Chocobo or Moogle. Each deck includes five rares, a foil mythic legendary, deck boxes, and digital codes for Magic: The Gathering Arena. But the real draw? The cards themselves. They are stunning.
    Cloud channels Final Fantasy 7’s environmental and emotional themes, with equipment-focused synergies that feel spot-on. Stiltzkin the Moogle is a flavourful support piece for donation strategies and the terrifying Tonberry arrives with Deathtouch and First Strike, which is fitting for a creature that’s haunted players for decades.
    With gorgeous full art treatments by amazing artists such as Takahashi Kazuya and Yoshitaka Amano, underpinned by clever mechanical call backs, this crossover is more than fan service, it’s a lovingly crafted bridge between two gaming giants that’s bound to fly of the shelves.
    £15.99 on Amazon – releases June 13
    Warhammer 40,000 Kill Team: Typhon
    There’s a possible future where Games Workshop is the final tabletop company left standing, as they dodge tariffs and take down licence infringers like a particularly vicious swarm of tyrranids. You too can act out this future in Kill Team: Typhon, which delivers the chaos of Warhammer 40,000 in a claustrophobic, subterranean brawl between flesh-rending Tyranid Raveners and a desperate Adeptus Mechanicus Battleclade.
    The latest Kill Team box looks stunning and turns up the tension with asymmetric forces: a lean, elite brood of Raveners – deadly melee predators that can tunnel through terrain – versus a jury-rigged Mechanicus strike team, built from repurposed servitors and guided by a technoarchaeologist scouring ancient relics.
    But this isn’t Helldivers 2. The Raveners can be customised into deadly variants like the Tremorscythe and Felltalon, each armed with bio-engineered weapons designed for close-quarters carnage. On the other side, the Mechanicus bring massed, lobotomised firepower: breachers, gunners, medics, and overseers to allow for some tactical coordination.
    Also included are Hormagaunts, the swarming Tyrranids shock troops and new Tyranid-infested terrain – always the standout feature of these kill team boxes, in my opinion, and perfect for narrative or larger 40K battles. Typhon embraces Kill Team’s strength: cinematic asymmetry and high stakes.
    Price TBA – releases June

    Scalpers are going to love thisPokémon Trading Card Game Scarlet & Violet – Destined Rivals Pokémon Center Elite Trainer Box
    This red and black box is going for gangbusters on eBay, before it’s even supposed to be out. The Scarlet & Violet Destined Rivals set reintroduces the beloved Trainer’s Pokémon mechanic, but now with a twist where players can align with iconic duos like Arven’s Mabosstiff and Ho-Oh ex or Cynthia and Garchomp ex. Or fall in with Team Rocket under Giovanni’s command, fielding heavy hitters like Mewtwo ex.
    It’s a rich throwback to the Gym Heroes era, with cards that spotlight specific trainer and pokémon bonds, each emblazoned with the trainer’s name. The expansion includes 83 cards branded under Team Rocket, 17 new Pokémon ex cards, and a trove of high rarity collectibles: 23 illustration rares, 11 special illustration rares, and six hyper rare gold-etched cards.
    But the pre-launch hasn’t been all Sunflora and Jigglypuffs. Since its full reveal on March 24, pre-orders have sparked a frenzy amongst scalpers, with sellouts and early store hiccups are already marring the rollout. Still, between the nostalgia bait and villainous charm, Destined Rivals is shaping up to be one of 2025’s hottest trading card releases.
    RRP £54.99 – releases May 30
    Finspan
    One game I just can’t put back on the shelf at the moment is Finspan; who’d of thought fish could be so much fun? Since Wingspan took flight in 2019, it’s become a modern classic: part art piece, part engine builder, and a benchmark for gateway games. Finspan, the third entry in the series, swaps feathers for fins, inviting players to explore marine ecosystems across oceanic zones in a beautifully illustrated, medium-lightweight game that last about 45 minutes.
    Mechanically, Finspan is more accessible than Wingspan, thanks to forgiving resource generation and a gentler deck structure. Strategic depth is still there, whether you chase high value fish, go wide with schools, or balance both. It’s more of a solo puzzle, and less about blocking opponents, which might suit more casual groups. Replayability is strong, and with one to five player support it scales well.
    The art is stunning, and the fish facts make you feel like a would-be marine biologist. But I missed the funny components, so this is missing some of that Wingspan magic. Finspan is a fantastic entry point to the series and an accessible and fun addition to the franchise. It’s not as perfect as Wingspan, but it swims confidently in its own current. Could we see whale and crustacean expansions? I hope so.
    RRP £41.99 – available now
    Star Wars Unlimited – Jump To Lightspeed
    While you might have missed the Star Wars Celebration in Japan last month, and be bereft over the end of Andor Season 2, don’t worry – there are plenty of alternatives for Star Wars fandom. Fantasy Flight Games is revving its hyperdrive with Jump To Lightspeed, the fourth set for trading card game Star Wars Unlimited. A dramatic shift from previous ground-focused releases, this set propels players into orbit, with an emphasis on space combat and a host of gameplay refinements.
    Headlining the release are two new Spotlight Decks, each featuring a classic rivalry, such as Han Solo vs. Boba Fett. These 50-card preconstructed decks introduce Pilots, a brand-new card type that changes how space units operate. Pilots can be deployed to enhance ships with improved health and damage dealing abilities, offering fresh tactical depth.
    The set also debuts the Piloting keyword, a hyperspace mechanic, and five special rarity cards per deck, including one new Leader per Spotlight release. It’s a sleek continuation of Unlimited’s mission, with deep strategy wrapped in Star Wars flair.
    Fantasy Flight isn’t just releasing a new set; they’re effectively entering year two of the game with a soft reboot, that smartly rebalances and refreshes. For new and returning players, the standalone Spotlight Decks offer a refined on-ramp into the meta, while the stellar art and fan favourite make this one of the best sci-fi trading card games around.
    RRP: £34.99 – available now

    An indie tabletop game adapting an indie video gameCitizen Sleeper: Spindlejack
    I was gutted I didn’t manage to nab some physical Cycles of the Eye Data-Cloud dice from Lost in Cult, before they sold out, so I was ecstatic to see the shadow drop of Citizen Sleeper: Spindlejack, especially as it’s completely free.
    It’s a lean, solo tabletop role-player set in the neon-drenched corridors of the Far Spindle, part of the Citizen Sleeper universe. Released on May 5th, it’s a print-and-play experience that trades dense narrative for kinetic delivery runs and tactical movement through a crumbling space station.
    Inspired by Kadet, the courier from Citizen Sleeper 2, Spindlejack casts you as one of the eponymous daredevils: airbike mounted messengers who dodge cargo haulers and urban decay to deliver sensitive payloads in a haunted, half-dead network. The draw? Not just the cryo or reputation, but the thrill, the competition, and the culture.
    Using your 10 six-sided dice, a pencil, and some printed sheets you’ll chart courses across randomly generated intersections, upgrade your bike, and edge toward Spindlejack legend status. Designed by Gareth Damian Martin, with stylish, gritty art from Guillaume Singelin, this is a tight, systems-focused dive into a beloved sci-fi setting.
    No campaign scheduling. No group required. Just you, your dice, and the Spindle’s rusted arteries. For fans of Citizen Sleeper or those craving a focused, atmospheric solo experience, Spindlejack is a no-brainer. DIY or DIE.
    Available now

    Disney Lorcana has become a certified hitDisney Lorcana – Reign Of Jafar Set 8 and Illumineer’s Quest: Palace Heist
    The internet has been on fire with the announcement that forthcoming Lorcana sets are to include Darkwing Duck and The Goofy Movie cast, emphasising that Disney Lorcana has become something of a juggernaut since its 2023 debut, captivating collectors and competitive players with a blend of nostalgic charm and evolving mechanics – judging scandals aside.
    During the Next Chapter of Lorcana livestream earlier this month, Ravensburger dropped major news. The autumn 2025 set, Fabled, will introduce Lorcana’s first ever set rotation, a sign the game is maturing into a competitive force. To support this shift, Fabled will include reprints from earlier sets, while also debuting two new rarity levels: epic and the ultra-rare Iconic.
    Reign Of Jafar, the game’s eighth set, sees Jafar rise as the new central villain, corrupting Archazia’s Island and bringing a darker twist to the narrative. Familiar faces like Mulan, Stitch, Rapunzel, and Bruno return, alongside new cards and accessories, including updated sleeves and deck boxes featuring classic Enchanted artwork.
    The new Illumineer’s Quest: Palace Heist PvE box expands on the beloved Deep Trouble, letting players face Jafar co-op style. Expect pre-built decks, booster boxes, and enough lore-packed cardboard to fuel your summer break.
    £16.99 starter pack – releases May 30
    The Lord Of The Rings: Fate Of The Fellowship
    While Finspan might be missing a dice tower, Fate Of The Fellowship more than makes up for that with a dice tower Barad-dûr. This is a one to five player co-op strategy game that builds on the Pandemic System but adds enough fresh features to feel distinct, deeper, and more precious than ever.
    Players take on the roles of Fellowship members and allies, racing to protect havens from surging shadow troops and helping Frodo sneak past the Nazgûl en route to Mount Doom. Unlike previous Pandemic adaptations, Fate Of The Fellowship leans hard into narrative mechanics. You’ll juggle four resources – stealth, valour, resistance, and friendship – across a sprawling map as you battle despair and shifting objectives. Each player commands two characters, with asymmetric abilities and layered decisions every turn.
    With 24 rotating objectives, a constant threat from the Eye of Sauron, and a cleverly tuned solo mode, designer Matt Leacock has crafted his richest Pandemic variant yet. I’ve seen plenty of tabletop gamers saying this will be their must-play at UK Games Expo.
    RRP £69.99 – releases June 27
    Assassin’s Creed Animus
    Animus brings the Assassin’s Creed universe to the tabletop in a wholly fresh, narrative-driven experience. Up to four players select historical eras, each tied to a legendary assassin like Ezio or Eivor, resulting in distinct, asymmetric playstyles, unique objectives, and specialised mechanics.
    Rather than a miniatures skirmish, this is a competitive, timeline-jumping adventure where players dive into ancestral memories via the titular Animus. Strategic stealth and precision matter: while one player might rush to the end, victory favours those who stay synchronised with their ancestor’s memory by completing tasks efficiently and, of course, stealthily.
    While there’s still not much information about this game at the moment, Animus looks to employ modular and evolving dynamics driven by interactive card play. Players can impact each other’s timelines, which will hopefully keep the experience reactive and organic. With deep lore integration, and Ubisoft’s full support, this could the most ambitious Assassin’s Creed tabletop title yet.
    Crowdfunding starts summer 2025

    Some like it HothStar Wars: Battle Of Hoth
    Days of Wonder, the studio behind tabletop classic Ticket To Ride, has unveiled its next major release, with Star Wars: Battle Of Hoth. Designed for two to four players, aged 8 and up, this fast-paced board game runs around 30 minutes per session and leans on the accessible, card-driven Commands & Colors system.

    More Trending

    Players will face off as Imperial or Rebel forces across 17 scenario-driven missions, with options to escalate into campaign mode. Leader cards introduce familiar names like Vader, Luke, Leia, and Han to influence the tide of battle.
    Although it should be easy to learn, concerns linger about the scope of the battlefield. A cramped board could reduce tactical play to simple dice duels, something fans of strategic depth may find frustrating. Questions also remain about unit range and movement dynamics. Still, Battle Of Hoth promises cinematic nostalgia and the potential for layered tactics, and all for a very reasonable price.
    RRP: £49.99 – crowdfunding starts summer 2025
    Email gamecentral@metro.co.uk, leave a comment below, follow us on Twitter, and sign-up to our newsletter.
    To submit Inbox letters and Reader’s Features more easily, without the need to send an email, just use our Submit Stuff page here.
    For more stories like this, check our Gaming page.

    GameCentral
    Sign up for exclusive analysis, latest releases, and bonus community content.
    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. Your information will be used in line with our Privacy Policy
    #incredible #new #tabletop #games #you
    10 incredible new tabletop games for you to play in summer 2025
    10 incredible new tabletop games for you to play in summer 2025 Lucy Orr Published May 16, 2025 2:00pm Pokémon TCG is in for a hot summerGameCentral looks at the most exciting new summer tabletop releases, including adaptations of Final Fantasy, Assassin’s Creed, and Citizen Sleeper The tabletop games industry has become an unlikely victim of Trump’s tariff trade war. Just after it was recovering from Covid supply chain issues it now sees itself hit with manufacturing issues and an uncertain future. It’s so bad that board game developer CMON has already shut up shop and Stonemaier, famous for the hugely successful Wingspan, is suing the Trump administration. Meanwhile Cephalofair, developer of fan favourite Gloomhaven, can’t even get their product on to the shelves, as it’s stuck in China. While I don’t expect any empty shelves at the UK Games Expo this month, there’s definitely panic in the air. Although one company that doesn’t seem to be too concerned is Games Workshop, who have always manufactured most of their products in the UK – although accessories and terrain for your favourite Warhammer army might become harder to find in the future. After everyone got into it during lockdown, the tabletop industry was riding a huge boom, with recent industry projections of the market doubling to around £20 billion by 2030. But US tariffs have left the industry reeling and could see the price for tabletop games around the world rise significantly. Despite the doom and gloom there’s plenty of exciting new products already out this year and many more on the way from massive brands such as Pokémon and Disney, as well as new Kickstarters that you can print yourself, thereby completely bypassing any manufacturing issues. There’s also never been a better time to support your local board game shop or Dungeons & Dragons club, as their overheads rise, so here’s a selection of games you might find on their shelves right now and those coming soon, that I’m excited to play during the summer break. Magic: The Gathering – Final Fantasy Square Enix’s iconic Final Fantasy franchise is stepping onto the cardboard battlefield with Magic: The Gathering, and the result is as gloriously nostalgic as it is mechanically exciting. The Universes Beyond initiative brings beloved characters, summons, and settings into Magic’s gameplay, with Cloud Strife leading the charge. The Final Fantasy Starter Kit offers two pre-constructed 60 card decks, packed with flavour and function and perfect for newcomers attracted by the sight of a Chocobo or Moogle. Each deck includes five rares, a foil mythic legendary, deck boxes, and digital codes for Magic: The Gathering Arena. But the real draw? The cards themselves. They are stunning. Cloud channels Final Fantasy 7’s environmental and emotional themes, with equipment-focused synergies that feel spot-on. Stiltzkin the Moogle is a flavourful support piece for donation strategies and the terrifying Tonberry arrives with Deathtouch and First Strike, which is fitting for a creature that’s haunted players for decades. With gorgeous full art treatments by amazing artists such as Takahashi Kazuya and Yoshitaka Amano, underpinned by clever mechanical call backs, this crossover is more than fan service, it’s a lovingly crafted bridge between two gaming giants that’s bound to fly of the shelves. £15.99 on Amazon – releases June 13 Warhammer 40,000 Kill Team: Typhon There’s a possible future where Games Workshop is the final tabletop company left standing, as they dodge tariffs and take down licence infringers like a particularly vicious swarm of tyrranids. You too can act out this future in Kill Team: Typhon, which delivers the chaos of Warhammer 40,000 in a claustrophobic, subterranean brawl between flesh-rending Tyranid Raveners and a desperate Adeptus Mechanicus Battleclade. The latest Kill Team box looks stunning and turns up the tension with asymmetric forces: a lean, elite brood of Raveners – deadly melee predators that can tunnel through terrain – versus a jury-rigged Mechanicus strike team, built from repurposed servitors and guided by a technoarchaeologist scouring ancient relics. But this isn’t Helldivers 2. The Raveners can be customised into deadly variants like the Tremorscythe and Felltalon, each armed with bio-engineered weapons designed for close-quarters carnage. On the other side, the Mechanicus bring massed, lobotomised firepower: breachers, gunners, medics, and overseers to allow for some tactical coordination. Also included are Hormagaunts, the swarming Tyrranids shock troops and new Tyranid-infested terrain – always the standout feature of these kill team boxes, in my opinion, and perfect for narrative or larger 40K battles. Typhon embraces Kill Team’s strength: cinematic asymmetry and high stakes. Price TBA – releases June Scalpers are going to love thisPokémon Trading Card Game Scarlet & Violet – Destined Rivals Pokémon Center Elite Trainer Box This red and black box is going for gangbusters on eBay, before it’s even supposed to be out. The Scarlet & Violet Destined Rivals set reintroduces the beloved Trainer’s Pokémon mechanic, but now with a twist where players can align with iconic duos like Arven’s Mabosstiff and Ho-Oh ex or Cynthia and Garchomp ex. Or fall in with Team Rocket under Giovanni’s command, fielding heavy hitters like Mewtwo ex. It’s a rich throwback to the Gym Heroes era, with cards that spotlight specific trainer and pokémon bonds, each emblazoned with the trainer’s name. The expansion includes 83 cards branded under Team Rocket, 17 new Pokémon ex cards, and a trove of high rarity collectibles: 23 illustration rares, 11 special illustration rares, and six hyper rare gold-etched cards. But the pre-launch hasn’t been all Sunflora and Jigglypuffs. Since its full reveal on March 24, pre-orders have sparked a frenzy amongst scalpers, with sellouts and early store hiccups are already marring the rollout. Still, between the nostalgia bait and villainous charm, Destined Rivals is shaping up to be one of 2025’s hottest trading card releases. RRP £54.99 – releases May 30 Finspan One game I just can’t put back on the shelf at the moment is Finspan; who’d of thought fish could be so much fun? Since Wingspan took flight in 2019, it’s become a modern classic: part art piece, part engine builder, and a benchmark for gateway games. Finspan, the third entry in the series, swaps feathers for fins, inviting players to explore marine ecosystems across oceanic zones in a beautifully illustrated, medium-lightweight game that last about 45 minutes. Mechanically, Finspan is more accessible than Wingspan, thanks to forgiving resource generation and a gentler deck structure. Strategic depth is still there, whether you chase high value fish, go wide with schools, or balance both. It’s more of a solo puzzle, and less about blocking opponents, which might suit more casual groups. Replayability is strong, and with one to five player support it scales well. The art is stunning, and the fish facts make you feel like a would-be marine biologist. But I missed the funny components, so this is missing some of that Wingspan magic. Finspan is a fantastic entry point to the series and an accessible and fun addition to the franchise. It’s not as perfect as Wingspan, but it swims confidently in its own current. Could we see whale and crustacean expansions? I hope so. RRP £41.99 – available now Star Wars Unlimited – Jump To Lightspeed While you might have missed the Star Wars Celebration in Japan last month, and be bereft over the end of Andor Season 2, don’t worry – there are plenty of alternatives for Star Wars fandom. Fantasy Flight Games is revving its hyperdrive with Jump To Lightspeed, the fourth set for trading card game Star Wars Unlimited. A dramatic shift from previous ground-focused releases, this set propels players into orbit, with an emphasis on space combat and a host of gameplay refinements. Headlining the release are two new Spotlight Decks, each featuring a classic rivalry, such as Han Solo vs. Boba Fett. These 50-card preconstructed decks introduce Pilots, a brand-new card type that changes how space units operate. Pilots can be deployed to enhance ships with improved health and damage dealing abilities, offering fresh tactical depth. The set also debuts the Piloting keyword, a hyperspace mechanic, and five special rarity cards per deck, including one new Leader per Spotlight release. It’s a sleek continuation of Unlimited’s mission, with deep strategy wrapped in Star Wars flair. Fantasy Flight isn’t just releasing a new set; they’re effectively entering year two of the game with a soft reboot, that smartly rebalances and refreshes. For new and returning players, the standalone Spotlight Decks offer a refined on-ramp into the meta, while the stellar art and fan favourite make this one of the best sci-fi trading card games around. RRP: £34.99 – available now An indie tabletop game adapting an indie video gameCitizen Sleeper: Spindlejack I was gutted I didn’t manage to nab some physical Cycles of the Eye Data-Cloud dice from Lost in Cult, before they sold out, so I was ecstatic to see the shadow drop of Citizen Sleeper: Spindlejack, especially as it’s completely free. It’s a lean, solo tabletop role-player set in the neon-drenched corridors of the Far Spindle, part of the Citizen Sleeper universe. Released on May 5th, it’s a print-and-play experience that trades dense narrative for kinetic delivery runs and tactical movement through a crumbling space station. Inspired by Kadet, the courier from Citizen Sleeper 2, Spindlejack casts you as one of the eponymous daredevils: airbike mounted messengers who dodge cargo haulers and urban decay to deliver sensitive payloads in a haunted, half-dead network. The draw? Not just the cryo or reputation, but the thrill, the competition, and the culture. Using your 10 six-sided dice, a pencil, and some printed sheets you’ll chart courses across randomly generated intersections, upgrade your bike, and edge toward Spindlejack legend status. Designed by Gareth Damian Martin, with stylish, gritty art from Guillaume Singelin, this is a tight, systems-focused dive into a beloved sci-fi setting. No campaign scheduling. No group required. Just you, your dice, and the Spindle’s rusted arteries. For fans of Citizen Sleeper or those craving a focused, atmospheric solo experience, Spindlejack is a no-brainer. DIY or DIE. Available now Disney Lorcana has become a certified hitDisney Lorcana – Reign Of Jafar Set 8 and Illumineer’s Quest: Palace Heist The internet has been on fire with the announcement that forthcoming Lorcana sets are to include Darkwing Duck and The Goofy Movie cast, emphasising that Disney Lorcana has become something of a juggernaut since its 2023 debut, captivating collectors and competitive players with a blend of nostalgic charm and evolving mechanics – judging scandals aside. During the Next Chapter of Lorcana livestream earlier this month, Ravensburger dropped major news. The autumn 2025 set, Fabled, will introduce Lorcana’s first ever set rotation, a sign the game is maturing into a competitive force. To support this shift, Fabled will include reprints from earlier sets, while also debuting two new rarity levels: epic and the ultra-rare Iconic. Reign Of Jafar, the game’s eighth set, sees Jafar rise as the new central villain, corrupting Archazia’s Island and bringing a darker twist to the narrative. Familiar faces like Mulan, Stitch, Rapunzel, and Bruno return, alongside new cards and accessories, including updated sleeves and deck boxes featuring classic Enchanted artwork. The new Illumineer’s Quest: Palace Heist PvE box expands on the beloved Deep Trouble, letting players face Jafar co-op style. Expect pre-built decks, booster boxes, and enough lore-packed cardboard to fuel your summer break. £16.99 starter pack – releases May 30 The Lord Of The Rings: Fate Of The Fellowship While Finspan might be missing a dice tower, Fate Of The Fellowship more than makes up for that with a dice tower Barad-dûr. This is a one to five player co-op strategy game that builds on the Pandemic System but adds enough fresh features to feel distinct, deeper, and more precious than ever. Players take on the roles of Fellowship members and allies, racing to protect havens from surging shadow troops and helping Frodo sneak past the Nazgûl en route to Mount Doom. Unlike previous Pandemic adaptations, Fate Of The Fellowship leans hard into narrative mechanics. You’ll juggle four resources – stealth, valour, resistance, and friendship – across a sprawling map as you battle despair and shifting objectives. Each player commands two characters, with asymmetric abilities and layered decisions every turn. With 24 rotating objectives, a constant threat from the Eye of Sauron, and a cleverly tuned solo mode, designer Matt Leacock has crafted his richest Pandemic variant yet. I’ve seen plenty of tabletop gamers saying this will be their must-play at UK Games Expo. RRP £69.99 – releases June 27 Assassin’s Creed Animus Animus brings the Assassin’s Creed universe to the tabletop in a wholly fresh, narrative-driven experience. Up to four players select historical eras, each tied to a legendary assassin like Ezio or Eivor, resulting in distinct, asymmetric playstyles, unique objectives, and specialised mechanics. Rather than a miniatures skirmish, this is a competitive, timeline-jumping adventure where players dive into ancestral memories via the titular Animus. Strategic stealth and precision matter: while one player might rush to the end, victory favours those who stay synchronised with their ancestor’s memory by completing tasks efficiently and, of course, stealthily. While there’s still not much information about this game at the moment, Animus looks to employ modular and evolving dynamics driven by interactive card play. Players can impact each other’s timelines, which will hopefully keep the experience reactive and organic. With deep lore integration, and Ubisoft’s full support, this could the most ambitious Assassin’s Creed tabletop title yet. Crowdfunding starts summer 2025 Some like it HothStar Wars: Battle Of Hoth Days of Wonder, the studio behind tabletop classic Ticket To Ride, has unveiled its next major release, with Star Wars: Battle Of Hoth. Designed for two to four players, aged 8 and up, this fast-paced board game runs around 30 minutes per session and leans on the accessible, card-driven Commands & Colors system. More Trending Players will face off as Imperial or Rebel forces across 17 scenario-driven missions, with options to escalate into campaign mode. Leader cards introduce familiar names like Vader, Luke, Leia, and Han to influence the tide of battle. Although it should be easy to learn, concerns linger about the scope of the battlefield. A cramped board could reduce tactical play to simple dice duels, something fans of strategic depth may find frustrating. Questions also remain about unit range and movement dynamics. Still, Battle Of Hoth promises cinematic nostalgia and the potential for layered tactics, and all for a very reasonable price. RRP: £49.99 – crowdfunding starts summer 2025 Email gamecentral@metro.co.uk, leave a comment below, follow us on Twitter, and sign-up to our newsletter. To submit Inbox letters and Reader’s Features more easily, without the need to send an email, just use our Submit Stuff page here. For more stories like this, check our Gaming page. GameCentral Sign up for exclusive analysis, latest releases, and bonus community content. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. Your information will be used in line with our Privacy Policy #incredible #new #tabletop #games #you
    METRO.CO.UK
    10 incredible new tabletop games for you to play in summer 2025
    10 incredible new tabletop games for you to play in summer 2025 Lucy Orr Published May 16, 2025 2:00pm Pokémon TCG is in for a hot summer (The Pokémon Company) GameCentral looks at the most exciting new summer tabletop releases, including adaptations of Final Fantasy, Assassin’s Creed, and Citizen Sleeper The tabletop games industry has become an unlikely victim of Trump’s tariff trade war. Just after it was recovering from Covid supply chain issues it now sees itself hit with manufacturing issues and an uncertain future. It’s so bad that board game developer CMON has already shut up shop and Stonemaier, famous for the hugely successful Wingspan, is suing the Trump administration. Meanwhile Cephalofair, developer of fan favourite Gloomhaven, can’t even get their product on to the shelves, as it’s stuck in China. While I don’t expect any empty shelves at the UK Games Expo this month, there’s definitely panic in the air. Although one company that doesn’t seem to be too concerned is Games Workshop, who have always manufactured most of their products in the UK – although accessories and terrain for your favourite Warhammer army might become harder to find in the future. After everyone got into it during lockdown, the tabletop industry was riding a huge boom, with recent industry projections of the market doubling to around £20 billion by 2030. But US tariffs have left the industry reeling and could see the price for tabletop games around the world rise significantly. Despite the doom and gloom there’s plenty of exciting new products already out this year and many more on the way from massive brands such as Pokémon and Disney, as well as new Kickstarters that you can print yourself, thereby completely bypassing any manufacturing issues. There’s also never been a better time to support your local board game shop or Dungeons & Dragons club, as their overheads rise, so here’s a selection of games you might find on their shelves right now and those coming soon, that I’m excited to play during the summer break. Magic: The Gathering – Final Fantasy Square Enix’s iconic Final Fantasy franchise is stepping onto the cardboard battlefield with Magic: The Gathering, and the result is as gloriously nostalgic as it is mechanically exciting. The Universes Beyond initiative brings beloved characters, summons, and settings into Magic’s gameplay, with Cloud Strife leading the charge. The Final Fantasy Starter Kit offers two pre-constructed 60 card decks, packed with flavour and function and perfect for newcomers attracted by the sight of a Chocobo or Moogle. Each deck includes five rares, a foil mythic legendary, deck boxes, and digital codes for Magic: The Gathering Arena. But the real draw? The cards themselves. They are stunning. Cloud channels Final Fantasy 7’s environmental and emotional themes, with equipment-focused synergies that feel spot-on. Stiltzkin the Moogle is a flavourful support piece for donation strategies and the terrifying Tonberry arrives with Deathtouch and First Strike, which is fitting for a creature that’s haunted players for decades. With gorgeous full art treatments by amazing artists such as Takahashi Kazuya and Yoshitaka Amano, underpinned by clever mechanical call backs, this crossover is more than fan service, it’s a lovingly crafted bridge between two gaming giants that’s bound to fly of the shelves. £15.99 on Amazon – releases June 13 Warhammer 40,000 Kill Team: Typhon There’s a possible future where Games Workshop is the final tabletop company left standing, as they dodge tariffs and take down licence infringers like a particularly vicious swarm of tyrranids. You too can act out this future in Kill Team: Typhon, which delivers the chaos of Warhammer 40,000 in a claustrophobic, subterranean brawl between flesh-rending Tyranid Raveners and a desperate Adeptus Mechanicus Battleclade. The latest Kill Team box looks stunning and turns up the tension with asymmetric forces: a lean, elite brood of Raveners – deadly melee predators that can tunnel through terrain – versus a jury-rigged Mechanicus strike team, built from repurposed servitors and guided by a technoarchaeologist scouring ancient relics. But this isn’t Helldivers 2. The Raveners can be customised into deadly variants like the Tremorscythe and Felltalon, each armed with bio-engineered weapons designed for close-quarters carnage. On the other side, the Mechanicus bring massed, lobotomised firepower: breachers, gunners, medics, and overseers to allow for some tactical coordination. Also included are Hormagaunts, the swarming Tyrranids shock troops and new Tyranid-infested terrain – always the standout feature of these kill team boxes, in my opinion, and perfect for narrative or larger 40K battles. Typhon embraces Kill Team’s strength: cinematic asymmetry and high stakes. Price TBA – releases June Scalpers are going to love this (The Pokémon Company) Pokémon Trading Card Game Scarlet & Violet – Destined Rivals Pokémon Center Elite Trainer Box This red and black box is going for gangbusters on eBay, before it’s even supposed to be out. The Scarlet & Violet Destined Rivals set reintroduces the beloved Trainer’s Pokémon mechanic, but now with a twist where players can align with iconic duos like Arven’s Mabosstiff and Ho-Oh ex or Cynthia and Garchomp ex. Or fall in with Team Rocket under Giovanni’s command, fielding heavy hitters like Mewtwo ex. It’s a rich throwback to the Gym Heroes era, with cards that spotlight specific trainer and pokémon bonds, each emblazoned with the trainer’s name. The expansion includes 83 cards branded under Team Rocket, 17 new Pokémon ex cards (10 of which are Trainer’s Pokémon ex), and a trove of high rarity collectibles: 23 illustration rares, 11 special illustration rares, and six hyper rare gold-etched cards. But the pre-launch hasn’t been all Sunflora and Jigglypuffs. Since its full reveal on March 24, pre-orders have sparked a frenzy amongst scalpers, with sellouts and early store hiccups are already marring the rollout. Still, between the nostalgia bait and villainous charm, Destined Rivals is shaping up to be one of 2025’s hottest trading card releases. RRP £54.99 – releases May 30 Finspan One game I just can’t put back on the shelf at the moment is Finspan; who’d of thought fish could be so much fun? Since Wingspan took flight in 2019, it’s become a modern classic: part art piece, part engine builder, and a benchmark for gateway games. Finspan, the third entry in the series, swaps feathers for fins, inviting players to explore marine ecosystems across oceanic zones in a beautifully illustrated, medium-lightweight game that last about 45 minutes. Mechanically, Finspan is more accessible than Wingspan, thanks to forgiving resource generation and a gentler deck structure. Strategic depth is still there, whether you chase high value fish, go wide with schools, or balance both. It’s more of a solo puzzle, and less about blocking opponents, which might suit more casual groups. Replayability is strong, and with one to five player support it scales well. The art is stunning, and the fish facts make you feel like a would-be marine biologist. But I missed the funny components (no birdhouse dice tower), so this is missing some of that Wingspan magic. Finspan is a fantastic entry point to the series and an accessible and fun addition to the franchise. It’s not as perfect as Wingspan, but it swims confidently in its own current. Could we see whale and crustacean expansions? I hope so. RRP £41.99 – available now Star Wars Unlimited – Jump To Lightspeed While you might have missed the Star Wars Celebration in Japan last month, and be bereft over the end of Andor Season 2, don’t worry – there are plenty of alternatives for Star Wars fandom. Fantasy Flight Games is revving its hyperdrive with Jump To Lightspeed, the fourth set for trading card game Star Wars Unlimited. A dramatic shift from previous ground-focused releases, this set propels players into orbit, with an emphasis on space combat and a host of gameplay refinements. Headlining the release are two new Spotlight Decks, each featuring a classic rivalry, such as Han Solo vs. Boba Fett. These 50-card preconstructed decks introduce Pilots, a brand-new card type that changes how space units operate. Pilots can be deployed to enhance ships with improved health and damage dealing abilities, offering fresh tactical depth. The set also debuts the Piloting keyword, a hyperspace mechanic, and five special rarity cards per deck, including one new Leader per Spotlight release. It’s a sleek continuation of Unlimited’s mission, with deep strategy wrapped in Star Wars flair. Fantasy Flight isn’t just releasing a new set; they’re effectively entering year two of the game with a soft reboot, that smartly rebalances and refreshes. For new and returning players, the standalone Spotlight Decks offer a refined on-ramp into the meta, while the stellar art and fan favourite make this one of the best sci-fi trading card games around. RRP: £34.99 – available now An indie tabletop game adapting an indie video game (Jump Over the Age) Citizen Sleeper: Spindlejack I was gutted I didn’t manage to nab some physical Cycles of the Eye Data-Cloud dice from Lost in Cult, before they sold out, so I was ecstatic to see the shadow drop of Citizen Sleeper: Spindlejack, especially as it’s completely free. It’s a lean, solo tabletop role-player set in the neon-drenched corridors of the Far Spindle, part of the Citizen Sleeper universe. Released on May 5th (aka Citizen Sleeper Day), it’s a print-and-play experience that trades dense narrative for kinetic delivery runs and tactical movement through a crumbling space station. Inspired by Kadet, the courier from Citizen Sleeper 2, Spindlejack casts you as one of the eponymous daredevils: airbike mounted messengers who dodge cargo haulers and urban decay to deliver sensitive payloads in a haunted, half-dead network. The draw? Not just the cryo or reputation, but the thrill, the competition, and the culture. Using your 10 six-sided dice, a pencil, and some printed sheets you’ll chart courses across randomly generated intersections, upgrade your bike, and edge toward Spindlejack legend status. Designed by Gareth Damian Martin, with stylish, gritty art from Guillaume Singelin, this is a tight, systems-focused dive into a beloved sci-fi setting. No campaign scheduling. No group required. Just you, your dice, and the Spindle’s rusted arteries. For fans of Citizen Sleeper or those craving a focused, atmospheric solo experience, Spindlejack is a no-brainer. DIY or DIE. Available now Disney Lorcana has become a certified hit (Ravensburger) Disney Lorcana – Reign Of Jafar Set 8 and Illumineer’s Quest: Palace Heist The internet has been on fire with the announcement that forthcoming Lorcana sets are to include Darkwing Duck and The Goofy Movie cast, emphasising that Disney Lorcana has become something of a juggernaut since its 2023 debut, captivating collectors and competitive players with a blend of nostalgic charm and evolving mechanics – judging scandals aside. During the Next Chapter of Lorcana livestream earlier this month, Ravensburger dropped major news. The autumn 2025 set, Fabled, will introduce Lorcana’s first ever set rotation, a sign the game is maturing into a competitive force. To support this shift, Fabled will include reprints from earlier sets, while also debuting two new rarity levels: epic and the ultra-rare Iconic. Reign Of Jafar, the game’s eighth set, sees Jafar rise as the new central villain, corrupting Archazia’s Island and bringing a darker twist to the narrative. Familiar faces like Mulan, Stitch, Rapunzel, and Bruno return, alongside new cards and accessories, including updated sleeves and deck boxes featuring classic Enchanted artwork. The new Illumineer’s Quest: Palace Heist PvE box expands on the beloved Deep Trouble, letting players face Jafar co-op style. Expect pre-built decks (Amethyst Amber and Ruby Steel), booster boxes, and enough lore-packed cardboard to fuel your summer break. £16.99 starter pack – releases May 30 The Lord Of The Rings: Fate Of The Fellowship While Finspan might be missing a dice tower, Fate Of The Fellowship more than makes up for that with a dice tower Barad-dûr. This is a one to five player co-op strategy game that builds on the Pandemic System but adds enough fresh features to feel distinct, deeper, and more precious than ever. Players take on the roles of Fellowship members and allies, racing to protect havens from surging shadow troops and helping Frodo sneak past the Nazgûl en route to Mount Doom. Unlike previous Pandemic adaptations, Fate Of The Fellowship leans hard into narrative mechanics. You’ll juggle four resources – stealth, valour, resistance, and friendship – across a sprawling map as you battle despair and shifting objectives. Each player commands two characters, with asymmetric abilities and layered decisions every turn. With 24 rotating objectives, a constant threat from the Eye of Sauron, and a cleverly tuned solo mode, designer Matt Leacock has crafted his richest Pandemic variant yet. I’ve seen plenty of tabletop gamers saying this will be their must-play at UK Games Expo. RRP £69.99 – releases June 27 Assassin’s Creed Animus Animus brings the Assassin’s Creed universe to the tabletop in a wholly fresh, narrative-driven experience. Up to four players select historical eras, each tied to a legendary assassin like Ezio or Eivor, resulting in distinct, asymmetric playstyles, unique objectives, and specialised mechanics. Rather than a miniatures skirmish, this is a competitive, timeline-jumping adventure where players dive into ancestral memories via the titular Animus. Strategic stealth and precision matter: while one player might rush to the end, victory favours those who stay synchronised with their ancestor’s memory by completing tasks efficiently and, of course, stealthily. While there’s still not much information about this game at the moment, Animus looks to employ modular and evolving dynamics driven by interactive card play. Players can impact each other’s timelines, which will hopefully keep the experience reactive and organic. With deep lore integration, and Ubisoft’s full support, this could the most ambitious Assassin’s Creed tabletop title yet. Crowdfunding starts summer 2025 Some like it Hoth (Days of Wonder) Star Wars: Battle Of Hoth Days of Wonder, the studio behind tabletop classic Ticket To Ride, has unveiled its next major release, with Star Wars: Battle Of Hoth. Designed for two to four players, aged 8 and up, this fast-paced board game runs around 30 minutes per session and leans on the accessible, card-driven Commands & Colors system. More Trending Players will face off as Imperial or Rebel forces across 17 scenario-driven missions, with options to escalate into campaign mode. Leader cards introduce familiar names like Vader, Luke, Leia, and Han to influence the tide of battle. Although it should be easy to learn, concerns linger about the scope of the battlefield. A cramped board could reduce tactical play to simple dice duels, something fans of strategic depth may find frustrating. Questions also remain about unit range and movement dynamics. Still, Battle Of Hoth promises cinematic nostalgia and the potential for layered tactics, and all for a very reasonable price. RRP: £49.99 – crowdfunding starts summer 2025 Email gamecentral@metro.co.uk, leave a comment below, follow us on Twitter, and sign-up to our newsletter. To submit Inbox letters and Reader’s Features more easily, without the need to send an email, just use our Submit Stuff page here. For more stories like this, check our Gaming page. GameCentral Sign up for exclusive analysis, latest releases, and bonus community content. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. Your information will be used in line with our Privacy Policy
    0 Commentarii 0 Distribuiri
  • Roundtable: Why did customers sail away from VMware?

    Hyper-converged infrastructure pioneer Nutanix is among a number of suppliers that smell blood in the water when it comes to VMware and its customers following the virtualisation giant’s acquisition by Broadcom.
    At Nutanix’s annual .Next bash in Washington DC last week, migration away from VMware and to – it hopes – its own Acropolis hypervisor (AHV) was a constant theme.
    As part of this, it gathered three customers to talk about their experiences of moving from VMware to Nutanix. 
    Of these, only one was directly attributable to Broadcom’s licensing changes, but Broadcom-Amazon Web Services (AWS) relations were key to another.
    We asked them about their journey to Nutanix and away from VMware, as well as the precise pain points that prompted their decisions.
    Here, we talk to:
    Dom Johnston, IT manager for Golding in Brisbane, Australia, which is a heavy civil and mining contracting company that has operated on the east coast of Australia for about 75 years. 
    Kee Yew Wei, associate vice-president for infrastructure and operations at MSIG, which is a Japan-headquartered insurance company that operates internationally. 
    Mike Taylor, hospital ship joint task director for Military Sealift Command and the US Navy, which operates two hospital ships, Mercy (pictured above) and Comfort.
    Dom Johnston: Golding had its infrastructure sitting in VMware on AWS.
    We had a three-year contract with VMware for that platform, which ended in February this year.
    About March of last year, there was a fairly public divorce between VMware and AWS.
    We weren’t sure where that left us. 
    To cut a long story short, with what we saw over the next two to three months from there, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great for us.
    Golding had its infrastructure sitting in VMware on AWS.
    [After the] fairly public divorce between VMware and AWS, we weren’t sure where that left us.
    With what we saw over the next two to three months, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great.
    Nutanix has kind of swung in to replace that

    Dom Johnston, Golding
    So we went out to market to look at alternatives.
    And Nutanix has kind of swung in to essentially replace that.
    We use NC2 [Nutanix Cloud Clusters] to run our production workloads in AWS, for our DR [disaster recovery] capability, and that’s essentially to directly replace the functionality that existed within AWS and VMware Cloud Disaster Recovery, which was the DR product that sat alongside that. 
    So essentially, our DR strategy is that if an event occurs, we immediately spin up the DR environment ready to accept a workload.
    In the event that is not required, it’s spun back down again, and we’ve lost, you know, a couple 100 bucks worth of compute usage. 
    Kee Yew Wei: Our journey with Nutanix is from 2017.
    We were looking for a hyper-converged system to simplify our environment, to do away with the traditional three-tier legacy system, to simplify our environment, and to reduce our datacentre footprint. 
    Nutanix is the system, but we didn’t have full confidence in Acropolis at that time, because it was quite new compared to VMware. 
    After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV.
    We completed the full migration last month. 
    All this came about after the acquisition by Broadcom, and we received a quotation with a 300% to 400% increase on our renewal pricing.
    So, then we made the decision to go for Nutanix. 
    We started planning somewhere around Q3 last year and were quite conservative, with completion planned for maybe somewhere in Q2 this year.
    My team migrated 1,000 to 2,000 VMs [virtual machines] from Q4 and completed that at the beginning of April.
    So today, we are a full Nutanix house.
    Mike Taylor: Our story with Nutanix started way back in 2017.
    We’d been Nutanix lookers for a long time. 
    On my ships, we had 1,000 blade servers and EMC tiered storage taking up multiple racks.
    But on the ships, there’s only a finite amount of power they generate, so I needed to find a way to bring everything down into a smaller footprint – but a smarter, smaller footprint, something that would allow me to very elegantly manage and have ease of use that my teams aboard the ships could deal with. 
    After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV

    Kee Yew Wei, MSIG
    So, we did a bake-off with Dell, Cisco and Nutanix, and we implemented Nutanix on Mercy in 2019 and Comfort in 2020.
    Now, we’re looking at generational refreshes of all of our equipment and probably expanding from there and getting some new features, with redundancy and disaster recovery.
    We do have an onboard continuity-of-operations rack, so we have mirrored failover clusters of Nutanix aboard the ships. 
    Now we’re all Nutanix.
    Everything moved over.
    That’s like, out of 80-something servers, we only had two or three servers that had hiccups. 
    Taylor: I remember standing in my main datacentre on the hospital ships.
    It’s very anticlimactic if you ever get to go; I just have five racks, but two of those five were purely just to run my server infrastructure.
    I remember standing there with one of my peers, and we were looking at it and we said, “Oh, hey, we’re still using SAN directors.” And SAN was going away, they were on their way out. 
    Dell had come out with stuff like FX, and other people were dabbling with hyper-converged, whereas Nutanix had already done it, and they had their own software, which was easy to understand for my engineers.
    So, I’m looking at these racks full of equipment, especially the VNX, which was power hungry.
    So, we said, “There has to be a better way to do this.” Energy was the problem.
    Energy was the driver to finding a solution. 
    We weren’t impacted by the Broadcom event.
    We got in before it.
    I do still run some VMware, so I am impacted by it there.
    The challenge we have incurred in continuing to operate that small part isn’t financial.
    It’s purely that I can’t get to updates.
    I can’t get to download them.
    It’s support aspects of the change that impact us the most, not the financial part of it. 
    If we hadn’t moved to Nutanix, if we were still purely ESXi, the financial part would certainly be a burden, like it is for other military commands. 
    Johnston: After AWS and VMware had their thing, we were notified by VMware that we were no longer able to spin up our on-demand DR cluster.
    They told us that, essentially, we could still use our DR plan if we powered down our production cluster before spinning up a DR cluster.
    We were testing quarterly, but we were no longer able to do that.
    In fact, we shifted to testing monthly because there was so much uncertainty in that space.
    We were left in a situation where, because we couldn’t test, we had zero confidence. 
    Kee Yew Wei: It was all about cost.
    We got a bill with a 300% to 400% increase on our last renewal.
    So, this is one of the key factors that drove us to migrate all our workloads to Nutanix.
    Taylor: The trade-offs are very, very light, if any.
    My people were very seasoned with ESXi VMware Tools and the orchestration that VMware had. 
    But the learning curve for Nutanix is very short.
    It’s very easy to pick up, but you have to learn it.
    There’s a different way to import an OVA, as opposed to the way you do it within the VMware ecosystem, for example.
    So, the trade-off is really just time to become a master at using the system with regard to functionality. 
    The learning curve for Nutanix is very short.
    It’s very easy to pick up

    Mike Taylor, US Navy
    In fact, I think I have enhanced capability using AHV as my hypervisor.
    When it comes to security, using VMware with the military, we have to submit vulnerability scans constantly.
    That’s just part of our regular drumbeat.
    I still run VMware on classified parts of my network, and it is very challenging to keep it secure and up to date.
    I don’t have that issue with Nutanix.
    Johnson: I second that.
    As far as trade-offs are concerned, or the functionality, it’s really just a question of semantics in relation to the differences between the two platforms.
    The way that Nutanix handles snapshots is different to the way that VMware handles snapshots.
    That was a learning curve for us.
    It’s like going from Windows 10 to Windows 11.
    Things are in a different spot, but it’s the same functionality. 
    You need to prepare your team, get them training, show them what to do.
    I don’t think there’s any loss of functionality.
    In fact, I think there are faster workflows, better availability of tools. 
    Kee Yew Wei: I don’t see trade-offs.
    Maybe 10 years ago, compatibility with other suppliers’ software might have been an issue, like backup solutions such as [Veritas] NetBackup.
    Maybe seven or eight years ago, they did not support Nutanix.
    But that’s not the case today.
    Read more about virtualisation and storage
    University will ‘pull the plug’ to test Nutanix disaster recovery: University of Reading set to save circa £500,000 and deploy Nutanix NC2 hybrid cloud that will allow failover from main datacentre. 
    NHS trust cloud plans hampered by Trump tariff uncertainty: Essex NHS wants to move some capacity to the Nutanix cloud, but can’t be certain prices will hold between product selection and when procurement plans gain approval. 

    Source: https://www.computerweekly.com/feature/Roundtable-Why-did-customers-sail-away-from-VMware" style="color: #0066cc;">https://www.computerweekly.com/feature/Roundtable-Why-did-customers-sail-away-from-VMware
    #roundtable #why #did #customers #sail #away #from #vmware
    Roundtable: Why did customers sail away from VMware?
    Hyper-converged infrastructure pioneer Nutanix is among a number of suppliers that smell blood in the water when it comes to VMware and its customers following the virtualisation giant’s acquisition by Broadcom. At Nutanix’s annual .Next bash in Washington DC last week, migration away from VMware and to – it hopes – its own Acropolis hypervisor (AHV) was a constant theme. As part of this, it gathered three customers to talk about their experiences of moving from VMware to Nutanix.  Of these, only one was directly attributable to Broadcom’s licensing changes, but Broadcom-Amazon Web Services (AWS) relations were key to another. We asked them about their journey to Nutanix and away from VMware, as well as the precise pain points that prompted their decisions. Here, we talk to: Dom Johnston, IT manager for Golding in Brisbane, Australia, which is a heavy civil and mining contracting company that has operated on the east coast of Australia for about 75 years.  Kee Yew Wei, associate vice-president for infrastructure and operations at MSIG, which is a Japan-headquartered insurance company that operates internationally.  Mike Taylor, hospital ship joint task director for Military Sealift Command and the US Navy, which operates two hospital ships, Mercy (pictured above) and Comfort. Dom Johnston: Golding had its infrastructure sitting in VMware on AWS. We had a three-year contract with VMware for that platform, which ended in February this year. About March of last year, there was a fairly public divorce between VMware and AWS. We weren’t sure where that left us.  To cut a long story short, with what we saw over the next two to three months from there, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great for us. Golding had its infrastructure sitting in VMware on AWS. [After the] fairly public divorce between VMware and AWS, we weren’t sure where that left us. With what we saw over the next two to three months, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great. Nutanix has kind of swung in to replace that Dom Johnston, Golding So we went out to market to look at alternatives. And Nutanix has kind of swung in to essentially replace that. We use NC2 [Nutanix Cloud Clusters] to run our production workloads in AWS, for our DR [disaster recovery] capability, and that’s essentially to directly replace the functionality that existed within AWS and VMware Cloud Disaster Recovery, which was the DR product that sat alongside that.  So essentially, our DR strategy is that if an event occurs, we immediately spin up the DR environment ready to accept a workload. In the event that is not required, it’s spun back down again, and we’ve lost, you know, a couple 100 bucks worth of compute usage.  Kee Yew Wei: Our journey with Nutanix is from 2017. We were looking for a hyper-converged system to simplify our environment, to do away with the traditional three-tier legacy system, to simplify our environment, and to reduce our datacentre footprint.  Nutanix is the system, but we didn’t have full confidence in Acropolis at that time, because it was quite new compared to VMware.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV. We completed the full migration last month.  All this came about after the acquisition by Broadcom, and we received a quotation with a 300% to 400% increase on our renewal pricing. So, then we made the decision to go for Nutanix.  We started planning somewhere around Q3 last year and were quite conservative, with completion planned for maybe somewhere in Q2 this year. My team migrated 1,000 to 2,000 VMs [virtual machines] from Q4 and completed that at the beginning of April. So today, we are a full Nutanix house. Mike Taylor: Our story with Nutanix started way back in 2017. We’d been Nutanix lookers for a long time.  On my ships, we had 1,000 blade servers and EMC tiered storage taking up multiple racks. But on the ships, there’s only a finite amount of power they generate, so I needed to find a way to bring everything down into a smaller footprint – but a smarter, smaller footprint, something that would allow me to very elegantly manage and have ease of use that my teams aboard the ships could deal with.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV Kee Yew Wei, MSIG So, we did a bake-off with Dell, Cisco and Nutanix, and we implemented Nutanix on Mercy in 2019 and Comfort in 2020. Now, we’re looking at generational refreshes of all of our equipment and probably expanding from there and getting some new features, with redundancy and disaster recovery. We do have an onboard continuity-of-operations rack, so we have mirrored failover clusters of Nutanix aboard the ships.  Now we’re all Nutanix. Everything moved over. That’s like, out of 80-something servers, we only had two or three servers that had hiccups.  Taylor: I remember standing in my main datacentre on the hospital ships. It’s very anticlimactic if you ever get to go; I just have five racks, but two of those five were purely just to run my server infrastructure. I remember standing there with one of my peers, and we were looking at it and we said, “Oh, hey, we’re still using SAN directors.” And SAN was going away, they were on their way out.  Dell had come out with stuff like FX, and other people were dabbling with hyper-converged, whereas Nutanix had already done it, and they had their own software, which was easy to understand for my engineers. So, I’m looking at these racks full of equipment, especially the VNX, which was power hungry. So, we said, “There has to be a better way to do this.” Energy was the problem. Energy was the driver to finding a solution.  We weren’t impacted by the Broadcom event. We got in before it. I do still run some VMware, so I am impacted by it there. The challenge we have incurred in continuing to operate that small part isn’t financial. It’s purely that I can’t get to updates. I can’t get to download them. It’s support aspects of the change that impact us the most, not the financial part of it.  If we hadn’t moved to Nutanix, if we were still purely ESXi, the financial part would certainly be a burden, like it is for other military commands.  Johnston: After AWS and VMware had their thing, we were notified by VMware that we were no longer able to spin up our on-demand DR cluster. They told us that, essentially, we could still use our DR plan if we powered down our production cluster before spinning up a DR cluster. We were testing quarterly, but we were no longer able to do that. In fact, we shifted to testing monthly because there was so much uncertainty in that space. We were left in a situation where, because we couldn’t test, we had zero confidence.  Kee Yew Wei: It was all about cost. We got a bill with a 300% to 400% increase on our last renewal. So, this is one of the key factors that drove us to migrate all our workloads to Nutanix. Taylor: The trade-offs are very, very light, if any. My people were very seasoned with ESXi VMware Tools and the orchestration that VMware had.  But the learning curve for Nutanix is very short. It’s very easy to pick up, but you have to learn it. There’s a different way to import an OVA, as opposed to the way you do it within the VMware ecosystem, for example. So, the trade-off is really just time to become a master at using the system with regard to functionality.  The learning curve for Nutanix is very short. It’s very easy to pick up Mike Taylor, US Navy In fact, I think I have enhanced capability using AHV as my hypervisor. When it comes to security, using VMware with the military, we have to submit vulnerability scans constantly. That’s just part of our regular drumbeat. I still run VMware on classified parts of my network, and it is very challenging to keep it secure and up to date. I don’t have that issue with Nutanix. Johnson: I second that. As far as trade-offs are concerned, or the functionality, it’s really just a question of semantics in relation to the differences between the two platforms. The way that Nutanix handles snapshots is different to the way that VMware handles snapshots. That was a learning curve for us. It’s like going from Windows 10 to Windows 11. Things are in a different spot, but it’s the same functionality.  You need to prepare your team, get them training, show them what to do. I don’t think there’s any loss of functionality. In fact, I think there are faster workflows, better availability of tools.  Kee Yew Wei: I don’t see trade-offs. Maybe 10 years ago, compatibility with other suppliers’ software might have been an issue, like backup solutions such as [Veritas] NetBackup. Maybe seven or eight years ago, they did not support Nutanix. But that’s not the case today. Read more about virtualisation and storage University will ‘pull the plug’ to test Nutanix disaster recovery: University of Reading set to save circa £500,000 and deploy Nutanix NC2 hybrid cloud that will allow failover from main datacentre.  NHS trust cloud plans hampered by Trump tariff uncertainty: Essex NHS wants to move some capacity to the Nutanix cloud, but can’t be certain prices will hold between product selection and when procurement plans gain approval.  Source: https://www.computerweekly.com/feature/Roundtable-Why-did-customers-sail-away-from-VMware #roundtable #why #did #customers #sail #away #from #vmware
    WWW.COMPUTERWEEKLY.COM
    Roundtable: Why did customers sail away from VMware?
    Hyper-converged infrastructure pioneer Nutanix is among a number of suppliers that smell blood in the water when it comes to VMware and its customers following the virtualisation giant’s acquisition by Broadcom. At Nutanix’s annual .Next bash in Washington DC last week, migration away from VMware and to – it hopes – its own Acropolis hypervisor (AHV) was a constant theme. As part of this, it gathered three customers to talk about their experiences of moving from VMware to Nutanix.  Of these, only one was directly attributable to Broadcom’s licensing changes, but Broadcom-Amazon Web Services (AWS) relations were key to another. We asked them about their journey to Nutanix and away from VMware, as well as the precise pain points that prompted their decisions. Here, we talk to: Dom Johnston, IT manager for Golding in Brisbane, Australia, which is a heavy civil and mining contracting company that has operated on the east coast of Australia for about 75 years.  Kee Yew Wei, associate vice-president for infrastructure and operations at MSIG, which is a Japan-headquartered insurance company that operates internationally.  Mike Taylor, hospital ship joint task director for Military Sealift Command and the US Navy, which operates two hospital ships, Mercy (pictured above) and Comfort. Dom Johnston: Golding had its infrastructure sitting in VMware on AWS. We had a three-year contract with VMware for that platform, which ended in February this year. About March of last year, there was a fairly public divorce between VMware and AWS. We weren’t sure where that left us.  To cut a long story short, with what we saw over the next two to three months from there, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great for us. Golding had its infrastructure sitting in VMware on AWS. [After the] fairly public divorce between VMware and AWS, we weren’t sure where that left us. With what we saw over the next two to three months, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great. Nutanix has kind of swung in to replace that Dom Johnston, Golding So we went out to market to look at alternatives. And Nutanix has kind of swung in to essentially replace that. We use NC2 [Nutanix Cloud Clusters] to run our production workloads in AWS, for our DR [disaster recovery] capability, and that’s essentially to directly replace the functionality that existed within AWS and VMware Cloud Disaster Recovery, which was the DR product that sat alongside that.  So essentially, our DR strategy is that if an event occurs, we immediately spin up the DR environment ready to accept a workload. In the event that is not required, it’s spun back down again, and we’ve lost, you know, a couple 100 bucks worth of compute usage.  Kee Yew Wei: Our journey with Nutanix is from 2017. We were looking for a hyper-converged system to simplify our environment, to do away with the traditional three-tier legacy system, to simplify our environment, and to reduce our datacentre footprint.  Nutanix is the system, but we didn’t have full confidence in Acropolis at that time, because it was quite new compared to VMware.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV. We completed the full migration last month.  All this came about after the acquisition by Broadcom, and we received a quotation with a 300% to 400% increase on our renewal pricing. So, then we made the decision to go for Nutanix.  We started planning somewhere around Q3 last year and were quite conservative, with completion planned for maybe somewhere in Q2 this year. My team migrated 1,000 to 2,000 VMs [virtual machines] from Q4 and completed that at the beginning of April. So today, we are a full Nutanix house. Mike Taylor: Our story with Nutanix started way back in 2017. We’d been Nutanix lookers for a long time.  On my ships, we had 1,000 blade servers and EMC tiered storage taking up multiple racks. But on the ships, there’s only a finite amount of power they generate, so I needed to find a way to bring everything down into a smaller footprint – but a smarter, smaller footprint, something that would allow me to very elegantly manage and have ease of use that my teams aboard the ships could deal with.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV Kee Yew Wei, MSIG So, we did a bake-off with Dell, Cisco and Nutanix, and we implemented Nutanix on Mercy in 2019 and Comfort in 2020. Now, we’re looking at generational refreshes of all of our equipment and probably expanding from there and getting some new features, with redundancy and disaster recovery. We do have an onboard continuity-of-operations rack, so we have mirrored failover clusters of Nutanix aboard the ships.  Now we’re all Nutanix. Everything moved over. That’s like, out of 80-something servers, we only had two or three servers that had hiccups.  Taylor: I remember standing in my main datacentre on the hospital ships. It’s very anticlimactic if you ever get to go; I just have five racks, but two of those five were purely just to run my server infrastructure. I remember standing there with one of my peers, and we were looking at it and we said, “Oh, hey, we’re still using SAN directors.” And SAN was going away, they were on their way out.  Dell had come out with stuff like FX, and other people were dabbling with hyper-converged, whereas Nutanix had already done it, and they had their own software, which was easy to understand for my engineers. So, I’m looking at these racks full of equipment, especially the VNX, which was power hungry. So, we said, “There has to be a better way to do this.” Energy was the problem. Energy was the driver to finding a solution.  We weren’t impacted by the Broadcom event. We got in before it. I do still run some VMware, so I am impacted by it there. The challenge we have incurred in continuing to operate that small part isn’t financial. It’s purely that I can’t get to updates. I can’t get to download them. It’s support aspects of the change that impact us the most, not the financial part of it.  If we hadn’t moved to Nutanix, if we were still purely ESXi, the financial part would certainly be a burden, like it is for other military commands.  Johnston: After AWS and VMware had their thing, we were notified by VMware that we were no longer able to spin up our on-demand DR cluster. They told us that, essentially, we could still use our DR plan if we powered down our production cluster before spinning up a DR cluster. We were testing quarterly, but we were no longer able to do that. In fact, we shifted to testing monthly because there was so much uncertainty in that space. We were left in a situation where, because we couldn’t test, we had zero confidence.  Kee Yew Wei: It was all about cost. We got a bill with a 300% to 400% increase on our last renewal. So, this is one of the key factors that drove us to migrate all our workloads to Nutanix. Taylor: The trade-offs are very, very light, if any. My people were very seasoned with ESXi VMware Tools and the orchestration that VMware had.  But the learning curve for Nutanix is very short. It’s very easy to pick up, but you have to learn it. There’s a different way to import an OVA, as opposed to the way you do it within the VMware ecosystem, for example. So, the trade-off is really just time to become a master at using the system with regard to functionality.  The learning curve for Nutanix is very short. It’s very easy to pick up Mike Taylor, US Navy In fact, I think I have enhanced capability using AHV as my hypervisor. When it comes to security, using VMware with the military, we have to submit vulnerability scans constantly. That’s just part of our regular drumbeat. I still run VMware on classified parts of my network, and it is very challenging to keep it secure and up to date. I don’t have that issue with Nutanix. Johnson: I second that. As far as trade-offs are concerned, or the functionality, it’s really just a question of semantics in relation to the differences between the two platforms. The way that Nutanix handles snapshots is different to the way that VMware handles snapshots. That was a learning curve for us. It’s like going from Windows 10 to Windows 11. Things are in a different spot, but it’s the same functionality.  You need to prepare your team, get them training, show them what to do. I don’t think there’s any loss of functionality. In fact, I think there are faster workflows, better availability of tools.  Kee Yew Wei: I don’t see trade-offs. Maybe 10 years ago, compatibility with other suppliers’ software might have been an issue, like backup solutions such as [Veritas] NetBackup. Maybe seven or eight years ago, they did not support Nutanix. But that’s not the case today. Read more about virtualisation and storage University will ‘pull the plug’ to test Nutanix disaster recovery: University of Reading set to save circa £500,000 and deploy Nutanix NC2 hybrid cloud that will allow failover from main datacentre.  NHS trust cloud plans hampered by Trump tariff uncertainty: Essex NHS wants to move some capacity to the Nutanix cloud, but can’t be certain prices will hold between product selection and when procurement plans gain approval. 
    0 Commentarii 0 Distribuiri
  • #333;">Covid buy refreshes & tariff fears drove iPad sales renaissance
    Millions of users replaced their aging early pandemic iPad in early 2025, fueling a sales rebound amplified by tariff fears and fresh hardware upgrades.iPad AirThe tablet market saw renewed energy in the first quarter as consumers upgraded devices bought during the height of COVID-19 lockdowns.
    With many of those iPads reaching the end of their typical lifecycle, Apple's growth came from a wave of replacements.A temporary surge in imports, triggered by uncertainty over new U.S.
    tariffs, added to the momentum.
    Continue Reading on AppleInsider | Discuss on our Forums
    Covid buy refreshes & tariff fears drove iPad sales renaissance
    Millions of users replaced their aging early pandemic iPad in early 2025, fueling a sales rebound amplified by tariff fears and fresh hardware upgrades.iPad AirThe tablet market saw renewed energy in the first quarter as consumers upgraded devices bought during the height of COVID-19 lockdowns. With many of those iPads reaching the end of their typical lifecycle, Apple's growth came from a wave of replacements.A temporary surge in imports, triggered by uncertainty over new U.S. tariffs, added to the momentum. Continue Reading on AppleInsider | Discuss on our Forums
    المصدر: appleinsider.com
    #covid #buy #refreshes #ampamp #tariff #fears #drove #ipad #sales #renaissance #millions #users #replaced #their #aging #early #pandemic #fueling #rebound #amplified #and #fresh #hardware #upgradesipad #airthe #tablet #market #saw #renewed #energy #the #first #quarter #consumers #upgraded #devices #bought #during #height #covid19 #lockdownswith #many #those #ipads #reaching #end #typical #lifecycle #apple039s #growth #came #from #wave #replacementsa #temporary #surge #imports #triggered #uncertainty #over #new #ustariffs #added #momentumcontinue #reading #appleinsider #discuss #our #forums
    APPLEINSIDER.COM
    Covid buy refreshes & tariff fears drove iPad sales renaissance
    Millions of users replaced their aging early pandemic iPad in early 2025, fueling a sales rebound amplified by tariff fears and fresh hardware upgrades.iPad AirThe tablet market saw renewed energy in the first quarter as consumers upgraded devices bought during the height of COVID-19 lockdowns. With many of those iPads reaching the end of their typical lifecycle, Apple's growth came from a wave of replacements.A temporary surge in imports, triggered by uncertainty over new U.S. tariffs, added to the momentum. Continue Reading on AppleInsider | Discuss on our Forums
    0 Commentarii 0 Distribuiri
  • GPU Architecture & Working intuitively explained


    Author(s): Allohvk

    Originally published on Towards AI.

    GPU Origins
    The image displayed on a computer screen is made up of millions of tiny pixels. In early days, “graphics controllers” were given instructions by the CPU on how to calculate the individual pixel values so that the appropriate image could be displayed. These were ok for conventional displays but for a really good gaming experience, images need to be built dozens of times per second. The CPU was not really designed to handle these kind of loads.
    The whole process of creating the image could be parallelized big-time simply by (a) dividing the image into smaller blocks (b) carrying out computations for each block in parallel & (c) grouping them back again. The results of one block don’t influence the results of the other blocks. CPU’s multi-threading capabilities was not really conceived for such massive parallelization. Enter the GPU! Sony first used the term GPU in 1994, in its PlayStation consoles. The technology was perfected by NVIDIA which soon became a leader.
    GPUs have numerous computation cores (much more than a CPU) and gaming programmers could write Shaders — programs to run graphics computations on the GPU in a massively parallelized way to create the screen images in super-fast time. The GPU is inspired by the CPU but was specifically designed to enable massive multi-threaded operations on its numerous computation cores seamlessly. Creating threads, switching between threads etc is much faster on a GPU. Some smart developers also realized that these parallel processing capabilities could be used for other computationally intensive tasks as well!

    2005: Steinkrau implements a simple 2-layer Neural Net on a GPU
    2006: Kumar et. al. trains a CNN model for document processing
    2007: NVIDIA released Compute Unified Device Architecture (CUDA) — a custom language extending C to exploit data parallelism on GPUs. Now developers had much more granular control over the image rendering.
    2008 a landmark paper by Raina et al was released. This paper pretty much showed everyone how to train deep layers on a GPU
    2014: NVIDIA released CuDNN — a dedicated CUDA library for Deep Learning. Very soon PyTorch, TensorFlow etc incorporated CuDNN, setting the stage for modern GPU usage for AI!

    A GPU is an ASIC or Application-Specific Integrated Circuit having a processor (hosting numerous computational cores), a memory soldered onto it (we want to avoid going to the CPU RAM for everything), a cooling system (well, they heat up pretty fast) and a BIOS chip (same role as a CPU — to store settings, run startup diagnostics etc). This card is then plugged into the motherboard slot using the PCI Express interface. The terms GPU and graphics card are often used interchangeably. Some GPUs like the one in Apple M3 do not have a dedicated memory but instead use the system RAM itself which is possible due to its unique design. Google has the TPU (Tensor Processing Unit) which is its own ASIC. We discuss the GPU memory, the processing cores, the LLM workflows happening inside them & common topologies for clustering.
    Photo by Thomas Foster on Unsplash
    1. GPU Memory module — The VRAM
    Instead of having the GPU talk to the regular RAM, it made sense to create another RAM physically closer to the GPU die so that data retrieval is faster. So a graphics card has a memory called VRAM — Video Random Access Memory in addition to the computation engines . VRAM is connected to the computation engine cores via a Bus called the memory interface.
    1.1 What is DRAM?
    Let us talk first of RAM technology in general. All memory whether it is the CPU RAM or the GPU VRAM are mostly based on DRAM technology which consists of a capacitor and a transistor. The capacitor’s charge represents the data stored. Due to its very nature, this charge gradually leaks. To prevent data loss, a refresh circuit periodically rewrites the data back, restoring its charge. Hence the name — Dynamic RAM due to these preiodic refreshes.
    Most computers use Synchronous DDR5 DRAM’s as their CPU RAMs. Synchronous because it utilizes the system clock for better performance. In other words the action (of retrieving & storing data) is operationally coordinated by an external clock signal. Tying the operations to the clock makes it faster. The processor knows the exact timing & number of cycles in which the data will be available from the RAM to the bus & can plan better. We have DDR1 (1st Gen Double Data Rate Synchronous Dynamic RAM released in 2000) to DDR5 which is the choice of CPU RAM as of today.
    1.2 What is SGRAM?
    Let us now talk about the VRAMs in GPUs. The VRAM is a type of SGRAM — Synchronous Graphics RAM. The current generation of VRAMs being used is GDDR6. Yes, this is 6th generation GDDR, the G standing for “Graphics”. While DDR & GDDR share common origins and early couple of generations were similar, the branches separated after DDR3. So as of 2025, DDR5 rules in CPU RAM and GDDR6 rules for consumer-grade GPU RAMs.
    Conceptually DDRs and GDDRs are similar but note that DDRs are used by CPUs which need low latency whereas GDDRs are used by GPUs which are OK to compromise latency for extremely high throughput. Crudely, the former has more frequent smaller calculations & the latter deals with much higher volume of data & some delays are forgiven considering the vast volumes of data being processed. Even more crudely, the former is a bullet train with 6–8 coaches while the latter a 3 Kilometre long goods train.
    1.3 GDDR VRAMs explained in detail
    GDDR memory are individual chips soldered to the PCB (Printed Circuit Board) very close to the GPU die. The physical proximity improves the speed of data transfer from the VRAM to the GPU processor. There are pins in a GDDR which can be thought of as individual wires that connect it to the processor. Bus width is literally the number of such connections. GDDR6 has 32 pins spread across 2 channels with roughly 16 Gbits.p.s bandwidth per pin. Bandwidth is total amount of data being moved & if you had one single metric at your disposal to take a decision, it would be this. Before we go further, let us try to understand this metric intuitively.
    1.4 Calculating GPU Memory Bandwidth intuitively
    Memory Bandwidth is the max rate at which data can be transferred between the GPU and the VRAM. We discussed that data transmission is synchronized with the clock. The clock cycle is measured in hertz & represents the number of cycles per second. Let us say we have a clock operating at 1000 MHz. This literally means 1 billion clock ticks per second. How long does a tick last? Literally 1/(1 billion) i.e. 1 nano second. Data is sent to and fro every clock cycle. So every nano-second, a bus-full of data is sent from the VRAM to the processor & vice versa.
    How many seats on the bus? Well, we discussed this earlier… This is the memory interface or the bus width… literally the physical count of bits that fit into the bus. A 128-bit bus would ferry 128 bits every nano-second. The D in G’D’DR6 stands for Double. Basically, data is transmitted on both the rising and falling edges of the clock cycle, so 256 bits every nano-second. How many bytes in 1 sec? 256/8 i.e. 32 billion bytes per second or better still 32 GB/s as Giga is the preferred term when measuring data. The capital B denotes bytes whereas the small b denotes bits… a source of confusion.
    A more practical formula is: Bandwidth = Clock * Bus Width x Data Rate, where the Data Rate is the number of data transfers per cycle. GDDR6 is Double Data Rate (as just discussed) and Quad pumped, which quadruples the (doubled) speed. So effectively the Data Rate is 8. Sometimes, you may encounter the same information crouched in different semantics. E.g., if frequency of command clock (CK#) is N, then the write command clock (WK#) is 2N. GDDR6 rates then are QDR (quad data rate) in reference to WK# and ODR (Octal Data Rate) in reference to the CK#.
    Some OEMs multiply the clock speed & data rate & call it a clock rate or something. In that case, the bandwidth is simply that number multiplied by the bus width. In general, this raw formula can be used: num_of_transfers per second * num_of_bits per transfer / 8. “Boost clock” mechanism allows the GPU and GDDR memory to operate at even higher speeds than the default clock when conditions allow it. Boost clock metric refers to the max such operating clock speed. A 1750 MHz clock means:

    1.75GHz is the frequency of command clock(CK#).
    The frequency of the write clock (WK#) is 3.5GHz due to the G”D”DR
    The Quad pumping takes it to 3.5*4=14 G bits moved in 1 second from each pin on the bus.
    We could have bus widths of up to 384 bits! So we get a bandwidth of 14*384 Giga bits per second.
    Divide by 8 to get 672 GB/s. GDDR6 bandwidth can go upto 1 TB/s. Wow!

    1.5 What is HBM VRAM in a GPU?
    When reading or writing data, contention is created when the VRAM has occupied memory channels & is busy receiving or delivering other data. This contention creates latency & this affects bandwidth. Increasing the number of memory channels is a great option. A type of memory called HBM (High-Bandwidth Memory) has lower access latency than GDDR6, since it has 8-memory channels versus 2 channels in GDDR6. HBM also has a wider bus.
    HBM has 1024 pins spread across 8 channels of 128 pins with roughly 2 Gbits.p.s bandwidth per pin. Compare this with (an equivalent) GDDR which has 32 pins spread across 2 channels with roughly 16 Gbits. p.s bandwidth per pin. Notice how HBM keeps the Gbit/sec per pin much lower than GDDR. This saves power (which is important as we shall see). In spite of this, it has higher bandwidth than GDDR6 due to the wider bus & higher channels.
    As we discussed, a pin is literally a wire connecting the VRAM to the processor. Having 1024 wires connected from the processor to the VRAM is not possible on a standard PCB. Therefore, an “interposer” is used as an
    intermediary to connect the VRAM & the processor. Just like a regular IC, wires (connections) are etched in this silicon “interposer” in the desired quantity. After this, the HBM device(s) & the processor are mounted atop this “interposer”. The slightly twisted workaround is called a 2.5D architecture.Another difference is that while GDDR chips are soldered to the PCB surrounding the GPU die, an HBM structure is a vertical stack of DRAMs like a high rise building. The stacked memory dies are linked using microscopic wires with TSV (Through-Silicon Vias) which are vertical electrical connections giving super fast connectivity between the DRAMs. There are huge challenges to stacking items vertically especially around designing heat sinks & managing thermal safety but somehow HBM manufacturers have made this happen.
    HBM has become a gold standard today for AI data centers. It was introduced to the Market by SK Hynix in 2013. Today, we have the 3rd generation HBM3 and their main client is Nvidia. Due to investments made way back, SK Hynix is leading the pack along with Samsung and a relatively recent entrant named Micron. We hear a lot about chips and TSMC but HBM is a key technology to watch out for in the coming years. We typically have more than one HBM devices inside the GPU die.
    GDDR6 co-exists with HBM3. The markets are complementary. The former addresses PCs & other consumer GPUs whereas the latter addresses data center GPUs. Ultra large scale AI deployments like ChatGPT likely leverage the use of a cluster of NVIDIA GPUs working in tandem. Connecting such GPU’s involves the use of NVIDIA NVLink technology which requires fast GPU memory bandwidth speeds and it’s the reason why HBM is prevalent in such systems. If not for the wide bus width and fast data transfer rates offered by HBM, these kind of clusters would be very difficult to design.
    Besides the VRAM, GPUs also include high-speed memory caches that are even closer to the GPU’s processing cores. There is a physical limit to the sizes of these caches. An L1 cache is usually in KB and an L2 cache is usually a few MB. Different hardware & software strategies exist to keep the most useful, and most reused data present in caches.
    2. Cooling Mechanisms in a GPU
    Higher clock speeds generally result in increased heat generation necessitating the need for cooling solutions to maintain optimal operating temperatures. Usual cooling methods are:

    Passive Cooling: These do not have any powered moving components. They take advantage of optimized airflow to take heat away.
    Fans are used to dissipate heat by blowing cool air across the heat sinks, which are metal components designed to absorb & disperse heat
    In water cooling, water is circulated through the GPU surface using pipes & a radiator. The hot liquid running through the pipes is in turn cooled down by the radiator fan.
    Hybrid cooling — which uses a combination of the above

    3. GPU Computation cores — Processors
    Let us now talk about the processors on the GPU. Unlike CPUs which contain only a few cores, the GPU literally has 1000’s of cores & specializes in running tasks in parallel across these cores using SIMD (Single Instruction, Multiple Data) units. Let us stick to NVIDIA terminology. There are multiple processing units called Streaming Multiprocessor (SM) on a NVIDIA GPU. For e.g. an H100 has upto 144 SMs. What is inside an SM? Well there are mainly 2 type of execution units — CUDA cores & Tensor cores. There is also a small memory SRAM which is Shared between all threads running in that SM. More specifically, every SM has a few KB memory that is partitioned between L1 cache & Shared Memory usage.
    3.1 CUDA core versus Tensor core in a GPU — The difference
    Tensor cores are a pretty recent innovation (from V100 onwards) and are specifically designed for faster matrix multiplication. Let us discuss CUDA cores first. These are the computation engines for regular math operations. Each CUDA core can execute one operation per clock cycle. But their strength lies in parallel processing. Many CUDA cores working together can accelerate computation by executing processes in parallel.
    Tensor Cores are specialized hardware units designed to accelerate “mixed precision” training. The earliest version allowed 4×4 FP16 matrices to be multiplied & added to an FP32 output matrix. By using lower-precision FP16 inputs in the computations, the calculations are vastly accelarated & by retaining FP32 outputs for the rest of the procedure, accuracy is not compromised too much. Modern tensor cores use even lower precision formats in DL computations. See this for more details. There may also specialized units like the transformer engine designed to accelerate models built with the Transformer blocks. A single GPU can be partitioned into multiple fully contained and isolated instances, with their own memory, cache & cores via MIG or Multi Instance GPU technology.
    3.2 GPU operations — A FLOP show
    Let us now talk about actual operations. A FLOP (Floating Point Operation) is a single floating-point calculation like an addition. Performance of a GPU is usually measured in TeraFLOP/s. Tera is a trillion, FLOP stands for floating-point operations and the ‘s’ stands for per second.
    Most matrix ops involve a multiply and an add. It makes sense to fuse these ops together to get an Fused Multiply-Add (FMA) op. If we know the FMA speed, we can simply double it to get the FLOP counts per clock. To get the peak FLOP/s rate, we multiply this by the clock rate & the number of SMs. Note that we have FP16, FP32, FP64 & Int8 cores with varying speeds. For e.g.:

    Say there are 4 tensor cores in each SM & 114 SMs in an H100
    Say each tensor core delivers 512 FP16 FMA ops per clock. Careful here: Read the specs clearly to check whether the FMA ops per clock metric is per SM or per individual core. For e.g., this link of A100 is per coreper SM
    Let the Clock speed = 1620 MHz
    So TFLOP/s = 1620 * (2*512) * 4 * 114= 756 TFLOP/s of performance! 756 Trillion operations per second. Wow! What would Babbage say to that?

    4. Putting everything together — LLM Operations in a GPU
    Given this immense compute-power, we can now make a reasonable guess that LLM inference is memory-I0 bound, not compute bound. In other words, it takes more time to load data to the GPU’s compute cores than it does for those cores to perform LLM computations on that data. The processing itself is super-fast & there is enough & more compute power available.

    To start with, the training data needs to be downloaded from a remote source to the CPU memory
    From there, it needs to be transferred to the GPU via the system bus and PCIe bus. The host(CPU)-to-device(GPU) bandwidth is limited by the CPU frequency, PCIe bus, GPU devices & the number of PCIe lanes available.
    Once the data & weights are in the GPU VRAM, they are then ferried across to the SRAM where the processors perform operations on it.
    After the operation the data is moved back to the VRAM & from there it is moved back to the CPU RAM. This is a rather simplistic view. Inside the GPU, the tensors are repeatedly moved back and forth between VRAM & SRAM (the memory allocated to an SM). Can you guess why?

    We saw that SRAM size is in KB so large matrices are not going to fit in there … which explains why there is a constant movement between VRAM which holds all the tensors and SRAM which holds the data on which compute operations are performed. So there is typically a memory-op where tensors are moved from VRAM to SRAM, then a compute-op SRAM and memory-op to move tensors back from SRAM to VRAM. Computations like a matrix multiplication involving 2 large matrices need several such memory + compute ops before the action is completed.
    During the training of GPT-3, the tensor cores on the GPUs used were found to be idle ~50% of the time. So, to extract the best from the infrastructure, data movement needs to be fast enough to ensure the computation cores are kept reasonably occupied. Surely, there is scope for some smart person to come up with shortcuts. Enter Flash attention & other such hacks. But that is a story for another day!
    5. Linking GPUs for LLM training — Topologies
    While LLM inferencing is manegable with a readymade collection of GPUs such as a DGX server (contains 8 H100s), LLM training needs far more GPUs. Before we discuss how to connect GPUs for larger workloads, it makes sense to see how CPU servers are connected in a datacentre. I am not an expert in this area, so please feel free to point out any incorrect interpretations I may have made from the references I quote.
    5.1 Generic concepts on linking processors
    Each server has a card attached to it called the Network Interface Card (NIC). RDMA technology enables direct memory access to a remote server via the NIC hardware. RoCE (RDMA over Converged Ethernet) protocol uses the RDMA technology & adapts it to Ethernet networks. So now, a server can talk to a remote server over a network. A network switch is a device connecting multiple servers in a network, enabling them to communicate with each other. This is the basic technology. Now let us come to the topology.
    So we assemble all the servers physically in one place and pile them up vertically them in neat racks.A very basic topology is to connect each server in a rack to a switch that usually sits on Top of the Rack, aptly named the ToR switch. The ToR switches of different racks are connected to a Spine switch. This topology is a basic implementation of Clos topology — named after Charles Clos who invented this scheme to originally arrange telephone nodes in a “leaf-n-spine” arrangement. The leaf switches are nothing but the ToR switches in modern data centers.
    Source: Fig 1–1 from https://www.oreilly.com/library/view/bgp-in-the/9781491983416/ch01.html
    Fat tree is a variant of Clos. Like before, we have servers arranged into racks connecting to Top-of-the-Rack (ToR) switches. ToR switches are connected to the aggregation switches to provide connectivity across racks, forming a pod. The pods are interconnected with spine switches, allowing any-to-any communication across servers. To be noted is the fact that there are multiple paths connecting servers. So there is lot of redundancy built-in.
    In a typical App deployment running hundreds of microservices on dozens of servers, it is useful to have such fully connected, high bandwidth networks. You never know who is going to talk to whom so it never hurts to overprovision on bandwidth and connectivity. However, network loads during AI training do not follow these patterns. They are more predictable & this allows us to build optimized, cheaper & less power-hungry networks.
    5.2 Linking GPUs via proprietary technology like NVLink
    We can strap together H100’s by leveraging the proprietary NVLink & NVSwitch technologies. NVLink provides the high-speed connection between individual GPUs, while NVSwitch is a chip that enables multiple GPUs to communicate through NVLink, forming a high-bandwidth network. See this nice article for details.
    NVIDIA’s P100 GPU introduced the NVLink1. At that time there was no NVSwitch chip, and the GPUs were connected in a ring-like configuration, which resulted in a lack of direct point-to-point communication between GPUs. The NVSwitch1 chip was introduced with the V100, followed by the NVSwitch2 chip with the A100 GPU. We are in the third-generation NVSwitch3 which can support a cluster of up to 256 H100 GPUs. Each H100 GPU in such a cluster is connected to the internal NVSwitch3 chip through 18 NVLink4.0 connections. This is how trillion parameter LLMs are inferenced.
    5.3 Linking GPUs via RoCE in a rail-optimized topology
    But as they say, ye dil mange more… Meta reportedly trains its newer models on a cluster that’s over 100K H100’s. Phew! How to they manage to link it all up? The standard NVLink tricks can only scale to a limited number of GPUs. Beyond that, we have to use the network topologies discussed earlier & fall back on technologies like RoCE, which allows data to be directly transferred from one GPU’s memory to another without involving the CPU.
    So you have 8 GPUs in one DGX server. You have several such DGX servers in the data centre. Each GPU is assigned a NIC (yes!) & connected via RDMA to all other GPUs thru’ a variant of Clos network called “rail-optimized network”. The idea here is to set up dedicated connections between groups of GPUs with rail switches. If a GPU wants to communicate with a GPU which is in a different group, then it has to go thru’ the spine switch (which takes a lil more time). To implement this, each GPU in a DGX server is indexed serially. A rail is the set of GPUs with the same index on different servers & these are interconnected with a rail switch via RDMA. These rail switches are subsequently connected to spine switches forming any-to-any GPU network.
    Source: Fig 1 from https://arxiv.org/pdf/2307.12169
    This topology streamlines traffic flow. It is like having dedicated lanes for high speed vehicles instead of generally mixing all traffic together. Rail paths are direct connections between a bunch of GPUs with same index. Spine switches serve as the connecting points for differently-indexed GPUs. For e.g., communication between GPU1 of server 1 and GPU1 of server 2 happens via their dedicated rail switch 1. If GPU1 of server 1 needs to reach GPU5 of another server, it has to go thru’ a spine switch.
    The workloads are designed so as to minimize data transfers across rails (since it has to go thru’ the extra spine switch). The good news is that this can be neatly done for AI training ensuring that most of the traffic stays within the rails, and does not cut across. In fact, there is a recent paper which suggests that you can consider removing costly spine switches altogether as inter-rail communication is minimal. Can you guess how?
    5.4 Linking GPUs via RoCE in a rail-only topology
    Well, we have the superfast connectivity using NVLink to communicate between a limited set of GPUs (upto 256). So you create these High Bandwith (HB) domains which use NVLink for communication. You have several such HB domains. We then have the same indexing system and rail connections to interconnect the HB domains. But there are no spine switches! Can you guess how GPU1 of HB domain 1 can talk to GPU5 of another HB domain? Yes! Transfer data via superfast NVLink to GPU5 of HB domain 1 first. Then use the dedicated rail of GPU5 to talk to the GPU5 in another HB domain! This is a rail-only topology as oppsed to rail-optimized topology!
    Given these topologies, we can now plan the training pipeline to have pipeline parallelism, tensor parallelism &/or data parallelism but that is a story for another day. See this, this & this for more details. 100K H100’s consume a LOT of power. Tech companies are exploring nuclear power options to generate clean energy needed for long term sustenance. Else, a 100K GPU cluster may have to be broken down to smaller clusters and connected using optical transceivers across the buildings in a campus.
    This (unplanned) article is a prelude to — Optimizing LLM inference: Key Faultlines & workarounds. To deeply understand how we can optimize LLM operations, we need to understand more about the silicon on which they are executed. Though there are lots of manuals/guides on individual aspects like memory, processors, networking etc, I couldn’t find a concise and reader-friendly thread linking together these various aspects & hence took a shot. This is the 9th of a 15-series article titled My LLM diaries.

    LLM Quantization — From concepts to implementation
    LoRA & its newer variants explained like never before
    In-Context learning: The greatest magic show in the kingdom of LLMs
    RAG in plain English — Summary of 100+ papers
    HNSW — Story of the world’s most popular Vector search algorithm
    VectorDB origins, Vamana & on-disk Vector search algorithms
    Taming LLMs — A study of few popular techniques
    Understanding LLM Agents: Concepts, Patterns & Frameworks
    Anatomy of a GPU — A peek into the hardware fuelling LLM operations
    Optimizing LLM Inference — Key Faultlines & workarounds
    LLM Serving — Architecture considerations
    LLM evaluation & other odds and ends
    Look Ma, LLMs without Prompt Engineering
    LLMs on the laptop — A peek into the Silicon
    Taking a step back — On model sentience, conscientiousness & other philosophical aspects

    Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

    Published via Towards AI



    المصدر: https://towardsai.net/p/machine-learning/gpu-architecture-working-intuitively-explained
    GPU Architecture & Working intuitively explained Author(s): Allohvk Originally published on Towards AI. GPU Origins The image displayed on a computer screen is made up of millions of tiny pixels. In early days, “graphics controllers” were given instructions by the CPU on how to calculate the individual pixel values so that the appropriate image could be displayed. These were ok for conventional displays but for a really good gaming experience, images need to be built dozens of times per second. The CPU was not really designed to handle these kind of loads. The whole process of creating the image could be parallelized big-time simply by (a) dividing the image into smaller blocks (b) carrying out computations for each block in parallel & (c) grouping them back again. The results of one block don’t influence the results of the other blocks. CPU’s multi-threading capabilities was not really conceived for such massive parallelization. Enter the GPU! Sony first used the term GPU in 1994, in its PlayStation consoles. The technology was perfected by NVIDIA which soon became a leader. GPUs have numerous computation cores (much more than a CPU) and gaming programmers could write Shaders — programs to run graphics computations on the GPU in a massively parallelized way to create the screen images in super-fast time. The GPU is inspired by the CPU but was specifically designed to enable massive multi-threaded operations on its numerous computation cores seamlessly. Creating threads, switching between threads etc is much faster on a GPU. Some smart developers also realized that these parallel processing capabilities could be used for other computationally intensive tasks as well! 2005: Steinkrau implements a simple 2-layer Neural Net on a GPU 2006: Kumar et. al. trains a CNN model for document processing 2007: NVIDIA released Compute Unified Device Architecture (CUDA) — a custom language extending C to exploit data parallelism on GPUs. Now developers had much more granular control over the image rendering. 2008 a landmark paper by Raina et al was released. This paper pretty much showed everyone how to train deep layers on a GPU 2014: NVIDIA released CuDNN — a dedicated CUDA library for Deep Learning. Very soon PyTorch, TensorFlow etc incorporated CuDNN, setting the stage for modern GPU usage for AI! A GPU is an ASIC or Application-Specific Integrated Circuit having a processor (hosting numerous computational cores), a memory soldered onto it (we want to avoid going to the CPU RAM for everything), a cooling system (well, they heat up pretty fast) and a BIOS chip (same role as a CPU — to store settings, run startup diagnostics etc). This card is then plugged into the motherboard slot using the PCI Express interface. The terms GPU and graphics card are often used interchangeably. Some GPUs like the one in Apple M3 do not have a dedicated memory but instead use the system RAM itself which is possible due to its unique design. Google has the TPU (Tensor Processing Unit) which is its own ASIC. We discuss the GPU memory, the processing cores, the LLM workflows happening inside them & common topologies for clustering. Photo by Thomas Foster on Unsplash 1. GPU Memory module — The VRAM Instead of having the GPU talk to the regular RAM, it made sense to create another RAM physically closer to the GPU die so that data retrieval is faster. So a graphics card has a memory called VRAM — Video Random Access Memory in addition to the computation engines . VRAM is connected to the computation engine cores via a Bus called the memory interface. 1.1 What is DRAM? Let us talk first of RAM technology in general. All memory whether it is the CPU RAM or the GPU VRAM are mostly based on DRAM technology which consists of a capacitor and a transistor. The capacitor’s charge represents the data stored. Due to its very nature, this charge gradually leaks. To prevent data loss, a refresh circuit periodically rewrites the data back, restoring its charge. Hence the name — Dynamic RAM due to these preiodic refreshes. Most computers use Synchronous DDR5 DRAM’s as their CPU RAMs. Synchronous because it utilizes the system clock for better performance. In other words the action (of retrieving & storing data) is operationally coordinated by an external clock signal. Tying the operations to the clock makes it faster. The processor knows the exact timing & number of cycles in which the data will be available from the RAM to the bus & can plan better. We have DDR1 (1st Gen Double Data Rate Synchronous Dynamic RAM released in 2000) to DDR5 which is the choice of CPU RAM as of today. 1.2 What is SGRAM? Let us now talk about the VRAMs in GPUs. The VRAM is a type of SGRAM — Synchronous Graphics RAM. The current generation of VRAMs being used is GDDR6. Yes, this is 6th generation GDDR, the G standing for “Graphics”. While DDR & GDDR share common origins and early couple of generations were similar, the branches separated after DDR3. So as of 2025, DDR5 rules in CPU RAM and GDDR6 rules for consumer-grade GPU RAMs. Conceptually DDRs and GDDRs are similar but note that DDRs are used by CPUs which need low latency whereas GDDRs are used by GPUs which are OK to compromise latency for extremely high throughput. Crudely, the former has more frequent smaller calculations & the latter deals with much higher volume of data & some delays are forgiven considering the vast volumes of data being processed. Even more crudely, the former is a bullet train with 6–8 coaches while the latter a 3 Kilometre long goods train. 1.3 GDDR VRAMs explained in detail GDDR memory are individual chips soldered to the PCB (Printed Circuit Board) very close to the GPU die. The physical proximity improves the speed of data transfer from the VRAM to the GPU processor. There are pins in a GDDR which can be thought of as individual wires that connect it to the processor. Bus width is literally the number of such connections. GDDR6 has 32 pins spread across 2 channels with roughly 16 Gbits.p.s bandwidth per pin. Bandwidth is total amount of data being moved & if you had one single metric at your disposal to take a decision, it would be this. Before we go further, let us try to understand this metric intuitively. 1.4 Calculating GPU Memory Bandwidth intuitively Memory Bandwidth is the max rate at which data can be transferred between the GPU and the VRAM. We discussed that data transmission is synchronized with the clock. The clock cycle is measured in hertz & represents the number of cycles per second. Let us say we have a clock operating at 1000 MHz. This literally means 1 billion clock ticks per second. How long does a tick last? Literally 1/(1 billion) i.e. 1 nano second. Data is sent to and fro every clock cycle. So every nano-second, a bus-full of data is sent from the VRAM to the processor & vice versa. How many seats on the bus? Well, we discussed this earlier… This is the memory interface or the bus width… literally the physical count of bits that fit into the bus. A 128-bit bus would ferry 128 bits every nano-second. The D in G’D’DR6 stands for Double. Basically, data is transmitted on both the rising and falling edges of the clock cycle, so 256 bits every nano-second. How many bytes in 1 sec? 256/8 i.e. 32 billion bytes per second or better still 32 GB/s as Giga is the preferred term when measuring data. The capital B denotes bytes whereas the small b denotes bits… a source of confusion. A more practical formula is: Bandwidth = Clock * Bus Width x Data Rate, where the Data Rate is the number of data transfers per cycle. GDDR6 is Double Data Rate (as just discussed) and Quad pumped, which quadruples the (doubled) speed. So effectively the Data Rate is 8. Sometimes, you may encounter the same information crouched in different semantics. E.g., if frequency of command clock (CK#) is N, then the write command clock (WK#) is 2N. GDDR6 rates then are QDR (quad data rate) in reference to WK# and ODR (Octal Data Rate) in reference to the CK#. Some OEMs multiply the clock speed & data rate & call it a clock rate or something. In that case, the bandwidth is simply that number multiplied by the bus width. In general, this raw formula can be used: num_of_transfers per second * num_of_bits per transfer / 8. “Boost clock” mechanism allows the GPU and GDDR memory to operate at even higher speeds than the default clock when conditions allow it. Boost clock metric refers to the max such operating clock speed. A 1750 MHz clock means: 1.75GHz is the frequency of command clock(CK#). The frequency of the write clock (WK#) is 3.5GHz due to the G”D”DR The Quad pumping takes it to 3.5*4=14 G bits moved in 1 second from each pin on the bus. We could have bus widths of up to 384 bits! So we get a bandwidth of 14*384 Giga bits per second. Divide by 8 to get 672 GB/s. GDDR6 bandwidth can go upto 1 TB/s. Wow! 1.5 What is HBM VRAM in a GPU? When reading or writing data, contention is created when the VRAM has occupied memory channels & is busy receiving or delivering other data. This contention creates latency & this affects bandwidth. Increasing the number of memory channels is a great option. A type of memory called HBM (High-Bandwidth Memory) has lower access latency than GDDR6, since it has 8-memory channels versus 2 channels in GDDR6. HBM also has a wider bus. HBM has 1024 pins spread across 8 channels of 128 pins with roughly 2 Gbits.p.s bandwidth per pin. Compare this with (an equivalent) GDDR which has 32 pins spread across 2 channels with roughly 16 Gbits. p.s bandwidth per pin. Notice how HBM keeps the Gbit/sec per pin much lower than GDDR. This saves power (which is important as we shall see). In spite of this, it has higher bandwidth than GDDR6 due to the wider bus & higher channels. As we discussed, a pin is literally a wire connecting the VRAM to the processor. Having 1024 wires connected from the processor to the VRAM is not possible on a standard PCB. Therefore, an “interposer” is used as an intermediary to connect the VRAM & the processor. Just like a regular IC, wires (connections) are etched in this silicon “interposer” in the desired quantity. After this, the HBM device(s) & the processor are mounted atop this “interposer”. The slightly twisted workaround is called a 2.5D architecture.Another difference is that while GDDR chips are soldered to the PCB surrounding the GPU die, an HBM structure is a vertical stack of DRAMs like a high rise building. The stacked memory dies are linked using microscopic wires with TSV (Through-Silicon Vias) which are vertical electrical connections giving super fast connectivity between the DRAMs. There are huge challenges to stacking items vertically especially around designing heat sinks & managing thermal safety but somehow HBM manufacturers have made this happen. HBM has become a gold standard today for AI data centers. It was introduced to the Market by SK Hynix in 2013. Today, we have the 3rd generation HBM3 and their main client is Nvidia. Due to investments made way back, SK Hynix is leading the pack along with Samsung and a relatively recent entrant named Micron. We hear a lot about chips and TSMC but HBM is a key technology to watch out for in the coming years. We typically have more than one HBM devices inside the GPU die. GDDR6 co-exists with HBM3. The markets are complementary. The former addresses PCs & other consumer GPUs whereas the latter addresses data center GPUs. Ultra large scale AI deployments like ChatGPT likely leverage the use of a cluster of NVIDIA GPUs working in tandem. Connecting such GPU’s involves the use of NVIDIA NVLink technology which requires fast GPU memory bandwidth speeds and it’s the reason why HBM is prevalent in such systems. If not for the wide bus width and fast data transfer rates offered by HBM, these kind of clusters would be very difficult to design. Besides the VRAM, GPUs also include high-speed memory caches that are even closer to the GPU’s processing cores. There is a physical limit to the sizes of these caches. An L1 cache is usually in KB and an L2 cache is usually a few MB. Different hardware & software strategies exist to keep the most useful, and most reused data present in caches. 2. Cooling Mechanisms in a GPU Higher clock speeds generally result in increased heat generation necessitating the need for cooling solutions to maintain optimal operating temperatures. Usual cooling methods are: Passive Cooling: These do not have any powered moving components. They take advantage of optimized airflow to take heat away. Fans are used to dissipate heat by blowing cool air across the heat sinks, which are metal components designed to absorb & disperse heat In water cooling, water is circulated through the GPU surface using pipes & a radiator. The hot liquid running through the pipes is in turn cooled down by the radiator fan. Hybrid cooling — which uses a combination of the above 3. GPU Computation cores — Processors Let us now talk about the processors on the GPU. Unlike CPUs which contain only a few cores, the GPU literally has 1000’s of cores & specializes in running tasks in parallel across these cores using SIMD (Single Instruction, Multiple Data) units. Let us stick to NVIDIA terminology. There are multiple processing units called Streaming Multiprocessor (SM) on a NVIDIA GPU. For e.g. an H100 has upto 144 SMs. What is inside an SM? Well there are mainly 2 type of execution units — CUDA cores & Tensor cores. There is also a small memory SRAM which is Shared between all threads running in that SM. More specifically, every SM has a few KB memory that is partitioned between L1 cache & Shared Memory usage. 3.1 CUDA core versus Tensor core in a GPU — The difference Tensor cores are a pretty recent innovation (from V100 onwards) and are specifically designed for faster matrix multiplication. Let us discuss CUDA cores first. These are the computation engines for regular math operations. Each CUDA core can execute one operation per clock cycle. But their strength lies in parallel processing. Many CUDA cores working together can accelerate computation by executing processes in parallel. Tensor Cores are specialized hardware units designed to accelerate “mixed precision” training. The earliest version allowed 4×4 FP16 matrices to be multiplied & added to an FP32 output matrix. By using lower-precision FP16 inputs in the computations, the calculations are vastly accelarated & by retaining FP32 outputs for the rest of the procedure, accuracy is not compromised too much. Modern tensor cores use even lower precision formats in DL computations. See this for more details. There may also specialized units like the transformer engine designed to accelerate models built with the Transformer blocks. A single GPU can be partitioned into multiple fully contained and isolated instances, with their own memory, cache & cores via MIG or Multi Instance GPU technology. 3.2 GPU operations — A FLOP show Let us now talk about actual operations. A FLOP (Floating Point Operation) is a single floating-point calculation like an addition. Performance of a GPU is usually measured in TeraFLOP/s. Tera is a trillion, FLOP stands for floating-point operations and the ‘s’ stands for per second. Most matrix ops involve a multiply and an add. It makes sense to fuse these ops together to get an Fused Multiply-Add (FMA) op. If we know the FMA speed, we can simply double it to get the FLOP counts per clock. To get the peak FLOP/s rate, we multiply this by the clock rate & the number of SMs. Note that we have FP16, FP32, FP64 & Int8 cores with varying speeds. For e.g.: Say there are 4 tensor cores in each SM & 114 SMs in an H100 Say each tensor core delivers 512 FP16 FMA ops per clock. Careful here: Read the specs clearly to check whether the FMA ops per clock metric is per SM or per individual core. For e.g., this link of A100 is per coreper SM Let the Clock speed = 1620 MHz So TFLOP/s = 1620 * (2*512) * 4 * 114= 756 TFLOP/s of performance! 756 Trillion operations per second. Wow! What would Babbage say to that? 4. Putting everything together — LLM Operations in a GPU Given this immense compute-power, we can now make a reasonable guess that LLM inference is memory-I0 bound, not compute bound. In other words, it takes more time to load data to the GPU’s compute cores than it does for those cores to perform LLM computations on that data. The processing itself is super-fast & there is enough & more compute power available. To start with, the training data needs to be downloaded from a remote source to the CPU memory From there, it needs to be transferred to the GPU via the system bus and PCIe bus. The host(CPU)-to-device(GPU) bandwidth is limited by the CPU frequency, PCIe bus, GPU devices & the number of PCIe lanes available. Once the data & weights are in the GPU VRAM, they are then ferried across to the SRAM where the processors perform operations on it. After the operation the data is moved back to the VRAM & from there it is moved back to the CPU RAM. This is a rather simplistic view. Inside the GPU, the tensors are repeatedly moved back and forth between VRAM & SRAM (the memory allocated to an SM). Can you guess why? We saw that SRAM size is in KB so large matrices are not going to fit in there … which explains why there is a constant movement between VRAM which holds all the tensors and SRAM which holds the data on which compute operations are performed. So there is typically a memory-op where tensors are moved from VRAM to SRAM, then a compute-op SRAM and memory-op to move tensors back from SRAM to VRAM. Computations like a matrix multiplication involving 2 large matrices need several such memory + compute ops before the action is completed. During the training of GPT-3, the tensor cores on the GPUs used were found to be idle ~50% of the time. So, to extract the best from the infrastructure, data movement needs to be fast enough to ensure the computation cores are kept reasonably occupied. Surely, there is scope for some smart person to come up with shortcuts. Enter Flash attention & other such hacks. But that is a story for another day! 5. Linking GPUs for LLM training — Topologies While LLM inferencing is manegable with a readymade collection of GPUs such as a DGX server (contains 8 H100s), LLM training needs far more GPUs. Before we discuss how to connect GPUs for larger workloads, it makes sense to see how CPU servers are connected in a datacentre. I am not an expert in this area, so please feel free to point out any incorrect interpretations I may have made from the references I quote. 5.1 Generic concepts on linking processors Each server has a card attached to it called the Network Interface Card (NIC). RDMA technology enables direct memory access to a remote server via the NIC hardware. RoCE (RDMA over Converged Ethernet) protocol uses the RDMA technology & adapts it to Ethernet networks. So now, a server can talk to a remote server over a network. A network switch is a device connecting multiple servers in a network, enabling them to communicate with each other. This is the basic technology. Now let us come to the topology. So we assemble all the servers physically in one place and pile them up vertically them in neat racks.A very basic topology is to connect each server in a rack to a switch that usually sits on Top of the Rack, aptly named the ToR switch. The ToR switches of different racks are connected to a Spine switch. This topology is a basic implementation of Clos topology — named after Charles Clos who invented this scheme to originally arrange telephone nodes in a “leaf-n-spine” arrangement. The leaf switches are nothing but the ToR switches in modern data centers. Source: Fig 1–1 from https://www.oreilly.com/library/view/bgp-in-the/9781491983416/ch01.html Fat tree is a variant of Clos. Like before, we have servers arranged into racks connecting to Top-of-the-Rack (ToR) switches. ToR switches are connected to the aggregation switches to provide connectivity across racks, forming a pod. The pods are interconnected with spine switches, allowing any-to-any communication across servers. To be noted is the fact that there are multiple paths connecting servers. So there is lot of redundancy built-in. In a typical App deployment running hundreds of microservices on dozens of servers, it is useful to have such fully connected, high bandwidth networks. You never know who is going to talk to whom so it never hurts to overprovision on bandwidth and connectivity. However, network loads during AI training do not follow these patterns. They are more predictable & this allows us to build optimized, cheaper & less power-hungry networks. 5.2 Linking GPUs via proprietary technology like NVLink We can strap together H100’s by leveraging the proprietary NVLink & NVSwitch technologies. NVLink provides the high-speed connection between individual GPUs, while NVSwitch is a chip that enables multiple GPUs to communicate through NVLink, forming a high-bandwidth network. See this nice article for details. NVIDIA’s P100 GPU introduced the NVLink1. At that time there was no NVSwitch chip, and the GPUs were connected in a ring-like configuration, which resulted in a lack of direct point-to-point communication between GPUs. The NVSwitch1 chip was introduced with the V100, followed by the NVSwitch2 chip with the A100 GPU. We are in the third-generation NVSwitch3 which can support a cluster of up to 256 H100 GPUs. Each H100 GPU in such a cluster is connected to the internal NVSwitch3 chip through 18 NVLink4.0 connections. This is how trillion parameter LLMs are inferenced. 5.3 Linking GPUs via RoCE in a rail-optimized topology But as they say, ye dil mange more… Meta reportedly trains its newer models on a cluster that’s over 100K H100’s. Phew! How to they manage to link it all up? The standard NVLink tricks can only scale to a limited number of GPUs. Beyond that, we have to use the network topologies discussed earlier & fall back on technologies like RoCE, which allows data to be directly transferred from one GPU’s memory to another without involving the CPU. So you have 8 GPUs in one DGX server. You have several such DGX servers in the data centre. Each GPU is assigned a NIC (yes!) & connected via RDMA to all other GPUs thru’ a variant of Clos network called “rail-optimized network”. The idea here is to set up dedicated connections between groups of GPUs with rail switches. If a GPU wants to communicate with a GPU which is in a different group, then it has to go thru’ the spine switch (which takes a lil more time). To implement this, each GPU in a DGX server is indexed serially. A rail is the set of GPUs with the same index on different servers & these are interconnected with a rail switch via RDMA. These rail switches are subsequently connected to spine switches forming any-to-any GPU network. Source: Fig 1 from https://arxiv.org/pdf/2307.12169 This topology streamlines traffic flow. It is like having dedicated lanes for high speed vehicles instead of generally mixing all traffic together. Rail paths are direct connections between a bunch of GPUs with same index. Spine switches serve as the connecting points for differently-indexed GPUs. For e.g., communication between GPU1 of server 1 and GPU1 of server 2 happens via their dedicated rail switch 1. If GPU1 of server 1 needs to reach GPU5 of another server, it has to go thru’ a spine switch. The workloads are designed so as to minimize data transfers across rails (since it has to go thru’ the extra spine switch). The good news is that this can be neatly done for AI training ensuring that most of the traffic stays within the rails, and does not cut across. In fact, there is a recent paper which suggests that you can consider removing costly spine switches altogether as inter-rail communication is minimal. Can you guess how? 5.4 Linking GPUs via RoCE in a rail-only topology Well, we have the superfast connectivity using NVLink to communicate between a limited set of GPUs (upto 256). So you create these High Bandwith (HB) domains which use NVLink for communication. You have several such HB domains. We then have the same indexing system and rail connections to interconnect the HB domains. But there are no spine switches! Can you guess how GPU1 of HB domain 1 can talk to GPU5 of another HB domain? Yes! Transfer data via superfast NVLink to GPU5 of HB domain 1 first. Then use the dedicated rail of GPU5 to talk to the GPU5 in another HB domain! This is a rail-only topology as oppsed to rail-optimized topology! Given these topologies, we can now plan the training pipeline to have pipeline parallelism, tensor parallelism &/or data parallelism but that is a story for another day. See this, this & this for more details. 100K H100’s consume a LOT of power. Tech companies are exploring nuclear power options to generate clean energy needed for long term sustenance. Else, a 100K GPU cluster may have to be broken down to smaller clusters and connected using optical transceivers across the buildings in a campus. This (unplanned) article is a prelude to — Optimizing LLM inference: Key Faultlines & workarounds. To deeply understand how we can optimize LLM operations, we need to understand more about the silicon on which they are executed. Though there are lots of manuals/guides on individual aspects like memory, processors, networking etc, I couldn’t find a concise and reader-friendly thread linking together these various aspects & hence took a shot. This is the 9th of a 15-series article titled My LLM diaries. LLM Quantization — From concepts to implementation LoRA & its newer variants explained like never before In-Context learning: The greatest magic show in the kingdom of LLMs RAG in plain English — Summary of 100+ papers HNSW — Story of the world’s most popular Vector search algorithm VectorDB origins, Vamana & on-disk Vector search algorithms Taming LLMs — A study of few popular techniques Understanding LLM Agents: Concepts, Patterns & Frameworks Anatomy of a GPU — A peek into the hardware fuelling LLM operations Optimizing LLM Inference — Key Faultlines & workarounds LLM Serving — Architecture considerations LLM evaluation & other odds and ends Look Ma, LLMs without Prompt Engineering LLMs on the laptop — A peek into the Silicon Taking a step back — On model sentience, conscientiousness & other philosophical aspects Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI المصدر: https://towardsai.net/p/machine-learning/gpu-architecture-working-intuitively-explained
    TOWARDSAI.NET
    GPU Architecture & Working intuitively explained
    Author(s): Allohvk Originally published on Towards AI. GPU Origins The image displayed on a computer screen is made up of millions of tiny pixels. In early days, “graphics controllers” were given instructions by the CPU on how to calculate the individual pixel values so that the appropriate image could be displayed. These were ok for conventional displays but for a really good gaming experience, images need to be built dozens of times per second. The CPU was not really designed to handle these kind of loads. The whole process of creating the image could be parallelized big-time simply by (a) dividing the image into smaller blocks (b) carrying out computations for each block in parallel & (c) grouping them back again. The results of one block don’t influence the results of the other blocks. CPU’s multi-threading capabilities was not really conceived for such massive parallelization. Enter the GPU! Sony first used the term GPU in 1994, in its PlayStation consoles. The technology was perfected by NVIDIA which soon became a leader. GPUs have numerous computation cores (much more than a CPU) and gaming programmers could write Shaders — programs to run graphics computations on the GPU in a massively parallelized way to create the screen images in super-fast time. The GPU is inspired by the CPU but was specifically designed to enable massive multi-threaded operations on its numerous computation cores seamlessly. Creating threads, switching between threads etc is much faster on a GPU. Some smart developers also realized that these parallel processing capabilities could be used for other computationally intensive tasks as well! 2005: Steinkrau implements a simple 2-layer Neural Net on a GPU 2006: Kumar et. al. trains a CNN model for document processing 2007: NVIDIA released Compute Unified Device Architecture (CUDA) — a custom language extending C to exploit data parallelism on GPUs. Now developers had much more granular control over the image rendering. 2008 a landmark paper by Raina et al was released. This paper pretty much showed everyone how to train deep layers on a GPU 2014: NVIDIA released CuDNN — a dedicated CUDA library for Deep Learning. Very soon PyTorch, TensorFlow etc incorporated CuDNN, setting the stage for modern GPU usage for AI! A GPU is an ASIC or Application-Specific Integrated Circuit having a processor (hosting numerous computational cores), a memory soldered onto it (we want to avoid going to the CPU RAM for everything), a cooling system (well, they heat up pretty fast) and a BIOS chip (same role as a CPU — to store settings, run startup diagnostics etc). This card is then plugged into the motherboard slot using the PCI Express interface. The terms GPU and graphics card are often used interchangeably. Some GPUs like the one in Apple M3 do not have a dedicated memory but instead use the system RAM itself which is possible due to its unique design. Google has the TPU (Tensor Processing Unit) which is its own ASIC. We discuss the GPU memory, the processing cores, the LLM workflows happening inside them & common topologies for clustering. Photo by Thomas Foster on Unsplash 1. GPU Memory module — The VRAM Instead of having the GPU talk to the regular RAM, it made sense to create another RAM physically closer to the GPU die so that data retrieval is faster. So a graphics card has a memory called VRAM — Video Random Access Memory in addition to the computation engines . VRAM is connected to the computation engine cores via a Bus called the memory interface. 1.1 What is DRAM? Let us talk first of RAM technology in general. All memory whether it is the CPU RAM or the GPU VRAM are mostly based on DRAM technology which consists of a capacitor and a transistor. The capacitor’s charge represents the data stored. Due to its very nature, this charge gradually leaks. To prevent data loss, a refresh circuit periodically rewrites the data back, restoring its charge. Hence the name — Dynamic RAM due to these preiodic refreshes. Most computers use Synchronous DDR5 DRAM’s as their CPU RAMs. Synchronous because it utilizes the system clock for better performance. In other words the action (of retrieving & storing data) is operationally coordinated by an external clock signal. Tying the operations to the clock makes it faster. The processor knows the exact timing & number of cycles in which the data will be available from the RAM to the bus & can plan better. We have DDR1 (1st Gen Double Data Rate Synchronous Dynamic RAM released in 2000) to DDR5 which is the choice of CPU RAM as of today. 1.2 What is SGRAM? Let us now talk about the VRAMs in GPUs. The VRAM is a type of SGRAM — Synchronous Graphics RAM. The current generation of VRAMs being used is GDDR6. Yes, this is 6th generation GDDR, the G standing for “Graphics”. While DDR & GDDR share common origins and early couple of generations were similar, the branches separated after DDR3. So as of 2025, DDR5 rules in CPU RAM and GDDR6 rules for consumer-grade GPU RAMs. Conceptually DDRs and GDDRs are similar but note that DDRs are used by CPUs which need low latency whereas GDDRs are used by GPUs which are OK to compromise latency for extremely high throughput. Crudely, the former has more frequent smaller calculations & the latter deals with much higher volume of data & some delays are forgiven considering the vast volumes of data being processed. Even more crudely, the former is a bullet train with 6–8 coaches while the latter a 3 Kilometre long goods train. 1.3 GDDR VRAMs explained in detail GDDR memory are individual chips soldered to the PCB (Printed Circuit Board) very close to the GPU die. The physical proximity improves the speed of data transfer from the VRAM to the GPU processor. There are pins in a GDDR which can be thought of as individual wires that connect it to the processor. Bus width is literally the number of such connections. GDDR6 has 32 pins spread across 2 channels with roughly 16 Gbits.p.s bandwidth per pin. Bandwidth is total amount of data being moved & if you had one single metric at your disposal to take a decision, it would be this. Before we go further, let us try to understand this metric intuitively. 1.4 Calculating GPU Memory Bandwidth intuitively Memory Bandwidth is the max rate at which data can be transferred between the GPU and the VRAM. We discussed that data transmission is synchronized with the clock. The clock cycle is measured in hertz & represents the number of cycles per second. Let us say we have a clock operating at 1000 MHz. This literally means 1 billion clock ticks per second. How long does a tick last? Literally 1/(1 billion) i.e. 1 nano second. Data is sent to and fro every clock cycle. So every nano-second, a bus-full of data is sent from the VRAM to the processor & vice versa. How many seats on the bus? Well, we discussed this earlier… This is the memory interface or the bus width… literally the physical count of bits that fit into the bus. A 128-bit bus would ferry 128 bits every nano-second. The D in G’D’DR6 stands for Double. Basically, data is transmitted on both the rising and falling edges of the clock cycle, so 256 bits every nano-second. How many bytes in 1 sec? 256/8 i.e. 32 billion bytes per second or better still 32 GB/s as Giga is the preferred term when measuring data. The capital B denotes bytes whereas the small b denotes bits… a source of confusion. A more practical formula is: Bandwidth = Clock * Bus Width x Data Rate, where the Data Rate is the number of data transfers per cycle. GDDR6 is Double Data Rate (as just discussed) and Quad pumped, which quadruples the (doubled) speed. So effectively the Data Rate is 8. Sometimes, you may encounter the same information crouched in different semantics. E.g., if frequency of command clock (CK#) is N, then the write command clock (WK#) is 2N. GDDR6 rates then are QDR (quad data rate) in reference to WK# and ODR (Octal Data Rate) in reference to the CK#. Some OEMs multiply the clock speed & data rate & call it a clock rate or something. In that case, the bandwidth is simply that number multiplied by the bus width. In general, this raw formula can be used: num_of_transfers per second * num_of_bits per transfer / 8. “Boost clock” mechanism allows the GPU and GDDR memory to operate at even higher speeds than the default clock when conditions allow it. Boost clock metric refers to the max such operating clock speed. A 1750 MHz clock means: 1.75GHz is the frequency of command clock(CK#). The frequency of the write clock (WK#) is 3.5GHz due to the G”D”DR The Quad pumping takes it to 3.5*4=14 G bits moved in 1 second from each pin on the bus. We could have bus widths of up to 384 bits! So we get a bandwidth of 14*384 Giga bits per second. Divide by 8 to get 672 GB/s. GDDR6 bandwidth can go upto 1 TB/s. Wow! 1.5 What is HBM VRAM in a GPU? When reading or writing data, contention is created when the VRAM has occupied memory channels & is busy receiving or delivering other data. This contention creates latency & this affects bandwidth. Increasing the number of memory channels is a great option. A type of memory called HBM (High-Bandwidth Memory) has lower access latency than GDDR6, since it has 8-memory channels versus 2 channels in GDDR6. HBM also has a wider bus. HBM has 1024 pins spread across 8 channels of 128 pins with roughly 2 Gbits.p.s bandwidth per pin. Compare this with (an equivalent) GDDR which has 32 pins spread across 2 channels with roughly 16 Gbits. p.s bandwidth per pin. Notice how HBM keeps the Gbit/sec per pin much lower than GDDR. This saves power (which is important as we shall see). In spite of this, it has higher bandwidth than GDDR6 due to the wider bus & higher channels. As we discussed, a pin is literally a wire connecting the VRAM to the processor. Having 1024 wires connected from the processor to the VRAM is not possible on a standard PCB. Therefore, an “interposer” is used as an intermediary to connect the VRAM & the processor. Just like a regular IC, wires (connections) are etched in this silicon “interposer” in the desired quantity. After this, the HBM device(s) & the processor are mounted atop this “interposer”. The slightly twisted workaround is called a 2.5D architecture.Another difference is that while GDDR chips are soldered to the PCB surrounding the GPU die, an HBM structure is a vertical stack of DRAMs like a high rise building. The stacked memory dies are linked using microscopic wires with TSV (Through-Silicon Vias) which are vertical electrical connections giving super fast connectivity between the DRAMs. There are huge challenges to stacking items vertically especially around designing heat sinks & managing thermal safety but somehow HBM manufacturers have made this happen. HBM has become a gold standard today for AI data centers. It was introduced to the Market by SK Hynix in 2013. Today, we have the 3rd generation HBM3 and their main client is Nvidia. Due to investments made way back, SK Hynix is leading the pack along with Samsung and a relatively recent entrant named Micron. We hear a lot about chips and TSMC but HBM is a key technology to watch out for in the coming years. We typically have more than one HBM devices inside the GPU die. GDDR6 co-exists with HBM3. The markets are complementary. The former addresses PCs & other consumer GPUs whereas the latter addresses data center GPUs. Ultra large scale AI deployments like ChatGPT likely leverage the use of a cluster of NVIDIA GPUs working in tandem. Connecting such GPU’s involves the use of NVIDIA NVLink technology which requires fast GPU memory bandwidth speeds and it’s the reason why HBM is prevalent in such systems. If not for the wide bus width and fast data transfer rates offered by HBM, these kind of clusters would be very difficult to design. Besides the VRAM, GPUs also include high-speed memory caches that are even closer to the GPU’s processing cores. There is a physical limit to the sizes of these caches. An L1 cache is usually in KB and an L2 cache is usually a few MB. Different hardware & software strategies exist to keep the most useful, and most reused data present in caches. 2. Cooling Mechanisms in a GPU Higher clock speeds generally result in increased heat generation necessitating the need for cooling solutions to maintain optimal operating temperatures. Usual cooling methods are: Passive Cooling: These do not have any powered moving components. They take advantage of optimized airflow to take heat away. Fans are used to dissipate heat by blowing cool air across the heat sinks, which are metal components designed to absorb & disperse heat In water cooling, water is circulated through the GPU surface using pipes & a radiator. The hot liquid running through the pipes is in turn cooled down by the radiator fan. Hybrid cooling — which uses a combination of the above 3. GPU Computation cores — Processors Let us now talk about the processors on the GPU. Unlike CPUs which contain only a few cores, the GPU literally has 1000’s of cores & specializes in running tasks in parallel across these cores using SIMD (Single Instruction, Multiple Data) units. Let us stick to NVIDIA terminology. There are multiple processing units called Streaming Multiprocessor (SM) on a NVIDIA GPU. For e.g. an H100 has upto 144 SMs. What is inside an SM? Well there are mainly 2 type of execution units — CUDA cores & Tensor cores. There is also a small memory SRAM which is Shared between all threads running in that SM. More specifically, every SM has a few KB memory that is partitioned between L1 cache & Shared Memory usage. 3.1 CUDA core versus Tensor core in a GPU — The difference Tensor cores are a pretty recent innovation (from V100 onwards) and are specifically designed for faster matrix multiplication. Let us discuss CUDA cores first. These are the computation engines for regular math operations. Each CUDA core can execute one operation per clock cycle. But their strength lies in parallel processing. Many CUDA cores working together can accelerate computation by executing processes in parallel. Tensor Cores are specialized hardware units designed to accelerate “mixed precision” training. The earliest version allowed 4×4 FP16 matrices to be multiplied & added to an FP32 output matrix. By using lower-precision FP16 inputs in the computations, the calculations are vastly accelarated & by retaining FP32 outputs for the rest of the procedure, accuracy is not compromised too much. Modern tensor cores use even lower precision formats in DL computations. See this for more details. There may also specialized units like the transformer engine designed to accelerate models built with the Transformer blocks. A single GPU can be partitioned into multiple fully contained and isolated instances, with their own memory, cache & cores via MIG or Multi Instance GPU technology. 3.2 GPU operations — A FLOP show Let us now talk about actual operations. A FLOP (Floating Point Operation) is a single floating-point calculation like an addition. Performance of a GPU is usually measured in TeraFLOP/s. Tera is a trillion, FLOP stands for floating-point operations and the ‘s’ stands for per second. Most matrix ops involve a multiply and an add. It makes sense to fuse these ops together to get an Fused Multiply-Add (FMA) op. If we know the FMA speed, we can simply double it to get the FLOP counts per clock. To get the peak FLOP/s rate, we multiply this by the clock rate & the number of SMs. Note that we have FP16, FP32, FP64 & Int8 cores with varying speeds. For e.g.: Say there are 4 tensor cores in each SM & 114 SMs in an H100 Say each tensor core delivers 512 FP16 FMA ops per clock. Careful here: Read the specs clearly to check whether the FMA ops per clock metric is per SM or per individual core. For e.g., this link of A100 is per coreper SM Let the Clock speed = 1620 MHz So TFLOP/s = 1620 * (2*512) * 4 * 114= 756 TFLOP/s of performance! 756 Trillion operations per second. Wow! What would Babbage say to that? 4. Putting everything together — LLM Operations in a GPU Given this immense compute-power, we can now make a reasonable guess that LLM inference is memory-I0 bound, not compute bound. In other words, it takes more time to load data to the GPU’s compute cores than it does for those cores to perform LLM computations on that data. The processing itself is super-fast & there is enough & more compute power available. To start with, the training data needs to be downloaded from a remote source to the CPU memory From there, it needs to be transferred to the GPU via the system bus and PCIe bus. The host(CPU)-to-device(GPU) bandwidth is limited by the CPU frequency, PCIe bus, GPU devices & the number of PCIe lanes available. Once the data & weights are in the GPU VRAM, they are then ferried across to the SRAM where the processors perform operations on it. After the operation the data is moved back to the VRAM & from there it is moved back to the CPU RAM. This is a rather simplistic view. Inside the GPU, the tensors are repeatedly moved back and forth between VRAM & SRAM (the memory allocated to an SM). Can you guess why? We saw that SRAM size is in KB so large matrices are not going to fit in there … which explains why there is a constant movement between VRAM which holds all the tensors and SRAM which holds the data on which compute operations are performed. So there is typically a memory-op where tensors are moved from VRAM to SRAM, then a compute-op SRAM and memory-op to move tensors back from SRAM to VRAM. Computations like a matrix multiplication involving 2 large matrices need several such memory + compute ops before the action is completed. During the training of GPT-3, the tensor cores on the GPUs used were found to be idle ~50% of the time. So, to extract the best from the infrastructure, data movement needs to be fast enough to ensure the computation cores are kept reasonably occupied. Surely, there is scope for some smart person to come up with shortcuts. Enter Flash attention & other such hacks. But that is a story for another day! 5. Linking GPUs for LLM training — Topologies While LLM inferencing is manegable with a readymade collection of GPUs such as a DGX server (contains 8 H100s), LLM training needs far more GPUs. Before we discuss how to connect GPUs for larger workloads, it makes sense to see how CPU servers are connected in a datacentre. I am not an expert in this area, so please feel free to point out any incorrect interpretations I may have made from the references I quote. 5.1 Generic concepts on linking processors Each server has a card attached to it called the Network Interface Card (NIC). RDMA technology enables direct memory access to a remote server via the NIC hardware. RoCE (RDMA over Converged Ethernet) protocol uses the RDMA technology & adapts it to Ethernet networks. So now, a server can talk to a remote server over a network. A network switch is a device connecting multiple servers in a network, enabling them to communicate with each other. This is the basic technology. Now let us come to the topology. So we assemble all the servers physically in one place and pile them up vertically them in neat racks.A very basic topology is to connect each server in a rack to a switch that usually sits on Top of the Rack, aptly named the ToR switch. The ToR switches of different racks are connected to a Spine switch. This topology is a basic implementation of Clos topology — named after Charles Clos who invented this scheme to originally arrange telephone nodes in a “leaf-n-spine” arrangement. The leaf switches are nothing but the ToR switches in modern data centers. Source: Fig 1–1 from https://www.oreilly.com/library/view/bgp-in-the/9781491983416/ch01.html Fat tree is a variant of Clos. Like before, we have servers arranged into racks connecting to Top-of-the-Rack (ToR) switches. ToR switches are connected to the aggregation switches to provide connectivity across racks, forming a pod. The pods are interconnected with spine switches, allowing any-to-any communication across servers. To be noted is the fact that there are multiple paths connecting servers. So there is lot of redundancy built-in. In a typical App deployment running hundreds of microservices on dozens of servers, it is useful to have such fully connected, high bandwidth networks. You never know who is going to talk to whom so it never hurts to overprovision on bandwidth and connectivity. However, network loads during AI training do not follow these patterns. They are more predictable & this allows us to build optimized, cheaper & less power-hungry networks. 5.2 Linking GPUs via proprietary technology like NVLink We can strap together H100’s by leveraging the proprietary NVLink & NVSwitch technologies. NVLink provides the high-speed connection between individual GPUs, while NVSwitch is a chip that enables multiple GPUs to communicate through NVLink, forming a high-bandwidth network. See this nice article for details. NVIDIA’s P100 GPU introduced the NVLink1. At that time there was no NVSwitch chip, and the GPUs were connected in a ring-like configuration, which resulted in a lack of direct point-to-point communication between GPUs. The NVSwitch1 chip was introduced with the V100, followed by the NVSwitch2 chip with the A100 GPU. We are in the third-generation NVSwitch3 which can support a cluster of up to 256 H100 GPUs. Each H100 GPU in such a cluster is connected to the internal NVSwitch3 chip through 18 NVLink4.0 connections. This is how trillion parameter LLMs are inferenced. 5.3 Linking GPUs via RoCE in a rail-optimized topology But as they say, ye dil mange more… Meta reportedly trains its newer models on a cluster that’s over 100K H100’s. Phew! How to they manage to link it all up? The standard NVLink tricks can only scale to a limited number of GPUs. Beyond that, we have to use the network topologies discussed earlier & fall back on technologies like RoCE, which allows data to be directly transferred from one GPU’s memory to another without involving the CPU. So you have 8 GPUs in one DGX server. You have several such DGX servers in the data centre. Each GPU is assigned a NIC (yes!) & connected via RDMA to all other GPUs thru’ a variant of Clos network called “rail-optimized network”. The idea here is to set up dedicated connections between groups of GPUs with rail switches. If a GPU wants to communicate with a GPU which is in a different group, then it has to go thru’ the spine switch (which takes a lil more time). To implement this, each GPU in a DGX server is indexed serially. A rail is the set of GPUs with the same index on different servers & these are interconnected with a rail switch via RDMA. These rail switches are subsequently connected to spine switches forming any-to-any GPU network. Source: Fig 1 from https://arxiv.org/pdf/2307.12169 This topology streamlines traffic flow. It is like having dedicated lanes for high speed vehicles instead of generally mixing all traffic together. Rail paths are direct connections between a bunch of GPUs with same index. Spine switches serve as the connecting points for differently-indexed GPUs. For e.g., communication between GPU1 of server 1 and GPU1 of server 2 happens via their dedicated rail switch 1. If GPU1 of server 1 needs to reach GPU5 of another server, it has to go thru’ a spine switch. The workloads are designed so as to minimize data transfers across rails (since it has to go thru’ the extra spine switch). The good news is that this can be neatly done for AI training ensuring that most of the traffic stays within the rails, and does not cut across. In fact, there is a recent paper which suggests that you can consider removing costly spine switches altogether as inter-rail communication is minimal. Can you guess how? 5.4 Linking GPUs via RoCE in a rail-only topology Well, we have the superfast connectivity using NVLink to communicate between a limited set of GPUs (upto 256). So you create these High Bandwith (HB) domains which use NVLink for communication. You have several such HB domains. We then have the same indexing system and rail connections to interconnect the HB domains. But there are no spine switches! Can you guess how GPU1 of HB domain 1 can talk to GPU5 of another HB domain? Yes! Transfer data via superfast NVLink to GPU5 of HB domain 1 first. Then use the dedicated rail of GPU5 to talk to the GPU5 in another HB domain! This is a rail-only topology as oppsed to rail-optimized topology! Given these topologies, we can now plan the training pipeline to have pipeline parallelism, tensor parallelism &/or data parallelism but that is a story for another day. See this, this & this for more details. 100K H100’s consume a LOT of power. Tech companies are exploring nuclear power options to generate clean energy needed for long term sustenance. Else, a 100K GPU cluster may have to be broken down to smaller clusters and connected using optical transceivers across the buildings in a campus. This (unplanned) article is a prelude to — Optimizing LLM inference: Key Faultlines & workarounds. To deeply understand how we can optimize LLM operations, we need to understand more about the silicon on which they are executed. Though there are lots of manuals/guides on individual aspects like memory, processors, networking etc, I couldn’t find a concise and reader-friendly thread linking together these various aspects & hence took a shot. This is the 9th of a 15-series article titled My LLM diaries. LLM Quantization — From concepts to implementation LoRA & its newer variants explained like never before In-Context learning: The greatest magic show in the kingdom of LLMs RAG in plain English — Summary of 100+ papers HNSW — Story of the world’s most popular Vector search algorithm VectorDB origins, Vamana & on-disk Vector search algorithms Taming LLMs — A study of few popular techniques Understanding LLM Agents: Concepts, Patterns & Frameworks Anatomy of a GPU — A peek into the hardware fuelling LLM operations Optimizing LLM Inference — Key Faultlines & workarounds LLM Serving — Architecture considerations LLM evaluation & other odds and ends Look Ma, LLMs without Prompt Engineering LLMs on the laptop — A peek into the Silicon Taking a step back — On model sentience, conscientiousness & other philosophical aspects Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI
    0 Commentarii 0 Distribuiri