• In conflict: Putting Russia’s datacentre market under the microscope

    When Russian troops invaded Ukraine on 24 February 2022, Russia’s datacentre sector was one of the fastest-growing segments of the country’s IT industry, with annual growth rates in the region of 10-12%.
    However, with the conflict resulting in the imposition of Western sanctions against Russia and an outflow of US-based tech companies from the country, including Apple and Microsoft, optimism about the sector’s potential for further growth soon disappeared.
    In early March 2025, it was reported that Google had disconnected from traffic exchange points and datacentres in Russia, leading to concerns about how this could negatively affect the speed of access to some Google services for Russian users.
    Initially, there was hope that domestic technology and datacentre providers might be able to plug the gaps left by the exodus of the US tech giants, but it seems they could not keep up with the hosting demands of Russia’s increasingly digital economy.
    Oleg Kim, director of the hardware systems department at Russian IT company Axoft, says the departure of foreign cloud providers and equipment manufacturers has led to a serious shortage of compute capacity in Russia.
    This is because the situation resulted in a sharp, initial increase in demand for domestic datacentres, but Russian providers simply did not have time to expand their capacities on the required scale, continues Kim.

    According to the estimates of Key Point, one of Russia’s largest datacentre networks, meeting Russia’s demand for datacentres will require facilities with a total capacity of 30,000 racks to be built each year over the next five years.
    On top of this, it has also become more costly to build datacentres in Russia.
    Estimates suggest that prior to 2022, the cost of a datacentre rack totalled 100,000 rubles, but now exceeds 150,000 rubles.
    And analysts at Forbes Russia expect these figures will continue to grow, due to rising logistics costs and the impact the war is having on the availability of skilled labour in the construction sector.
    The impact of these challenges is being keenly felt by users, with several of the country’s large banks experiencing serious problems when finding suitable locations for their datacentres.
    Sberbank is among the firms affected, with its chairperson, German Gref, speaking out previously about how the bank is in need of a datacentre with at least 200MW of capacity, but would ideally need 300-400MW to address its compute requirements.
    Stanislav Bliznyuk, chairperson of T-Bank, says trying to build even two 50MW datacentres to meet its needs is proving problematic. “Finding locations where such capacity and adequate tariffs are available is a difficult task,” he said.

    about datacentre developments

    North Lincolnshire Council has received a planning permission application for another large-scale datacentre development, in support of its bid to become an AI Growth Zone
    A proposal to build one of the biggest datacentres in Europe has been submitted to Hertsmere Borough Council, and already has the support of the technology secretary and local councillors.
    The UK government has unveiled its 50-point AI action plan, which commits to building sovereign artificial intelligence capabilities and accelerating AI datacentre developments – but questions remain about the viability of the plans.

    Despite this, T-Bank is establishing its own network of data processing centres – the first of which should open in early 2027, he confirmed in November 2024.
    Kirill Solyev, head of the engineering infrastructure department of the Softline Group of Companies, who specialise in IT, says many large Russian companies are resorting to building their own datacentres – because compute capacity is in such short supply.
    The situation is, however, complicated by the lack of suitable locations for datacentres in the largest cities of Russia – Moscow and St Petersburg. “For example, to build a datacentre with a capacity of 60MW, finding a suitable site can take up to three years,” says Solyev. “In Moscow, according to preliminary estimates, there are about 50MW of free capacity left, which is equivalent to 2-4 large commercial datacentres.
    “The capacity deficit only in the southern part of the Moscow region is predicted at 564MW by 2030, and up to 3.15GW by 2042.”
    As a result, datacentre operators and investors are now looking for suitable locations outside of Moscow and St Petersburg, and seeking to co-locate new datacentres in close proximity to renewable energy sources.
    And this will be important as demand for datacentre capacity in Russia is expected to increase, as it is in most of the rest of the world, due to the growing use of artificial intelligencetools and services.
    The energy-intensive nature of AI workloads will put further pressure on operators that are already struggling to meet the compute capacity demands of their customers.

    Speaking at the recent Ural Forum on cyber security in finance, Alexander Kraynov, director of AI technology development at Yandex, says solving the energy consumption issue of AI datacentres will not be easy.
    “The world is running out of electricity, including for AI, while the same situation is observed in Russia,” he said. “In order to ensure a stable energy supply of a newly built large datacentre, we will need up to one year.”
    According to a recent report of the Russian Vedomosti business paper, as of April 2024, Russian datacentres have used about 2.6GW, which is equivalent to about 1% of the installed capacity of the Unified Energy System of Russia.
    Accommodating AI workloads will also mean operators will need to purchase additional equipment, including expensive accelerators based on graphic processing units and higher-performing data storage systems.
    The implementation of these plans and the viability of these purchases is likely to be seriously complicated by the current sanctions regime against Russia.
    That said, Russia’s prime minister, Mikhail Mishustin, claims this part of the datacentre supply equation is being partially solved by an uptick in the domestic production of datacentre kit.
    According to the Mishustin, more than half of the server equipment and industrial storage and information processing systems needed for datacentres are already being produced in Russia – and these figures will continue to grow.

    The government also plans to provide additional financial support to the industry, as – to date – building datacentres in Russia has been prevented by relatively long payback periods, of up to 10 years in some cases, of such projects.
    One of the possible support measures on offer could include the subsidisation of at least part of the interest rates on loans to datacentre developers and operators.
    At the same time, though, the government’s actions in other areas have made it harder for operators to build new facilities.
    For example, in March 2025, the Russian government significantly tightened the existing norms for the establishment of new datacentres in the form of new rules for the design of data processing centres, which came into force after the approval by the Russian Ministry of Construction.
    According to Nikita Tsaplin, CEO of Russian hosting provider RUVDS, the rules led to additional bureaucracy in the sector.
    And, according to his predictions, that situation can extend the construction cycle of a datacentre from around five years to seven years.
    The government’s intervention here was to prevent the installation of servers in residential areas, such as garages, but it looks set to complicate an already complex situation – prompting questions about whether Russia’s datacentre market will ever reach its full potential.
    #conflict #putting #russias #datacentre #market
    In conflict: Putting Russia’s datacentre market under the microscope
    When Russian troops invaded Ukraine on 24 February 2022, Russia’s datacentre sector was one of the fastest-growing segments of the country’s IT industry, with annual growth rates in the region of 10-12%. However, with the conflict resulting in the imposition of Western sanctions against Russia and an outflow of US-based tech companies from the country, including Apple and Microsoft, optimism about the sector’s potential for further growth soon disappeared. In early March 2025, it was reported that Google had disconnected from traffic exchange points and datacentres in Russia, leading to concerns about how this could negatively affect the speed of access to some Google services for Russian users. Initially, there was hope that domestic technology and datacentre providers might be able to plug the gaps left by the exodus of the US tech giants, but it seems they could not keep up with the hosting demands of Russia’s increasingly digital economy. Oleg Kim, director of the hardware systems department at Russian IT company Axoft, says the departure of foreign cloud providers and equipment manufacturers has led to a serious shortage of compute capacity in Russia. This is because the situation resulted in a sharp, initial increase in demand for domestic datacentres, but Russian providers simply did not have time to expand their capacities on the required scale, continues Kim. According to the estimates of Key Point, one of Russia’s largest datacentre networks, meeting Russia’s demand for datacentres will require facilities with a total capacity of 30,000 racks to be built each year over the next five years. On top of this, it has also become more costly to build datacentres in Russia. Estimates suggest that prior to 2022, the cost of a datacentre rack totalled 100,000 rubles, but now exceeds 150,000 rubles. And analysts at Forbes Russia expect these figures will continue to grow, due to rising logistics costs and the impact the war is having on the availability of skilled labour in the construction sector. The impact of these challenges is being keenly felt by users, with several of the country’s large banks experiencing serious problems when finding suitable locations for their datacentres. Sberbank is among the firms affected, with its chairperson, German Gref, speaking out previously about how the bank is in need of a datacentre with at least 200MW of capacity, but would ideally need 300-400MW to address its compute requirements. Stanislav Bliznyuk, chairperson of T-Bank, says trying to build even two 50MW datacentres to meet its needs is proving problematic. “Finding locations where such capacity and adequate tariffs are available is a difficult task,” he said. about datacentre developments North Lincolnshire Council has received a planning permission application for another large-scale datacentre development, in support of its bid to become an AI Growth Zone A proposal to build one of the biggest datacentres in Europe has been submitted to Hertsmere Borough Council, and already has the support of the technology secretary and local councillors. The UK government has unveiled its 50-point AI action plan, which commits to building sovereign artificial intelligence capabilities and accelerating AI datacentre developments – but questions remain about the viability of the plans. Despite this, T-Bank is establishing its own network of data processing centres – the first of which should open in early 2027, he confirmed in November 2024. Kirill Solyev, head of the engineering infrastructure department of the Softline Group of Companies, who specialise in IT, says many large Russian companies are resorting to building their own datacentres – because compute capacity is in such short supply. The situation is, however, complicated by the lack of suitable locations for datacentres in the largest cities of Russia – Moscow and St Petersburg. “For example, to build a datacentre with a capacity of 60MW, finding a suitable site can take up to three years,” says Solyev. “In Moscow, according to preliminary estimates, there are about 50MW of free capacity left, which is equivalent to 2-4 large commercial datacentres. “The capacity deficit only in the southern part of the Moscow region is predicted at 564MW by 2030, and up to 3.15GW by 2042.” As a result, datacentre operators and investors are now looking for suitable locations outside of Moscow and St Petersburg, and seeking to co-locate new datacentres in close proximity to renewable energy sources. And this will be important as demand for datacentre capacity in Russia is expected to increase, as it is in most of the rest of the world, due to the growing use of artificial intelligencetools and services. The energy-intensive nature of AI workloads will put further pressure on operators that are already struggling to meet the compute capacity demands of their customers. Speaking at the recent Ural Forum on cyber security in finance, Alexander Kraynov, director of AI technology development at Yandex, says solving the energy consumption issue of AI datacentres will not be easy. “The world is running out of electricity, including for AI, while the same situation is observed in Russia,” he said. “In order to ensure a stable energy supply of a newly built large datacentre, we will need up to one year.” According to a recent report of the Russian Vedomosti business paper, as of April 2024, Russian datacentres have used about 2.6GW, which is equivalent to about 1% of the installed capacity of the Unified Energy System of Russia. Accommodating AI workloads will also mean operators will need to purchase additional equipment, including expensive accelerators based on graphic processing units and higher-performing data storage systems. The implementation of these plans and the viability of these purchases is likely to be seriously complicated by the current sanctions regime against Russia. That said, Russia’s prime minister, Mikhail Mishustin, claims this part of the datacentre supply equation is being partially solved by an uptick in the domestic production of datacentre kit. According to the Mishustin, more than half of the server equipment and industrial storage and information processing systems needed for datacentres are already being produced in Russia – and these figures will continue to grow. The government also plans to provide additional financial support to the industry, as – to date – building datacentres in Russia has been prevented by relatively long payback periods, of up to 10 years in some cases, of such projects. One of the possible support measures on offer could include the subsidisation of at least part of the interest rates on loans to datacentre developers and operators. At the same time, though, the government’s actions in other areas have made it harder for operators to build new facilities. For example, in March 2025, the Russian government significantly tightened the existing norms for the establishment of new datacentres in the form of new rules for the design of data processing centres, which came into force after the approval by the Russian Ministry of Construction. According to Nikita Tsaplin, CEO of Russian hosting provider RUVDS, the rules led to additional bureaucracy in the sector. And, according to his predictions, that situation can extend the construction cycle of a datacentre from around five years to seven years. The government’s intervention here was to prevent the installation of servers in residential areas, such as garages, but it looks set to complicate an already complex situation – prompting questions about whether Russia’s datacentre market will ever reach its full potential. #conflict #putting #russias #datacentre #market
    WWW.COMPUTERWEEKLY.COM
    In conflict: Putting Russia’s datacentre market under the microscope
    When Russian troops invaded Ukraine on 24 February 2022, Russia’s datacentre sector was one of the fastest-growing segments of the country’s IT industry, with annual growth rates in the region of 10-12%. However, with the conflict resulting in the imposition of Western sanctions against Russia and an outflow of US-based tech companies from the country, including Apple and Microsoft, optimism about the sector’s potential for further growth soon disappeared. In early March 2025, it was reported that Google had disconnected from traffic exchange points and datacentres in Russia, leading to concerns about how this could negatively affect the speed of access to some Google services for Russian users. Initially, there was hope that domestic technology and datacentre providers might be able to plug the gaps left by the exodus of the US tech giants, but it seems they could not keep up with the hosting demands of Russia’s increasingly digital economy. Oleg Kim, director of the hardware systems department at Russian IT company Axoft, says the departure of foreign cloud providers and equipment manufacturers has led to a serious shortage of compute capacity in Russia. This is because the situation resulted in a sharp, initial increase in demand for domestic datacentres, but Russian providers simply did not have time to expand their capacities on the required scale, continues Kim. According to the estimates of Key Point, one of Russia’s largest datacentre networks, meeting Russia’s demand for datacentres will require facilities with a total capacity of 30,000 racks to be built each year over the next five years. On top of this, it has also become more costly to build datacentres in Russia. Estimates suggest that prior to 2022, the cost of a datacentre rack totalled 100,000 rubles ($1,200), but now exceeds 150,000 rubles. And analysts at Forbes Russia expect these figures will continue to grow, due to rising logistics costs and the impact the war is having on the availability of skilled labour in the construction sector. The impact of these challenges is being keenly felt by users, with several of the country’s large banks experiencing serious problems when finding suitable locations for their datacentres. Sberbank is among the firms affected, with its chairperson, German Gref, speaking out previously about how the bank is in need of a datacentre with at least 200MW of capacity, but would ideally need 300-400MW to address its compute requirements. Stanislav Bliznyuk, chairperson of T-Bank, says trying to build even two 50MW datacentres to meet its needs is proving problematic. “Finding locations where such capacity and adequate tariffs are available is a difficult task,” he said. Read more about datacentre developments North Lincolnshire Council has received a planning permission application for another large-scale datacentre development, in support of its bid to become an AI Growth Zone A proposal to build one of the biggest datacentres in Europe has been submitted to Hertsmere Borough Council, and already has the support of the technology secretary and local councillors. The UK government has unveiled its 50-point AI action plan, which commits to building sovereign artificial intelligence capabilities and accelerating AI datacentre developments – but questions remain about the viability of the plans. Despite this, T-Bank is establishing its own network of data processing centres – the first of which should open in early 2027, he confirmed in November 2024. Kirill Solyev, head of the engineering infrastructure department of the Softline Group of Companies, who specialise in IT, says many large Russian companies are resorting to building their own datacentres – because compute capacity is in such short supply. The situation is, however, complicated by the lack of suitable locations for datacentres in the largest cities of Russia – Moscow and St Petersburg. “For example, to build a datacentre with a capacity of 60MW, finding a suitable site can take up to three years,” says Solyev. “In Moscow, according to preliminary estimates, there are about 50MW of free capacity left, which is equivalent to 2-4 large commercial datacentres. “The capacity deficit only in the southern part of the Moscow region is predicted at 564MW by 2030, and up to 3.15GW by 2042.” As a result, datacentre operators and investors are now looking for suitable locations outside of Moscow and St Petersburg, and seeking to co-locate new datacentres in close proximity to renewable energy sources. And this will be important as demand for datacentre capacity in Russia is expected to increase, as it is in most of the rest of the world, due to the growing use of artificial intelligence (AI) tools and services. The energy-intensive nature of AI workloads will put further pressure on operators that are already struggling to meet the compute capacity demands of their customers. Speaking at the recent Ural Forum on cyber security in finance, Alexander Kraynov, director of AI technology development at Yandex, says solving the energy consumption issue of AI datacentres will not be easy. “The world is running out of electricity, including for AI, while the same situation is observed in Russia,” he said. “In order to ensure a stable energy supply of a newly built large datacentre, we will need up to one year.” According to a recent report of the Russian Vedomosti business paper, as of April 2024, Russian datacentres have used about 2.6GW, which is equivalent to about 1% of the installed capacity of the Unified Energy System of Russia. Accommodating AI workloads will also mean operators will need to purchase additional equipment, including expensive accelerators based on graphic processing units and higher-performing data storage systems. The implementation of these plans and the viability of these purchases is likely to be seriously complicated by the current sanctions regime against Russia. That said, Russia’s prime minister, Mikhail Mishustin, claims this part of the datacentre supply equation is being partially solved by an uptick in the domestic production of datacentre kit. According to the Mishustin, more than half of the server equipment and industrial storage and information processing systems needed for datacentres are already being produced in Russia – and these figures will continue to grow. The government also plans to provide additional financial support to the industry, as – to date – building datacentres in Russia has been prevented by relatively long payback periods, of up to 10 years in some cases, of such projects. One of the possible support measures on offer could include the subsidisation of at least part of the interest rates on loans to datacentre developers and operators. At the same time, though, the government’s actions in other areas have made it harder for operators to build new facilities. For example, in March 2025, the Russian government significantly tightened the existing norms for the establishment of new datacentres in the form of new rules for the design of data processing centres, which came into force after the approval by the Russian Ministry of Construction. According to Nikita Tsaplin, CEO of Russian hosting provider RUVDS, the rules led to additional bureaucracy in the sector (due to the positioning of datacentres as typical construction objects). And, according to his predictions, that situation can extend the construction cycle of a datacentre from around five years to seven years. The government’s intervention here was to prevent the installation of servers in residential areas, such as garages, but it looks set to complicate an already complex situation – prompting questions about whether Russia’s datacentre market will ever reach its full potential.
    Like
    Love
    Wow
    Sad
    Angry
    631
    0 Commentarii 0 Distribuiri
  • Data embassies and US embargo halt give Saudi AI hope

    Saudi Arabia’sattempt to turn from one of the least to most developed data markets in the world has advanced with measures it and the US have taken to encourage investors to build artificial intelligencedatacentres in the country.
    KSA came closer to finalising plans to treat foreign computer systems as “data embassies”, reassuring firms their customer data would be safely stored in the authoritarian Gulf monarchy. Meanwhile, the US scrapped export controls on its most advanced AI chips, which had threatened to stop KSA from ever realising its plan to become a global leader in AI.
    Those legal preparations bore fruit this week before either was actually enacted, when Nvidia, whose advanced AI chips are the subject of US export controls, said it had done a deal to ship 18,000 of them to the Saudi state-owned Public Investment Fund. The chips were the first stage in a plan to install “several hundred thousand” Nvidia Grace Blackwell AI chips in five years, consuming 500MW of energy.
    Political analysts and industry insiders said, before KSA’s plans unfurled this week, that its proposed Global AI Hub Law would allow KSA to get banned AI chips that both it and foreign firms would need to build AI systems in the country. The draft law offers to give foreign computer systems embassy status, so their operators answered only to the laws of their home nations. It would forbid the Saudi state from intruding.
    KSA concluded a public consultation on the law the day after an Investment Summit, at which US president Donald Trump and Saudi crown prince Mohammed bin Salman Al Saud signed a broad economic partnership and presided over bn of trade deals, the White House said in a statement. They had done bn of deals when the conference opened, and aspired to tn, the prince told the conference on Tuesday. The deals encompassed defence, energy, tech and health.
    The audacity of KSA’s ambition was made apparent by data that in February, according to Computer Weekly analysis, showed how among 20 of the most notable data markets in Europe, the Middle East and Africa, Saudi capital Riyadh had the second-least of all operational, planned and unfinished datacentres, above only Athens.
    With 125MW of computing capacity then planned in Riyadh, it was barely 5% of the forecast size of EMEA market leader London, and not 15% of the size of its rival and neighbour, the United Arab Emirates, according to numbers published by commercial estate agent Cushman & Wakefield. The largest datacentre investment deal apparent, among those announced at the Forum, was Saudi firm DataVolt, investing bn in the US.

    about US and Saudi agreements

    On Monday, the US scrapped the AI Diffusion Rule, by which former president Joe Biden had blocked exports of powerful AI chips to all but a handful of countries because, US AI tzar David Sacks told the conference, it stopped US technology proliferating around the world and stifled strategic partners such as KSA, when it was supposed to hinder AI development in only a few countries.
    The US had decided instead to model AI policy on Silicon Valley’s software ecosystems, where firms became dominant by publishing application programming interfacesthat others could use to build on their technology.
    “They’re able to build these ecosystems without even having any lawyers involved,” said Sacks. “There’s no need for a contract. You just publish an API. In a similar way, the US needs to encourage the world to build on our tech stack.
    “President Trump said ‘the US has to win the AI race’. How do we win the AI race? We have to build the biggest partner ecosystem. We need our friends like the Kingdom of Saudi Arabia, and other strategic partners and allies, to build on our tech.
    “We want our technology to spread,” he said. “We want people to use it. We want to become the standard.”

    KSA’s attempt, meanwhile, to encourage foreign firms to build AI datacentres in the country by allowing their home nations to retain sovereignty over their data was widely commended as a strategic masterstroke.
    “It’s still an immature market, but the opportunity is huge,” said Stephen Beard, a real estate deal-maker for Knight Frank in Dubai. KSA could be a top-seven datacentre market in a decade. His firm estimated US cloud computing firms had recently committed to bn of investment there by 2027.
    Knight Frank alone was handling bn of datacentre deals for firms attracted by the local market opportunity, in a country with 20% lower power costs than the UK, a large, growing population and a non-democratic government able to digitise rapidly without the inconvenience of parliamentary process. President Trump commended KSA’s ruling family for that in a speech in Riyadh this week.
    “The AI Hub law is optically a fantastic move,” said Beard. “It should go some way to appeasing investors’ concerns. But we are talking about Saudi Arabia. Who decides the law in Saudi Arabia? Any developer looks for a higher return because of the macro risks.”
    But computer firms would invest there to serve KSA. The idea of KSA becoming a “super-hub” was flawed.
    Munir Suboh, a lawyer at Taylor Wessing in Riyadh, said the law would give KSA an “unprecedented advantage” over other countries which hesitate to cede sovereignty over foreign facilities. Contrast Saudi Arabia's attempt to make life easier for foreign investors with Europe’s regulatory preoccupation with imposing safety standards.
    “Traditionally, cross-border data transfers require compliance with multiple data localisation regulations, especially in data-heavy industries,” said Oliver Subhedar, a commercial dispute lawyer with Burlingtons. KSA is seeking a comparative advantage over other states by regulating datacentres themselves.”

    KSA would slash the cost of risk and compliance for multinationals that ordinarily had to accommodate a host of different regulations around the world, said Jade Masri, managing director of investment advisory R Consultancy in Dubai. That would cut capital costs for investors.
    “Hyperscalers need this law to import data into KSA to run large language models and generate meaningful AI,” said Amrik Sangha, a consultant with Gateley in Dubai.
    But KSA needed to address the question of “grey” fibre optic cables that would carry foreign data transfers “without monitoring”, he said. Grey, or “dark”, cables are private, point-to-point communications lines not reliant on local connections.
    Notwithstanding the unexpected US U-turn, Juliana Rordorf, Middle East director for political consultancy Albright Stonebridge Group, said the law might influence the global debate about data localisation, as well as AI export controls.
    Neighbouring Bahrain has had a data embassy law since 2018, while UAE, whose datacentre market and planned construction dwarfs that of KSA, recently made bilateral data embassy agreements with France and Italy.
    Such a law has even been mooted as a way to encourage investors deterred by Europe’s onerous data protection rules, having been pioneered in Estonia, and aped in Monaco, as a means of securing government backup datacentres in Luxembourg, because they otherwise had nowhere to put them safely.
    #data #embassies #embargo #halt #give
    Data embassies and US embargo halt give Saudi AI hope
    Saudi Arabia’sattempt to turn from one of the least to most developed data markets in the world has advanced with measures it and the US have taken to encourage investors to build artificial intelligencedatacentres in the country. KSA came closer to finalising plans to treat foreign computer systems as “data embassies”, reassuring firms their customer data would be safely stored in the authoritarian Gulf monarchy. Meanwhile, the US scrapped export controls on its most advanced AI chips, which had threatened to stop KSA from ever realising its plan to become a global leader in AI. Those legal preparations bore fruit this week before either was actually enacted, when Nvidia, whose advanced AI chips are the subject of US export controls, said it had done a deal to ship 18,000 of them to the Saudi state-owned Public Investment Fund. The chips were the first stage in a plan to install “several hundred thousand” Nvidia Grace Blackwell AI chips in five years, consuming 500MW of energy. Political analysts and industry insiders said, before KSA’s plans unfurled this week, that its proposed Global AI Hub Law would allow KSA to get banned AI chips that both it and foreign firms would need to build AI systems in the country. The draft law offers to give foreign computer systems embassy status, so their operators answered only to the laws of their home nations. It would forbid the Saudi state from intruding. KSA concluded a public consultation on the law the day after an Investment Summit, at which US president Donald Trump and Saudi crown prince Mohammed bin Salman Al Saud signed a broad economic partnership and presided over bn of trade deals, the White House said in a statement. They had done bn of deals when the conference opened, and aspired to tn, the prince told the conference on Tuesday. The deals encompassed defence, energy, tech and health. The audacity of KSA’s ambition was made apparent by data that in February, according to Computer Weekly analysis, showed how among 20 of the most notable data markets in Europe, the Middle East and Africa, Saudi capital Riyadh had the second-least of all operational, planned and unfinished datacentres, above only Athens. With 125MW of computing capacity then planned in Riyadh, it was barely 5% of the forecast size of EMEA market leader London, and not 15% of the size of its rival and neighbour, the United Arab Emirates, according to numbers published by commercial estate agent Cushman & Wakefield. The largest datacentre investment deal apparent, among those announced at the Forum, was Saudi firm DataVolt, investing bn in the US. about US and Saudi agreements On Monday, the US scrapped the AI Diffusion Rule, by which former president Joe Biden had blocked exports of powerful AI chips to all but a handful of countries because, US AI tzar David Sacks told the conference, it stopped US technology proliferating around the world and stifled strategic partners such as KSA, when it was supposed to hinder AI development in only a few countries. The US had decided instead to model AI policy on Silicon Valley’s software ecosystems, where firms became dominant by publishing application programming interfacesthat others could use to build on their technology. “They’re able to build these ecosystems without even having any lawyers involved,” said Sacks. “There’s no need for a contract. You just publish an API. In a similar way, the US needs to encourage the world to build on our tech stack. “President Trump said ‘the US has to win the AI race’. How do we win the AI race? We have to build the biggest partner ecosystem. We need our friends like the Kingdom of Saudi Arabia, and other strategic partners and allies, to build on our tech. “We want our technology to spread,” he said. “We want people to use it. We want to become the standard.” KSA’s attempt, meanwhile, to encourage foreign firms to build AI datacentres in the country by allowing their home nations to retain sovereignty over their data was widely commended as a strategic masterstroke. “It’s still an immature market, but the opportunity is huge,” said Stephen Beard, a real estate deal-maker for Knight Frank in Dubai. KSA could be a top-seven datacentre market in a decade. His firm estimated US cloud computing firms had recently committed to bn of investment there by 2027. Knight Frank alone was handling bn of datacentre deals for firms attracted by the local market opportunity, in a country with 20% lower power costs than the UK, a large, growing population and a non-democratic government able to digitise rapidly without the inconvenience of parliamentary process. President Trump commended KSA’s ruling family for that in a speech in Riyadh this week. “The AI Hub law is optically a fantastic move,” said Beard. “It should go some way to appeasing investors’ concerns. But we are talking about Saudi Arabia. Who decides the law in Saudi Arabia? Any developer looks for a higher return because of the macro risks.” But computer firms would invest there to serve KSA. The idea of KSA becoming a “super-hub” was flawed. Munir Suboh, a lawyer at Taylor Wessing in Riyadh, said the law would give KSA an “unprecedented advantage” over other countries which hesitate to cede sovereignty over foreign facilities. Contrast Saudi Arabia's attempt to make life easier for foreign investors with Europe’s regulatory preoccupation with imposing safety standards. “Traditionally, cross-border data transfers require compliance with multiple data localisation regulations, especially in data-heavy industries,” said Oliver Subhedar, a commercial dispute lawyer with Burlingtons. KSA is seeking a comparative advantage over other states by regulating datacentres themselves.” KSA would slash the cost of risk and compliance for multinationals that ordinarily had to accommodate a host of different regulations around the world, said Jade Masri, managing director of investment advisory R Consultancy in Dubai. That would cut capital costs for investors. “Hyperscalers need this law to import data into KSA to run large language models and generate meaningful AI,” said Amrik Sangha, a consultant with Gateley in Dubai. But KSA needed to address the question of “grey” fibre optic cables that would carry foreign data transfers “without monitoring”, he said. Grey, or “dark”, cables are private, point-to-point communications lines not reliant on local connections. Notwithstanding the unexpected US U-turn, Juliana Rordorf, Middle East director for political consultancy Albright Stonebridge Group, said the law might influence the global debate about data localisation, as well as AI export controls. Neighbouring Bahrain has had a data embassy law since 2018, while UAE, whose datacentre market and planned construction dwarfs that of KSA, recently made bilateral data embassy agreements with France and Italy. Such a law has even been mooted as a way to encourage investors deterred by Europe’s onerous data protection rules, having been pioneered in Estonia, and aped in Monaco, as a means of securing government backup datacentres in Luxembourg, because they otherwise had nowhere to put them safely. #data #embassies #embargo #halt #give
    WWW.COMPUTERWEEKLY.COM
    Data embassies and US embargo halt give Saudi AI hope
    Saudi Arabia’s (KSA’s) attempt to turn from one of the least to most developed data markets in the world has advanced with measures it and the US have taken to encourage investors to build artificial intelligence (AI) datacentres in the country. KSA came closer to finalising plans to treat foreign computer systems as “data embassies”, reassuring firms their customer data would be safely stored in the authoritarian Gulf monarchy. Meanwhile, the US scrapped export controls on its most advanced AI chips, which had threatened to stop KSA from ever realising its plan to become a global leader in AI. Those legal preparations bore fruit this week before either was actually enacted, when Nvidia, whose advanced AI chips are the subject of US export controls, said it had done a deal to ship 18,000 of them to the Saudi state-owned Public Investment Fund. The chips were the first stage in a plan to install “several hundred thousand” Nvidia Grace Blackwell AI chips in five years, consuming 500MW of energy. Political analysts and industry insiders said, before KSA’s plans unfurled this week, that its proposed Global AI Hub Law would allow KSA to get banned AI chips that both it and foreign firms would need to build AI systems in the country. The draft law offers to give foreign computer systems embassy status, so their operators answered only to the laws of their home nations. It would forbid the Saudi state from intruding. KSA concluded a public consultation on the law the day after an Investment Summit, at which US president Donald Trump and Saudi crown prince Mohammed bin Salman Al Saud signed a broad economic partnership and presided over $600bn of trade deals, the White House said in a statement. They had done $300bn of deals when the conference opened, and aspired to $1tn, the prince told the conference on Tuesday. The deals encompassed defence, energy, tech and health. The audacity of KSA’s ambition was made apparent by data that in February, according to Computer Weekly analysis, showed how among 20 of the most notable data markets in Europe, the Middle East and Africa (EMEA), Saudi capital Riyadh had the second-least of all operational, planned and unfinished datacentres, above only Athens. With 125MW of computing capacity then planned in Riyadh, it was barely 5% of the forecast size of EMEA market leader London, and not 15% of the size of its rival and neighbour, the United Arab Emirates, according to numbers published by commercial estate agent Cushman & Wakefield. The largest datacentre investment deal apparent, among those announced at the Forum, was Saudi firm DataVolt, investing $20bn in the US. Read more about US and Saudi agreements On Monday, the US scrapped the AI Diffusion Rule, by which former president Joe Biden had blocked exports of powerful AI chips to all but a handful of countries because, US AI tzar David Sacks told the conference, it stopped US technology proliferating around the world and stifled strategic partners such as KSA, when it was supposed to hinder AI development in only a few countries. The US had decided instead to model AI policy on Silicon Valley’s software ecosystems, where firms became dominant by publishing application programming interfaces (APIs) that others could use to build on their technology. “They’re able to build these ecosystems without even having any lawyers involved,” said Sacks. “There’s no need for a contract. You just publish an API. In a similar way, the US needs to encourage the world to build on our tech stack. “President Trump said ‘the US has to win the AI race’. How do we win the AI race? We have to build the biggest partner ecosystem. We need our friends like the Kingdom of Saudi Arabia, and other strategic partners and allies, to build on our tech. “We want our technology to spread,” he said. “We want people to use it. We want to become the standard.” KSA’s attempt, meanwhile, to encourage foreign firms to build AI datacentres in the country by allowing their home nations to retain sovereignty over their data was widely commended as a strategic masterstroke. “It’s still an immature market, but the opportunity is huge,” said Stephen Beard, a real estate deal-maker for Knight Frank in Dubai. KSA could be a top-seven datacentre market in a decade. His firm estimated US cloud computing firms had recently committed to $9bn of investment there by 2027. Knight Frank alone was handling $7bn of datacentre deals for firms attracted by the local market opportunity, in a country with 20% lower power costs than the UK, a large, growing population and a non-democratic government able to digitise rapidly without the inconvenience of parliamentary process. President Trump commended KSA’s ruling family for that in a speech in Riyadh this week. “The AI Hub law is optically a fantastic move,” said Beard. “It should go some way to appeasing investors’ concerns. But we are talking about Saudi Arabia. Who decides the law in Saudi Arabia? Any developer looks for a higher return because of the macro risks.” But computer firms would invest there to serve KSA. The idea of KSA becoming a “super-hub” was flawed. Munir Suboh, a lawyer at Taylor Wessing in Riyadh, said the law would give KSA an “unprecedented advantage” over other countries which hesitate to cede sovereignty over foreign facilities. Contrast Saudi Arabia's attempt to make life easier for foreign investors with Europe’s regulatory preoccupation with imposing safety standards. “Traditionally, cross-border data transfers require compliance with multiple data localisation regulations, especially in data-heavy industries,” said Oliver Subhedar, a commercial dispute lawyer with Burlingtons. KSA is seeking a comparative advantage over other states by regulating datacentres themselves.” KSA would slash the cost of risk and compliance for multinationals that ordinarily had to accommodate a host of different regulations around the world, said Jade Masri, managing director of investment advisory R Consultancy in Dubai. That would cut capital costs for investors. “Hyperscalers need this law to import data into KSA to run large language models and generate meaningful AI,” said Amrik Sangha, a consultant with Gateley in Dubai. But KSA needed to address the question of “grey” fibre optic cables that would carry foreign data transfers “without monitoring”, he said. Grey, or “dark”, cables are private, point-to-point communications lines not reliant on local connections. Notwithstanding the unexpected US U-turn, Juliana Rordorf, Middle East director for political consultancy Albright Stonebridge Group, said the law might influence the global debate about data localisation, as well as AI export controls. Neighbouring Bahrain has had a data embassy law since 2018, while UAE, whose datacentre market and planned construction dwarfs that of KSA, recently made bilateral data embassy agreements with France and Italy. Such a law has even been mooted as a way to encourage investors deterred by Europe’s onerous data protection rules, having been pioneered in Estonia, and aped in Monaco, as a means of securing government backup datacentres in Luxembourg, because they otherwise had nowhere to put them safely.
    0 Commentarii 0 Distribuiri
  • Microsoft’s wartime pact with the EU rings hollow - and could spell trouble for UK IT buyers

    charles taylor - stock.adobe.com

    Opinion

    Microsoft’s wartime pact with the EU rings hollow - and could spell trouble for UK IT buyers
    Microsoft has moved to assure its European customers that it will fight any attempt by President Trump to disrupt their ability to access its services, but can UK customers take the company at its word?

    By

    Owen Sayers,
    Secon Solutions

    Published: 20 May 2025

    Microsoft has moved to re-assure governments across the European Unionthat it will fight any move by President Trump to interrupt services, should relations between Brussels and Washington deteriorate further, causing the United States to make cloud services a gambling chit in a trade war. 
    The company promised Europe it would “promptly and vigorously contest such a measure,” yet its overtures to EU customers should leave its governments confused, rather than assured.
    Its tacit recognition that the much-vaunted Microsoft EU Boundary might provide its European customers with little real protection from US interventions – despite taking over two years to implement – cannot be glossed over. 
    The issue in play is not just that Microsoft, being a US-headquartered ‘communication services provider’, is subject to many American laws that make transfers of data to them a complicated process. But that they might no longer be able to give guarantees of continuity of service if, for example, President Trump should wake up one morning and decide to order them to cease their EU operations.
    This admission, coming on top of well documented Microsoft global service outages and serial security compromises in recent years, will almost certainly stoke any fires of concern. 
    Such a possibility might have seemed remote just a few months ago, but Trump’s recent search for effective levers to exert control over other countries as part of his “America First” initiative means that what was previously a low risk, with negligible likelihood but massive impact, is now much more likely and could perhaps even be proximate.
    Microsoft president Brad Smith clearly agrees, as evidenced by his new comments about pre-emptive measures and offsets. 
    Certainly, that is the view many readers will hold after Smith’s somewhat clumsily presented efforts to calm the cloud market that he admits drives 25% of Microsoft’s global revenues, and is clearly important for him to protect.
    Telling foreign leaders that Microsoft is embarking on a crusade of change, as he repeatedly did, flies in the face of his attempts to thwart regulatory interventions regarding the software giant’s restrictive cloud licensing practices and grudging moves to unbundle its software from its operating system in the European Union.
    They might also consider that previous positive approaches and engagements from Microsoft, and even directly from Smith, have not prevented them from levelling blame at the EU when things go wrong. As the company did during the wholly EU-unrelated Crowdstrike-initiated global service outage.
    At best, the historic relationship between Microsoft and European leaders has been spasmodic, and it would be understandable if they take these new assurances with a huge pinch of salt.
    Microsoft now plans to further address these new risks through the creation of a new EU board to manage its expanded datacentre estates in Europe, whilst ignoring that ‘branch-office’ management does not in fact change the nature of its US-centric operations, and nor can they prevent the effects of any Presidential diktat.
    Time and again Microsoft has attempted to address legitimate consumer and EU-member government concerns through measures that are presented as forward-thinking and positive, but have zero effective benefit when analysed.
    For example, UK FOI disclosures made in June 2024 confirmed the long-held suspicion that Microsoft is dependent on the ability to process data globally wherever they choose for both their Azure and Microsoft 365 cloud service families, and this – not layers of localised senior execs – is at the root of their problems.
    Due to their global operating model any EU board managing datacentres will have no practical ability to technically or legally protect EU data from a US Government choosing to exert its entirely legitimate, if controversial, control over them and the European data they manage.
    What should cause immediate concern for the UK is that these overtures to the EU do not consider the UK at all – because it lies entirely outside of the Microsoft EU Data Boundary, and doesn’t appear to be included in these new promises either.
    Should companies with a foot in both UK and Europe decide the protections offered up by Microsoft are indeed effective they may therefore need to re-locate their data and workloads to benefit from them, and history suggests that where the work shifts, so invariably do the key jobs.
    Microsoft, in any event, appears to feel that such efforts in the UK are unnecessary given the level of dependency the UK government already has on the Seattle tech giant, whether that be in the workings of the civil service, the NHS or national infrastructure.
    A view that can only have been solidified by the key positions the government has given over to Microsoft executives to effectively steer the UK’s national technology strategy.
    This is what should be the attention-grabber for Microsoft’s UK customers today; not that Microsoft is making big commitments and high-profile promises to the EU, but that the tech giant no longer feels the need to do the same for its UK operations and, as a result, UK consumers and companies can expect to suffer as a result.
    With their choices limited and their data access subject to the whims of foreign powers, with a government too dependent on Microsoft to put up a fight.

    about Microsoft cloud

    Microsoft makes ‘digital commitments’ amid pledge to continue growing its European datacentre footprint, in the face of growing geopolitical uncertainty
    Microsoft pushes back on analyst claims its changing relationship with OpenAI is forcing it to scale back its datacentre expansion plans in the US and Europe.

    In The Current Issue:

    UK critical systems at risk from ‘digital divide’ created by AI threats
    UK at risk of Russian cyber and physical attacks as Ukraine seeks peace deal
    Standard Chartered grounds AI ambitions in data governance

    Download Current Issue

    Starburst chews into the fruits of agentic
    – CW Developer Network

    Calm settles over digital identity market - for now...– Computer Weekly Editors Blog

    View All Blogs
    #microsofts #wartime #pact #with #rings
    Microsoft’s wartime pact with the EU rings hollow - and could spell trouble for UK IT buyers
    charles taylor - stock.adobe.com Opinion Microsoft’s wartime pact with the EU rings hollow - and could spell trouble for UK IT buyers Microsoft has moved to assure its European customers that it will fight any attempt by President Trump to disrupt their ability to access its services, but can UK customers take the company at its word? By Owen Sayers, Secon Solutions Published: 20 May 2025 Microsoft has moved to re-assure governments across the European Unionthat it will fight any move by President Trump to interrupt services, should relations between Brussels and Washington deteriorate further, causing the United States to make cloud services a gambling chit in a trade war.  The company promised Europe it would “promptly and vigorously contest such a measure,” yet its overtures to EU customers should leave its governments confused, rather than assured. Its tacit recognition that the much-vaunted Microsoft EU Boundary might provide its European customers with little real protection from US interventions – despite taking over two years to implement – cannot be glossed over.  The issue in play is not just that Microsoft, being a US-headquartered ‘communication services provider’, is subject to many American laws that make transfers of data to them a complicated process. But that they might no longer be able to give guarantees of continuity of service if, for example, President Trump should wake up one morning and decide to order them to cease their EU operations. This admission, coming on top of well documented Microsoft global service outages and serial security compromises in recent years, will almost certainly stoke any fires of concern.  Such a possibility might have seemed remote just a few months ago, but Trump’s recent search for effective levers to exert control over other countries as part of his “America First” initiative means that what was previously a low risk, with negligible likelihood but massive impact, is now much more likely and could perhaps even be proximate. Microsoft president Brad Smith clearly agrees, as evidenced by his new comments about pre-emptive measures and offsets.  Certainly, that is the view many readers will hold after Smith’s somewhat clumsily presented efforts to calm the cloud market that he admits drives 25% of Microsoft’s global revenues, and is clearly important for him to protect. Telling foreign leaders that Microsoft is embarking on a crusade of change, as he repeatedly did, flies in the face of his attempts to thwart regulatory interventions regarding the software giant’s restrictive cloud licensing practices and grudging moves to unbundle its software from its operating system in the European Union. They might also consider that previous positive approaches and engagements from Microsoft, and even directly from Smith, have not prevented them from levelling blame at the EU when things go wrong. As the company did during the wholly EU-unrelated Crowdstrike-initiated global service outage. At best, the historic relationship between Microsoft and European leaders has been spasmodic, and it would be understandable if they take these new assurances with a huge pinch of salt. Microsoft now plans to further address these new risks through the creation of a new EU board to manage its expanded datacentre estates in Europe, whilst ignoring that ‘branch-office’ management does not in fact change the nature of its US-centric operations, and nor can they prevent the effects of any Presidential diktat. Time and again Microsoft has attempted to address legitimate consumer and EU-member government concerns through measures that are presented as forward-thinking and positive, but have zero effective benefit when analysed. For example, UK FOI disclosures made in June 2024 confirmed the long-held suspicion that Microsoft is dependent on the ability to process data globally wherever they choose for both their Azure and Microsoft 365 cloud service families, and this – not layers of localised senior execs – is at the root of their problems. Due to their global operating model any EU board managing datacentres will have no practical ability to technically or legally protect EU data from a US Government choosing to exert its entirely legitimate, if controversial, control over them and the European data they manage. What should cause immediate concern for the UK is that these overtures to the EU do not consider the UK at all – because it lies entirely outside of the Microsoft EU Data Boundary, and doesn’t appear to be included in these new promises either. Should companies with a foot in both UK and Europe decide the protections offered up by Microsoft are indeed effective they may therefore need to re-locate their data and workloads to benefit from them, and history suggests that where the work shifts, so invariably do the key jobs. Microsoft, in any event, appears to feel that such efforts in the UK are unnecessary given the level of dependency the UK government already has on the Seattle tech giant, whether that be in the workings of the civil service, the NHS or national infrastructure. A view that can only have been solidified by the key positions the government has given over to Microsoft executives to effectively steer the UK’s national technology strategy. This is what should be the attention-grabber for Microsoft’s UK customers today; not that Microsoft is making big commitments and high-profile promises to the EU, but that the tech giant no longer feels the need to do the same for its UK operations and, as a result, UK consumers and companies can expect to suffer as a result. With their choices limited and their data access subject to the whims of foreign powers, with a government too dependent on Microsoft to put up a fight. about Microsoft cloud Microsoft makes ‘digital commitments’ amid pledge to continue growing its European datacentre footprint, in the face of growing geopolitical uncertainty Microsoft pushes back on analyst claims its changing relationship with OpenAI is forcing it to scale back its datacentre expansion plans in the US and Europe. In The Current Issue: UK critical systems at risk from ‘digital divide’ created by AI threats UK at risk of Russian cyber and physical attacks as Ukraine seeks peace deal Standard Chartered grounds AI ambitions in data governance Download Current Issue Starburst chews into the fruits of agentic – CW Developer Network Calm settles over digital identity market - for now...– Computer Weekly Editors Blog View All Blogs #microsofts #wartime #pact #with #rings
    WWW.COMPUTERWEEKLY.COM
    Microsoft’s wartime pact with the EU rings hollow - and could spell trouble for UK IT buyers
    charles taylor - stock.adobe.com Opinion Microsoft’s wartime pact with the EU rings hollow - and could spell trouble for UK IT buyers Microsoft has moved to assure its European customers that it will fight any attempt by President Trump to disrupt their ability to access its services, but can UK customers take the company at its word? By Owen Sayers, Secon Solutions Published: 20 May 2025 Microsoft has moved to re-assure governments across the European Union (EU) that it will fight any move by President Trump to interrupt services, should relations between Brussels and Washington deteriorate further, causing the United States to make cloud services a gambling chit in a trade war.  The company promised Europe it would “promptly and vigorously contest such a measure,” yet its overtures to EU customers should leave its governments confused, rather than assured. Its tacit recognition that the much-vaunted Microsoft EU Boundary might provide its European customers with little real protection from US interventions – despite taking over two years to implement – cannot be glossed over.  The issue in play is not just that Microsoft, being a US-headquartered ‘communication services provider’, is subject to many American laws that make transfers of data to them a complicated process. But that they might no longer be able to give guarantees of continuity of service if, for example, President Trump should wake up one morning and decide to order them to cease their EU operations. This admission, coming on top of well documented Microsoft global service outages and serial security compromises in recent years, will almost certainly stoke any fires of concern.  Such a possibility might have seemed remote just a few months ago, but Trump’s recent search for effective levers to exert control over other countries as part of his “America First” initiative means that what was previously a low risk, with negligible likelihood but massive impact, is now much more likely and could perhaps even be proximate. Microsoft president Brad Smith clearly agrees, as evidenced by his new comments about pre-emptive measures and offsets.  Certainly, that is the view many readers will hold after Smith’s somewhat clumsily presented efforts to calm the cloud market that he admits drives 25% of Microsoft’s global revenues, and is clearly important for him to protect. Telling foreign leaders that Microsoft is embarking on a crusade of change, as he repeatedly did, flies in the face of his attempts to thwart regulatory interventions regarding the software giant’s restrictive cloud licensing practices and grudging moves to unbundle its software from its operating system in the European Union. They might also consider that previous positive approaches and engagements from Microsoft, and even directly from Smith, have not prevented them from levelling blame at the EU when things go wrong. As the company did during the wholly EU-unrelated Crowdstrike-initiated global service outage. At best, the historic relationship between Microsoft and European leaders has been spasmodic, and it would be understandable if they take these new assurances with a huge pinch of salt. Microsoft now plans to further address these new risks through the creation of a new EU board to manage its expanded datacentre estates in Europe, whilst ignoring that ‘branch-office’ management does not in fact change the nature of its US-centric operations, and nor can they prevent the effects of any Presidential diktat. Time and again Microsoft has attempted to address legitimate consumer and EU-member government concerns through measures that are presented as forward-thinking and positive, but have zero effective benefit when analysed. For example, UK FOI disclosures made in June 2024 confirmed the long-held suspicion that Microsoft is dependent on the ability to process data globally wherever they choose for both their Azure and Microsoft 365 cloud service families, and this – not layers of localised senior execs – is at the root of their problems. Due to their global operating model any EU board managing datacentres will have no practical ability to technically or legally protect EU data from a US Government choosing to exert its entirely legitimate, if controversial, control over them and the European data they manage. What should cause immediate concern for the UK is that these overtures to the EU do not consider the UK at all – because it lies entirely outside of the Microsoft EU Data Boundary, and doesn’t appear to be included in these new promises either. Should companies with a foot in both UK and Europe decide the protections offered up by Microsoft are indeed effective they may therefore need to re-locate their data and workloads to benefit from them, and history suggests that where the work shifts, so invariably do the key jobs. Microsoft, in any event, appears to feel that such efforts in the UK are unnecessary given the level of dependency the UK government already has on the Seattle tech giant, whether that be in the workings of the civil service, the NHS or national infrastructure. A view that can only have been solidified by the key positions the government has given over to Microsoft executives to effectively steer the UK’s national technology strategy. This is what should be the attention-grabber for Microsoft’s UK customers today; not that Microsoft is making big commitments and high-profile promises to the EU, but that the tech giant no longer feels the need to do the same for its UK operations and, as a result, UK consumers and companies can expect to suffer as a result. With their choices limited and their data access subject to the whims of foreign powers, with a government too dependent on Microsoft to put up a fight. Read more about Microsoft cloud Microsoft makes ‘digital commitments’ amid pledge to continue growing its European datacentre footprint, in the face of growing geopolitical uncertainty Microsoft pushes back on analyst claims its changing relationship with OpenAI is forcing it to scale back its datacentre expansion plans in the US and Europe. In The Current Issue: UK critical systems at risk from ‘digital divide’ created by AI threats UK at risk of Russian cyber and physical attacks as Ukraine seeks peace deal Standard Chartered grounds AI ambitions in data governance Download Current Issue Starburst chews into the fruits of agentic – CW Developer Network Calm settles over digital identity market - for now... (Hark, is that Big Tech on the horizon?) – Computer Weekly Editors Blog View All Blogs
    0 Commentarii 0 Distribuiri
  • Tackling the UK’s cyber threats

    Thank you for joining!
    Access your Pro+ Content below.

    20 May 2025
    Tackling the UK’s cyber threats

    In this week’s Computer Weekly, we report from the National Cyber Security Centre’s annual update on the state of UK security and examine the emerging threats. The chief data officer of Standard Chartered bank discusses preparing for artificial intelligence. We also look at the networking implications of GPU-based AI datacentres. Read the issue now.

    Features
    in this issue

    UK critical systems at risk from ‘digital divide’ created by AI threats

    by 
    Bill Goodwin

    GCHQ’s National Cyber Security Centre warns that a growing ‘digital divide’ between organisations that can keep pace with AI-enabled threats and those that cannot is set to heighten the UK's overall cyber risk

    UK at risk of Russian cyber and physical attacks as Ukraine seeks peace deal

    by 
    Bill Goodwin

    UK cyber security chief warns of ‘direct connection’ between Russian cyber attacks and physical threats to the UK

    Standard Chartered grounds AI ambitions in data governance

    by 
    Aaron Tan

    The bank’s group chief data officer, Mohammed Rahim, outlines how the bank is modernising its data infrastructure and governance practices to support its AI initiatives

    View Computer Weekly Archives

    Next Issue 

    More CW+ Content
    View All

    E-Handbook
    Top 10 ASEAN IT stories of 2019
    #tackling #uks #cyber #threats
    Tackling the UK’s cyber threats
    Thank you for joining! Access your Pro+ Content below. 20 May 2025 Tackling the UK’s cyber threats In this week’s Computer Weekly, we report from the National Cyber Security Centre’s annual update on the state of UK security and examine the emerging threats. The chief data officer of Standard Chartered bank discusses preparing for artificial intelligence. We also look at the networking implications of GPU-based AI datacentres. Read the issue now. Features in this issue UK critical systems at risk from ‘digital divide’ created by AI threats by  Bill Goodwin GCHQ’s National Cyber Security Centre warns that a growing ‘digital divide’ between organisations that can keep pace with AI-enabled threats and those that cannot is set to heighten the UK's overall cyber risk UK at risk of Russian cyber and physical attacks as Ukraine seeks peace deal by  Bill Goodwin UK cyber security chief warns of ‘direct connection’ between Russian cyber attacks and physical threats to the UK Standard Chartered grounds AI ambitions in data governance by  Aaron Tan The bank’s group chief data officer, Mohammed Rahim, outlines how the bank is modernising its data infrastructure and governance practices to support its AI initiatives View Computer Weekly Archives Next Issue  More CW+ Content View All E-Handbook Top 10 ASEAN IT stories of 2019 #tackling #uks #cyber #threats
    WWW.COMPUTERWEEKLY.COM
    Tackling the UK’s cyber threats
    Thank you for joining! Access your Pro+ Content below. 20 May 2025 Tackling the UK’s cyber threats In this week’s Computer Weekly, we report from the National Cyber Security Centre’s annual update on the state of UK security and examine the emerging threats. The chief data officer of Standard Chartered bank discusses preparing for artificial intelligence. We also look at the networking implications of GPU-based AI datacentres. Read the issue now. Features in this issue UK critical systems at risk from ‘digital divide’ created by AI threats by  Bill Goodwin GCHQ’s National Cyber Security Centre warns that a growing ‘digital divide’ between organisations that can keep pace with AI-enabled threats and those that cannot is set to heighten the UK's overall cyber risk UK at risk of Russian cyber and physical attacks as Ukraine seeks peace deal by  Bill Goodwin UK cyber security chief warns of ‘direct connection’ between Russian cyber attacks and physical threats to the UK Standard Chartered grounds AI ambitions in data governance by  Aaron Tan The bank’s group chief data officer, Mohammed Rahim, outlines how the bank is modernising its data infrastructure and governance practices to support its AI initiatives View Computer Weekly Archives Next Issue  More CW+ Content View All E-Handbook Top 10 ASEAN IT stories of 2019
    0 Commentarii 0 Distribuiri
  • Trump visit bolsters Saudi AI

    Saudi Arabia’s artificial intelligenceambitions were boosted following US president Donald Trump’s recent trip to the Middle East.
    The Trump administration’s import tariffs and AI chip export restrictions appear to have been behind the decision by Saudi Arabian datacentre operator DataVolt to sign a major server deal with US firm Supermicro. The company’s selection of the US firm for its bn investment in AI datacentre technology will secure graphics processing unithardware currently seen as the best way to provide the accelerated processing power AI workloads require.
    “Partnering with Supermicro guarantees us a US-made supply chain for critical GPU systems and positions DataVolt to accelerate our investment plans,” said Rajit Nanda, CEO of DataVolt.
    Supermicro said the collaboration would fast-track delivery of its ultra-dense GPU platforms, storage and rack-based systems for DataVolt’s hyperscale gigawatt-class renewable and net-zero green AI datacentre facilities. 
    The hardware uses liquid cooling, which Supermicro said reduces power costs by up to 40%, which allows datacentres to run more efficiently with lower power usage effectiveness.
    “We are excited to collaborate with DataVolt to bring our advanced AI systems featuring the latest direct liquid cooling technology, powered by local renewable, sustainable and net-zero green technology,” said Charles Liang, president and CEO of Supermicro.
    Supermicro claims its latest AI infrastructure is able to lower the total cost of ownership by up to 20%.

    Along with its Saudi AI datacentre build, Datavolt, Google, Oracle, Salesforce, AMD and Uber have said they are committed to the investment of bn in cutting-edge transformative technologies in both the US and Saudi Arabia.
    Oracle said it plans to invest bn over the next 10 years to deliver cloud and AI technology to Saudi Arabia. “Thanks to the decisive actions and strong leadership of President Trump and his administration, Oracle is providing the world’s most advanced cloud and AI technology to Saudi Arabia,” said Safra Catz, CEO at Oracle. “Our expanded partnership with the kingdom will create new opportunities for its economy and deliver better health outcomes for its people.”
    As part of Trump’s visit, Humain, a subsidiary of Saudi Arabia’s Public Investment Fund, also announced a bn deal with US chipmaker AMD to develop 500 megawatts of open, scalable, resilient and cost-efficient AI infrastructure that spans Saudi Arabia and the US.
    Through the collaboration, Humain will oversee end-to-end delivery, including hyperscale datacentres, sustainable power systems and global fibre interconnects, while AMD has said it will be providing its AI compute processors and the AMD ROCm open software ecosystem.
    Tareq Amin, CEO of Humain, described the initiative as “an open collaboration to the world’s innovators”. “We are democratising AI at the compute level, ensuring that access to advanced AI is limited only by imagination, not by infrastructure,” he said.

    about AI in Saudi Arabia

    Saudi puts bn into AI as experts debate next steps: The kingdom’s Leap 2025 tech show is the backdrop for huge investment, plus debate over the future of artificial intelligence as a productivity tool but which can also potentially undermine human society.
    Saudi Arabia struggling to reach global leadership in deeptech: Petrostate monarchy trying to build surrogate industry made of foreign startups because own ecosystem is too immature.

    A number of US tech firms have made major investments in Saudi this year, driven by the kingdom’s ambitions to boost AI. Data for the Saudi Data and AI Authority from Accenture shows that generative AIhas the potential to elevate its gross domestic productby approximately bn by augmenting and automating nearly a third of all jobs.
    In February, Accenture said it was collaborating with Google Cloud to accelerate the adoption of cloud and GenAI capabilities in Saudi Arabia to support the KSA’s local data, operational and software sovereignty needs. As part of this work, Google and Accenture announced a joint centre of excellence for GenAI.
    During the Leap 2025 conference, Salesforce said it would be investing m in Saudi, with the focus again on AI. The company announced a regional head office in Riyadh, and as part of this expansion, Salesforce said it was working with Amazon Web Servicesto deliver its Hyperforce, next-generation platform architecture to the kingdom.
    The work with AWS means Salesforce’s global customers are able to run workloads locally through a distributed public cloud infrastructure that Salesforce said complies with local regulations.
    Commenting on Salesforce’s plans, Abdullah Alswaha, minister of communications and information technology, said: “We look forward to seeing Salesforce expand its presence here and welcome the investment in AI that will drive unprecedented innovation and operational efficiency, supporting the realisation of Saudi Arabia’s Vision 2030 goals.”
    While the White House has linked Trump’s visit to the investments major tech firms are making in Saudi Arabia, the kingdom faces a number of challenges as it fleshes out its AI ambitions.
    According to a March 2025 paper from the Carnegie Endowment for International Peace, Vision 2030 represents a future-orientated vision to make religion less of a focal point of its society, particularly with regards to how Saudi Arabia is perceived internationally. But the paper notes that private-sector employers face the challenge of hiring Saudi citizens to fill high-skilled jobs. “This skills gap will only worsen as the kingdom targets a wide range of highly skilled sectors, including AI startups, green technology, datacentres, e-gaming, fintech and EV production,” it says.
    #trump #visit #bolsters #saudi
    Trump visit bolsters Saudi AI
    Saudi Arabia’s artificial intelligenceambitions were boosted following US president Donald Trump’s recent trip to the Middle East. The Trump administration’s import tariffs and AI chip export restrictions appear to have been behind the decision by Saudi Arabian datacentre operator DataVolt to sign a major server deal with US firm Supermicro. The company’s selection of the US firm for its bn investment in AI datacentre technology will secure graphics processing unithardware currently seen as the best way to provide the accelerated processing power AI workloads require. “Partnering with Supermicro guarantees us a US-made supply chain for critical GPU systems and positions DataVolt to accelerate our investment plans,” said Rajit Nanda, CEO of DataVolt. Supermicro said the collaboration would fast-track delivery of its ultra-dense GPU platforms, storage and rack-based systems for DataVolt’s hyperscale gigawatt-class renewable and net-zero green AI datacentre facilities.  The hardware uses liquid cooling, which Supermicro said reduces power costs by up to 40%, which allows datacentres to run more efficiently with lower power usage effectiveness. “We are excited to collaborate with DataVolt to bring our advanced AI systems featuring the latest direct liquid cooling technology, powered by local renewable, sustainable and net-zero green technology,” said Charles Liang, president and CEO of Supermicro. Supermicro claims its latest AI infrastructure is able to lower the total cost of ownership by up to 20%. Along with its Saudi AI datacentre build, Datavolt, Google, Oracle, Salesforce, AMD and Uber have said they are committed to the investment of bn in cutting-edge transformative technologies in both the US and Saudi Arabia. Oracle said it plans to invest bn over the next 10 years to deliver cloud and AI technology to Saudi Arabia. “Thanks to the decisive actions and strong leadership of President Trump and his administration, Oracle is providing the world’s most advanced cloud and AI technology to Saudi Arabia,” said Safra Catz, CEO at Oracle. “Our expanded partnership with the kingdom will create new opportunities for its economy and deliver better health outcomes for its people.” As part of Trump’s visit, Humain, a subsidiary of Saudi Arabia’s Public Investment Fund, also announced a bn deal with US chipmaker AMD to develop 500 megawatts of open, scalable, resilient and cost-efficient AI infrastructure that spans Saudi Arabia and the US. Through the collaboration, Humain will oversee end-to-end delivery, including hyperscale datacentres, sustainable power systems and global fibre interconnects, while AMD has said it will be providing its AI compute processors and the AMD ROCm open software ecosystem. Tareq Amin, CEO of Humain, described the initiative as “an open collaboration to the world’s innovators”. “We are democratising AI at the compute level, ensuring that access to advanced AI is limited only by imagination, not by infrastructure,” he said. about AI in Saudi Arabia Saudi puts bn into AI as experts debate next steps: The kingdom’s Leap 2025 tech show is the backdrop for huge investment, plus debate over the future of artificial intelligence as a productivity tool but which can also potentially undermine human society. Saudi Arabia struggling to reach global leadership in deeptech: Petrostate monarchy trying to build surrogate industry made of foreign startups because own ecosystem is too immature. A number of US tech firms have made major investments in Saudi this year, driven by the kingdom’s ambitions to boost AI. Data for the Saudi Data and AI Authority from Accenture shows that generative AIhas the potential to elevate its gross domestic productby approximately bn by augmenting and automating nearly a third of all jobs. In February, Accenture said it was collaborating with Google Cloud to accelerate the adoption of cloud and GenAI capabilities in Saudi Arabia to support the KSA’s local data, operational and software sovereignty needs. As part of this work, Google and Accenture announced a joint centre of excellence for GenAI. During the Leap 2025 conference, Salesforce said it would be investing m in Saudi, with the focus again on AI. The company announced a regional head office in Riyadh, and as part of this expansion, Salesforce said it was working with Amazon Web Servicesto deliver its Hyperforce, next-generation platform architecture to the kingdom. The work with AWS means Salesforce’s global customers are able to run workloads locally through a distributed public cloud infrastructure that Salesforce said complies with local regulations. Commenting on Salesforce’s plans, Abdullah Alswaha, minister of communications and information technology, said: “We look forward to seeing Salesforce expand its presence here and welcome the investment in AI that will drive unprecedented innovation and operational efficiency, supporting the realisation of Saudi Arabia’s Vision 2030 goals.” While the White House has linked Trump’s visit to the investments major tech firms are making in Saudi Arabia, the kingdom faces a number of challenges as it fleshes out its AI ambitions. According to a March 2025 paper from the Carnegie Endowment for International Peace, Vision 2030 represents a future-orientated vision to make religion less of a focal point of its society, particularly with regards to how Saudi Arabia is perceived internationally. But the paper notes that private-sector employers face the challenge of hiring Saudi citizens to fill high-skilled jobs. “This skills gap will only worsen as the kingdom targets a wide range of highly skilled sectors, including AI startups, green technology, datacentres, e-gaming, fintech and EV production,” it says. #trump #visit #bolsters #saudi
    WWW.COMPUTERWEEKLY.COM
    Trump visit bolsters Saudi AI
    Saudi Arabia’s artificial intelligence (AI) ambitions were boosted following US president Donald Trump’s recent trip to the Middle East. The Trump administration’s import tariffs and AI chip export restrictions appear to have been behind the decision by Saudi Arabian datacentre operator DataVolt to sign a major server deal with US firm Supermicro. The company’s selection of the US firm for its $20bn investment in AI datacentre technology will secure graphics processing unit (GPU) hardware currently seen as the best way to provide the accelerated processing power AI workloads require. “Partnering with Supermicro guarantees us a US-made supply chain for critical GPU systems and positions DataVolt to accelerate our investment plans,” said Rajit Nanda, CEO of DataVolt. Supermicro said the collaboration would fast-track delivery of its ultra-dense GPU platforms, storage and rack-based systems for DataVolt’s hyperscale gigawatt-class renewable and net-zero green AI datacentre facilities.  The hardware uses liquid cooling, which Supermicro said reduces power costs by up to 40%, which allows datacentres to run more efficiently with lower power usage effectiveness. “We are excited to collaborate with DataVolt to bring our advanced AI systems featuring the latest direct liquid cooling technology (DLC-2), powered by local renewable, sustainable and net-zero green technology,” said Charles Liang, president and CEO of Supermicro. Supermicro claims its latest AI infrastructure is able to lower the total cost of ownership by up to 20%. Along with its Saudi AI datacentre build, Datavolt, Google, Oracle, Salesforce, AMD and Uber have said they are committed to the investment of $80bn in cutting-edge transformative technologies in both the US and Saudi Arabia. Oracle said it plans to invest $14bn over the next 10 years to deliver cloud and AI technology to Saudi Arabia. “Thanks to the decisive actions and strong leadership of President Trump and his administration, Oracle is providing the world’s most advanced cloud and AI technology to Saudi Arabia,” said Safra Catz, CEO at Oracle. “Our expanded partnership with the kingdom will create new opportunities for its economy and deliver better health outcomes for its people.” As part of Trump’s visit, Humain, a subsidiary of Saudi Arabia’s Public Investment Fund (PIF), also announced a $10bn deal with US chipmaker AMD to develop 500 megawatts of open, scalable, resilient and cost-efficient AI infrastructure that spans Saudi Arabia and the US. Through the collaboration, Humain will oversee end-to-end delivery, including hyperscale datacentres, sustainable power systems and global fibre interconnects, while AMD has said it will be providing its AI compute processors and the AMD ROCm open software ecosystem. Tareq Amin, CEO of Humain, described the initiative as “an open collaboration to the world’s innovators”. “We are democratising AI at the compute level, ensuring that access to advanced AI is limited only by imagination, not by infrastructure,” he said. Read more about AI in Saudi Arabia Saudi puts $15bn into AI as experts debate next steps: The kingdom’s Leap 2025 tech show is the backdrop for huge investment, plus debate over the future of artificial intelligence as a productivity tool but which can also potentially undermine human society. Saudi Arabia struggling to reach global leadership in deeptech: Petrostate monarchy trying to build surrogate industry made of foreign startups because own ecosystem is too immature. A number of US tech firms have made major investments in Saudi this year, driven by the kingdom’s ambitions to boost AI. Data for the Saudi Data and AI Authority from Accenture shows that generative AI (GenAI) has the potential to elevate its gross domestic product (GDP) by approximately $42.3bn by augmenting and automating nearly a third of all jobs. In February, Accenture said it was collaborating with Google Cloud to accelerate the adoption of cloud and GenAI capabilities in Saudi Arabia to support the KSA’s local data, operational and software sovereignty needs. As part of this work, Google and Accenture announced a joint centre of excellence for GenAI. During the Leap 2025 conference, Salesforce said it would be investing $500m in Saudi, with the focus again on AI. The company announced a regional head office in Riyadh, and as part of this expansion, Salesforce said it was working with Amazon Web Services (AWS) to deliver its Hyperforce, next-generation platform architecture to the kingdom. The work with AWS means Salesforce’s global customers are able to run workloads locally through a distributed public cloud infrastructure that Salesforce said complies with local regulations. Commenting on Salesforce’s plans, Abdullah Alswaha, minister of communications and information technology, said: “We look forward to seeing Salesforce expand its presence here and welcome the investment in AI that will drive unprecedented innovation and operational efficiency, supporting the realisation of Saudi Arabia’s Vision 2030 goals.” While the White House has linked Trump’s visit to the investments major tech firms are making in Saudi Arabia, the kingdom faces a number of challenges as it fleshes out its AI ambitions. According to a March 2025 paper from the Carnegie Endowment for International Peace, Vision 2030 represents a future-orientated vision to make religion less of a focal point of its society, particularly with regards to how Saudi Arabia is perceived internationally. But the paper notes that private-sector employers face the challenge of hiring Saudi citizens to fill high-skilled jobs. “This skills gap will only worsen as the kingdom targets a wide range of highly skilled sectors, including AI startups, green technology, datacentres, e-gaming, fintech and EV production,” it says.
    0 Commentarii 0 Distribuiri
  • How close is quantum computing to commercial reality?

    Quantum computing may still be regarded by many IT leaders as a very niche technology, but broader business use cases may be just a few years away.
    While only a handful of companies have machines with logical qubits today, delegates at the Commercialising Quantum Computing conference in London were told that a machine with 100 logical qubits would offer quantum advantage in material science by 2028.
    This means that, by then, a sufficiently powerful and stable quantum computer would start delivering business value better than what would be possible using high performance computing.

    Mark Jackson, senior quantum evangelist at Quantinuum, said the company was already using generative quantum artificial intelligence. In a fireside chat at the conference, Jackson spoke about the interaction between quantum computing and AI.
    It is largely acknowledged that a quantum computer is not good at providing a precise answer, such as if applied to big data analysis. But, according to Jackson, it shines when used for machine learning, which can be applied to identify a correct answer. Quantum-enhanced machine learning can process large datasets far quicker than conventional computers, especially when applied to detecting patterns.
    “Quantum computers can detect patterns that would be missed by other conventional computing methods,” said Jackson.
    This ability to detect patterns in massive datasets could revolutionise cyber security. Becky Pickard, managing director of global cyber operations at Barclays, pointed out during a panel discussion that a lot of progress has been made with machine learning and how to apply it on a day-to-day basis: “We’re working with massive volumes of data – 12Tbytes – on a daily basis.”
    She suggested that quantum machine learning could help. From an optimisation perspective, she is keen to see the development of quantum computing applied in a way that reshapes cyber defence. 

    HSBC is one of the organisations that has been working on quantum computing for several years.
    Discussing the return on investment opportunity, and how quantum computing can be used to build more optimised financial models, Phil Intallura, global head of quantum technologies at HSBC, said: “When you breakdown the opportunities, financial services is one of the biggest beneficiaries.”
    As Intallura points out, banks are always looking for a better financial model: “There’s one thing that catalyses commercial organisations more than anything else, and that’s confidence. If you can show a solution using quantum technology that can get a better output based than using a supercomputers,will give you much more runway than you need.”
    Another application area is the ability to generate a true random number, which can feed into financial model simulations.
    In March, a team of researchers from JPMorganChase, Quantinuum, Argonne National Laboratory, Oak Ridge National Laboratory, and the University of Texas at Austin published a paper in Nature discussing a technique known as Random Circuit Sampling.
    RCS is used to perform a certified-randomness-expansion protocol, which outputs more randomness than it takes as input. It is a task that is often used to demonstrate quantum supremacy since it cannot be achieved on a classical computer.
    Speaking of the usefulness of a quantum number generator at HSBC, Intallura said: “Using quantum random numbers as your entropy source to classical simulation does not change any of the underlying model practices in classical models. You’re just injecting a different source of entropy than what we woulduse.”

    For Intallura, regulatory pressure and the need to ensure financial transactions are secure is helping to inform quantum computing plans at financial institutions. 
    The US National Institute of Standards and Technology has ratified a number of post-quantum cryptographystandards. Banks face pressure from regulators to replace RSA-2048 encryption by 2035 and migrate fully over to quantum safe encryption standards to protect banking transactions.But, as Mark Carney, lead of quantum cyber security research at Santander Global, noted, post-quantum cryptography needs both software and hardware acceleration.
    “We want to be able to have PQC at speed in our devices and on our payment cards,” he said. “We want to give our customers the very best cryptography that we possibly can – not just for regulatory purposes, but also because it gives a sense of assurance.”
    Among the promises of quantum computing is that it can be applied to solve complex optimisation problems. As and when they become commercially viable, such systems will need to work alongside traditional enterprise IT.
    This is something that Gerard Mullery, interim CEO of Oxford Quantum Circuits, recognised during his presentation at the event. Mullery sees a need for quantum computing to be embedded in enterprise workflows.
     “As AI agents autonomously orchestrate enterprise workflows, quantum compute platforms must be designed to integrate with them,” he added.
    What is clear from the experts who spoke at the Commercialising Quantum Computing conference is that a useful machine is perhaps only a few years away. This will have enough logical qubits to solve real-world problems.
    As such devices evolve, it is likely more organisations will draw on quantum computing for certain combinatorial optimisation problems, which will need to integrate with classical computing in the datacentre. As quantum computing becomes more accessible, there will also be a need to bolster cryptography with PQC.

    about quantum developments

    Cisco lays out plans for networking in era of quantum computing: The network equipment provider has opened a new lab and developed a prototype chip as it fleshes out its quantum networking strategy.
    Quantum datacentre deployments: How they are supporting evolving compute projects: Quantum datacentre deployments are emerging worldwide, so what are they and where are the benefits?
    #how #close #quantum #computing #commercial
    How close is quantum computing to commercial reality?
    Quantum computing may still be regarded by many IT leaders as a very niche technology, but broader business use cases may be just a few years away. While only a handful of companies have machines with logical qubits today, delegates at the Commercialising Quantum Computing conference in London were told that a machine with 100 logical qubits would offer quantum advantage in material science by 2028. This means that, by then, a sufficiently powerful and stable quantum computer would start delivering business value better than what would be possible using high performance computing. Mark Jackson, senior quantum evangelist at Quantinuum, said the company was already using generative quantum artificial intelligence. In a fireside chat at the conference, Jackson spoke about the interaction between quantum computing and AI. It is largely acknowledged that a quantum computer is not good at providing a precise answer, such as if applied to big data analysis. But, according to Jackson, it shines when used for machine learning, which can be applied to identify a correct answer. Quantum-enhanced machine learning can process large datasets far quicker than conventional computers, especially when applied to detecting patterns. “Quantum computers can detect patterns that would be missed by other conventional computing methods,” said Jackson. This ability to detect patterns in massive datasets could revolutionise cyber security. Becky Pickard, managing director of global cyber operations at Barclays, pointed out during a panel discussion that a lot of progress has been made with machine learning and how to apply it on a day-to-day basis: “We’re working with massive volumes of data – 12Tbytes – on a daily basis.” She suggested that quantum machine learning could help. From an optimisation perspective, she is keen to see the development of quantum computing applied in a way that reshapes cyber defence.  HSBC is one of the organisations that has been working on quantum computing for several years. Discussing the return on investment opportunity, and how quantum computing can be used to build more optimised financial models, Phil Intallura, global head of quantum technologies at HSBC, said: “When you breakdown the opportunities, financial services is one of the biggest beneficiaries.” As Intallura points out, banks are always looking for a better financial model: “There’s one thing that catalyses commercial organisations more than anything else, and that’s confidence. If you can show a solution using quantum technology that can get a better output based than using a supercomputers,will give you much more runway than you need.” Another application area is the ability to generate a true random number, which can feed into financial model simulations. In March, a team of researchers from JPMorganChase, Quantinuum, Argonne National Laboratory, Oak Ridge National Laboratory, and the University of Texas at Austin published a paper in Nature discussing a technique known as Random Circuit Sampling. RCS is used to perform a certified-randomness-expansion protocol, which outputs more randomness than it takes as input. It is a task that is often used to demonstrate quantum supremacy since it cannot be achieved on a classical computer. Speaking of the usefulness of a quantum number generator at HSBC, Intallura said: “Using quantum random numbers as your entropy source to classical simulation does not change any of the underlying model practices in classical models. You’re just injecting a different source of entropy than what we woulduse.” For Intallura, regulatory pressure and the need to ensure financial transactions are secure is helping to inform quantum computing plans at financial institutions.  The US National Institute of Standards and Technology has ratified a number of post-quantum cryptographystandards. Banks face pressure from regulators to replace RSA-2048 encryption by 2035 and migrate fully over to quantum safe encryption standards to protect banking transactions.But, as Mark Carney, lead of quantum cyber security research at Santander Global, noted, post-quantum cryptography needs both software and hardware acceleration. “We want to be able to have PQC at speed in our devices and on our payment cards,” he said. “We want to give our customers the very best cryptography that we possibly can – not just for regulatory purposes, but also because it gives a sense of assurance.” Among the promises of quantum computing is that it can be applied to solve complex optimisation problems. As and when they become commercially viable, such systems will need to work alongside traditional enterprise IT. This is something that Gerard Mullery, interim CEO of Oxford Quantum Circuits, recognised during his presentation at the event. Mullery sees a need for quantum computing to be embedded in enterprise workflows.  “As AI agents autonomously orchestrate enterprise workflows, quantum compute platforms must be designed to integrate with them,” he added. What is clear from the experts who spoke at the Commercialising Quantum Computing conference is that a useful machine is perhaps only a few years away. This will have enough logical qubits to solve real-world problems. As such devices evolve, it is likely more organisations will draw on quantum computing for certain combinatorial optimisation problems, which will need to integrate with classical computing in the datacentre. As quantum computing becomes more accessible, there will also be a need to bolster cryptography with PQC. about quantum developments Cisco lays out plans for networking in era of quantum computing: The network equipment provider has opened a new lab and developed a prototype chip as it fleshes out its quantum networking strategy. Quantum datacentre deployments: How they are supporting evolving compute projects: Quantum datacentre deployments are emerging worldwide, so what are they and where are the benefits? #how #close #quantum #computing #commercial
    WWW.COMPUTERWEEKLY.COM
    How close is quantum computing to commercial reality?
    Quantum computing may still be regarded by many IT leaders as a very niche technology, but broader business use cases may be just a few years away. While only a handful of companies have machines with logical qubits today, delegates at the Commercialising Quantum Computing conference in London were told that a machine with 100 logical qubits would offer quantum advantage in material science by 2028. This means that, by then, a sufficiently powerful and stable quantum computer would start delivering business value better than what would be possible using high performance computing. Mark Jackson, senior quantum evangelist at Quantinuum, said the company was already using generative quantum artificial intelligence (AI). In a fireside chat at the conference, Jackson spoke about the interaction between quantum computing and AI. It is largely acknowledged that a quantum computer is not good at providing a precise answer, such as if applied to big data analysis. But, according to Jackson, it shines when used for machine learning, which can be applied to identify a correct answer. Quantum-enhanced machine learning can process large datasets far quicker than conventional computers, especially when applied to detecting patterns. “Quantum computers can detect patterns that would be missed by other conventional computing methods,” said Jackson. This ability to detect patterns in massive datasets could revolutionise cyber security. Becky Pickard, managing director of global cyber operations at Barclays, pointed out during a panel discussion that a lot of progress has been made with machine learning and how to apply it on a day-to-day basis: “We’re working with massive volumes of data – 12Tbytes – on a daily basis.” She suggested that quantum machine learning could help. From an optimisation perspective, she is keen to see the development of quantum computing applied in a way that reshapes cyber defence.  HSBC is one of the organisations that has been working on quantum computing for several years. Discussing the return on investment opportunity, and how quantum computing can be used to build more optimised financial models, Phil Intallura, global head of quantum technologies at HSBC, said: “When you breakdown the opportunities, financial services is one of the biggest beneficiaries.” As Intallura points out, banks are always looking for a better financial model: “There’s one thing that catalyses commercial organisations more than anything else, and that’s confidence. If you can show a solution using quantum technology that can get a better output based than using a supercomputers, [business decision-makers] will give you much more runway than you need.” Another application area is the ability to generate a true random number, which can feed into financial model simulations. In March, a team of researchers from JPMorganChase, Quantinuum, Argonne National Laboratory, Oak Ridge National Laboratory, and the University of Texas at Austin published a paper in Nature discussing a technique known as Random Circuit Sampling (RCS). RCS is used to perform a certified-randomness-expansion protocol, which outputs more randomness than it takes as input. It is a task that is often used to demonstrate quantum supremacy since it cannot be achieved on a classical computer. Speaking of the usefulness of a quantum number generator at HSBC, Intallura said: “Using quantum random numbers as your entropy source to classical simulation does not change any of the underlying model practices in classical models. You’re just injecting a different source of entropy than what we would [normally] use.” For Intallura, regulatory pressure and the need to ensure financial transactions are secure is helping to inform quantum computing plans at financial institutions.  The US National Institute of Standards and Technology has ratified a number of post-quantum cryptography (PQC) standards. Banks face pressure from regulators to replace RSA-2048 encryption by 2035 and migrate fully over to quantum safe encryption standards to protect banking transactions.But, as Mark Carney, lead of quantum cyber security research at Santander Global, noted, post-quantum cryptography needs both software and hardware acceleration. “We want to be able to have PQC at speed in our devices and on our payment cards,” he said. “We want to give our customers the very best cryptography that we possibly can – not just for regulatory purposes, but also because it gives a sense of assurance.” Among the promises of quantum computing is that it can be applied to solve complex optimisation problems. As and when they become commercially viable, such systems will need to work alongside traditional enterprise IT. This is something that Gerard Mullery, interim CEO of Oxford Quantum Circuits, recognised during his presentation at the event. Mullery sees a need for quantum computing to be embedded in enterprise workflows.  “As AI agents autonomously orchestrate enterprise workflows, quantum compute platforms must be designed to integrate with them,” he added. What is clear from the experts who spoke at the Commercialising Quantum Computing conference is that a useful machine is perhaps only a few years away. This will have enough logical qubits to solve real-world problems. As such devices evolve, it is likely more organisations will draw on quantum computing for certain combinatorial optimisation problems, which will need to integrate with classical computing in the datacentre. As quantum computing becomes more accessible, there will also be a need to bolster cryptography with PQC. Read more about quantum developments Cisco lays out plans for networking in era of quantum computing: The network equipment provider has opened a new lab and developed a prototype chip as it fleshes out its quantum networking strategy. Quantum datacentre deployments: How they are supporting evolving compute projects: Quantum datacentre deployments are emerging worldwide, so what are they and where are the benefits?
    0 Commentarii 0 Distribuiri
  • Roundtable: Why did customers sail away from VMware?

    Hyper-converged infrastructure pioneer Nutanix is among a number of suppliers that smell blood in the water when it comes to VMware and its customers following the virtualisation giant’s acquisition by Broadcom.
    At Nutanix’s annual .Next bash in Washington DC last week, migration away from VMware and to – it hopes – its own Acropolis hypervisor (AHV) was a constant theme.
    As part of this, it gathered three customers to talk about their experiences of moving from VMware to Nutanix. 
    Of these, only one was directly attributable to Broadcom’s licensing changes, but Broadcom-Amazon Web Services (AWS) relations were key to another.
    We asked them about their journey to Nutanix and away from VMware, as well as the precise pain points that prompted their decisions.
    Here, we talk to:
    Dom Johnston, IT manager for Golding in Brisbane, Australia, which is a heavy civil and mining contracting company that has operated on the east coast of Australia for about 75 years. 
    Kee Yew Wei, associate vice-president for infrastructure and operations at MSIG, which is a Japan-headquartered insurance company that operates internationally. 
    Mike Taylor, hospital ship joint task director for Military Sealift Command and the US Navy, which operates two hospital ships, Mercy (pictured above) and Comfort.
    Dom Johnston: Golding had its infrastructure sitting in VMware on AWS.
    We had a three-year contract with VMware for that platform, which ended in February this year.
    About March of last year, there was a fairly public divorce between VMware and AWS.
    We weren’t sure where that left us. 
    To cut a long story short, with what we saw over the next two to three months from there, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great for us.
    Golding had its infrastructure sitting in VMware on AWS.
    [After the] fairly public divorce between VMware and AWS, we weren’t sure where that left us.
    With what we saw over the next two to three months, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great.
    Nutanix has kind of swung in to replace that

    Dom Johnston, Golding
    So we went out to market to look at alternatives.
    And Nutanix has kind of swung in to essentially replace that.
    We use NC2 [Nutanix Cloud Clusters] to run our production workloads in AWS, for our DR [disaster recovery] capability, and that’s essentially to directly replace the functionality that existed within AWS and VMware Cloud Disaster Recovery, which was the DR product that sat alongside that. 
    So essentially, our DR strategy is that if an event occurs, we immediately spin up the DR environment ready to accept a workload.
    In the event that is not required, it’s spun back down again, and we’ve lost, you know, a couple 100 bucks worth of compute usage. 
    Kee Yew Wei: Our journey with Nutanix is from 2017.
    We were looking for a hyper-converged system to simplify our environment, to do away with the traditional three-tier legacy system, to simplify our environment, and to reduce our datacentre footprint. 
    Nutanix is the system, but we didn’t have full confidence in Acropolis at that time, because it was quite new compared to VMware. 
    After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV.
    We completed the full migration last month. 
    All this came about after the acquisition by Broadcom, and we received a quotation with a 300% to 400% increase on our renewal pricing.
    So, then we made the decision to go for Nutanix. 
    We started planning somewhere around Q3 last year and were quite conservative, with completion planned for maybe somewhere in Q2 this year.
    My team migrated 1,000 to 2,000 VMs [virtual machines] from Q4 and completed that at the beginning of April.
    So today, we are a full Nutanix house.
    Mike Taylor: Our story with Nutanix started way back in 2017.
    We’d been Nutanix lookers for a long time. 
    On my ships, we had 1,000 blade servers and EMC tiered storage taking up multiple racks.
    But on the ships, there’s only a finite amount of power they generate, so I needed to find a way to bring everything down into a smaller footprint – but a smarter, smaller footprint, something that would allow me to very elegantly manage and have ease of use that my teams aboard the ships could deal with. 
    After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV

    Kee Yew Wei, MSIG
    So, we did a bake-off with Dell, Cisco and Nutanix, and we implemented Nutanix on Mercy in 2019 and Comfort in 2020.
    Now, we’re looking at generational refreshes of all of our equipment and probably expanding from there and getting some new features, with redundancy and disaster recovery.
    We do have an onboard continuity-of-operations rack, so we have mirrored failover clusters of Nutanix aboard the ships. 
    Now we’re all Nutanix.
    Everything moved over.
    That’s like, out of 80-something servers, we only had two or three servers that had hiccups. 
    Taylor: I remember standing in my main datacentre on the hospital ships.
    It’s very anticlimactic if you ever get to go; I just have five racks, but two of those five were purely just to run my server infrastructure.
    I remember standing there with one of my peers, and we were looking at it and we said, “Oh, hey, we’re still using SAN directors.” And SAN was going away, they were on their way out. 
    Dell had come out with stuff like FX, and other people were dabbling with hyper-converged, whereas Nutanix had already done it, and they had their own software, which was easy to understand for my engineers.
    So, I’m looking at these racks full of equipment, especially the VNX, which was power hungry.
    So, we said, “There has to be a better way to do this.” Energy was the problem.
    Energy was the driver to finding a solution. 
    We weren’t impacted by the Broadcom event.
    We got in before it.
    I do still run some VMware, so I am impacted by it there.
    The challenge we have incurred in continuing to operate that small part isn’t financial.
    It’s purely that I can’t get to updates.
    I can’t get to download them.
    It’s support aspects of the change that impact us the most, not the financial part of it. 
    If we hadn’t moved to Nutanix, if we were still purely ESXi, the financial part would certainly be a burden, like it is for other military commands. 
    Johnston: After AWS and VMware had their thing, we were notified by VMware that we were no longer able to spin up our on-demand DR cluster.
    They told us that, essentially, we could still use our DR plan if we powered down our production cluster before spinning up a DR cluster.
    We were testing quarterly, but we were no longer able to do that.
    In fact, we shifted to testing monthly because there was so much uncertainty in that space.
    We were left in a situation where, because we couldn’t test, we had zero confidence. 
    Kee Yew Wei: It was all about cost.
    We got a bill with a 300% to 400% increase on our last renewal.
    So, this is one of the key factors that drove us to migrate all our workloads to Nutanix.
    Taylor: The trade-offs are very, very light, if any.
    My people were very seasoned with ESXi VMware Tools and the orchestration that VMware had. 
    But the learning curve for Nutanix is very short.
    It’s very easy to pick up, but you have to learn it.
    There’s a different way to import an OVA, as opposed to the way you do it within the VMware ecosystem, for example.
    So, the trade-off is really just time to become a master at using the system with regard to functionality. 
    The learning curve for Nutanix is very short.
    It’s very easy to pick up

    Mike Taylor, US Navy
    In fact, I think I have enhanced capability using AHV as my hypervisor.
    When it comes to security, using VMware with the military, we have to submit vulnerability scans constantly.
    That’s just part of our regular drumbeat.
    I still run VMware on classified parts of my network, and it is very challenging to keep it secure and up to date.
    I don’t have that issue with Nutanix.
    Johnson: I second that.
    As far as trade-offs are concerned, or the functionality, it’s really just a question of semantics in relation to the differences between the two platforms.
    The way that Nutanix handles snapshots is different to the way that VMware handles snapshots.
    That was a learning curve for us.
    It’s like going from Windows 10 to Windows 11.
    Things are in a different spot, but it’s the same functionality. 
    You need to prepare your team, get them training, show them what to do.
    I don’t think there’s any loss of functionality.
    In fact, I think there are faster workflows, better availability of tools. 
    Kee Yew Wei: I don’t see trade-offs.
    Maybe 10 years ago, compatibility with other suppliers’ software might have been an issue, like backup solutions such as [Veritas] NetBackup.
    Maybe seven or eight years ago, they did not support Nutanix.
    But that’s not the case today.
    Read more about virtualisation and storage
    University will ‘pull the plug’ to test Nutanix disaster recovery: University of Reading set to save circa £500,000 and deploy Nutanix NC2 hybrid cloud that will allow failover from main datacentre. 
    NHS trust cloud plans hampered by Trump tariff uncertainty: Essex NHS wants to move some capacity to the Nutanix cloud, but can’t be certain prices will hold between product selection and when procurement plans gain approval. 

    Source: https://www.computerweekly.com/feature/Roundtable-Why-did-customers-sail-away-from-VMware" style="color: #0066cc;">https://www.computerweekly.com/feature/Roundtable-Why-did-customers-sail-away-from-VMware
    #roundtable #why #did #customers #sail #away #from #vmware
    Roundtable: Why did customers sail away from VMware?
    Hyper-converged infrastructure pioneer Nutanix is among a number of suppliers that smell blood in the water when it comes to VMware and its customers following the virtualisation giant’s acquisition by Broadcom. At Nutanix’s annual .Next bash in Washington DC last week, migration away from VMware and to – it hopes – its own Acropolis hypervisor (AHV) was a constant theme. As part of this, it gathered three customers to talk about their experiences of moving from VMware to Nutanix.  Of these, only one was directly attributable to Broadcom’s licensing changes, but Broadcom-Amazon Web Services (AWS) relations were key to another. We asked them about their journey to Nutanix and away from VMware, as well as the precise pain points that prompted their decisions. Here, we talk to: Dom Johnston, IT manager for Golding in Brisbane, Australia, which is a heavy civil and mining contracting company that has operated on the east coast of Australia for about 75 years.  Kee Yew Wei, associate vice-president for infrastructure and operations at MSIG, which is a Japan-headquartered insurance company that operates internationally.  Mike Taylor, hospital ship joint task director for Military Sealift Command and the US Navy, which operates two hospital ships, Mercy (pictured above) and Comfort. Dom Johnston: Golding had its infrastructure sitting in VMware on AWS. We had a three-year contract with VMware for that platform, which ended in February this year. About March of last year, there was a fairly public divorce between VMware and AWS. We weren’t sure where that left us.  To cut a long story short, with what we saw over the next two to three months from there, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great for us. Golding had its infrastructure sitting in VMware on AWS. [After the] fairly public divorce between VMware and AWS, we weren’t sure where that left us. With what we saw over the next two to three months, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great. Nutanix has kind of swung in to replace that Dom Johnston, Golding So we went out to market to look at alternatives. And Nutanix has kind of swung in to essentially replace that. We use NC2 [Nutanix Cloud Clusters] to run our production workloads in AWS, for our DR [disaster recovery] capability, and that’s essentially to directly replace the functionality that existed within AWS and VMware Cloud Disaster Recovery, which was the DR product that sat alongside that.  So essentially, our DR strategy is that if an event occurs, we immediately spin up the DR environment ready to accept a workload. In the event that is not required, it’s spun back down again, and we’ve lost, you know, a couple 100 bucks worth of compute usage.  Kee Yew Wei: Our journey with Nutanix is from 2017. We were looking for a hyper-converged system to simplify our environment, to do away with the traditional three-tier legacy system, to simplify our environment, and to reduce our datacentre footprint.  Nutanix is the system, but we didn’t have full confidence in Acropolis at that time, because it was quite new compared to VMware.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV. We completed the full migration last month.  All this came about after the acquisition by Broadcom, and we received a quotation with a 300% to 400% increase on our renewal pricing. So, then we made the decision to go for Nutanix.  We started planning somewhere around Q3 last year and were quite conservative, with completion planned for maybe somewhere in Q2 this year. My team migrated 1,000 to 2,000 VMs [virtual machines] from Q4 and completed that at the beginning of April. So today, we are a full Nutanix house. Mike Taylor: Our story with Nutanix started way back in 2017. We’d been Nutanix lookers for a long time.  On my ships, we had 1,000 blade servers and EMC tiered storage taking up multiple racks. But on the ships, there’s only a finite amount of power they generate, so I needed to find a way to bring everything down into a smaller footprint – but a smarter, smaller footprint, something that would allow me to very elegantly manage and have ease of use that my teams aboard the ships could deal with.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV Kee Yew Wei, MSIG So, we did a bake-off with Dell, Cisco and Nutanix, and we implemented Nutanix on Mercy in 2019 and Comfort in 2020. Now, we’re looking at generational refreshes of all of our equipment and probably expanding from there and getting some new features, with redundancy and disaster recovery. We do have an onboard continuity-of-operations rack, so we have mirrored failover clusters of Nutanix aboard the ships.  Now we’re all Nutanix. Everything moved over. That’s like, out of 80-something servers, we only had two or three servers that had hiccups.  Taylor: I remember standing in my main datacentre on the hospital ships. It’s very anticlimactic if you ever get to go; I just have five racks, but two of those five were purely just to run my server infrastructure. I remember standing there with one of my peers, and we were looking at it and we said, “Oh, hey, we’re still using SAN directors.” And SAN was going away, they were on their way out.  Dell had come out with stuff like FX, and other people were dabbling with hyper-converged, whereas Nutanix had already done it, and they had their own software, which was easy to understand for my engineers. So, I’m looking at these racks full of equipment, especially the VNX, which was power hungry. So, we said, “There has to be a better way to do this.” Energy was the problem. Energy was the driver to finding a solution.  We weren’t impacted by the Broadcom event. We got in before it. I do still run some VMware, so I am impacted by it there. The challenge we have incurred in continuing to operate that small part isn’t financial. It’s purely that I can’t get to updates. I can’t get to download them. It’s support aspects of the change that impact us the most, not the financial part of it.  If we hadn’t moved to Nutanix, if we were still purely ESXi, the financial part would certainly be a burden, like it is for other military commands.  Johnston: After AWS and VMware had their thing, we were notified by VMware that we were no longer able to spin up our on-demand DR cluster. They told us that, essentially, we could still use our DR plan if we powered down our production cluster before spinning up a DR cluster. We were testing quarterly, but we were no longer able to do that. In fact, we shifted to testing monthly because there was so much uncertainty in that space. We were left in a situation where, because we couldn’t test, we had zero confidence.  Kee Yew Wei: It was all about cost. We got a bill with a 300% to 400% increase on our last renewal. So, this is one of the key factors that drove us to migrate all our workloads to Nutanix. Taylor: The trade-offs are very, very light, if any. My people were very seasoned with ESXi VMware Tools and the orchestration that VMware had.  But the learning curve for Nutanix is very short. It’s very easy to pick up, but you have to learn it. There’s a different way to import an OVA, as opposed to the way you do it within the VMware ecosystem, for example. So, the trade-off is really just time to become a master at using the system with regard to functionality.  The learning curve for Nutanix is very short. It’s very easy to pick up Mike Taylor, US Navy In fact, I think I have enhanced capability using AHV as my hypervisor. When it comes to security, using VMware with the military, we have to submit vulnerability scans constantly. That’s just part of our regular drumbeat. I still run VMware on classified parts of my network, and it is very challenging to keep it secure and up to date. I don’t have that issue with Nutanix. Johnson: I second that. As far as trade-offs are concerned, or the functionality, it’s really just a question of semantics in relation to the differences between the two platforms. The way that Nutanix handles snapshots is different to the way that VMware handles snapshots. That was a learning curve for us. It’s like going from Windows 10 to Windows 11. Things are in a different spot, but it’s the same functionality.  You need to prepare your team, get them training, show them what to do. I don’t think there’s any loss of functionality. In fact, I think there are faster workflows, better availability of tools.  Kee Yew Wei: I don’t see trade-offs. Maybe 10 years ago, compatibility with other suppliers’ software might have been an issue, like backup solutions such as [Veritas] NetBackup. Maybe seven or eight years ago, they did not support Nutanix. But that’s not the case today. Read more about virtualisation and storage University will ‘pull the plug’ to test Nutanix disaster recovery: University of Reading set to save circa £500,000 and deploy Nutanix NC2 hybrid cloud that will allow failover from main datacentre.  NHS trust cloud plans hampered by Trump tariff uncertainty: Essex NHS wants to move some capacity to the Nutanix cloud, but can’t be certain prices will hold between product selection and when procurement plans gain approval.  Source: https://www.computerweekly.com/feature/Roundtable-Why-did-customers-sail-away-from-VMware #roundtable #why #did #customers #sail #away #from #vmware
    WWW.COMPUTERWEEKLY.COM
    Roundtable: Why did customers sail away from VMware?
    Hyper-converged infrastructure pioneer Nutanix is among a number of suppliers that smell blood in the water when it comes to VMware and its customers following the virtualisation giant’s acquisition by Broadcom. At Nutanix’s annual .Next bash in Washington DC last week, migration away from VMware and to – it hopes – its own Acropolis hypervisor (AHV) was a constant theme. As part of this, it gathered three customers to talk about their experiences of moving from VMware to Nutanix.  Of these, only one was directly attributable to Broadcom’s licensing changes, but Broadcom-Amazon Web Services (AWS) relations were key to another. We asked them about their journey to Nutanix and away from VMware, as well as the precise pain points that prompted their decisions. Here, we talk to: Dom Johnston, IT manager for Golding in Brisbane, Australia, which is a heavy civil and mining contracting company that has operated on the east coast of Australia for about 75 years.  Kee Yew Wei, associate vice-president for infrastructure and operations at MSIG, which is a Japan-headquartered insurance company that operates internationally.  Mike Taylor, hospital ship joint task director for Military Sealift Command and the US Navy, which operates two hospital ships, Mercy (pictured above) and Comfort. Dom Johnston: Golding had its infrastructure sitting in VMware on AWS. We had a three-year contract with VMware for that platform, which ended in February this year. About March of last year, there was a fairly public divorce between VMware and AWS. We weren’t sure where that left us.  To cut a long story short, with what we saw over the next two to three months from there, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great for us. Golding had its infrastructure sitting in VMware on AWS. [After the] fairly public divorce between VMware and AWS, we weren’t sure where that left us. With what we saw over the next two to three months, we considered the risk of leaving our infrastructure there beyond the end of that three-year contract was too great. Nutanix has kind of swung in to replace that Dom Johnston, Golding So we went out to market to look at alternatives. And Nutanix has kind of swung in to essentially replace that. We use NC2 [Nutanix Cloud Clusters] to run our production workloads in AWS, for our DR [disaster recovery] capability, and that’s essentially to directly replace the functionality that existed within AWS and VMware Cloud Disaster Recovery, which was the DR product that sat alongside that.  So essentially, our DR strategy is that if an event occurs, we immediately spin up the DR environment ready to accept a workload. In the event that is not required, it’s spun back down again, and we’ve lost, you know, a couple 100 bucks worth of compute usage.  Kee Yew Wei: Our journey with Nutanix is from 2017. We were looking for a hyper-converged system to simplify our environment, to do away with the traditional three-tier legacy system, to simplify our environment, and to reduce our datacentre footprint.  Nutanix is the system, but we didn’t have full confidence in Acropolis at that time, because it was quite new compared to VMware.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV. We completed the full migration last month.  All this came about after the acquisition by Broadcom, and we received a quotation with a 300% to 400% increase on our renewal pricing. So, then we made the decision to go for Nutanix.  We started planning somewhere around Q3 last year and were quite conservative, with completion planned for maybe somewhere in Q2 this year. My team migrated 1,000 to 2,000 VMs [virtual machines] from Q4 and completed that at the beginning of April. So today, we are a full Nutanix house. Mike Taylor: Our story with Nutanix started way back in 2017. We’d been Nutanix lookers for a long time.  On my ships, we had 1,000 blade servers and EMC tiered storage taking up multiple racks. But on the ships, there’s only a finite amount of power they generate, so I needed to find a way to bring everything down into a smaller footprint – but a smarter, smaller footprint, something that would allow me to very elegantly manage and have ease of use that my teams aboard the ships could deal with.  After a couple of years using Nutanix, we built confidence, so we have recently migrated all our VMware to Nutanix AHV Kee Yew Wei, MSIG So, we did a bake-off with Dell, Cisco and Nutanix, and we implemented Nutanix on Mercy in 2019 and Comfort in 2020. Now, we’re looking at generational refreshes of all of our equipment and probably expanding from there and getting some new features, with redundancy and disaster recovery. We do have an onboard continuity-of-operations rack, so we have mirrored failover clusters of Nutanix aboard the ships.  Now we’re all Nutanix. Everything moved over. That’s like, out of 80-something servers, we only had two or three servers that had hiccups.  Taylor: I remember standing in my main datacentre on the hospital ships. It’s very anticlimactic if you ever get to go; I just have five racks, but two of those five were purely just to run my server infrastructure. I remember standing there with one of my peers, and we were looking at it and we said, “Oh, hey, we’re still using SAN directors.” And SAN was going away, they were on their way out.  Dell had come out with stuff like FX, and other people were dabbling with hyper-converged, whereas Nutanix had already done it, and they had their own software, which was easy to understand for my engineers. So, I’m looking at these racks full of equipment, especially the VNX, which was power hungry. So, we said, “There has to be a better way to do this.” Energy was the problem. Energy was the driver to finding a solution.  We weren’t impacted by the Broadcom event. We got in before it. I do still run some VMware, so I am impacted by it there. The challenge we have incurred in continuing to operate that small part isn’t financial. It’s purely that I can’t get to updates. I can’t get to download them. It’s support aspects of the change that impact us the most, not the financial part of it.  If we hadn’t moved to Nutanix, if we were still purely ESXi, the financial part would certainly be a burden, like it is for other military commands.  Johnston: After AWS and VMware had their thing, we were notified by VMware that we were no longer able to spin up our on-demand DR cluster. They told us that, essentially, we could still use our DR plan if we powered down our production cluster before spinning up a DR cluster. We were testing quarterly, but we were no longer able to do that. In fact, we shifted to testing monthly because there was so much uncertainty in that space. We were left in a situation where, because we couldn’t test, we had zero confidence.  Kee Yew Wei: It was all about cost. We got a bill with a 300% to 400% increase on our last renewal. So, this is one of the key factors that drove us to migrate all our workloads to Nutanix. Taylor: The trade-offs are very, very light, if any. My people were very seasoned with ESXi VMware Tools and the orchestration that VMware had.  But the learning curve for Nutanix is very short. It’s very easy to pick up, but you have to learn it. There’s a different way to import an OVA, as opposed to the way you do it within the VMware ecosystem, for example. So, the trade-off is really just time to become a master at using the system with regard to functionality.  The learning curve for Nutanix is very short. It’s very easy to pick up Mike Taylor, US Navy In fact, I think I have enhanced capability using AHV as my hypervisor. When it comes to security, using VMware with the military, we have to submit vulnerability scans constantly. That’s just part of our regular drumbeat. I still run VMware on classified parts of my network, and it is very challenging to keep it secure and up to date. I don’t have that issue with Nutanix. Johnson: I second that. As far as trade-offs are concerned, or the functionality, it’s really just a question of semantics in relation to the differences between the two platforms. The way that Nutanix handles snapshots is different to the way that VMware handles snapshots. That was a learning curve for us. It’s like going from Windows 10 to Windows 11. Things are in a different spot, but it’s the same functionality.  You need to prepare your team, get them training, show them what to do. I don’t think there’s any loss of functionality. In fact, I think there are faster workflows, better availability of tools.  Kee Yew Wei: I don’t see trade-offs. Maybe 10 years ago, compatibility with other suppliers’ software might have been an issue, like backup solutions such as [Veritas] NetBackup. Maybe seven or eight years ago, they did not support Nutanix. But that’s not the case today. Read more about virtualisation and storage University will ‘pull the plug’ to test Nutanix disaster recovery: University of Reading set to save circa £500,000 and deploy Nutanix NC2 hybrid cloud that will allow failover from main datacentre.  NHS trust cloud plans hampered by Trump tariff uncertainty: Essex NHS wants to move some capacity to the Nutanix cloud, but can’t be certain prices will hold between product selection and when procurement plans gain approval. 
    0 Commentarii 0 Distribuiri
  • GPU Architecture & Working intuitively explained


    Author(s): Allohvk

    Originally published on Towards AI.

    GPU Origins
    The image displayed on a computer screen is made up of millions of tiny pixels. In early days, “graphics controllers” were given instructions by the CPU on how to calculate the individual pixel values so that the appropriate image could be displayed. These were ok for conventional displays but for a really good gaming experience, images need to be built dozens of times per second. The CPU was not really designed to handle these kind of loads.
    The whole process of creating the image could be parallelized big-time simply by (a) dividing the image into smaller blocks (b) carrying out computations for each block in parallel & (c) grouping them back again. The results of one block don’t influence the results of the other blocks. CPU’s multi-threading capabilities was not really conceived for such massive parallelization. Enter the GPU! Sony first used the term GPU in 1994, in its PlayStation consoles. The technology was perfected by NVIDIA which soon became a leader.
    GPUs have numerous computation cores (much more than a CPU) and gaming programmers could write Shaders — programs to run graphics computations on the GPU in a massively parallelized way to create the screen images in super-fast time. The GPU is inspired by the CPU but was specifically designed to enable massive multi-threaded operations on its numerous computation cores seamlessly. Creating threads, switching between threads etc is much faster on a GPU. Some smart developers also realized that these parallel processing capabilities could be used for other computationally intensive tasks as well!

    2005: Steinkrau implements a simple 2-layer Neural Net on a GPU
    2006: Kumar et. al. trains a CNN model for document processing
    2007: NVIDIA released Compute Unified Device Architecture (CUDA) — a custom language extending C to exploit data parallelism on GPUs. Now developers had much more granular control over the image rendering.
    2008 a landmark paper by Raina et al was released. This paper pretty much showed everyone how to train deep layers on a GPU
    2014: NVIDIA released CuDNN — a dedicated CUDA library for Deep Learning. Very soon PyTorch, TensorFlow etc incorporated CuDNN, setting the stage for modern GPU usage for AI!

    A GPU is an ASIC or Application-Specific Integrated Circuit having a processor (hosting numerous computational cores), a memory soldered onto it (we want to avoid going to the CPU RAM for everything), a cooling system (well, they heat up pretty fast) and a BIOS chip (same role as a CPU — to store settings, run startup diagnostics etc). This card is then plugged into the motherboard slot using the PCI Express interface. The terms GPU and graphics card are often used interchangeably. Some GPUs like the one in Apple M3 do not have a dedicated memory but instead use the system RAM itself which is possible due to its unique design. Google has the TPU (Tensor Processing Unit) which is its own ASIC. We discuss the GPU memory, the processing cores, the LLM workflows happening inside them & common topologies for clustering.
    Photo by Thomas Foster on Unsplash
    1. GPU Memory module — The VRAM
    Instead of having the GPU talk to the regular RAM, it made sense to create another RAM physically closer to the GPU die so that data retrieval is faster. So a graphics card has a memory called VRAM — Video Random Access Memory in addition to the computation engines . VRAM is connected to the computation engine cores via a Bus called the memory interface.
    1.1 What is DRAM?
    Let us talk first of RAM technology in general. All memory whether it is the CPU RAM or the GPU VRAM are mostly based on DRAM technology which consists of a capacitor and a transistor. The capacitor’s charge represents the data stored. Due to its very nature, this charge gradually leaks. To prevent data loss, a refresh circuit periodically rewrites the data back, restoring its charge. Hence the name — Dynamic RAM due to these preiodic refreshes.
    Most computers use Synchronous DDR5 DRAM’s as their CPU RAMs. Synchronous because it utilizes the system clock for better performance. In other words the action (of retrieving & storing data) is operationally coordinated by an external clock signal. Tying the operations to the clock makes it faster. The processor knows the exact timing & number of cycles in which the data will be available from the RAM to the bus & can plan better. We have DDR1 (1st Gen Double Data Rate Synchronous Dynamic RAM released in 2000) to DDR5 which is the choice of CPU RAM as of today.
    1.2 What is SGRAM?
    Let us now talk about the VRAMs in GPUs. The VRAM is a type of SGRAM — Synchronous Graphics RAM. The current generation of VRAMs being used is GDDR6. Yes, this is 6th generation GDDR, the G standing for “Graphics”. While DDR & GDDR share common origins and early couple of generations were similar, the branches separated after DDR3. So as of 2025, DDR5 rules in CPU RAM and GDDR6 rules for consumer-grade GPU RAMs.
    Conceptually DDRs and GDDRs are similar but note that DDRs are used by CPUs which need low latency whereas GDDRs are used by GPUs which are OK to compromise latency for extremely high throughput. Crudely, the former has more frequent smaller calculations & the latter deals with much higher volume of data & some delays are forgiven considering the vast volumes of data being processed. Even more crudely, the former is a bullet train with 6–8 coaches while the latter a 3 Kilometre long goods train.
    1.3 GDDR VRAMs explained in detail
    GDDR memory are individual chips soldered to the PCB (Printed Circuit Board) very close to the GPU die. The physical proximity improves the speed of data transfer from the VRAM to the GPU processor. There are pins in a GDDR which can be thought of as individual wires that connect it to the processor. Bus width is literally the number of such connections. GDDR6 has 32 pins spread across 2 channels with roughly 16 Gbits.p.s bandwidth per pin. Bandwidth is total amount of data being moved & if you had one single metric at your disposal to take a decision, it would be this. Before we go further, let us try to understand this metric intuitively.
    1.4 Calculating GPU Memory Bandwidth intuitively
    Memory Bandwidth is the max rate at which data can be transferred between the GPU and the VRAM. We discussed that data transmission is synchronized with the clock. The clock cycle is measured in hertz & represents the number of cycles per second. Let us say we have a clock operating at 1000 MHz. This literally means 1 billion clock ticks per second. How long does a tick last? Literally 1/(1 billion) i.e. 1 nano second. Data is sent to and fro every clock cycle. So every nano-second, a bus-full of data is sent from the VRAM to the processor & vice versa.
    How many seats on the bus? Well, we discussed this earlier… This is the memory interface or the bus width… literally the physical count of bits that fit into the bus. A 128-bit bus would ferry 128 bits every nano-second. The D in G’D’DR6 stands for Double. Basically, data is transmitted on both the rising and falling edges of the clock cycle, so 256 bits every nano-second. How many bytes in 1 sec? 256/8 i.e. 32 billion bytes per second or better still 32 GB/s as Giga is the preferred term when measuring data. The capital B denotes bytes whereas the small b denotes bits… a source of confusion.
    A more practical formula is: Bandwidth = Clock * Bus Width x Data Rate, where the Data Rate is the number of data transfers per cycle. GDDR6 is Double Data Rate (as just discussed) and Quad pumped, which quadruples the (doubled) speed. So effectively the Data Rate is 8. Sometimes, you may encounter the same information crouched in different semantics. E.g., if frequency of command clock (CK#) is N, then the write command clock (WK#) is 2N. GDDR6 rates then are QDR (quad data rate) in reference to WK# and ODR (Octal Data Rate) in reference to the CK#.
    Some OEMs multiply the clock speed & data rate & call it a clock rate or something. In that case, the bandwidth is simply that number multiplied by the bus width. In general, this raw formula can be used: num_of_transfers per second * num_of_bits per transfer / 8. “Boost clock” mechanism allows the GPU and GDDR memory to operate at even higher speeds than the default clock when conditions allow it. Boost clock metric refers to the max such operating clock speed. A 1750 MHz clock means:

    1.75GHz is the frequency of command clock(CK#).
    The frequency of the write clock (WK#) is 3.5GHz due to the G”D”DR
    The Quad pumping takes it to 3.5*4=14 G bits moved in 1 second from each pin on the bus.
    We could have bus widths of up to 384 bits! So we get a bandwidth of 14*384 Giga bits per second.
    Divide by 8 to get 672 GB/s. GDDR6 bandwidth can go upto 1 TB/s. Wow!

    1.5 What is HBM VRAM in a GPU?
    When reading or writing data, contention is created when the VRAM has occupied memory channels & is busy receiving or delivering other data. This contention creates latency & this affects bandwidth. Increasing the number of memory channels is a great option. A type of memory called HBM (High-Bandwidth Memory) has lower access latency than GDDR6, since it has 8-memory channels versus 2 channels in GDDR6. HBM also has a wider bus.
    HBM has 1024 pins spread across 8 channels of 128 pins with roughly 2 Gbits.p.s bandwidth per pin. Compare this with (an equivalent) GDDR which has 32 pins spread across 2 channels with roughly 16 Gbits. p.s bandwidth per pin. Notice how HBM keeps the Gbit/sec per pin much lower than GDDR. This saves power (which is important as we shall see). In spite of this, it has higher bandwidth than GDDR6 due to the wider bus & higher channels.
    As we discussed, a pin is literally a wire connecting the VRAM to the processor. Having 1024 wires connected from the processor to the VRAM is not possible on a standard PCB. Therefore, an “interposer” is used as an
    intermediary to connect the VRAM & the processor. Just like a regular IC, wires (connections) are etched in this silicon “interposer” in the desired quantity. After this, the HBM device(s) & the processor are mounted atop this “interposer”. The slightly twisted workaround is called a 2.5D architecture.Another difference is that while GDDR chips are soldered to the PCB surrounding the GPU die, an HBM structure is a vertical stack of DRAMs like a high rise building. The stacked memory dies are linked using microscopic wires with TSV (Through-Silicon Vias) which are vertical electrical connections giving super fast connectivity between the DRAMs. There are huge challenges to stacking items vertically especially around designing heat sinks & managing thermal safety but somehow HBM manufacturers have made this happen.
    HBM has become a gold standard today for AI data centers. It was introduced to the Market by SK Hynix in 2013. Today, we have the 3rd generation HBM3 and their main client is Nvidia. Due to investments made way back, SK Hynix is leading the pack along with Samsung and a relatively recent entrant named Micron. We hear a lot about chips and TSMC but HBM is a key technology to watch out for in the coming years. We typically have more than one HBM devices inside the GPU die.
    GDDR6 co-exists with HBM3. The markets are complementary. The former addresses PCs & other consumer GPUs whereas the latter addresses data center GPUs. Ultra large scale AI deployments like ChatGPT likely leverage the use of a cluster of NVIDIA GPUs working in tandem. Connecting such GPU’s involves the use of NVIDIA NVLink technology which requires fast GPU memory bandwidth speeds and it’s the reason why HBM is prevalent in such systems. If not for the wide bus width and fast data transfer rates offered by HBM, these kind of clusters would be very difficult to design.
    Besides the VRAM, GPUs also include high-speed memory caches that are even closer to the GPU’s processing cores. There is a physical limit to the sizes of these caches. An L1 cache is usually in KB and an L2 cache is usually a few MB. Different hardware & software strategies exist to keep the most useful, and most reused data present in caches.
    2. Cooling Mechanisms in a GPU
    Higher clock speeds generally result in increased heat generation necessitating the need for cooling solutions to maintain optimal operating temperatures. Usual cooling methods are:

    Passive Cooling: These do not have any powered moving components. They take advantage of optimized airflow to take heat away.
    Fans are used to dissipate heat by blowing cool air across the heat sinks, which are metal components designed to absorb & disperse heat
    In water cooling, water is circulated through the GPU surface using pipes & a radiator. The hot liquid running through the pipes is in turn cooled down by the radiator fan.
    Hybrid cooling — which uses a combination of the above

    3. GPU Computation cores — Processors
    Let us now talk about the processors on the GPU. Unlike CPUs which contain only a few cores, the GPU literally has 1000’s of cores & specializes in running tasks in parallel across these cores using SIMD (Single Instruction, Multiple Data) units. Let us stick to NVIDIA terminology. There are multiple processing units called Streaming Multiprocessor (SM) on a NVIDIA GPU. For e.g. an H100 has upto 144 SMs. What is inside an SM? Well there are mainly 2 type of execution units — CUDA cores & Tensor cores. There is also a small memory SRAM which is Shared between all threads running in that SM. More specifically, every SM has a few KB memory that is partitioned between L1 cache & Shared Memory usage.
    3.1 CUDA core versus Tensor core in a GPU — The difference
    Tensor cores are a pretty recent innovation (from V100 onwards) and are specifically designed for faster matrix multiplication. Let us discuss CUDA cores first. These are the computation engines for regular math operations. Each CUDA core can execute one operation per clock cycle. But their strength lies in parallel processing. Many CUDA cores working together can accelerate computation by executing processes in parallel.
    Tensor Cores are specialized hardware units designed to accelerate “mixed precision” training. The earliest version allowed 4×4 FP16 matrices to be multiplied & added to an FP32 output matrix. By using lower-precision FP16 inputs in the computations, the calculations are vastly accelarated & by retaining FP32 outputs for the rest of the procedure, accuracy is not compromised too much. Modern tensor cores use even lower precision formats in DL computations. See this for more details. There may also specialized units like the transformer engine designed to accelerate models built with the Transformer blocks. A single GPU can be partitioned into multiple fully contained and isolated instances, with their own memory, cache & cores via MIG or Multi Instance GPU technology.
    3.2 GPU operations — A FLOP show
    Let us now talk about actual operations. A FLOP (Floating Point Operation) is a single floating-point calculation like an addition. Performance of a GPU is usually measured in TeraFLOP/s. Tera is a trillion, FLOP stands for floating-point operations and the ‘s’ stands for per second.
    Most matrix ops involve a multiply and an add. It makes sense to fuse these ops together to get an Fused Multiply-Add (FMA) op. If we know the FMA speed, we can simply double it to get the FLOP counts per clock. To get the peak FLOP/s rate, we multiply this by the clock rate & the number of SMs. Note that we have FP16, FP32, FP64 & Int8 cores with varying speeds. For e.g.:

    Say there are 4 tensor cores in each SM & 114 SMs in an H100
    Say each tensor core delivers 512 FP16 FMA ops per clock. Careful here: Read the specs clearly to check whether the FMA ops per clock metric is per SM or per individual core. For e.g., this link of A100 is per coreper SM
    Let the Clock speed = 1620 MHz
    So TFLOP/s = 1620 * (2*512) * 4 * 114= 756 TFLOP/s of performance! 756 Trillion operations per second. Wow! What would Babbage say to that?

    4. Putting everything together — LLM Operations in a GPU
    Given this immense compute-power, we can now make a reasonable guess that LLM inference is memory-I0 bound, not compute bound. In other words, it takes more time to load data to the GPU’s compute cores than it does for those cores to perform LLM computations on that data. The processing itself is super-fast & there is enough & more compute power available.

    To start with, the training data needs to be downloaded from a remote source to the CPU memory
    From there, it needs to be transferred to the GPU via the system bus and PCIe bus. The host(CPU)-to-device(GPU) bandwidth is limited by the CPU frequency, PCIe bus, GPU devices & the number of PCIe lanes available.
    Once the data & weights are in the GPU VRAM, they are then ferried across to the SRAM where the processors perform operations on it.
    After the operation the data is moved back to the VRAM & from there it is moved back to the CPU RAM. This is a rather simplistic view. Inside the GPU, the tensors are repeatedly moved back and forth between VRAM & SRAM (the memory allocated to an SM). Can you guess why?

    We saw that SRAM size is in KB so large matrices are not going to fit in there … which explains why there is a constant movement between VRAM which holds all the tensors and SRAM which holds the data on which compute operations are performed. So there is typically a memory-op where tensors are moved from VRAM to SRAM, then a compute-op SRAM and memory-op to move tensors back from SRAM to VRAM. Computations like a matrix multiplication involving 2 large matrices need several such memory + compute ops before the action is completed.
    During the training of GPT-3, the tensor cores on the GPUs used were found to be idle ~50% of the time. So, to extract the best from the infrastructure, data movement needs to be fast enough to ensure the computation cores are kept reasonably occupied. Surely, there is scope for some smart person to come up with shortcuts. Enter Flash attention & other such hacks. But that is a story for another day!
    5. Linking GPUs for LLM training — Topologies
    While LLM inferencing is manegable with a readymade collection of GPUs such as a DGX server (contains 8 H100s), LLM training needs far more GPUs. Before we discuss how to connect GPUs for larger workloads, it makes sense to see how CPU servers are connected in a datacentre. I am not an expert in this area, so please feel free to point out any incorrect interpretations I may have made from the references I quote.
    5.1 Generic concepts on linking processors
    Each server has a card attached to it called the Network Interface Card (NIC). RDMA technology enables direct memory access to a remote server via the NIC hardware. RoCE (RDMA over Converged Ethernet) protocol uses the RDMA technology & adapts it to Ethernet networks. So now, a server can talk to a remote server over a network. A network switch is a device connecting multiple servers in a network, enabling them to communicate with each other. This is the basic technology. Now let us come to the topology.
    So we assemble all the servers physically in one place and pile them up vertically them in neat racks.A very basic topology is to connect each server in a rack to a switch that usually sits on Top of the Rack, aptly named the ToR switch. The ToR switches of different racks are connected to a Spine switch. This topology is a basic implementation of Clos topology — named after Charles Clos who invented this scheme to originally arrange telephone nodes in a “leaf-n-spine” arrangement. The leaf switches are nothing but the ToR switches in modern data centers.
    Source: Fig 1–1 from https://www.oreilly.com/library/view/bgp-in-the/9781491983416/ch01.html
    Fat tree is a variant of Clos. Like before, we have servers arranged into racks connecting to Top-of-the-Rack (ToR) switches. ToR switches are connected to the aggregation switches to provide connectivity across racks, forming a pod. The pods are interconnected with spine switches, allowing any-to-any communication across servers. To be noted is the fact that there are multiple paths connecting servers. So there is lot of redundancy built-in.
    In a typical App deployment running hundreds of microservices on dozens of servers, it is useful to have such fully connected, high bandwidth networks. You never know who is going to talk to whom so it never hurts to overprovision on bandwidth and connectivity. However, network loads during AI training do not follow these patterns. They are more predictable & this allows us to build optimized, cheaper & less power-hungry networks.
    5.2 Linking GPUs via proprietary technology like NVLink
    We can strap together H100’s by leveraging the proprietary NVLink & NVSwitch technologies. NVLink provides the high-speed connection between individual GPUs, while NVSwitch is a chip that enables multiple GPUs to communicate through NVLink, forming a high-bandwidth network. See this nice article for details.
    NVIDIA’s P100 GPU introduced the NVLink1. At that time there was no NVSwitch chip, and the GPUs were connected in a ring-like configuration, which resulted in a lack of direct point-to-point communication between GPUs. The NVSwitch1 chip was introduced with the V100, followed by the NVSwitch2 chip with the A100 GPU. We are in the third-generation NVSwitch3 which can support a cluster of up to 256 H100 GPUs. Each H100 GPU in such a cluster is connected to the internal NVSwitch3 chip through 18 NVLink4.0 connections. This is how trillion parameter LLMs are inferenced.
    5.3 Linking GPUs via RoCE in a rail-optimized topology
    But as they say, ye dil mange more… Meta reportedly trains its newer models on a cluster that’s over 100K H100’s. Phew! How to they manage to link it all up? The standard NVLink tricks can only scale to a limited number of GPUs. Beyond that, we have to use the network topologies discussed earlier & fall back on technologies like RoCE, which allows data to be directly transferred from one GPU’s memory to another without involving the CPU.
    So you have 8 GPUs in one DGX server. You have several such DGX servers in the data centre. Each GPU is assigned a NIC (yes!) & connected via RDMA to all other GPUs thru’ a variant of Clos network called “rail-optimized network”. The idea here is to set up dedicated connections between groups of GPUs with rail switches. If a GPU wants to communicate with a GPU which is in a different group, then it has to go thru’ the spine switch (which takes a lil more time). To implement this, each GPU in a DGX server is indexed serially. A rail is the set of GPUs with the same index on different servers & these are interconnected with a rail switch via RDMA. These rail switches are subsequently connected to spine switches forming any-to-any GPU network.
    Source: Fig 1 from https://arxiv.org/pdf/2307.12169
    This topology streamlines traffic flow. It is like having dedicated lanes for high speed vehicles instead of generally mixing all traffic together. Rail paths are direct connections between a bunch of GPUs with same index. Spine switches serve as the connecting points for differently-indexed GPUs. For e.g., communication between GPU1 of server 1 and GPU1 of server 2 happens via their dedicated rail switch 1. If GPU1 of server 1 needs to reach GPU5 of another server, it has to go thru’ a spine switch.
    The workloads are designed so as to minimize data transfers across rails (since it has to go thru’ the extra spine switch). The good news is that this can be neatly done for AI training ensuring that most of the traffic stays within the rails, and does not cut across. In fact, there is a recent paper which suggests that you can consider removing costly spine switches altogether as inter-rail communication is minimal. Can you guess how?
    5.4 Linking GPUs via RoCE in a rail-only topology
    Well, we have the superfast connectivity using NVLink to communicate between a limited set of GPUs (upto 256). So you create these High Bandwith (HB) domains which use NVLink for communication. You have several such HB domains. We then have the same indexing system and rail connections to interconnect the HB domains. But there are no spine switches! Can you guess how GPU1 of HB domain 1 can talk to GPU5 of another HB domain? Yes! Transfer data via superfast NVLink to GPU5 of HB domain 1 first. Then use the dedicated rail of GPU5 to talk to the GPU5 in another HB domain! This is a rail-only topology as oppsed to rail-optimized topology!
    Given these topologies, we can now plan the training pipeline to have pipeline parallelism, tensor parallelism &/or data parallelism but that is a story for another day. See this, this & this for more details. 100K H100’s consume a LOT of power. Tech companies are exploring nuclear power options to generate clean energy needed for long term sustenance. Else, a 100K GPU cluster may have to be broken down to smaller clusters and connected using optical transceivers across the buildings in a campus.
    This (unplanned) article is a prelude to — Optimizing LLM inference: Key Faultlines & workarounds. To deeply understand how we can optimize LLM operations, we need to understand more about the silicon on which they are executed. Though there are lots of manuals/guides on individual aspects like memory, processors, networking etc, I couldn’t find a concise and reader-friendly thread linking together these various aspects & hence took a shot. This is the 9th of a 15-series article titled My LLM diaries.

    LLM Quantization — From concepts to implementation
    LoRA & its newer variants explained like never before
    In-Context learning: The greatest magic show in the kingdom of LLMs
    RAG in plain English — Summary of 100+ papers
    HNSW — Story of the world’s most popular Vector search algorithm
    VectorDB origins, Vamana & on-disk Vector search algorithms
    Taming LLMs — A study of few popular techniques
    Understanding LLM Agents: Concepts, Patterns & Frameworks
    Anatomy of a GPU — A peek into the hardware fuelling LLM operations
    Optimizing LLM Inference — Key Faultlines & workarounds
    LLM Serving — Architecture considerations
    LLM evaluation & other odds and ends
    Look Ma, LLMs without Prompt Engineering
    LLMs on the laptop — A peek into the Silicon
    Taking a step back — On model sentience, conscientiousness & other philosophical aspects

    Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

    Published via Towards AI



    المصدر: https://towardsai.net/p/machine-learning/gpu-architecture-working-intuitively-explained
    GPU Architecture & Working intuitively explained Author(s): Allohvk Originally published on Towards AI. GPU Origins The image displayed on a computer screen is made up of millions of tiny pixels. In early days, “graphics controllers” were given instructions by the CPU on how to calculate the individual pixel values so that the appropriate image could be displayed. These were ok for conventional displays but for a really good gaming experience, images need to be built dozens of times per second. The CPU was not really designed to handle these kind of loads. The whole process of creating the image could be parallelized big-time simply by (a) dividing the image into smaller blocks (b) carrying out computations for each block in parallel & (c) grouping them back again. The results of one block don’t influence the results of the other blocks. CPU’s multi-threading capabilities was not really conceived for such massive parallelization. Enter the GPU! Sony first used the term GPU in 1994, in its PlayStation consoles. The technology was perfected by NVIDIA which soon became a leader. GPUs have numerous computation cores (much more than a CPU) and gaming programmers could write Shaders — programs to run graphics computations on the GPU in a massively parallelized way to create the screen images in super-fast time. The GPU is inspired by the CPU but was specifically designed to enable massive multi-threaded operations on its numerous computation cores seamlessly. Creating threads, switching between threads etc is much faster on a GPU. Some smart developers also realized that these parallel processing capabilities could be used for other computationally intensive tasks as well! 2005: Steinkrau implements a simple 2-layer Neural Net on a GPU 2006: Kumar et. al. trains a CNN model for document processing 2007: NVIDIA released Compute Unified Device Architecture (CUDA) — a custom language extending C to exploit data parallelism on GPUs. Now developers had much more granular control over the image rendering. 2008 a landmark paper by Raina et al was released. This paper pretty much showed everyone how to train deep layers on a GPU 2014: NVIDIA released CuDNN — a dedicated CUDA library for Deep Learning. Very soon PyTorch, TensorFlow etc incorporated CuDNN, setting the stage for modern GPU usage for AI! A GPU is an ASIC or Application-Specific Integrated Circuit having a processor (hosting numerous computational cores), a memory soldered onto it (we want to avoid going to the CPU RAM for everything), a cooling system (well, they heat up pretty fast) and a BIOS chip (same role as a CPU — to store settings, run startup diagnostics etc). This card is then plugged into the motherboard slot using the PCI Express interface. The terms GPU and graphics card are often used interchangeably. Some GPUs like the one in Apple M3 do not have a dedicated memory but instead use the system RAM itself which is possible due to its unique design. Google has the TPU (Tensor Processing Unit) which is its own ASIC. We discuss the GPU memory, the processing cores, the LLM workflows happening inside them & common topologies for clustering. Photo by Thomas Foster on Unsplash 1. GPU Memory module — The VRAM Instead of having the GPU talk to the regular RAM, it made sense to create another RAM physically closer to the GPU die so that data retrieval is faster. So a graphics card has a memory called VRAM — Video Random Access Memory in addition to the computation engines . VRAM is connected to the computation engine cores via a Bus called the memory interface. 1.1 What is DRAM? Let us talk first of RAM technology in general. All memory whether it is the CPU RAM or the GPU VRAM are mostly based on DRAM technology which consists of a capacitor and a transistor. The capacitor’s charge represents the data stored. Due to its very nature, this charge gradually leaks. To prevent data loss, a refresh circuit periodically rewrites the data back, restoring its charge. Hence the name — Dynamic RAM due to these preiodic refreshes. Most computers use Synchronous DDR5 DRAM’s as their CPU RAMs. Synchronous because it utilizes the system clock for better performance. In other words the action (of retrieving & storing data) is operationally coordinated by an external clock signal. Tying the operations to the clock makes it faster. The processor knows the exact timing & number of cycles in which the data will be available from the RAM to the bus & can plan better. We have DDR1 (1st Gen Double Data Rate Synchronous Dynamic RAM released in 2000) to DDR5 which is the choice of CPU RAM as of today. 1.2 What is SGRAM? Let us now talk about the VRAMs in GPUs. The VRAM is a type of SGRAM — Synchronous Graphics RAM. The current generation of VRAMs being used is GDDR6. Yes, this is 6th generation GDDR, the G standing for “Graphics”. While DDR & GDDR share common origins and early couple of generations were similar, the branches separated after DDR3. So as of 2025, DDR5 rules in CPU RAM and GDDR6 rules for consumer-grade GPU RAMs. Conceptually DDRs and GDDRs are similar but note that DDRs are used by CPUs which need low latency whereas GDDRs are used by GPUs which are OK to compromise latency for extremely high throughput. Crudely, the former has more frequent smaller calculations & the latter deals with much higher volume of data & some delays are forgiven considering the vast volumes of data being processed. Even more crudely, the former is a bullet train with 6–8 coaches while the latter a 3 Kilometre long goods train. 1.3 GDDR VRAMs explained in detail GDDR memory are individual chips soldered to the PCB (Printed Circuit Board) very close to the GPU die. The physical proximity improves the speed of data transfer from the VRAM to the GPU processor. There are pins in a GDDR which can be thought of as individual wires that connect it to the processor. Bus width is literally the number of such connections. GDDR6 has 32 pins spread across 2 channels with roughly 16 Gbits.p.s bandwidth per pin. Bandwidth is total amount of data being moved & if you had one single metric at your disposal to take a decision, it would be this. Before we go further, let us try to understand this metric intuitively. 1.4 Calculating GPU Memory Bandwidth intuitively Memory Bandwidth is the max rate at which data can be transferred between the GPU and the VRAM. We discussed that data transmission is synchronized with the clock. The clock cycle is measured in hertz & represents the number of cycles per second. Let us say we have a clock operating at 1000 MHz. This literally means 1 billion clock ticks per second. How long does a tick last? Literally 1/(1 billion) i.e. 1 nano second. Data is sent to and fro every clock cycle. So every nano-second, a bus-full of data is sent from the VRAM to the processor & vice versa. How many seats on the bus? Well, we discussed this earlier… This is the memory interface or the bus width… literally the physical count of bits that fit into the bus. A 128-bit bus would ferry 128 bits every nano-second. The D in G’D’DR6 stands for Double. Basically, data is transmitted on both the rising and falling edges of the clock cycle, so 256 bits every nano-second. How many bytes in 1 sec? 256/8 i.e. 32 billion bytes per second or better still 32 GB/s as Giga is the preferred term when measuring data. The capital B denotes bytes whereas the small b denotes bits… a source of confusion. A more practical formula is: Bandwidth = Clock * Bus Width x Data Rate, where the Data Rate is the number of data transfers per cycle. GDDR6 is Double Data Rate (as just discussed) and Quad pumped, which quadruples the (doubled) speed. So effectively the Data Rate is 8. Sometimes, you may encounter the same information crouched in different semantics. E.g., if frequency of command clock (CK#) is N, then the write command clock (WK#) is 2N. GDDR6 rates then are QDR (quad data rate) in reference to WK# and ODR (Octal Data Rate) in reference to the CK#. Some OEMs multiply the clock speed & data rate & call it a clock rate or something. In that case, the bandwidth is simply that number multiplied by the bus width. In general, this raw formula can be used: num_of_transfers per second * num_of_bits per transfer / 8. “Boost clock” mechanism allows the GPU and GDDR memory to operate at even higher speeds than the default clock when conditions allow it. Boost clock metric refers to the max such operating clock speed. A 1750 MHz clock means: 1.75GHz is the frequency of command clock(CK#). The frequency of the write clock (WK#) is 3.5GHz due to the G”D”DR The Quad pumping takes it to 3.5*4=14 G bits moved in 1 second from each pin on the bus. We could have bus widths of up to 384 bits! So we get a bandwidth of 14*384 Giga bits per second. Divide by 8 to get 672 GB/s. GDDR6 bandwidth can go upto 1 TB/s. Wow! 1.5 What is HBM VRAM in a GPU? When reading or writing data, contention is created when the VRAM has occupied memory channels & is busy receiving or delivering other data. This contention creates latency & this affects bandwidth. Increasing the number of memory channels is a great option. A type of memory called HBM (High-Bandwidth Memory) has lower access latency than GDDR6, since it has 8-memory channels versus 2 channels in GDDR6. HBM also has a wider bus. HBM has 1024 pins spread across 8 channels of 128 pins with roughly 2 Gbits.p.s bandwidth per pin. Compare this with (an equivalent) GDDR which has 32 pins spread across 2 channels with roughly 16 Gbits. p.s bandwidth per pin. Notice how HBM keeps the Gbit/sec per pin much lower than GDDR. This saves power (which is important as we shall see). In spite of this, it has higher bandwidth than GDDR6 due to the wider bus & higher channels. As we discussed, a pin is literally a wire connecting the VRAM to the processor. Having 1024 wires connected from the processor to the VRAM is not possible on a standard PCB. Therefore, an “interposer” is used as an intermediary to connect the VRAM & the processor. Just like a regular IC, wires (connections) are etched in this silicon “interposer” in the desired quantity. After this, the HBM device(s) & the processor are mounted atop this “interposer”. The slightly twisted workaround is called a 2.5D architecture.Another difference is that while GDDR chips are soldered to the PCB surrounding the GPU die, an HBM structure is a vertical stack of DRAMs like a high rise building. The stacked memory dies are linked using microscopic wires with TSV (Through-Silicon Vias) which are vertical electrical connections giving super fast connectivity between the DRAMs. There are huge challenges to stacking items vertically especially around designing heat sinks & managing thermal safety but somehow HBM manufacturers have made this happen. HBM has become a gold standard today for AI data centers. It was introduced to the Market by SK Hynix in 2013. Today, we have the 3rd generation HBM3 and their main client is Nvidia. Due to investments made way back, SK Hynix is leading the pack along with Samsung and a relatively recent entrant named Micron. We hear a lot about chips and TSMC but HBM is a key technology to watch out for in the coming years. We typically have more than one HBM devices inside the GPU die. GDDR6 co-exists with HBM3. The markets are complementary. The former addresses PCs & other consumer GPUs whereas the latter addresses data center GPUs. Ultra large scale AI deployments like ChatGPT likely leverage the use of a cluster of NVIDIA GPUs working in tandem. Connecting such GPU’s involves the use of NVIDIA NVLink technology which requires fast GPU memory bandwidth speeds and it’s the reason why HBM is prevalent in such systems. If not for the wide bus width and fast data transfer rates offered by HBM, these kind of clusters would be very difficult to design. Besides the VRAM, GPUs also include high-speed memory caches that are even closer to the GPU’s processing cores. There is a physical limit to the sizes of these caches. An L1 cache is usually in KB and an L2 cache is usually a few MB. Different hardware & software strategies exist to keep the most useful, and most reused data present in caches. 2. Cooling Mechanisms in a GPU Higher clock speeds generally result in increased heat generation necessitating the need for cooling solutions to maintain optimal operating temperatures. Usual cooling methods are: Passive Cooling: These do not have any powered moving components. They take advantage of optimized airflow to take heat away. Fans are used to dissipate heat by blowing cool air across the heat sinks, which are metal components designed to absorb & disperse heat In water cooling, water is circulated through the GPU surface using pipes & a radiator. The hot liquid running through the pipes is in turn cooled down by the radiator fan. Hybrid cooling — which uses a combination of the above 3. GPU Computation cores — Processors Let us now talk about the processors on the GPU. Unlike CPUs which contain only a few cores, the GPU literally has 1000’s of cores & specializes in running tasks in parallel across these cores using SIMD (Single Instruction, Multiple Data) units. Let us stick to NVIDIA terminology. There are multiple processing units called Streaming Multiprocessor (SM) on a NVIDIA GPU. For e.g. an H100 has upto 144 SMs. What is inside an SM? Well there are mainly 2 type of execution units — CUDA cores & Tensor cores. There is also a small memory SRAM which is Shared between all threads running in that SM. More specifically, every SM has a few KB memory that is partitioned between L1 cache & Shared Memory usage. 3.1 CUDA core versus Tensor core in a GPU — The difference Tensor cores are a pretty recent innovation (from V100 onwards) and are specifically designed for faster matrix multiplication. Let us discuss CUDA cores first. These are the computation engines for regular math operations. Each CUDA core can execute one operation per clock cycle. But their strength lies in parallel processing. Many CUDA cores working together can accelerate computation by executing processes in parallel. Tensor Cores are specialized hardware units designed to accelerate “mixed precision” training. The earliest version allowed 4×4 FP16 matrices to be multiplied & added to an FP32 output matrix. By using lower-precision FP16 inputs in the computations, the calculations are vastly accelarated & by retaining FP32 outputs for the rest of the procedure, accuracy is not compromised too much. Modern tensor cores use even lower precision formats in DL computations. See this for more details. There may also specialized units like the transformer engine designed to accelerate models built with the Transformer blocks. A single GPU can be partitioned into multiple fully contained and isolated instances, with their own memory, cache & cores via MIG or Multi Instance GPU technology. 3.2 GPU operations — A FLOP show Let us now talk about actual operations. A FLOP (Floating Point Operation) is a single floating-point calculation like an addition. Performance of a GPU is usually measured in TeraFLOP/s. Tera is a trillion, FLOP stands for floating-point operations and the ‘s’ stands for per second. Most matrix ops involve a multiply and an add. It makes sense to fuse these ops together to get an Fused Multiply-Add (FMA) op. If we know the FMA speed, we can simply double it to get the FLOP counts per clock. To get the peak FLOP/s rate, we multiply this by the clock rate & the number of SMs. Note that we have FP16, FP32, FP64 & Int8 cores with varying speeds. For e.g.: Say there are 4 tensor cores in each SM & 114 SMs in an H100 Say each tensor core delivers 512 FP16 FMA ops per clock. Careful here: Read the specs clearly to check whether the FMA ops per clock metric is per SM or per individual core. For e.g., this link of A100 is per coreper SM Let the Clock speed = 1620 MHz So TFLOP/s = 1620 * (2*512) * 4 * 114= 756 TFLOP/s of performance! 756 Trillion operations per second. Wow! What would Babbage say to that? 4. Putting everything together — LLM Operations in a GPU Given this immense compute-power, we can now make a reasonable guess that LLM inference is memory-I0 bound, not compute bound. In other words, it takes more time to load data to the GPU’s compute cores than it does for those cores to perform LLM computations on that data. The processing itself is super-fast & there is enough & more compute power available. To start with, the training data needs to be downloaded from a remote source to the CPU memory From there, it needs to be transferred to the GPU via the system bus and PCIe bus. The host(CPU)-to-device(GPU) bandwidth is limited by the CPU frequency, PCIe bus, GPU devices & the number of PCIe lanes available. Once the data & weights are in the GPU VRAM, they are then ferried across to the SRAM where the processors perform operations on it. After the operation the data is moved back to the VRAM & from there it is moved back to the CPU RAM. This is a rather simplistic view. Inside the GPU, the tensors are repeatedly moved back and forth between VRAM & SRAM (the memory allocated to an SM). Can you guess why? We saw that SRAM size is in KB so large matrices are not going to fit in there … which explains why there is a constant movement between VRAM which holds all the tensors and SRAM which holds the data on which compute operations are performed. So there is typically a memory-op where tensors are moved from VRAM to SRAM, then a compute-op SRAM and memory-op to move tensors back from SRAM to VRAM. Computations like a matrix multiplication involving 2 large matrices need several such memory + compute ops before the action is completed. During the training of GPT-3, the tensor cores on the GPUs used were found to be idle ~50% of the time. So, to extract the best from the infrastructure, data movement needs to be fast enough to ensure the computation cores are kept reasonably occupied. Surely, there is scope for some smart person to come up with shortcuts. Enter Flash attention & other such hacks. But that is a story for another day! 5. Linking GPUs for LLM training — Topologies While LLM inferencing is manegable with a readymade collection of GPUs such as a DGX server (contains 8 H100s), LLM training needs far more GPUs. Before we discuss how to connect GPUs for larger workloads, it makes sense to see how CPU servers are connected in a datacentre. I am not an expert in this area, so please feel free to point out any incorrect interpretations I may have made from the references I quote. 5.1 Generic concepts on linking processors Each server has a card attached to it called the Network Interface Card (NIC). RDMA technology enables direct memory access to a remote server via the NIC hardware. RoCE (RDMA over Converged Ethernet) protocol uses the RDMA technology & adapts it to Ethernet networks. So now, a server can talk to a remote server over a network. A network switch is a device connecting multiple servers in a network, enabling them to communicate with each other. This is the basic technology. Now let us come to the topology. So we assemble all the servers physically in one place and pile them up vertically them in neat racks.A very basic topology is to connect each server in a rack to a switch that usually sits on Top of the Rack, aptly named the ToR switch. The ToR switches of different racks are connected to a Spine switch. This topology is a basic implementation of Clos topology — named after Charles Clos who invented this scheme to originally arrange telephone nodes in a “leaf-n-spine” arrangement. The leaf switches are nothing but the ToR switches in modern data centers. Source: Fig 1–1 from https://www.oreilly.com/library/view/bgp-in-the/9781491983416/ch01.html Fat tree is a variant of Clos. Like before, we have servers arranged into racks connecting to Top-of-the-Rack (ToR) switches. ToR switches are connected to the aggregation switches to provide connectivity across racks, forming a pod. The pods are interconnected with spine switches, allowing any-to-any communication across servers. To be noted is the fact that there are multiple paths connecting servers. So there is lot of redundancy built-in. In a typical App deployment running hundreds of microservices on dozens of servers, it is useful to have such fully connected, high bandwidth networks. You never know who is going to talk to whom so it never hurts to overprovision on bandwidth and connectivity. However, network loads during AI training do not follow these patterns. They are more predictable & this allows us to build optimized, cheaper & less power-hungry networks. 5.2 Linking GPUs via proprietary technology like NVLink We can strap together H100’s by leveraging the proprietary NVLink & NVSwitch technologies. NVLink provides the high-speed connection between individual GPUs, while NVSwitch is a chip that enables multiple GPUs to communicate through NVLink, forming a high-bandwidth network. See this nice article for details. NVIDIA’s P100 GPU introduced the NVLink1. At that time there was no NVSwitch chip, and the GPUs were connected in a ring-like configuration, which resulted in a lack of direct point-to-point communication between GPUs. The NVSwitch1 chip was introduced with the V100, followed by the NVSwitch2 chip with the A100 GPU. We are in the third-generation NVSwitch3 which can support a cluster of up to 256 H100 GPUs. Each H100 GPU in such a cluster is connected to the internal NVSwitch3 chip through 18 NVLink4.0 connections. This is how trillion parameter LLMs are inferenced. 5.3 Linking GPUs via RoCE in a rail-optimized topology But as they say, ye dil mange more… Meta reportedly trains its newer models on a cluster that’s over 100K H100’s. Phew! How to they manage to link it all up? The standard NVLink tricks can only scale to a limited number of GPUs. Beyond that, we have to use the network topologies discussed earlier & fall back on technologies like RoCE, which allows data to be directly transferred from one GPU’s memory to another without involving the CPU. So you have 8 GPUs in one DGX server. You have several such DGX servers in the data centre. Each GPU is assigned a NIC (yes!) & connected via RDMA to all other GPUs thru’ a variant of Clos network called “rail-optimized network”. The idea here is to set up dedicated connections between groups of GPUs with rail switches. If a GPU wants to communicate with a GPU which is in a different group, then it has to go thru’ the spine switch (which takes a lil more time). To implement this, each GPU in a DGX server is indexed serially. A rail is the set of GPUs with the same index on different servers & these are interconnected with a rail switch via RDMA. These rail switches are subsequently connected to spine switches forming any-to-any GPU network. Source: Fig 1 from https://arxiv.org/pdf/2307.12169 This topology streamlines traffic flow. It is like having dedicated lanes for high speed vehicles instead of generally mixing all traffic together. Rail paths are direct connections between a bunch of GPUs with same index. Spine switches serve as the connecting points for differently-indexed GPUs. For e.g., communication between GPU1 of server 1 and GPU1 of server 2 happens via their dedicated rail switch 1. If GPU1 of server 1 needs to reach GPU5 of another server, it has to go thru’ a spine switch. The workloads are designed so as to minimize data transfers across rails (since it has to go thru’ the extra spine switch). The good news is that this can be neatly done for AI training ensuring that most of the traffic stays within the rails, and does not cut across. In fact, there is a recent paper which suggests that you can consider removing costly spine switches altogether as inter-rail communication is minimal. Can you guess how? 5.4 Linking GPUs via RoCE in a rail-only topology Well, we have the superfast connectivity using NVLink to communicate between a limited set of GPUs (upto 256). So you create these High Bandwith (HB) domains which use NVLink for communication. You have several such HB domains. We then have the same indexing system and rail connections to interconnect the HB domains. But there are no spine switches! Can you guess how GPU1 of HB domain 1 can talk to GPU5 of another HB domain? Yes! Transfer data via superfast NVLink to GPU5 of HB domain 1 first. Then use the dedicated rail of GPU5 to talk to the GPU5 in another HB domain! This is a rail-only topology as oppsed to rail-optimized topology! Given these topologies, we can now plan the training pipeline to have pipeline parallelism, tensor parallelism &/or data parallelism but that is a story for another day. See this, this & this for more details. 100K H100’s consume a LOT of power. Tech companies are exploring nuclear power options to generate clean energy needed for long term sustenance. Else, a 100K GPU cluster may have to be broken down to smaller clusters and connected using optical transceivers across the buildings in a campus. This (unplanned) article is a prelude to — Optimizing LLM inference: Key Faultlines & workarounds. To deeply understand how we can optimize LLM operations, we need to understand more about the silicon on which they are executed. Though there are lots of manuals/guides on individual aspects like memory, processors, networking etc, I couldn’t find a concise and reader-friendly thread linking together these various aspects & hence took a shot. This is the 9th of a 15-series article titled My LLM diaries. LLM Quantization — From concepts to implementation LoRA & its newer variants explained like never before In-Context learning: The greatest magic show in the kingdom of LLMs RAG in plain English — Summary of 100+ papers HNSW — Story of the world’s most popular Vector search algorithm VectorDB origins, Vamana & on-disk Vector search algorithms Taming LLMs — A study of few popular techniques Understanding LLM Agents: Concepts, Patterns & Frameworks Anatomy of a GPU — A peek into the hardware fuelling LLM operations Optimizing LLM Inference — Key Faultlines & workarounds LLM Serving — Architecture considerations LLM evaluation & other odds and ends Look Ma, LLMs without Prompt Engineering LLMs on the laptop — A peek into the Silicon Taking a step back — On model sentience, conscientiousness & other philosophical aspects Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI المصدر: https://towardsai.net/p/machine-learning/gpu-architecture-working-intuitively-explained
    TOWARDSAI.NET
    GPU Architecture & Working intuitively explained
    Author(s): Allohvk Originally published on Towards AI. GPU Origins The image displayed on a computer screen is made up of millions of tiny pixels. In early days, “graphics controllers” were given instructions by the CPU on how to calculate the individual pixel values so that the appropriate image could be displayed. These were ok for conventional displays but for a really good gaming experience, images need to be built dozens of times per second. The CPU was not really designed to handle these kind of loads. The whole process of creating the image could be parallelized big-time simply by (a) dividing the image into smaller blocks (b) carrying out computations for each block in parallel & (c) grouping them back again. The results of one block don’t influence the results of the other blocks. CPU’s multi-threading capabilities was not really conceived for such massive parallelization. Enter the GPU! Sony first used the term GPU in 1994, in its PlayStation consoles. The technology was perfected by NVIDIA which soon became a leader. GPUs have numerous computation cores (much more than a CPU) and gaming programmers could write Shaders — programs to run graphics computations on the GPU in a massively parallelized way to create the screen images in super-fast time. The GPU is inspired by the CPU but was specifically designed to enable massive multi-threaded operations on its numerous computation cores seamlessly. Creating threads, switching between threads etc is much faster on a GPU. Some smart developers also realized that these parallel processing capabilities could be used for other computationally intensive tasks as well! 2005: Steinkrau implements a simple 2-layer Neural Net on a GPU 2006: Kumar et. al. trains a CNN model for document processing 2007: NVIDIA released Compute Unified Device Architecture (CUDA) — a custom language extending C to exploit data parallelism on GPUs. Now developers had much more granular control over the image rendering. 2008 a landmark paper by Raina et al was released. This paper pretty much showed everyone how to train deep layers on a GPU 2014: NVIDIA released CuDNN — a dedicated CUDA library for Deep Learning. Very soon PyTorch, TensorFlow etc incorporated CuDNN, setting the stage for modern GPU usage for AI! A GPU is an ASIC or Application-Specific Integrated Circuit having a processor (hosting numerous computational cores), a memory soldered onto it (we want to avoid going to the CPU RAM for everything), a cooling system (well, they heat up pretty fast) and a BIOS chip (same role as a CPU — to store settings, run startup diagnostics etc). This card is then plugged into the motherboard slot using the PCI Express interface. The terms GPU and graphics card are often used interchangeably. Some GPUs like the one in Apple M3 do not have a dedicated memory but instead use the system RAM itself which is possible due to its unique design. Google has the TPU (Tensor Processing Unit) which is its own ASIC. We discuss the GPU memory, the processing cores, the LLM workflows happening inside them & common topologies for clustering. Photo by Thomas Foster on Unsplash 1. GPU Memory module — The VRAM Instead of having the GPU talk to the regular RAM, it made sense to create another RAM physically closer to the GPU die so that data retrieval is faster. So a graphics card has a memory called VRAM — Video Random Access Memory in addition to the computation engines . VRAM is connected to the computation engine cores via a Bus called the memory interface. 1.1 What is DRAM? Let us talk first of RAM technology in general. All memory whether it is the CPU RAM or the GPU VRAM are mostly based on DRAM technology which consists of a capacitor and a transistor. The capacitor’s charge represents the data stored. Due to its very nature, this charge gradually leaks. To prevent data loss, a refresh circuit periodically rewrites the data back, restoring its charge. Hence the name — Dynamic RAM due to these preiodic refreshes. Most computers use Synchronous DDR5 DRAM’s as their CPU RAMs. Synchronous because it utilizes the system clock for better performance. In other words the action (of retrieving & storing data) is operationally coordinated by an external clock signal. Tying the operations to the clock makes it faster. The processor knows the exact timing & number of cycles in which the data will be available from the RAM to the bus & can plan better. We have DDR1 (1st Gen Double Data Rate Synchronous Dynamic RAM released in 2000) to DDR5 which is the choice of CPU RAM as of today. 1.2 What is SGRAM? Let us now talk about the VRAMs in GPUs. The VRAM is a type of SGRAM — Synchronous Graphics RAM. The current generation of VRAMs being used is GDDR6. Yes, this is 6th generation GDDR, the G standing for “Graphics”. While DDR & GDDR share common origins and early couple of generations were similar, the branches separated after DDR3. So as of 2025, DDR5 rules in CPU RAM and GDDR6 rules for consumer-grade GPU RAMs. Conceptually DDRs and GDDRs are similar but note that DDRs are used by CPUs which need low latency whereas GDDRs are used by GPUs which are OK to compromise latency for extremely high throughput. Crudely, the former has more frequent smaller calculations & the latter deals with much higher volume of data & some delays are forgiven considering the vast volumes of data being processed. Even more crudely, the former is a bullet train with 6–8 coaches while the latter a 3 Kilometre long goods train. 1.3 GDDR VRAMs explained in detail GDDR memory are individual chips soldered to the PCB (Printed Circuit Board) very close to the GPU die. The physical proximity improves the speed of data transfer from the VRAM to the GPU processor. There are pins in a GDDR which can be thought of as individual wires that connect it to the processor. Bus width is literally the number of such connections. GDDR6 has 32 pins spread across 2 channels with roughly 16 Gbits.p.s bandwidth per pin. Bandwidth is total amount of data being moved & if you had one single metric at your disposal to take a decision, it would be this. Before we go further, let us try to understand this metric intuitively. 1.4 Calculating GPU Memory Bandwidth intuitively Memory Bandwidth is the max rate at which data can be transferred between the GPU and the VRAM. We discussed that data transmission is synchronized with the clock. The clock cycle is measured in hertz & represents the number of cycles per second. Let us say we have a clock operating at 1000 MHz. This literally means 1 billion clock ticks per second. How long does a tick last? Literally 1/(1 billion) i.e. 1 nano second. Data is sent to and fro every clock cycle. So every nano-second, a bus-full of data is sent from the VRAM to the processor & vice versa. How many seats on the bus? Well, we discussed this earlier… This is the memory interface or the bus width… literally the physical count of bits that fit into the bus. A 128-bit bus would ferry 128 bits every nano-second. The D in G’D’DR6 stands for Double. Basically, data is transmitted on both the rising and falling edges of the clock cycle, so 256 bits every nano-second. How many bytes in 1 sec? 256/8 i.e. 32 billion bytes per second or better still 32 GB/s as Giga is the preferred term when measuring data. The capital B denotes bytes whereas the small b denotes bits… a source of confusion. A more practical formula is: Bandwidth = Clock * Bus Width x Data Rate, where the Data Rate is the number of data transfers per cycle. GDDR6 is Double Data Rate (as just discussed) and Quad pumped, which quadruples the (doubled) speed. So effectively the Data Rate is 8. Sometimes, you may encounter the same information crouched in different semantics. E.g., if frequency of command clock (CK#) is N, then the write command clock (WK#) is 2N. GDDR6 rates then are QDR (quad data rate) in reference to WK# and ODR (Octal Data Rate) in reference to the CK#. Some OEMs multiply the clock speed & data rate & call it a clock rate or something. In that case, the bandwidth is simply that number multiplied by the bus width. In general, this raw formula can be used: num_of_transfers per second * num_of_bits per transfer / 8. “Boost clock” mechanism allows the GPU and GDDR memory to operate at even higher speeds than the default clock when conditions allow it. Boost clock metric refers to the max such operating clock speed. A 1750 MHz clock means: 1.75GHz is the frequency of command clock(CK#). The frequency of the write clock (WK#) is 3.5GHz due to the G”D”DR The Quad pumping takes it to 3.5*4=14 G bits moved in 1 second from each pin on the bus. We could have bus widths of up to 384 bits! So we get a bandwidth of 14*384 Giga bits per second. Divide by 8 to get 672 GB/s. GDDR6 bandwidth can go upto 1 TB/s. Wow! 1.5 What is HBM VRAM in a GPU? When reading or writing data, contention is created when the VRAM has occupied memory channels & is busy receiving or delivering other data. This contention creates latency & this affects bandwidth. Increasing the number of memory channels is a great option. A type of memory called HBM (High-Bandwidth Memory) has lower access latency than GDDR6, since it has 8-memory channels versus 2 channels in GDDR6. HBM also has a wider bus. HBM has 1024 pins spread across 8 channels of 128 pins with roughly 2 Gbits.p.s bandwidth per pin. Compare this with (an equivalent) GDDR which has 32 pins spread across 2 channels with roughly 16 Gbits. p.s bandwidth per pin. Notice how HBM keeps the Gbit/sec per pin much lower than GDDR. This saves power (which is important as we shall see). In spite of this, it has higher bandwidth than GDDR6 due to the wider bus & higher channels. As we discussed, a pin is literally a wire connecting the VRAM to the processor. Having 1024 wires connected from the processor to the VRAM is not possible on a standard PCB. Therefore, an “interposer” is used as an intermediary to connect the VRAM & the processor. Just like a regular IC, wires (connections) are etched in this silicon “interposer” in the desired quantity. After this, the HBM device(s) & the processor are mounted atop this “interposer”. The slightly twisted workaround is called a 2.5D architecture.Another difference is that while GDDR chips are soldered to the PCB surrounding the GPU die, an HBM structure is a vertical stack of DRAMs like a high rise building. The stacked memory dies are linked using microscopic wires with TSV (Through-Silicon Vias) which are vertical electrical connections giving super fast connectivity between the DRAMs. There are huge challenges to stacking items vertically especially around designing heat sinks & managing thermal safety but somehow HBM manufacturers have made this happen. HBM has become a gold standard today for AI data centers. It was introduced to the Market by SK Hynix in 2013. Today, we have the 3rd generation HBM3 and their main client is Nvidia. Due to investments made way back, SK Hynix is leading the pack along with Samsung and a relatively recent entrant named Micron. We hear a lot about chips and TSMC but HBM is a key technology to watch out for in the coming years. We typically have more than one HBM devices inside the GPU die. GDDR6 co-exists with HBM3. The markets are complementary. The former addresses PCs & other consumer GPUs whereas the latter addresses data center GPUs. Ultra large scale AI deployments like ChatGPT likely leverage the use of a cluster of NVIDIA GPUs working in tandem. Connecting such GPU’s involves the use of NVIDIA NVLink technology which requires fast GPU memory bandwidth speeds and it’s the reason why HBM is prevalent in such systems. If not for the wide bus width and fast data transfer rates offered by HBM, these kind of clusters would be very difficult to design. Besides the VRAM, GPUs also include high-speed memory caches that are even closer to the GPU’s processing cores. There is a physical limit to the sizes of these caches. An L1 cache is usually in KB and an L2 cache is usually a few MB. Different hardware & software strategies exist to keep the most useful, and most reused data present in caches. 2. Cooling Mechanisms in a GPU Higher clock speeds generally result in increased heat generation necessitating the need for cooling solutions to maintain optimal operating temperatures. Usual cooling methods are: Passive Cooling: These do not have any powered moving components. They take advantage of optimized airflow to take heat away. Fans are used to dissipate heat by blowing cool air across the heat sinks, which are metal components designed to absorb & disperse heat In water cooling, water is circulated through the GPU surface using pipes & a radiator. The hot liquid running through the pipes is in turn cooled down by the radiator fan. Hybrid cooling — which uses a combination of the above 3. GPU Computation cores — Processors Let us now talk about the processors on the GPU. Unlike CPUs which contain only a few cores, the GPU literally has 1000’s of cores & specializes in running tasks in parallel across these cores using SIMD (Single Instruction, Multiple Data) units. Let us stick to NVIDIA terminology. There are multiple processing units called Streaming Multiprocessor (SM) on a NVIDIA GPU. For e.g. an H100 has upto 144 SMs. What is inside an SM? Well there are mainly 2 type of execution units — CUDA cores & Tensor cores. There is also a small memory SRAM which is Shared between all threads running in that SM. More specifically, every SM has a few KB memory that is partitioned between L1 cache & Shared Memory usage. 3.1 CUDA core versus Tensor core in a GPU — The difference Tensor cores are a pretty recent innovation (from V100 onwards) and are specifically designed for faster matrix multiplication. Let us discuss CUDA cores first. These are the computation engines for regular math operations. Each CUDA core can execute one operation per clock cycle. But their strength lies in parallel processing. Many CUDA cores working together can accelerate computation by executing processes in parallel. Tensor Cores are specialized hardware units designed to accelerate “mixed precision” training. The earliest version allowed 4×4 FP16 matrices to be multiplied & added to an FP32 output matrix. By using lower-precision FP16 inputs in the computations, the calculations are vastly accelarated & by retaining FP32 outputs for the rest of the procedure, accuracy is not compromised too much. Modern tensor cores use even lower precision formats in DL computations. See this for more details. There may also specialized units like the transformer engine designed to accelerate models built with the Transformer blocks. A single GPU can be partitioned into multiple fully contained and isolated instances, with their own memory, cache & cores via MIG or Multi Instance GPU technology. 3.2 GPU operations — A FLOP show Let us now talk about actual operations. A FLOP (Floating Point Operation) is a single floating-point calculation like an addition. Performance of a GPU is usually measured in TeraFLOP/s. Tera is a trillion, FLOP stands for floating-point operations and the ‘s’ stands for per second. Most matrix ops involve a multiply and an add. It makes sense to fuse these ops together to get an Fused Multiply-Add (FMA) op. If we know the FMA speed, we can simply double it to get the FLOP counts per clock. To get the peak FLOP/s rate, we multiply this by the clock rate & the number of SMs. Note that we have FP16, FP32, FP64 & Int8 cores with varying speeds. For e.g.: Say there are 4 tensor cores in each SM & 114 SMs in an H100 Say each tensor core delivers 512 FP16 FMA ops per clock. Careful here: Read the specs clearly to check whether the FMA ops per clock metric is per SM or per individual core. For e.g., this link of A100 is per coreper SM Let the Clock speed = 1620 MHz So TFLOP/s = 1620 * (2*512) * 4 * 114= 756 TFLOP/s of performance! 756 Trillion operations per second. Wow! What would Babbage say to that? 4. Putting everything together — LLM Operations in a GPU Given this immense compute-power, we can now make a reasonable guess that LLM inference is memory-I0 bound, not compute bound. In other words, it takes more time to load data to the GPU’s compute cores than it does for those cores to perform LLM computations on that data. The processing itself is super-fast & there is enough & more compute power available. To start with, the training data needs to be downloaded from a remote source to the CPU memory From there, it needs to be transferred to the GPU via the system bus and PCIe bus. The host(CPU)-to-device(GPU) bandwidth is limited by the CPU frequency, PCIe bus, GPU devices & the number of PCIe lanes available. Once the data & weights are in the GPU VRAM, they are then ferried across to the SRAM where the processors perform operations on it. After the operation the data is moved back to the VRAM & from there it is moved back to the CPU RAM. This is a rather simplistic view. Inside the GPU, the tensors are repeatedly moved back and forth between VRAM & SRAM (the memory allocated to an SM). Can you guess why? We saw that SRAM size is in KB so large matrices are not going to fit in there … which explains why there is a constant movement between VRAM which holds all the tensors and SRAM which holds the data on which compute operations are performed. So there is typically a memory-op where tensors are moved from VRAM to SRAM, then a compute-op SRAM and memory-op to move tensors back from SRAM to VRAM. Computations like a matrix multiplication involving 2 large matrices need several such memory + compute ops before the action is completed. During the training of GPT-3, the tensor cores on the GPUs used were found to be idle ~50% of the time. So, to extract the best from the infrastructure, data movement needs to be fast enough to ensure the computation cores are kept reasonably occupied. Surely, there is scope for some smart person to come up with shortcuts. Enter Flash attention & other such hacks. But that is a story for another day! 5. Linking GPUs for LLM training — Topologies While LLM inferencing is manegable with a readymade collection of GPUs such as a DGX server (contains 8 H100s), LLM training needs far more GPUs. Before we discuss how to connect GPUs for larger workloads, it makes sense to see how CPU servers are connected in a datacentre. I am not an expert in this area, so please feel free to point out any incorrect interpretations I may have made from the references I quote. 5.1 Generic concepts on linking processors Each server has a card attached to it called the Network Interface Card (NIC). RDMA technology enables direct memory access to a remote server via the NIC hardware. RoCE (RDMA over Converged Ethernet) protocol uses the RDMA technology & adapts it to Ethernet networks. So now, a server can talk to a remote server over a network. A network switch is a device connecting multiple servers in a network, enabling them to communicate with each other. This is the basic technology. Now let us come to the topology. So we assemble all the servers physically in one place and pile them up vertically them in neat racks.A very basic topology is to connect each server in a rack to a switch that usually sits on Top of the Rack, aptly named the ToR switch. The ToR switches of different racks are connected to a Spine switch. This topology is a basic implementation of Clos topology — named after Charles Clos who invented this scheme to originally arrange telephone nodes in a “leaf-n-spine” arrangement. The leaf switches are nothing but the ToR switches in modern data centers. Source: Fig 1–1 from https://www.oreilly.com/library/view/bgp-in-the/9781491983416/ch01.html Fat tree is a variant of Clos. Like before, we have servers arranged into racks connecting to Top-of-the-Rack (ToR) switches. ToR switches are connected to the aggregation switches to provide connectivity across racks, forming a pod. The pods are interconnected with spine switches, allowing any-to-any communication across servers. To be noted is the fact that there are multiple paths connecting servers. So there is lot of redundancy built-in. In a typical App deployment running hundreds of microservices on dozens of servers, it is useful to have such fully connected, high bandwidth networks. You never know who is going to talk to whom so it never hurts to overprovision on bandwidth and connectivity. However, network loads during AI training do not follow these patterns. They are more predictable & this allows us to build optimized, cheaper & less power-hungry networks. 5.2 Linking GPUs via proprietary technology like NVLink We can strap together H100’s by leveraging the proprietary NVLink & NVSwitch technologies. NVLink provides the high-speed connection between individual GPUs, while NVSwitch is a chip that enables multiple GPUs to communicate through NVLink, forming a high-bandwidth network. See this nice article for details. NVIDIA’s P100 GPU introduced the NVLink1. At that time there was no NVSwitch chip, and the GPUs were connected in a ring-like configuration, which resulted in a lack of direct point-to-point communication between GPUs. The NVSwitch1 chip was introduced with the V100, followed by the NVSwitch2 chip with the A100 GPU. We are in the third-generation NVSwitch3 which can support a cluster of up to 256 H100 GPUs. Each H100 GPU in such a cluster is connected to the internal NVSwitch3 chip through 18 NVLink4.0 connections. This is how trillion parameter LLMs are inferenced. 5.3 Linking GPUs via RoCE in a rail-optimized topology But as they say, ye dil mange more… Meta reportedly trains its newer models on a cluster that’s over 100K H100’s. Phew! How to they manage to link it all up? The standard NVLink tricks can only scale to a limited number of GPUs. Beyond that, we have to use the network topologies discussed earlier & fall back on technologies like RoCE, which allows data to be directly transferred from one GPU’s memory to another without involving the CPU. So you have 8 GPUs in one DGX server. You have several such DGX servers in the data centre. Each GPU is assigned a NIC (yes!) & connected via RDMA to all other GPUs thru’ a variant of Clos network called “rail-optimized network”. The idea here is to set up dedicated connections between groups of GPUs with rail switches. If a GPU wants to communicate with a GPU which is in a different group, then it has to go thru’ the spine switch (which takes a lil more time). To implement this, each GPU in a DGX server is indexed serially. A rail is the set of GPUs with the same index on different servers & these are interconnected with a rail switch via RDMA. These rail switches are subsequently connected to spine switches forming any-to-any GPU network. Source: Fig 1 from https://arxiv.org/pdf/2307.12169 This topology streamlines traffic flow. It is like having dedicated lanes for high speed vehicles instead of generally mixing all traffic together. Rail paths are direct connections between a bunch of GPUs with same index. Spine switches serve as the connecting points for differently-indexed GPUs. For e.g., communication between GPU1 of server 1 and GPU1 of server 2 happens via their dedicated rail switch 1. If GPU1 of server 1 needs to reach GPU5 of another server, it has to go thru’ a spine switch. The workloads are designed so as to minimize data transfers across rails (since it has to go thru’ the extra spine switch). The good news is that this can be neatly done for AI training ensuring that most of the traffic stays within the rails, and does not cut across. In fact, there is a recent paper which suggests that you can consider removing costly spine switches altogether as inter-rail communication is minimal. Can you guess how? 5.4 Linking GPUs via RoCE in a rail-only topology Well, we have the superfast connectivity using NVLink to communicate between a limited set of GPUs (upto 256). So you create these High Bandwith (HB) domains which use NVLink for communication. You have several such HB domains. We then have the same indexing system and rail connections to interconnect the HB domains. But there are no spine switches! Can you guess how GPU1 of HB domain 1 can talk to GPU5 of another HB domain? Yes! Transfer data via superfast NVLink to GPU5 of HB domain 1 first. Then use the dedicated rail of GPU5 to talk to the GPU5 in another HB domain! This is a rail-only topology as oppsed to rail-optimized topology! Given these topologies, we can now plan the training pipeline to have pipeline parallelism, tensor parallelism &/or data parallelism but that is a story for another day. See this, this & this for more details. 100K H100’s consume a LOT of power. Tech companies are exploring nuclear power options to generate clean energy needed for long term sustenance. Else, a 100K GPU cluster may have to be broken down to smaller clusters and connected using optical transceivers across the buildings in a campus. This (unplanned) article is a prelude to — Optimizing LLM inference: Key Faultlines & workarounds. To deeply understand how we can optimize LLM operations, we need to understand more about the silicon on which they are executed. Though there are lots of manuals/guides on individual aspects like memory, processors, networking etc, I couldn’t find a concise and reader-friendly thread linking together these various aspects & hence took a shot. This is the 9th of a 15-series article titled My LLM diaries. LLM Quantization — From concepts to implementation LoRA & its newer variants explained like never before In-Context learning: The greatest magic show in the kingdom of LLMs RAG in plain English — Summary of 100+ papers HNSW — Story of the world’s most popular Vector search algorithm VectorDB origins, Vamana & on-disk Vector search algorithms Taming LLMs — A study of few popular techniques Understanding LLM Agents: Concepts, Patterns & Frameworks Anatomy of a GPU — A peek into the hardware fuelling LLM operations Optimizing LLM Inference — Key Faultlines & workarounds LLM Serving — Architecture considerations LLM evaluation & other odds and ends Look Ma, LLMs without Prompt Engineering LLMs on the laptop — A peek into the Silicon Taking a step back — On model sentience, conscientiousness & other philosophical aspects Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI
    0 Commentarii 0 Distribuiri