- The system links remote sites to enable continuous execution of large training workloads.
- High-speed fiber keeps GPUs active, avoiding bottlenecks in slow data transfers.
- Double-tier chip density increases processing power while reducing rack-to-rack latency.
Microsoft introduced its first AI superfactory integrating large AIs data centers in Wisconsin and Atlanta via a dedicated fiber-optic network designed for high-speed transmission of training data.
The design places chips close together on two floors to increase density and reduce latency.
It also uses extensive cabling and fluid systems designed to manage the weight and heat generated by large clusters of equipment.
Network designed for large-scale model training
IN blog post,Microsoft said this configuration will support large AI workloads, which are different from the smaller, more isolated tasks common in cloud environments.
“It's about creating a distributed network that can act as a virtual supercomputer to solve the world's biggest problems,” said Alistair Spears, Microsoft's general manager of Azure infrastructure.
“The reason we call it an AI superfactory is because it does one complex job on millions of pieces of hardware… it's not just one site training AI models, it's a network of sites supporting that work.”
An AI WAN system transmits information over thousands of miles using dedicated fiber, some newly built and some repurposed from previous acquisitions.
Network protocols and architecture have been adjusted to shorten paths and ensure data transfer with minimal latency.
Microsoft says this allows remote locations to collaborate on the same model training process in near real-time, with each location contributing its own contribution to the calculation.
The focus is on maintaining continuous activity in large quantities GPUs so that no unit pauses while waiting for results from elsewhere.
“Leading in AI isn't just about adding more GPUs, it's about building the infrastructure that allows them to work together as one system,” said Scott Guthrie, Microsoft's executive vice president of Cloud+AI.
Microsoft uses the Fairwater layout to support high-bandwidth rack-mount systems, including Nvidia The GB200 NVL72 units are designed to scale to very large Blackwell GPU clusters.
The company combines this equipment with liquid cooling systems that send heated fluid outside the building and return it at lower temperatures.
Microsoft says that operational cooling uses virtually no new water, except for periodic replacement when necessary for chemical control.
The Atlanta site follows the layout of Wisconsin, providing consistent architecture across multiple regions as new facilities come online.
“To improve the capabilities of AI, you need to have more and more infrastructure to train it,” said Mark Russinovich, CTO, Deputy CISO and Technical Fellow at Microsoft Azure.
“The amount of infrastructure needed now to train these models is not one or two data centers, but multiples of that.”
The company positions these sites as specially created for training advanced specialists. Artificial Intelligence Toolsciting the growth in the number of parameters and increasing amounts of training data as key factors driving the expansion.
These objects include exabytes of memory and millions CPU kernels to support tasks associated with primary learning workflows.
Microsoft suggests this scale is needed for partners like OpenAI and its own AI Superintelligence team to continue developing models.
Follow TechRadar on Google News. And add us as your preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the “Subscribe” button!
And of course you can also Follow TechRadar on TikTok for news, reviews, unboxing videos and get regular updates from us on whatsapp too much.






