- Oracle claims its Zettascale10 system can achieve peak performance of 16 zettaflops
- The project uses approximately 800,000 Nvidia GPUs scattered across data centers.
- OpenAI Stargate cluster in Texas runs on new Oracle infrastructure
Oracle has announced what it calls the largest artificial intelligence supercomputer in the cloud, OCI. Zettascale10.
The company claims the system can deliver peak performance of 16 zettaflops per 800,000 Nvidia graphics processors.
This performance, when broken down, equals approximately 20 petaflops per GPU, which is roughly equivalent to the Grace Blackwell GB300 Ultra chip used in high-end desktop AI systems.
Network design for large-scale AI workloads
Oracle says the platform is the basis for the OpenAI Stargate cluster in Abilene, Texas, built to handle some of the most demanding artificial intelligence workloads currently emerging in research and commercial use.
“RoCE’s highly scalable, custom design maximizes network-wide performance at gigawatt scale while keeping most of the power focused on compute…” said Peter Hoeschele, Vice President of Infrastructure and Industrial Computing at OpenAI.
At the heart of Zettascale10 is the Oracle Acceleron RoCE network, designed to improve the scalability and reliability of artificial intelligence operations on large volumes of data.
This architecture uses NICs as mini-switches that interconnect GPUs across multiple isolated network planes.
The purpose of the design is to reduce the delay between GPUs and allow jobs to continue running if one network path fails.
“With Nvidia's full-featured AI infrastructure, OCI Zettascale10 provides the computing fabric needed to advance modern AI research and help organizations around the world move from experimentation to industrial AI,” said Ian Buck, Vice President, Hyperscale, Nvidia.
Oracle says this structure can reduce costs by simplifying the layers within the network while maintaining consistent performance across all nodes.
It also introduces linear connect and receive optics to reduce power consumption and cooling without sacrificing throughput.
While Oracle's numbers are impressive, the company has not provided independent verification of its 16 zettaflops claims.
Cloud performance metrics may vary depending on how throughput is calculated, and Oracle comparisons may be based on theoretical peaks rather than consistent performance.
Considering that the stated total number of systems is equal to the sum of 800,000 top-end GPUs, actual efficiency may be highly dependent on network design and software optimization.
Analysts may wait to see whether the configuration provides performance comparable to leading AI clusters already used by other major cloud providers.
Zettascale10 puts Oracle on par with other major players aiming to provide the infrastructure for the best GPUs and Artificial Intelligence Tools.
The company says customers can train and deploy large models in Oracle's distributed cloud environment, supported by data sovereignty measures.
Oracle also says Zettascale10 provides operational flexibility through independent maintenance at the plane level, allowing upgrades to be completed with less downtime.
“With OCI Zettascale10, we are combining Oracle's Acceleron RoCE OCI network architecture with Nvidia's next-generation AI infrastructure to deliver multi-gigawatt AI capabilities at unprecedented scale,” said Mahesh Thiagarajan, executive vice president of Oracle Cloud. Infrastructure.
“Customers will be able to build, train, and deploy their largest AI models in production using less energy, and have the freedom to run in Oracle's distributed cloud with trusted data and AI sovereignty…”
However, observers note that other providers are building their own large-scale GPU clusters and advanced cloud storage systems, which could narrow Oracle's advantage.
This system will be deployed next year, and only then will it be clear whether the architecture can meet the demand for scalable, efficient and reliable AI computing.
By using HPCWire
Follow TechRadar on Google News. And add us as your preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the “Subscribe” button!
And of course you can also Follow TechRadar on TikTok for news, reviews, unboxing videos and get regular updates from us on whatsapp too much.