AI Training Benchmarks Push Hardware Limits

Since 2018, the consortium MLCommons is holding a sort of Olympics for AI training. Competition called MLPerfconsists of a set of tasks for training specific artificial intelligence modelson predefined data sets, with a certain accuracy. Essentially, these tasks, called tests, test how well the hardware and low-level software configuration is configured to train a particular AI model.

Twice a year, companies combine their applications – usually clusters of processors and GPUs and software optimized for them—and compete to see whose work can train models the fastest.

There is no doubt that cutting-edge AI training hardware has improved significantly since MLPerf's inception. Over the years, Nvidia has released four new generations GPUs that have since become the industry standard (Nvidia's latest Blackwell GPU has not yet become a standard, but is growing in popularity). Companies competing in MLPerf are also using larger clusters of GPUs to solve training problems.

However, the MLPerf criteria have also become stricter. And this increased rigor is by design: the tests are trying to keep up with the industry, says David Kanterhead of MLPerf. “These numbers need to be representative,” he says.

Interestingly, the data shows that large language models and their predecessors grew in size faster than the hardware could keep up. Thus, each time a new test is introduced, the fastest learning time increases. Hardware improvements then gradually reduce execution time, but the next test interferes with it again. Then the cycle repeats.

This article appears in the November 2025 print issue.

Articles from your site

Related articles on the Internet

Leave a Comment