OpenAI’s new LLM exposes the secrets of how AI really works

“As these artificial intelligence systems become more powerful, they will become more and more integrated into very important areas,” said Leo Gao, a research scientist at OpenAI. MIT Technology Review in an exclusive preview of the new work. “It’s really important to make sure they’re safe.”

This is still early research. The new model, called a low-weight transformer, is much smaller and much less efficient than top-tier mass-market models such as the company's GPT-5, Anthropic's Claude and Google DeepMind's Gemini. Gao says it's as capable as GPT-1, a model that OpenAI developed back in 2018 (though he and his colleagues haven't done a direct comparison).

But the goal isn't to compete with the best in class (at least not yet). Instead, by looking at how this experimental model works, OpenAI hopes to learn about the hidden mechanisms inside bigger and better versions of the technology.

It's an interesting study, says Elisenda Grigsby, a mathematician at Boston College who studies how graduate programs work and was not involved in the project: “I'm confident that the methods it will introduce will have a significant impact.”

Lee Sharkey, a research scientist at artificial intelligence startup Goodfire, agrees. “This work aims to achieve the right goal and seems well executed,” he says.

Why are models so difficult to understand?

OpenAI's work is part of a hot new area of research known as mechanistic interpretability, which attempts to map the internal mechanisms that models use to perform various tasks.

It's harder than it seems. LLMs are built on neural networks, which consist of nodes called neurons arranged in layers. In most networks, each neuron is connected to every other neuron in adjacent layers. Such a network is known as a dense network.

Dense networks are relatively efficient for learning and work, but they spread acquired knowledge across a vast network of connections. As a result, simple concepts or functions can be shared between neurons in different parts of the model. At the same time, individual neurons can also represent many different functions, a phenomenon known as superposition (a term borrowed from quantum physics). As a result, you cannot associate specific parts of the model with specific concepts.

OpenAI’s new LLM exposes the secrets of how AI really works

Why are models so difficult to understand?

Leave a Comment Cancel reply

Entertainment

Nardwuar Appointed to the Order of Canada │ Exclaim!

Technology

New on Prime Video Canada: January 2026

General News

Zohran Mamdani sworn in as NYC mayor in midnight ceremony at Old City Hall

Sports

Enzo Maresca considering his future at Chelsea Football Club — reports

Science

5 hidden Windows settings that are secretly slowing down your SSD

Gaming

Arc Raiders was Steam’s best-selling game over the holiday period

Why are models so difficult to understand?

Leave a Comment Cancel reply

most recent

Entertainment

Nardwuar Appointed to the Order of Canada │ Exclaim!

Technology

New on Prime Video Canada: January 2026

General News

Zohran Mamdani sworn in as NYC mayor in midnight ceremony at Old City Hall

Sports

Enzo Maresca considering his future at Chelsea Football Club — reports

Science

5 hidden Windows settings that are secretly slowing down your SSD

Gaming

Arc Raiders was Steam’s best-selling game over the holiday period