Researchers isolate memorization from reasoning in AI neural networks

Looking ahead, if information removal techniques are further developed in the future, artificial intelligence companies could potentially one day remove, say, copyrighted content, personal information, or malicious memorized text from a neural network without destroying the model's ability to perform transformative tasks. However, because neural networks store information in distributed ways that are still not fully understood, the researchers say their method “cannot guarantee complete removal of sensitive information.” These are the first steps in a new direction of AI research.

Journey through the neural landscape

To understand how the Goodfire researchers differentiate memorization from reasoning in these neural networks, it's helpful to know about an AI concept called “loss landscape.” Loss Landscape is a way of visualizing how wrong or right an AI model's predictions are as it adjusts its internal settings (called “weights”).

Imagine setting up a complex machine with millions of dials. “Loss” measures the number of errors a machine makes. High loss means many errors, low loss means few errors. The “landscape” is what you would see if you could determine the error rate for every possible combination of dialing settings.

During training, AI models essentially “roll down” in this landscape (gradient descent), adjusting their weights to find the valleys where they make the least mistakes. This process provides output to the AI model, such as answers to questions.

Figure 1 from the article “From Memorization to Reasoning in the Loss Curvature Spectrum.”

Credit:

Merullo et al.

The researchers analyzed the “curvature” of the loss landscape of specific AI language models, measuring how sensitive the model's performance is to small changes in the various weights of the neural networks. Sharp peaks and troughs represent high curvature (where small changes have large effects), while flat plains represent low curvature (where changes have minimal impact).

Using a technique called K-FAK (Approximate curvature taking into account the Kronecker factor) they found that individual remembered facts create sharp jumps in this landscape, but because each remembered element has peaks in a different direction, when averaged together they create a flat profile. Meanwhile, the reasoning abilities that many different inputs rely on maintain constant, moderate curves in the landscape, such as hills that remain roughly the same shape no matter the direction from which you approach them.

Researchers isolate memorization from reasoning in AI neural networks

Journey through the neural landscape

Leave a Comment Cancel reply

Politics

Senate approves shutdown ending legislation, sending bill to the House for a vote : NPR

Sports

John Cena wins WWE Intercontinental Championship

Science

AI power use forecast finds the industry far off track to net zero

Gaming

Today’s NYT Mini Crossword Answers for Nov. 11

Gaming

Into the Grid Hacks Its Way to Steam Early Access

Sports

Wiggins throws down alley-oop dunk at OT buzzer, Heat sink Cavaliers

Journey through the neural landscape

Leave a Comment Cancel reply

most recent

Politics

Senate approves shutdown ending legislation, sending bill to the House for a vote : NPR

Sports

John Cena wins WWE Intercontinental Championship

Science

AI power use forecast finds the industry far off track to net zero

Gaming

Today’s NYT Mini Crossword Answers for Nov. 11

Gaming

Into the Grid Hacks Its Way to Steam Early Access

Sports

Wiggins throws down alley-oop dunk at OT buzzer, Heat sink Cavaliers