close
close
Kalmogorov-Arnold neural networks bring new momentum to AI

Artificial neural networks – algorithms inspired by biological brains – are at the core of modern artificial intelligence and are behind chatbots and image generators. But with their many neurons, they can become black boxes whose inner workings are incomprehensible to users.

Now researchers have developed a fundamentally new method for creating neural networks that in some ways outperform traditional systems. These new networks are more interpretable and also more accurate, proponents say, even if they are smaller. Their developers say the way they learn to represent physical data precisely could help scientists discover new laws of nature.

“It’s great to see that a new architecture is on the table.” –Brice Menard, Johns Hopkins University

For the past decade or more, engineers have optimized neural network designs mostly through trial and error, says Brice Ménard, a physicist at Johns Hopkins University who studies how neural networks work but was not involved in the new work, published on arXiv in April. “It’s great to see a new architecture on the table,” he says, especially one designed from scratch.

One can think of neural networks as being analogous to neurons or nodes and synapses or connections between these nodes. In traditional neural networks, called multi-layer perceptrons (MLPs), each synapse learns a weight – a number that determines how strong the connection is between these two neurons. The neurons are arranged in layers, so that a neuron from one layer receives inputs from the neurons in the previous layer, weighted by the strength of their synaptic connection. Each neuron then applies a simple function to the sum total of its inputs, called the activation function.

black text on white background with red and blue connecting lines on the left side and black connecting lines on the right side In traditional neural networks, sometimes called multilayer perceptrons (left), each synapse learns a number called a weight, and each neuron applies a simple function to the sum of its inputs. In the new Kolmogorov-Arnold architecture (right), each synapse learns a function, and the neurons sum the outputs of these functions.The NSF Institute for Artificial Intelligence and Fundamental Interactions

In the new architecture, synapses play a more complex role. Instead of simply learning how strong The connection between two neurons is that they learn the full nature of that connection – the function that maps input and output. Unlike the activation function that neurons use in the traditional architecture, this function could be more complex – in fact, a “spline” or a combination of several functions – and is different in each case. Neurons, on the other hand, become simpler – they simply sum the outputs of all their previous synapses. The new networks are called Kolmogorov-Arnold networks (KANs), after two mathematicians who studied how functions could be combined. The idea is that KANs offer more flexibility in learning how to represent data, using fewer learned parameters.

“It’s like an extraterrestrial life that sees things from a different perspective, but is also somehow understandable to humans.” —Ziming Liu, Massachusetts Institute of Technology

The researchers tested their KANs on relatively simple scientific tasks. In some experiments, they used simple physical laws, such as the rate at which two objects fly past each other at relativistic speeds. They used these equations to generate input-output data points, then for each physical function, trained a network on some of the data and tested it on the rest. They found that increasing the size of MLPs improves the performance of KANs faster than increasing the size of KANs. When solving partial differential equations, a KAN was 100 times as accurate as an MLP with 100 times as many parameters.

In another experiment, they trained networks to predict an attribute of topological nodes, called their signature, from other attributes of the nodes. An MLP achieved a test accuracy of 78 percent using about 300,000 parameters, while a KAN achieved a test accuracy of 81.6 percent using only about 200 parameters.

Furthermore, the researchers were able to visually represent the KANs and look at the shapes of the activation functions and the importance of each connection. Either manually or automatically, they were able to prune weak connections and replace some activation functions with simpler ones, such as sine or exponential functions. They were then able to summarize the entire KAN in an intuitive one-line function (including all component activation functions) and, in some cases, perfectly reconstruct the physical function that created the dataset.

“We hope it can be a useful tool for everyday scientific research in the future,” says Ziming Liu, a computer scientist at the Massachusetts Institute of Technology and lead author of the study. “We just feed a data set that we can’t interpret to a KAN and it can generate some hypotheses for you. You just stare at the brain (the KAN diagram) and you can even perform surgery on it if you want.” You could get some neat function. “It’s like alien life that looks at things from a different perspective, but still understandable to humans.”

Dozens of papers have already cited the KAN preprint. “I found it very exciting from the moment I saw it,” says Alexander Bodner, a computer science student at the University of San Andrés in Argentina. Within a week, he and three fellow students had combined KANs with convolutional neural networks (CNNs), a popular architecture for image processing. They tested their convolutional KANs for their ability to categorize handwritten digits or items of clothing. The best network performed roughly as well as a traditional CNN (99 percent accuracy for both networks on digits, 90 percent for both on clothing), but used about 60 percent fewer parameters. The datasets were simple, but Bodner says other teams with more computing power have begun scaling the networks. Others are combining KANs with transformers, an architecture popular for large language models.

One disadvantage of KANs is that they take longer to train per parameter – partly because they can’t take advantage of GPUs. But they use fewer parameters. Liu notes that even if KANs don’t replace giant CNNs and transformers for image and language processing, training time won’t be an issue at the smaller scales of many physics problems. He’s looking for ways for experts to bring their prior knowledge to KANs – for example, by manually selecting activation functions – and easily extract knowledge from them using a simple interface. One day, he says, KANs could help physicists discover high-temperature superconductors or methods for controlling nuclear fusion.

From your site articles

Related articles on the web

By Bronte

Leave a Reply

Your email address will not be published. Required fields are marked *