从MLP到KAN-小萌小筑

KAN文档

MLP

初次学习MLP(Multilayer Perceptron)时，简直是对高等数学的复习。勉强攻克却又发觉，一切看似合理的设计，却又不时感受到随意和事后诸葛。对不合理的解释与似乎合理的说明纠缠在一起，形成了如今的赛博炼丹，尽管如今深度学习力压传统机器学习一头，但我仍然能感受到当时传统派对维新派的鄙夷（忆！悟！），毕竟这样似乎能无限完善的MLP是整个深度学习的基石。

最初符号主义人工智能思想占据主导地位，但不得不说这是一个伟大且具有挑战性的想法，因为我认为它更加理性。符号主义AI曾强调句法和语义的区分，但未能奏效。而今天的大语言模型在思想方面明显不同，却解决了这一向题，它门无需区分句法和语义，只需考虑文本，通过建立词汇与所有可能用法的概率联系，使用简单算法预测下一个最有可能的单词。而实现这一切仅仅是因为人类有了数据，有一个数亿或数万亿参数的大模型，
基于上下文的统计模型实际上并不理解文本，因此它们容易产生无意义的结果，而符号主义AI依赖于语义，这是一个明显的区别。
——图灵奖得主Joseph Sifakis

KAN初窥

KAN( Kolmogorov-Arnold Networks)其最大意义就是作为 MLP 最具潜力的替代品，提出了全新的架构思路。KAN 受到 Kolmogorov-Arnold 表示定理的启发，在以下几个方面展现了显著的优势：

增强的准确性与效率：与传统 MLP相比，KAN 能够使用更少的参数达到相同或更好的准确度，尤其是在数据拟合和偏微分方程求解等任务中。这意味着在处理复杂科学和数学问题时，KAN 可能提供更高效的解决方案。
可学习的激活函数：KAN 的一个核心创新点是将可学习的激活函数置于边（权重）上，而非节点（如 MLP）。这不仅允许模型学习到更复杂的函数关系，还使得每个权重参数由一个参数化的样条函数代替，从而提高了模型的表达能力。
增强的可解释性：KAN 的结构可以直观地被可视化，并且容易与人类用户交互，这有助于科学家们理解模型内部的工作原理，甚至直接参与到模型的优化和“发现”过程中。通过手动调整和简化 KAN，科学家们能够引导模型发现或验证数学与物理定律，促进 AI 与科学家之间的合作。
适应性和灵活性：利用样条基函数的内在局部性，KAN 支持适应性设计和训练，比如引入多级训练策略，提高模型的准确性和训练效率。这种适应性使得 KAN 能更好地匹配不同任务的需求。
自动发现高效结构：实验结果显示，自动发现的 KAN 结构通常比人为构建的更为紧凑，表明 Kolmogorov-Arnold 表示可能在某些情况下能以比预期更高效的方式压缩和表示信息，尽管这也可能给模型的直接可解释性带来挑战

粗读

Kolmogorov-Arnold定理

如果 f 是定义在有界域上的多变量连续函数，那么 f 可以被写成单变量连续函数的有限组合以及加法的二元运算。更具体地说，对于光滑函数

f(X)=f(x_1,...,x_n)=\sum_{q=1}^{2n+1}\Phi_q(\sum_{p=1}^n\phi_{q,p}(x_p))

从某种意义上说，他们证明了唯一真正的多元函数是加法，因为每个其他函数都可以用单变量函数和求和来表示。

One might naively consider this great news for machine learning: learning a high-dimensional function boils down to learning a polynomial number of 1D functions. However, these 1D functions can be non-smooth and even fractal, so they may not be learnable in practice Because of this pathological behavior, the Kolmogorov-Arnold representation theorem was basically sentenced to death in machine learning, regarded as theoretically sound but practically useless
However, we are more optimistic about the usefulness of the Kolmogorov-Arnold theorem for machine learning. First of all, we need not stick to the original Eq. (2.1) which has only two-layer nonlinearities and a small number of terms (2n + 1) in the hidden layer: we will generalize the network to arbitrary widths and depths. Secondly, most functions in science and daily life are often smooth and have sparse compositional structures, potentially facilitating smooth Kolmogorov-Arnold representations. The philosophy here is close to the mindset of physicists, who often care more about typical cases rather than worst cases. After all, our physical world and machine learning tasks must have structures to make physics and machine learning useful or generalizable at all
有人可能天真地认为这对机器学习是个好消息：学习高维函数归结为学习一系列 1D 函数。然而，这些 1D 函数可能是非光滑甚至是分形的，所以在实践中可能无法学习到它们。由于这种病态行为，科尔莫戈洛夫-阿诺尔德表示定理基本上被判定为在机器学习中理论上合理但实际上无用 。
然而，我们对科尔莫戈洛夫-阿诺尔德定理在机器学习中的用处更加乐观。首先，我们不必局限于原始的方程，该方程仅具有两层非线性和隐藏层中的少量项（2n + 1）：我们将将网络推广到任意宽度和深度。其次，科学和日常生活中的大多数函数通常是光滑的，并具有稀疏的组合结构，潜在地促进了光滑的科尔莫戈洛夫-阿诺尔德表示。这里的哲学与物理学家的思维方式接近，他们通常更关心典型情况而不是最坏情况。毕竟，我们的物理世界和机器学习任务必须具有结构，才能使物理学和机器学习具有任何用处或可推广性。

剩余内容

之后论文尝试了KAN的简单实现与应用，以及在经典数据集、物理、数学各方面相比MLP更好的初步证据，不仅在于更准确，同时也更具解释性。

未来人们可能会提出kansformsers！
KAN as a “language model” for AI + Science The reason why large language models are so transformative is because they are useful to anyone who can speak natural language. The language of science is functions. KANs are composed of interpretable functions, so when a human user stares at a KAN, it is like communicating with it using the language of functions. This paragraph aims to promote the AI-Scientist-Collaboration paradigm rather than our specific tool KANs. Just like people use different languages to communicate, we expect that in the future KANs will be just one of the languages for AI + Science, although KANs will be one of the very first languages that would enable AI and human to communicate. However, enabled by KANs, the AI-Scientist-Collaboration paradigm has never been this easy and convenient, which leads us to rethink the paradigm of how we want to approach AI + Science: Do we want AI scientists, or do we want AI that helps scientists? The intrinsic difficulty of (fully automated) AI scientists is that it is hard to make human preferences quantitative, which would codify human preferences into AI objectives. In fact, scientists in different fields may feel differently about which functions are simple or interpretable. As a result, it is more desirable for scientists to have an AI that can speak the scientific language (functions) and can conveniently interact with inductive biases of individual scientist(s) to adapt to a specific scientific domain.
KAN 作为 AI + 科学的“语言模型”。大型语言模型之所以具有如此重大的转变性，是因为它们对任何能够说自然语言的人都是有用的。科学的语言是函数。KANs 由可解释的函数组成，因此当一个人类用户盯着一个 KAN 时，就像用函数语言与它交流一样。本段旨在推广 AI-科学家-协作范式，而不是我们具体的工具 KANs。就像人们使用不同的语言进行交流一样，我们期望未来 KANs 将只是 AI + 科学中的一种语言，尽管 KANs 将是使 AI 和人类交流的最初的语言之一。然而，借助 KANs 的帮助，AI-科学家-协作范式变得如此简单和方便，这促使我们重新思考我们希望如何接近 AI + 科学的范式：我们想要 AI 科学家，还是我们想要帮助科学家的 AI？（完全自动化的）AI 科学家的固有难度在于很难使人类偏好量化，这将使人类偏好编码为 AI 目标。事实上，不同领域的科学家对于哪些函数简单或可解释可能有不同的看法。因此，科学家更希望有一个能够使用科学语言（函数）并且可以方便地与单个科学家的归纳偏好进行交互以适应特定科学领域的 AI。

最后，作者给出了在现阶段KAN与MLP的如何选择，尽管如今KAN的速度显著慢于MLP，但作者认为这是工程上的问题，而不是根本性的局限。

Currently, the biggest bottleneck of KANs lies in its slow training. KANs are usually 10x slower than MLPs, given the same number of parameters. We should be honest that we did not try hard to optimize KANs’ efficiency though, so we deem KANs’ slow training more as an engineering problem to be improved in the future rather than a fundamental limitation. If one wants to train a model fast, one should use MLPs. In other cases, however, KANs should be comparable or better than MLPs, which makes them worth trying.n short, if you care about interpretability and/or accuracy, and slow training is not a major concern, we suggest trying KANs
目前，KANs 最大的瓶颈在于训练速度较慢。相同数量的参数情况下，KANs 通常比 MLPs 慢 10 倍。我们应该诚实地说，尽管我们并没有努力优化 KANs 的效率，但我们认为 KANs 的训练速度慢更多地是未来需要改进的工程问题，而不是根本性的限制。如果想要快速训练模型，应该使用 MLPs。然而，在其他情况下，KANs 应该与 MLPs 相比较或更好，这使得值得尝试。简而言之，如果您关心可解释性和/或准确性，并且缓慢的训练不是主要问题，我们建议尝试 KANs。

总结

随着大语言模型热度的降低与质疑的增加，我很开心能看到符号主义的人工智能又重新回到大众视野，我期待的科学是摆脱无知的，是严谨认真的，也是无懈可击的
这个世界最大的问题是蠢人自命不凡,而智者却满腹疑虑。——罗素
科学的语言是函数，不同领域的科学家对函数的看法可能不同，函数应该可以适应不同的领域
简直从根本上缓解了我的AI焦虑
实践是检验真理的唯一标准
I would like to welcome people to be critical of KANs, but also to be critical of critiques as well. Practice is the only criterion for testing understanding. We don't know many things beforehand until they are really tried and shown to be succeeding or failing. As much as I'm willing to see success mode of KANs, I'm equally curious about failure modes of KANs, to better understand the boundaries. KANs and MLPs cannot replace each other (as far as I can tell); they each have advantages in some settings and limitations in others. I would be intrigued by a theoretical framework that encompasses both and could even suggest new alternatives (physicists love unified theories, sorry :).
我想欢迎人们对KAN持批评态度，但也欢迎对批评持批评态度。实践是检验真理的唯一标准。我们事先不知道很多事情，直到它们被真正尝试并被证明是成功或失败的。尽管我愿意看到 KAN 的成功模式，但我同样对 KAN 的失败模式感到好奇，以便更好地理解边界。KAN 和 MLP 不能相互替代（据我所知）;它们在某些环境中都有优势，而在另一些环境中则有局限性。我会对一个包含两者的理论框架感兴趣，甚至可以提出新的替代方案（物理学家喜欢统一的理论，对不起，:)。
必须具有结构，才能使物理学和机器学习具有任何用处或可推广性
前提条件！先有结构！先有信念！

目录CONTENT

从MLP到KAN

KAN文档

MLP

KAN初窥

粗读

Kolmogorov-Arnold定理

剩余内容

总结

评论区