Two years ago, I wrote a post about a paper by Wei et al, Emergent Abilities of Large Language Models
Then last year, I posted about a paper Are Emergent Abilities of Large Language Models a Mirage? that criticised the first paper.
There is more to the story. The first paper has now been cited over 3,600 times. There is a helpful review of the state of the field.
Emergent Abilities in Large Language Models: A Survey
Leonardo Berti, Flavio Giorgi, Gjergji Kasneci
It begins with a discussion of what emergence is, quoting from Phil Anderson's More is Different article [which emphasised how new properties may appear when a system becomes large] and John Hopfield's Neural networks and physical systems with emergent collective computational abilities, which was the basis of his recent Nobel Prize. Hopfield stated
"Computational properties of use to biological organisms or the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons)."
Berti et al. observe, "Fast forward to the LLM era, notice how Hopfield's observations encompass all the computational tasks that LLMs can perform."
They discuss emergent abilities as in-context learning, defined as the "capability to generalise from a few examples to new tasks and concepts on which they have not been directly trained."
Here, I put this review in the broader context of the role of emergence in other areas of science.
Scales.
Simple scales that describe how large an LLM is include the amount of computation, the number of model parameters, and the size of the training dataset. More complicated measures of scale include the number of layers in a deep neural network and the complexity of the training tasks.
Berti et al. note that the emergence of new computational abilities does not just follow from increases in the simple scales but can be tied to the training process. I note that this subtlety is consistent with experience in biology. Simple scales would be the length of an amino acid chain in a protein or base pairs in a DNA molecule, the number of proteins in a cell or the number of cells in an organism. More subtle scales include the number of protein interactions in a proteome or gene networks in a cell. Deducing what the relevant scales are is non-trivial. Furthermore, as emphasised by Denis Noble and Robert Bishop, context matters, e.g., a protein may only have a specific function if it is located in a specific cell.
Novelty.
When they become sufficiently "large", LLMs have computational abilities that they were not explicitly designed for and that "small" versions do not have.
The emergent abilities range "from advanced reasoning and in-context learning to coding and problem-solving."
The original paper by Wei et al. listed 137 emergent abilities in an Appendix!
Berti et al. give another example.
"Chen et al. [15] introduced a novel framework called AgentVerse, designed to enable and study collaboration among multiple AI agents. Through these interactions, the framework reveals emergent behaviors such as spontaneous cooperation, competition, negotiation, and the development of innovative strategies that were not explicitly programmed."
An alternative to defining novelty in terms of a comparison of the whole to the parts is to compare properties of the whole to those of a random configuration of the system. The performance of some LLMs is near-random (e.g., random guessing) until a critical threshold is reached (e.g., in size) when the emergent ability appears.
Discontinuities.
Are there quantitative objective measures that can be used to identify the emergence of a new computational ability? Researchers are struggling to find agreed-upon metrics that show clear discontinuities. That was the essential point of Are Emergent Abilities of Large Language Models a Mirage?
In condensed matter physics, the emergence of a new state of matter is (usually) associated with symmetry breaking and an order parameter. Figuring out what the relevant broken symmetry and the order parameter often requires brilliant insight and may even lead to a Nobel Prize (Neel, Josephson, Ginzburg, Leggett,...) A similar argument can be made with respect to the development of the Standard Model of elementary particles and gauge fields. Furthermore, the discontinuities only exist in the thermodynamic limit (i.e., in the limit of an infinite system), and there are many subtleties associated with how the data from finite-size computer simulations should be plotted to show that the system really does exhibit a phase transition.
Unpredictability.
The observation of new computational abilities in LLMs was unanticipated and surprised many people, including the designers of the specific LLMs involved. This is similar to what happens in condensed matter physics, where new states of matter have mostly been discovered by serendipity.
Some authors seem surprised that it is difficult to predict emergent abilities. "While early scaling laws provided some insight, they often fail to anticipate discontinuous leaps in performance."
Given the largely "black box" nature of LLMs, I don't find it the unpredictability surprising. It is hard for condensed matter systems, and they are much better characterised and understood.
Modular structures at the mesoscale.
Modularity is a common characteristic of emergence. In a wide range of systems, from physics to biology to economics, a key step in the development of the theory of a specific emergent phenomenon has been the identification of a mesoscale (intermediate between the micro- and macro-scales) at which modular structures emerge. These modules interact weakly with one another, and the whole system can be understood in these terms. Identification of these structures and the effective theories describing them has usually required brilliant insight. An example is the concepts of quasiparticles in quantum many-body physics, pioneered by Landau.
Berti et al. do not mention the importance of this issue. However, they do mention that "functional modules emerge naturally during training" [Ref. 7,43,81,84] and that "specialised circuits activate at certain scaling thresholds [24]".
Modularity may be related to an earlier post, Why do deep learning algorithms work so well? In the training process, a neural network rids noisy input data of extraneous details...There is a connection between the deep learning algorithm, known as the "deep belief net" of Geoffrey Hinton, and renormalisation group methods (which can be key to identifying modularity and effective interactions).
Is emergence good or bad?
Undesirable and dangerous capabilities can emerge. Those observed include deception, manipulation, exploitation, and sycophancy.
These concerns parallel discussions in economics. Libertarians, the Austrian school, and Federich Hayek tend to see the emergence as only producing socially desirable outcomes, such as the efficiency of free markets [the invisible hand of Adam Smith]. However, emergence also produces bubbles and crashes and recessions.
Resistance to control
A holy grail is the design, manipulation, and control of emergent properties. This ambitious goal is promoted in materials science, medicine, engineering, economics, public policy, business management, and social activism. However, it largely remains elusive, arguably due to the complexity and unpredictability of the systems of interest. Emergent properties of LLMs may turn out to offer similar hopes, frustrations, and disappointments. We should try, but have realistic expectations.
No comments:
Post a Comment