Friday, July 25, 2025

Reviewing emergent computational abilities in Large Language Models

Two years ago, I wrote a post about a paper by Wei et al, Emergent Abilities of Large Language Models

Then last year, I posted about a paper Are Emergent Abilities of Large Language Models a Mirage? that criticised the first paper.

There is more to the story. The first paper has now been cited over 3,600 times. There is a helpful review of the state of the field.

Emergent Abilities in Large Language Models: A Survey

Leonardo Berti, Flavio Giorgi, Gjergji Kasneci

It begins with a discussion of what emergence is, quoting from Phil Anderson's More is Different article [which emphasised how new properties may appear when a system becomes large] and John Hopfield's Neural networks and physical systems with emergent collective computational abilities, which was the basis of his recent Nobel Prize. Hopfield stated

"Computational properties of use to biological organisms or the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons)."

Berti et al. observe, "Fast forward to the LLM era, notice how Hopfield's observations encompass all the computational tasks that LLMs can perform."

They discuss emergent abilities as in-context learning, defined as the "capability to generalise from a few examples to new tasks and concepts on which they have not been directly trained."

Here, I put this review in the broader context of the role of emergence in other areas of science.

Scales. 

Simple scales that describe how large an LLM is include the amount of computation, the number of model parameters, and the size of the training dataset. More complicated measures of scale include the number of layers in a deep neural network and the complexity of the training tasks.

Berti et al. note that the emergence of new computational abilities does not just follow from increases in the simple scales but can be tied to the training process. I note that this subtlety is consistent with experience in biology. Simple scales would be the length of an amino acid chain in a protein or base pairs in a DNA molecule, the number of proteins in a cell or the number of cells in an organism. More subtle scales include the number of protein interactions in a proteome or gene networks in a cell. Deducing what the relevant scales are is non-trivial. Furthermore, as emphasised by Denis Noble and Robert Bishop, context matters, e.g., a protein may only have a specific function if it is located in a specific cell.

Novelty. 

When they become sufficiently "large", LLMs have computational abilities that they were not explicitly designed for and that "small" versions do not have. 

The emergent abilities range "from advanced reasoning and in-context learning to coding and problem-solving."

The original paper by Wei et al. listed 137 emergent abilities in an Appendix!

Berti et al. give another example.

"Chen et al. [15] introduced a novel framework called AgentVerse, designed to enable and study collaboration among multiple AI agents. Through these interactions, the framework reveals emergent behaviors such as spontaneous cooperation, competition, negotiation, and the development of innovative strategies that were not explicitly programmed."

An alternative to defining novelty in terms of a comparison of the whole to the parts is to compare properties of the whole to those of a random configuration of the system. The performance of some LLMs is near-random (e.g., random guessing) until a critical threshold is reached (e.g., in size) when the emergent ability appears.

Discontinuities.

Are there quantitative objective measures that can be used to identify the emergence of a new computational ability? Researchers are struggling to find agreed-upon metrics that show clear discontinuities. That was the essential point of Are Emergent Abilities of Large Language Models a Mirage? 

In condensed matter physics, the emergence of a new state of matter is (usually) associated with symmetry breaking and an order parameter. Figuring out what the relevant broken symmetry and the order parameter often requires brilliant insight and may even lead to a Nobel Prize (Neel, Josephson, Ginzburg, Leggett,...) A similar argument can be made with respect to the development of the Standard Model of elementary particles and gauge fields. Furthermore, the discontinuities only exist in the thermodynamic limit (i.e., in the limit of an infinite system), and there are many subtleties associated with how the data from finite-size computer simulations should be plotted to show that the system really does exhibit a phase transition.

Unpredictability.

The observation of new computational abilities in LLMs was unanticipated and surprised many people, including the designers of the specific LLMs involved. This is similar to what happens in condensed matter physics, where new states of matter have mostly been discovered by serendipity.

Some authors seem surprised that it is difficult to predict emergent abilities. "While early scaling laws provided some insight, they often fail to anticipate discontinuous leaps in performance."

Given the largely "black box" nature of LLMs, I don't find it the unpredictability surprising. It is hard for condensed matter systems, and they are much better characterised and understood.

Modular structures at the mesoscale.

Modularity is a common characteristic of emergence. In a wide range of systems, from physics to biology to economics, a key step in the development of the theory of a specific emergent phenomenon has been the identification of a mesoscale (intermediate between the micro- and macro-scales) at which modular structures emerge. These modules interact weakly with one another, and the whole system can be understood in these terms. Identification of these structures and the effective theories describing them has usually required brilliant insight. An example is the concepts of quasiparticles in quantum many-body physics, pioneered by Landau.

Berti et al. do not mention the importance of this issue. However, they do mention that "functional modules emerge naturally during training" [Ref. 7,43,81,84] and that "specialised circuits activate at certain scaling thresholds [24]".

Modularity may be related to an earlier post, Why do deep learning algorithms work so well? In the training process, a neural network rids noisy input data of extraneous details...There is a connection between the deep learning algorithm, known as the "deep belief net" of Geoffrey Hinton, and renormalisation group methods (which can be key to identifying modularity and effective interactions).

Is emergence good or bad?

Undesirable and dangerous capabilities can emerge. Those observed include deception, manipulation, exploitation, and sycophancy.

These concerns parallel discussions in economics. Libertarians, the Austrian school, and Federich Hayek tend to see the emergence as only producing socially desirable outcomes, such as the efficiency of free markets [the invisible hand of Adam Smith]. However, emergence also produces bubbles and crashes and recessions.

Resistance to control

A holy grail is the design, manipulation, and control of emergent properties. This ambitious goal is promoted in materials science, medicine, engineering, economics, public policy, business management, and social activism. However, it largely remains elusive, arguably due to the complexity and unpredictability of the systems of interest. Emergent properties of LLMs may turn out to offer similar hopes, frustrations, and disappointments. We should try, but have realistic expectations.

Friday, July 18, 2025

Emergence in Chemistry

It is important to be clear what the system is. Most of chemistry is not really about isolated molecules. A significant amount of chemistry occurs in an environment, often within a solvent. Then the system is the chemicals of interest and the solvent. For example, when it is stated that HCl is an acid, this is not a reference to isolated HCl molecules but a solution of HCl in water, and then the HCl dissociates into H+ and Cl- ions. Chemical properties such as reactivity can change significantly depending on whether a compound is in the solid, liquid, or gas state, or on the properties of the solvent in which it is dissolved.

Scales

The time scales for processes, which range from molecular vibrations to chemical reactions, can vary from femtoseconds to days. Relevant energy scales, corresponding to different effective interactions, can vary from tens of eV (strong covalent bonds) to microwave energies of 0.1 meV (quantum tunnelling in an ammonia maser).

Other scales are the total number of atoms in a compound, which can range from two to millions, the total number of electrons, and the number of different chemical elements in the compound. As the number of atoms and electrons increases, so does the dimensionality of the Hilbert space of the corresponding quantum system.

Novelty

All chemical compounds are composed of a discrete number of atoms, usually of different type. For example, acetic acid, denoted CH3COOH, is composed of carbon, oxygen, and hydrogen atoms. The compound usually has chemical and physical properties that the individual atoms do not have.

Chemistry is all about transformation. Reactants combine to produce products, e.g. A + B -> C. C may have chemical or physical properties that A and B did not have.

Chemistry involves concepts that do not appear in physics. Roald Hoffmann argued that concepts such as acidity and basicity, aromaticity, functional groups, and substituent effects have great utility and are lost in a reductionist perspective that tries to define them precisely and mathematicise them.

Diversity

Chemistry is a wonderland of diversity, as it arranges chemical elements in a multitude of different ways that produce a plethora of phenomena. Much of organic chemistry just involves three different atoms: carbon, oxygen, and hydrogen.

Molecular structure

Simple molecules (such as water, ammonia, carbon dioxide, methane, benzene) have a unique structure defined by fixed bond lengths and angles. In other words, there is a well-defined geometric structure that gives the locations of the centres of atomic nuclei. This is a classical entity. This emerges from the interactions between the electrons and nuclei of the constituent atoms.

In philosophical discussions of emergence in chemistry, molecular structure has received significant attention. Some claim it provides evidence of strong emergence. The arguments centre around the fact that the molecular structure is a classical entity and concept that is imposed, whereas a logically self-consistent approach would treat both electrons and nuclei quantum mechanically.

The molecular structure of ammonia (NH3) illustrates the issue. It has an umbrella structure which can be inverted. Classically, there are two possible degenerate structures. For an isolated molecule, quantum tunnelling back and forth between the two structures can occur. The ground state is a quantum superposition of two molecular structures. This tunnelling does occur in a dilute gas of ammonia at low temperature, and the associated quantum transition is the basis of the maser, the forerunner of the laser. This example of ammonia was discussed by Anderson at the beginning of his seminal More is Different article to illustrate how symmetry breaking leads to well-defined molecular structures in large molecules. 

Figure is taken from here.

Born-Oppenheimer approximation 

Without this concept, much of theoretical chemistry and condensed matter would be incredibly difficult. It is based on the separation of time and energy scales associated with electronic and nuclear motion.  It is used to describe and understand the dynamics of nuclei and electronic transitions in solids and molecules. The potential energy surfaces for different electronic states define effective theory for the nuclei. Without this concept, much of theoretical chemistry and condensed matter would be incredibly difficult.

Singularity. The Born-Oppenheimer approximation is justified by an asymptotic expansion in powers of (m/M)^1/4, where m is the mass of an electron and M the mass of an atomic nucleus in the molecule. This has been discussed by Primas and Bishop.

The rotational and vibrational degrees of freedom of molecules also involve a separation of time and energy scales. Consequently, one can derive separate effective Hamiltonians for the vibrational and rotational degrees of freedom.

Qualitative difference with increase in molecular size

Consider the following series with varying chemical properties: formic acid (CH2O2), acetic acid (C2H4O2), propionic acid (C3H6O2), butyric acid (C4H8O2), and valerianic acid (C5H10O2), whose members involve the successive addition of a CH2 radical. The Marxist Friedrich Engels used these examples as evidence for Hegel’s law: “The law of transformation of quantity into quality and vice versa”.

In 1961, Platt discussed properties of large molecules that “might not have been anticipated” from properties of their chemical subgroups. Table 1 in Platt’s paper lists “Properties of molecules in the 5- to 50-range that have no counterpart in diatomics and many triatomics.” Table 2 lists “Properties of molecules in the 50- to 500-atom range and up that go beyond the properties of their chemical sub-groups.” The properties listed included internal conversion (i.e., non-radiative decay of excited electronic states), formation of micelles for hydrocarbon chains with more than ten carbons, the helix-coil transition in polymers, chromatographic or molecular sorting properties of polyelectrolytes such as those in ion-exchange resins, and the contractility of long chains.

Platt also discussed the problem of molecular self-replication. Until 1951, it was assumed that a machine could not reproduce itself,f and this was the fundamental difference between machines and living systems. However, von Neumann showed that a machine with a sufficient number of parts and a sufficiently long list of instructions can reproduce itself. Platt pointed out that this suggested there is a threshold for autocatalysis: “this threshold marks an essentially discontinuous change in properties, and that fully-complex molecules larger than this size differ from all smaller ones in a property of central importance for biology.” Thus, self-replication is an emergent property. A modification of this idea has been pursued by Stuart Kauffman with regard to the origin of life, that when a network of chemical reactions is sufficiently large, it becomes self-replicating.

Thursday, July 10, 2025

What Americans might want to know about getting a job in an Australian university

Universities and scientific research in the USA are facing a dire future. Understandably, some scientists are considering leaving the USA. I have had a few enquiries about Australia. This makes sense, as Australia is a stable English-speaking country with similarities in education, culture, democracy, and economics. At least compared to most other possible destinations. Nevertheless, there are important differences between Australia and the USA to be aware of, particularly when it comes down to how universities function (and dis-function!) and how they hire people. 

A few people have asked me for advice. Below are some comparisons. Why should you believe me? I spent eleven years in the US (1983-1994) and visited at least once a year until 2018. On the other hand, there are some reasons to take what I say with a grain of salt. I have never been a faculty member in a US university. I retired four years ago from a faculty position in Australia. I actually haven't sat on a committee for almost ten years :). Hopefully, this post will prompt other readers to weigh in with other perspectives.

There are discussions in Australia about trying to attract senior people from the USA to come here. Whether that will come to anything substantial remains to be seen.

The best place to look for advertised positions is on Seek. 

Postdocs

This is where the news is best. Young people in the USA can apply for regular postdoc positions. Most are attached to specific grants and so involve working on a specific project. 

Ph.D. students

Most of the positions go to Australian citizens who get there own scholarship (fellowship) from the government. These are not tied to a grant or a supervisor (advisor) There are a few positions for international students, but not many. Usually they go to applicants with a Masters degree and publications.

Ph.D's are funded for 3 to 3.5 years. There is no required course work. Australian students have done a 4-year undergraduate degree and no Masters. This means tackling highly technical projects in theory is not realistic, except for exceptional students.

Faculty hiring is adhoc 

There is no hiring cycle. Positions tend to be advertised at random times depending on local politics, whims and bureaucracy. Universities and Schools (departments) claim they have strategic plans, but given fluctuations in funding, management, and government policy positions appear and disappear at random. Typically, the Dean (and their lackies), not the department, control the selection process, particularly for senior appointments. The emphasis is on metrics. Letters of reference are sometimes not even called for before short listing. Some hiring is done purely from online interviews and seminars.

Bias towards insiders 

People already in the Australian system know how to navigate it best. They may also already have a grant from the Australian Research Council and have done some teaching and (positive) student evaluations. They are known quantities to the managers and so a safer bet than outsiders. If you want to get a junior faculty position here (a lectureship) your chances may be better if you first come as a postdoc. However, there are exceptions...

Current funding crunches

Unfortunately, I fear the faculty market may be quite cool for the next few years. Many universities are actually trying to sack (fire) people due to funding shortfalls. These budget crises are due to post-covid, mismanagement, and the government trying to reduce international student numbers (due to the politics of a housing and cost-of-living crisis).

Australian Research Council

This is pretty much the sole source of funding in physics and chemistry. This is quite different to the USA where there were (pre-Trump) numerous funding agencies (NSF, DOE, DOD, ...).  They are currently reviewing and redesigning all their programs and so we will have to wait to see how this may impact the prospects of scientific refugees from the USA. (They used to have quite good Fellowship schemes for all career stages that were an excellent avenue for foreigners to come here). Some of my colleagues recommend following ARC Tracker on social media to be informed about the latest at ARC.

Thirty years ago, I came back to Australia from the USA. I had a wonderful stint doing science, largely because of generous ARC funding. Unfortunately, the system has declined. But I am sure it is better than being the USA right now.

There are many more things I could write about. Some have featured in previous rants about metrics and managerialism. Things to be aware of before accepting a job include faculty having little voice or power, student absenteeism, corrupt governance, and there is no real tenure or sabbaticals.

Reviewing emergent computational abilities in Large Language Models

Two years ago, I wrote a post about a paper by Wei et al,  Emergent Abilities of Large Language Models Then last year, I posted about a pap...