Condensed concepts: computing

Showing posts with label computing. Show all posts

Tuesday, February 24, 2026

Information theoretic measures for emergence and causality

The relationship between emergence and causation is contentious, with a long history. Most discussions are qualitative. Presented with a new system, how does one identify the microscopic and macroscopic scales that may be most useful for understanding and describing the system? Can Judea Pearl’s seminal ideas about causality be implemented practically for understanding emergence?

Broadly speaking, a weakness of discussions of emergence and causality is that it is hard to define these concepts in a rigorous and quantitative manner that makes them amenable to empirical testing, with respect to theoretical models and to experimental data.

Fortunately, in the past decade, there have been some specific proposals to address this issue, mostly using information theory. A helpful recent review is by Yuan et al.

“Two primary challenges take precedence in understanding emergence from a causal perspective. The first is establishing a quantitative definition of emergence, whereas the second involves identifying emergent behaviors or phenomena through data analysis.
To address the first challenge, two prominent quantitative theories of emergence have emerged in the past decade. The first is Erik Hoel et al.’s theory of causal emergence [19] whereas the second is Fernando E. Rosas et al.’s theory of emergence based on partial information decomposition [24].
Hoel et al.’s theory of causal emergence specifically addresses complex systems that are modeled using Markov chains. It employs the concept of effective information (EI) to quantify the extent of causal influence within Markov chains and enables comparisons of EI values across different scales [19,25]. Causal emergence is defined by the difference in the EI values between the macro-level and micro-level."

One perspective on causal emergence is that it occurs when the dynamics of a system at the macro-level is described more efficiently by macro-variables than by the dynamics of variables from the micro-level.

Klein et al. used Hoel’s information-theoretic measures of causal emergence to analyse protein interaction networks (interactomes) in over 1800 species, containing more than eight million protein–protein interactions, across different scales. They showed the emergence of ‘macroscales’ that are associated with lower noise and uncertainty. The nodes in the macroscale description of the network are more resilient than those in less coarse-grained descriptions. Greater causal emergence (i.e., a stronger macroscale description) was generally seen in multicellular organisms compared to single-cell organisms. The authors quantified causal emergence in terms of mutual information (between large and small scales) and effective information (a measure of the certainty in the connectivity of a network). Philip Ball (2023) (pages 218-220) gives an account of this work in terms of the emergence of multicellularity in biological evolution. He introduced the term causal spreading (pages 225-7), arguing that over the history of evolution the locus of causation has changed.

Yuan et al. continue

"However, in Hoel’s theory of causal emergence, it is essential to establish a coarse-graining strategy beforehand. Alternatively, the strategy can be derived by maximizing the effective information (EI) [19]. However, this task becomes challenging for large-scale systems due to the computational complexity involved. To address these problems, Rosas et al. introduced a new quantitative definition of causal emergence [24] that does not depend on coarse-graining methods, drawing from partial information decomposition (PID)-related theory. PID is an approach developed by Williams et al., which seeks to decompose the mutual information between a target and source variables into non-overlapping information atoms: unique, redundant, and synergistic information [29]…"

The Figure below is taken from Rosas et al. Xt^j (j=1,…,n) are microscopic variables that define a Markov chain. Vt is a macroscopic variable that is completely determined by the microscopic variables.

“Diagram of causally emergent relationships. Causally emergent features have predictive power beyond individual components. Downward causation takes place when that predictive power refers to individual elements; causal decoupling when it refers to itself or other high-order features.”

Rosas et al. applied the method to specific systems, including Conway’s Game of Life, Reynolds’ flocking model, and neural activity as measured by electrocorticography. More recently, it was used to describe emergence in computer science, including the identification of modular structures. Calculations were performed for specific examples, including Ehrenfest’s urn model for diffusion, the Ising model with Glauber dynamics, a Hopfield neural network model for associative memory.

Yuan et al. also state the following:

"The second challenge pertains to the identification of emergence from data. In an effort to address this issue, Rosas et al. derived a numerical method [24]. However, it is important to acknowledge that this method offers only a sufficient condition for emergence and is an approximate approach. Another limitation is that a coarse-grained macro-state variable should be given beforehand to apply this method."

Sas et al. recently stated

“Empirical applications of this framework to study emergence … including the study of gene regulatory networks [22], the dynamics of the human brain [23], the internal dynamics of reservoir computing [24], and the formation of useful internal representations in machine learning [25].”

Yuan et al. also discuss two significant connections between causal emergence and machine learning. First, machine learning can be used to improve calculations of causal emergence. Second, causal emergence measures can be used to better understand how machine learning works and improve it.

The work described above built on earlier work by Crutchfield, who claimed that the identification of emergence and hierarchies could be made operational, stating that “different scales are delineated by a succession of divergences in statistical complexity at lower levels.” More recently, Rupe and Crutchfield have reported progress towards identifying emergent self-organisation in a system.

Although this work on quantitative measures of emergence based on information theory represents significant progress, there are many open problems. Examples include the extension to non-Markovian systems and the development of computationally feasible methods for large systems. The latter is particularly important in physical systems where spontaneous symmetry breaking occurs, as this only happens in the thermodynamic limit of an infinite system.

There is an unrecognised similarity between the work described above and techniques recently developed to characterise phase transitions in statistical mechanics models such as the Ising model and classical dimer models. Coarse-graining (CG) is optimised by maximising the Real-Space Mutual Information (RSMI) between a spatial block and its distant environment.

In general, maximising mutual information is notoriously hard but can be done using state-of-the-art machine learning algorithms. Gokmen et al. have developed an algorithm that they claim “can, unsupervised, construct order parameters, locate phase transitions, and identify spatial correlations and symmetries for complex and large-dimensional real-space data.” Furthermore, the optimal CG explicitly identifies the scaling operators associated with the critical point.

The classical dimer model provides a stringent test as “the relevant low-energy degrees of freedom are profoundly different from the microscopic building blocks of the theory and change qualitatively throughout the phase diagram.” In other words, the emergent entities (quasiparticles such as vortices associated with the height field, which is described by a sine-Gordon field theory) are different from the dimers.

It is encouraging to see that two different scientific communities have developed similar ideas to address this challenging problem of making discussions about emergence and causality more concrete and quantitative.

Saturday, October 25, 2025

Can AI solve quantum-many body problems?

I find it difficult to wade through all the hype about AI, along with the anecdotes about its failings to reliably answer basic questions.

Gerard Milburn kindly brought to my attention a nice paper that systematically addresses whether AI is useful as an aid (research assistant) for solving basic (but difficult) problems that researchers in condensed matter theorists care about.

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

The abstract is below.

My only comment is one of perspective. Is the cup half full or half empty? Do we emphasise the failures or the successes?

The optimists among us will claim that the success in solving a smaller number of these difficult problems shows the power and potential of AI. It is just a matter of time before LLMs can solve most of these problems, and we will see dramatic increases in research productivity (defined as the amount of time taken to complete a project).

The pessimists and skeptically oriented will claim that the failures highlight the limitations of AI, particularly when training data sets are small. We are still a long way from replacing graduate students with AI bots (or at least using AI to train students in the first year of their PhD).

What do you think? Should this study lead to optimism, pessimism, or just wait and see?

----------

Large language models (LLMs) have shown remarkable progress in coding and math problem-solving, but evaluation on advanced research-level problems in hard sciences remains scarce. To fill this gap, we present CMT-Benchmark, a dataset of 50 problems covering condensed matter theory (CMT) at the level of an expert researcher. Topics span analytical and computational approaches in quantum many-body, and classical statistical mechanics. The dataset was designed and verified by a panel of expert researchers from around the world. We built the dataset through a collaborative environment that challenges the panel to write and refine problems they would want a research assistant to solve, including Hartree-Fock, exact diagonalization, quantum/variational Monte Carlo, density matrix renormalization group (DMRG), quantum/classical statistical mechanics, and model building. We evaluate LLMs by programmatically checking solutions against expert-supplied ground truth. We developed machine-grading, including symbolic handling of non-commuting operators via normal ordering. They generalize across tasks too. Our evaluations show that frontier models struggle with all of the problems in the dataset, highlighting a gap in the physical reasoning skills of current LLMs. Notably, experts identified strategies for creating increasingly difficult problems by interacting with the LLMs and exploiting common failure modes. The best model, GPT5, solves 30\% of the problems; average across 17 models (GPT, Gemini, Claude, DeepSeek, Llama) is 11.4±2.1\%. Moreover, 18 problems are solved by none of the 17 models, and 26 by at most one. These unsolved problems span Quantum Monte Carlo, Variational Monte Carlo, and DMRG. Answers sometimes violate fundamental symmetries or have unphysical scaling dimensions. We believe this benchmark will guide development toward capable AI research assistants and tutors.

Monday, October 20, 2025

Undergraduates need to learn about the Ising model

A typical undergraduate course on statistical mechanics is arguably misleading because (unintentionally) it does not tell students several important things (related to one another).

Statistical mechanics is not just about how to calculate thermodynamic properties of a collection of non-interacting particles.

A hundred years ago, many physicists did not believe that statistical mechanics could describe phase transitions. Arguably, this lingering doubt only ended fifty years ago with Wilson's development of renormalisation group theory.

It is about emergence: how microscopic properties are related to macroscopic properties.

Leo Kadanoff commented, "Starting around 1925, a change occurred: With the work of Ising, statistical mechanics began to be used to describe the behaviour of many particles at once."

When I came to UQ 25 years ago, I taught PHYS3020 Statistical Mechanics a couple of times. To my shame, I never discussed the Ising model. There is a nice section on it in the course textbook, Thermal Physics: An Introduction, by Daniel Schroeder. I guess I did not think there was time to "fit it in" and back then, I did not appreciate how important the Ising model is. This was a mistake.

Things have changed for the better due to my colleagues Peter Jacobson and Karen Kheruntsyan. They now include one lecture on the model, and students complete a computational assignment in which they write a Monte Carlo code to simulate the model.

This year, I am giving the lecture on the model. Here are my slides and what I will write on the whiteboard or document viewer in the lecture.

Wednesday, August 13, 2025

My review article on emergence

I just posted on the arXiv a long review article on emergence

Emergence: from physics to biology, sociology, and computer science

The abstract is below.

I welcome feedback.

------

Many systems of interest to scientists involve a large number of interacting parts and the whole system can have properties that the individual parts do not. The system is qualitatively different to its parts. More is different. I take this novelty as the defining characteristic of an emergent property. Many other characteristics have been associated with emergence are reviewed, including universality, order, complexity, unpredictability, irreducibility, diversity, self-organisation, discontinuities, and singularities. However, it has not been established whether these characteristics are necessary or sufficient for novelty. A wide range of examples are given to show how emergent phenomena are ubiquitous across most sub-fields of physics and many areas of biology and social sciences. Emergence is central to many of the biggest scientific and societal challenges today. Emergence can be understood in terms of scales (energy, time, length, complexity) and the associated stratification of reality. At each stratum (level) there is a distinct ontology (properties, phenomena, processes, entities, and effective interactions) and epistemology (theories, concepts, models, and methods). This stratification of reality leads to semi-autonomous scientific disciplines and sub-disciplines. A common challenge is understanding the relationship between emergent properties observed at the macroscopic scale (the whole system) and what is known about the microscopic scale: the components and their interactions. A key and profound insight is to identify a relevant emergent mesoscopic scale (i.e., a scale intermediate between the macro- and micro- scales) at which new entities emerge and interact with one another weakly. In different words, modular structures may emerge at the mesoscale. Key theoretical methods are the development and study of effective theories and toy models. Effective theories describe phenomena at a particular scale and sometimes can be derived from more microscopic descriptions. Toy models involve minimal degrees of freedom, interactions, and parameters. Toy models are amenable to analytical and computational analysis and may reveal the minimal requirements for an emergent property to occur. The Ising model is an emblematic toy model that elucidates not just critical phenomena but also key characteristics of emergence. Many examples are given from condensed matter physics to illustrate the characteristics of emergence. A wide range of areas of physics are discussed, including chaotic dynamical systems, fluid dynamics, nuclear physics, and quantum gravity. The ubiquity of emergence in other fields is illustrated by neural networks, protein folding, and social segregation. An emergent perspective matters for scientific strategy, as it shapes questions, choice of research methodologies, priorities, and allocation of resources. Finally, the elusive goal of the design and control of emergent properties is considered.

Saturday, August 2, 2025

Science job openings in sunny Brisbane, Australia

Bribie Island, just north of Brisbane.

The University of Queensland has just advertised several jobs that may be of interest to readers of this blog, particularly those seeking to flee the USA.

There is a junior faculty position for a theorist working at the interface of condensed matter, quantum chemistry, and quantum computing.

There is also a postdoc to work on the theory of strongly correlated electron systems with my colleagues Ben Powell and Carla Verdi.

There is a postdoc in experimental condensed matter, to work on scanning probe methods, such as STM, with my colleague Peter Jacobson.

Glasshouse Mountains. Just north of Brisbane.

Friday, July 25, 2025

Reviewing emergent computational abilities in Large Language Models

Two years ago, I wrote a post about a paper by Wei et al, Emergent Abilities of Large Language Models

Then last year, I posted about a paper Are Emergent Abilities of Large Language Models a Mirage? that criticised the first paper.

There is more to the story. The first paper has now been cited over 3,600 times. There is a helpful review of the state of the field.

Emergent Abilities in Large Language Models: A Survey

Leonardo Berti, Flavio Giorgi, Gjergji Kasneci

It begins with a discussion of what emergence is, quoting from Phil Anderson's More is Different article [which emphasised how new properties may appear when a system becomes large] and John Hopfield's Neural networks and physical systems with emergent collective computational abilities, which was the basis of his recent Nobel Prize. Hopfield stated

"Computational properties of use to biological organisms or the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons)."

Berti et al. observe, "Fast forward to the LLM era, notice how Hopfield's observations encompass all the computational tasks that LLMs can perform."

They discuss emergent abilities as in-context learning, defined as the "capability to generalise from a few examples to new tasks and concepts on which they have not been directly trained."

Here, I put this review in the broader context of the role of emergence in other areas of science.

Scales.

Simple scales that describe how large an LLM is include the amount of computation, the number of model parameters, and the size of the training dataset. More complicated measures of scale include the number of layers in a deep neural network and the complexity of the training tasks.

Berti et al. note that the emergence of new computational abilities does not just follow from increases in the simple scales but can be tied to the training process. I note that this subtlety is consistent with experience in biology. Simple scales would be the length of an amino acid chain in a protein or base pairs in a DNA molecule, the number of proteins in a cell or the number of cells in an organism. More subtle scales include the number of protein interactions in a proteome or gene networks in a cell. Deducing what the relevant scales are is non-trivial. Furthermore, as emphasised by Denis Noble and Robert Bishop, context matters, e.g., a protein may only have a specific function if it is located in a specific cell.

Novelty.

When they become sufficiently "large", LLMs have computational abilities that they were not explicitly designed for and that "small" versions do not have.

The emergent abilities range "from advanced reasoning and in-context learning to coding and problem-solving."

The original paper by Wei et al. listed 137 emergent abilities in an Appendix!

Berti et al. give another example.

"Chen et al. [15] introduced a novel framework called AgentVerse, designed to enable and study collaboration among multiple AI agents. Through these interactions, the framework reveals emergent behaviors such as spontaneous cooperation, competition, negotiation, and the development of innovative strategies that were not explicitly programmed."

An alternative to defining novelty in terms of a comparison of the whole to the parts is to compare properties of the whole to those of a random configuration of the system. The performance of some LLMs is near-random (e.g., random guessing) until a critical threshold is reached (e.g., in size) when the emergent ability appears.

Discontinuities.

Are there quantitative objective measures that can be used to identify the emergence of a new computational ability? Researchers are struggling to find agreed-upon metrics that show clear discontinuities. That was the essential point of Are Emergent Abilities of Large Language Models a Mirage?

In condensed matter physics, the emergence of a new state of matter is (usually) associated with symmetry breaking and an order parameter. Figuring out what the relevant broken symmetry and the order parameter often requires brilliant insight and may even lead to a Nobel Prize (Neel, Josephson, Ginzburg, Leggett,...) A similar argument can be made with respect to the development of the Standard Model of elementary particles and gauge fields. Furthermore, the discontinuities only exist in the thermodynamic limit (i.e., in the limit of an infinite system), and there are many subtleties associated with how the data from finite-size computer simulations should be plotted to show that the system really does exhibit a phase transition.

Unpredictability.

The observation of new computational abilities in LLMs was unanticipated and surprised many people, including the designers of the specific LLMs involved. This is similar to what happens in condensed matter physics, where new states of matter have mostly been discovered by serendipity.

Some authors seem surprised that it is difficult to predict emergent abilities. "While early scaling laws provided some insight, they often fail to anticipate discontinuous leaps in performance."

Given the largely "black box" nature of LLMs, I don't find it the unpredictability surprising. It is hard for condensed matter systems, and they are much better characterised and understood.

Modular structures at the mesoscale.

Modularity is a common characteristic of emergence. In a wide range of systems, from physics to biology to economics, a key step in the development of the theory of a specific emergent phenomenon has been the identification of a mesoscale (intermediate between the micro- and macro-scales) at which modular structures emerge. These modules interact weakly with one another, and the whole system can be understood in these terms. Identification of these structures and the effective theories describing them has usually required brilliant insight. An example is the concepts of quasiparticles in quantum many-body physics, pioneered by Landau.

Berti et al. do not mention the importance of this issue. However, they do mention that "functional modules emerge naturally during training" [Ref. 7,43,81,84] and that "specialised circuits activate at certain scaling thresholds [24]".

Modularity may be related to an earlier post, Why do deep learning algorithms work so well? In the training process, a neural network rids noisy input data of extraneous details...There is a connection between the deep learning algorithm, known as the "deep belief net" of Geoffrey Hinton, and renormalisation group methods (which can be key to identifying modularity and effective interactions).

Is emergence good or bad?

Undesirable and dangerous capabilities can emerge. Those observed include deception, manipulation, exploitation, and sycophancy.

These concerns parallel discussions in economics. Libertarians, the Austrian school, and Federich Hayek tend to see the emergence as only producing socially desirable outcomes, such as the efficiency of free markets [the invisible hand of Adam Smith]. However, emergence also produces bubbles and crashes and recessions.

Resistance to control

A holy grail is the design, manipulation, and control of emergent properties. This ambitious goal is promoted in materials science, medicine, engineering, economics, public policy, business management, and social activism. However, it largely remains elusive, arguably due to the complexity and unpredictability of the systems of interest. Emergent properties of LLMs may turn out to offer similar hopes, frustrations, and disappointments. We should try, but have realistic expectations.

Toy models.

This is not discussed in the review. As I have argued before, a key to understanding a specific emergent phenomenon is the development of toy models that illustrate the phenomenon and the possible essential ingredients for it to occur. The following paper may be a step in that direction.

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Ard A. Louis

In a similar vein, another possibly relevant paper is the review

Statistical Mechanics of Deep Learning

Yasaman Bahri, Jonathan Kadmon, Jeffrey Pennington1, Sam S. Schoenholz, Jascha Sohl-Dickstein and Surya Ganguli

They considered a toy model for the error landscape for a neural network, and show that the error function for a deep neural net of depth D corresponds to the energy function for a D-spin spherical spin glass. [Section 3.2 in their paper].

Tuesday, October 22, 2024

Colloquium on 2024 Nobel Prizes

This friday I am giving a colloquium for the UQ Physics department.

2024 Nobel Prizes in Physics and Chemistry: from biological physics to artificial intelligence and back

The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.” Half of the 2024 Chemistry prize was awarded to Dennis Hassabis and John Jumper for “protein structure prediction” using artificial intelligence. I will describe the physics background needed to appreciate the significance of the awardees work.

Hopfield proposed a simple theoretical model for how networks of neurons in a brain can store and recall memories. Hopfield drew on his background in and ideas from condensed matter physics, including the theory of spin glasses, the subject of the 2021 Physics Nobel Prize.

Hinton, a computer scientist, generalised Hopfield’s model, using ideas from statistical physics to propose a “Boltzmann machine” that used an artificial neural network to learn to identify patterns in data, by being trained on a finite set of examples.

For fifty years scientists have struggled with the following challenge in biochemistry: given the unique sequence of amino acids that make up a particular protein can the native structure of the protein be predicted? Hassabis, a computer scientist, and Jumper, a theoretical chemist, used AI methods to solve this problem, highlighting the power of AI in scientific research.

I will briefly consider some issues these awards raise, including the blurring of boundaries between scientific disciplines, tensions between public and corporate interests, research driven by curiosity versus technological advance, and the limits of AI in scientific research.

Here is my current draft of the slides.

Saturday, October 12, 2024

2024 Nobel Prize in Physics

I was happy to see John Hopfield was awarded the Nobel Prize in Physics for his work on neural networks. The award is based on this paper from 1982

Neural networks and physical systems with emergent collective computational abilities

One thing I find beautiful about the paper is how Hopfield drew on ideas about spin glasses (many competing interactions lead to many ground states and a complex energy landscape).

A central insight is that an efficient way to store the information describing multiple objects (different collective spin states in an Ising model) is in terms of the inter-spin interaction constants (J_ij's) in the Ising model. These are the "weights" that are trained/learned in computer neural nets.

It should be noted that Hopfield's motivation was not at all to contribute to computer science. It was to understand a problem in biological physics: what is the physical basis for associative memory?

I have mixed feelings about Geoffrey Hinton sharing the prize. On the one hand, in his initial work, Hinton used physics ideas (Boltzmann weights) to extend Hopfields ideas so they were useful in computer science. Basically, Hopfield considered a spin glass model at zero temperature and Hinton considered it at non-zero temperature. [Note, the temperature is not physical it is just a parameter in a Boltzmann probability distribution for different states of the neural network]. Hinton certainly deserves lots of prizes, but I am not sure a physics one is appropriate. His work on AI has certainly been helpful for physics research. But so have lots of other advances in computer software and hardware, and those pioneers did not receive a prize.

I feel a bit like I did with Jack Kilby getting a physics prize for his work on integrated circuits. I feel that sometimes the Nobel Committee just wants to remind the world how physics is so relevant to modern technology.

Ten years ago Hopfield wrote a nice scientific autobiography for Annual Reviews in Condensed Matter Physics,

Whatever Happened to Solid State Physics?

After the 2021 Physics Nobel to Parisi, I reflected on the legacy of spin glasses, including the work of Hopfield.

Aside: I once pondered whether a chemist will ever win the Physics prize, given that many condensed matter physicists have won the chemistry prize. Well now, we have had an electronic engineer and a computer scientist winning the Physics prize.

Another side: I think calling Hinton's network a Boltzmann machine is a scientific misnomer. I should add this to my list of people getting credit for things that did not do. Boltzmann never considered networks, spin glasses or computer algorithms. Boltzmann was a genius, but I don't think we should be attaching his name to everything that involves a Boltzmann distribution. To me, this is a bit like calling the Metropolis algorithm for Monte Carlo simulations the Boltzmann algorithm.

Thursday, September 26, 2024

The multi-faceted character of emergence (part 2)

In the previous post, I considered five different characteristics that are often associated with emergence and classified them as being associated with ontology (what is real and observable) rather than epistemology (what we believe to be true).

Below I consider five more characteristics: self-organisation, unpredictability, irreducibility, contextuality and downward causation, and intra-stratum closure.

6. Self-organisation

Self-organisation is not a property of the system but a mechanism that a theorist says causes an emergent property to come into being. Self-organisation is also referred to as spontaneous order.

In the social sciences self-organisation is sometimes referred to as an endogenous cause, in contrast to an exogenous cause. There is no external force or agent causing the order, in contrast to order that is imposed externally. For example, suppose that in a city there is no government policy about the price of a loaf of sliced wholemeal bread or on how many loaves that bakers should produce. It is observed that prices are almost always in the range of $4 to $5 per loaf, and that rarely are there bread shortages. This outcome is a result of the self-organisation of the free-market, and economists would say the price range and its stability has an endogenous cause. In contrast, if the government legislated the price range and the production levels that would be an exogenous cause. Friedrich Hayek emphasised the role of spontaneous order in economics. In biology, Stuart Kaufmann equates emergence with spontaneous order and self-organisation.

In physics, the periodicity of the arrangement of atoms in a crystal is a result of self-organisation and has an endogenous cause. In contrast, the periodicity of atoms in an optical lattice is determined by the laser physicist who creates the lattice and so has an exogenous cause.

Self-organisation shows how local interactions can produce global properties. In different words, short-range interactions can lead to long-range order. After decades of debate and study, the Ising model showed that this was possible. Other examples of self-organisation, include flocking of birds and teamwork in ant colonies. There is no director or leader but the system acts “as if” there is.

7. Unpredictability

Ernst Mayr (This is Biology, p.19) defines emergence as “in a structured system, new properties emerge at higher levels of integration that could not have been predicted from a knowledge of the lower-level components.” Philip Ball also defines emergence in terms of unpredictability (Quanta, 2024).

More broadly, in discussions of emergence, “prediction” is used in three different senses: logical prediction, historical prediction, and dynamical prediction.

Logical prediction (deduction) concerns whether one can predict (calculate) the emergent (novel) property of the whole system solely from a knowledge of all the properties of the parts of the system and their interactions. Logical predictability is one of the most contested characteristics of emergence. Sometimes “predict” is replaced with “difficult to predict”, “extremely difficult to predict”, “impossible to predict”, “almost impossible to predict”, or “possible in principle, but impossible in practice, to predict.”

As an aside, I note that philosophers distinguish between epistemological emergence and ontological emergence. They are associated with prediction that is "possible in principle, but difficult in practice" and "impossible in principle" respectively.

After an emergent property has been discovered experimentally sometimes it can be understood in terms of the properties of the system parts. In a sense “pre-diction” then becomes “post-diction.” An example is the BCS theory of superconductivity, which provided a posteriori, rather than a priori, understanding. In different words, development of the theory was guided by a knowledge of the phenomena that had already been observed and characterised experimentally. Thus, a keyword in the statement above about logical prediction is “solely”.

Historical prediction. Most new states of matter discovered by experimentalists were not predicted even though theorists knew the laws that the microscopic components of the system obeyed. Examples include superconductivity (elemental metals, cuprates, iron pnictides, organic charge transfer salts, …), superfluidity in liquid 4He, antiferromagnetism, quasicrystals, and the integer and fractional quantum Hall states.

There are a few exceptions where theorists did predict new states of matter. These include are Bose-Einstein Condensates (BECs) in dilute atomic gases and topological insulators, the Anderson insulator in disordered metals, the Haldane phase in even-integer quantum antiferromagnetic spin chains, and the hexatic phase in two dimensions. It should be noted that prediction of BECs and topological insulators were significantly helped that theorists could predict them starting with Hamiltonians of non-interacting particles. Furthermore, all of these predictions involved working with effective Hamiltonians. None started with microscopic Hamiltonians for specific materials.

Dynamical unpredictability concerns what it means in chaotic dynamical systems, where it relates to sensitivity to initial conditions. I do not see this as an example of emergence as it can occur in systems with only a few degrees of freedom. However, some authors do associate dynamical unpredictability with complexity and emergence.

8. Irreducibility and singularities

An emergent property cannot be reduced to properties of the parts, because if emergence is defined in terms of novelty, the parts do not have the property.

Emergence is also associated with the problem of theory reduction. Formally, this is the process where a more general theory reduces in a particular mathematical limit to a less general theory. For example, quantum mechanics reduces to classical mechanics in the limit where Planck’s constant goes to zero. Einstein’s theory of special relativity reduces to Newtonian mechanics in the limit where the speeds of massive objects become much less than the speed of light. Theory reduction is a subtle philosophical problem that is arguably poorly understood both by scientists [who oversimplify or trivialise it] and philosophers [who arguably overstate the problems it presents for science producing reliable knowledge]. Subtleties arise because the two different theories usually involve language and concepts that are "incommensurate" with one another.

Irreducibility is also related to the discontinuities and singularities associated with emergent phenomena. As emphasised independently by Hans Primas and Michael Berry, singularities occur because the mathematics of theory reduction involves singular asymptotic expansions. Primas illustrates this by considering a light wave incident on an object and producing a shadow. The shadow is an emergent property, well described by geometrical optics, but not by the more fundamental theory of Maxwell’s electromagnetism. The two theories are related in the asymptotic limit that the wavelength of light in Maxwell’s theory tends to zero. This example illustrates that theory reduction is compatible with the emergence of novelty. Primas also considers how the Born-Oppenheimer approximation, which is central to solid state theory and quantum chemistry, is associated with a singular asymptotic expansion (in the ratio of the mass of an electron to the mass of an atomic nuclei in the system).

Berry considers several other examples of theory reduction, including going from general to special relativity, from statistical mechanics to thermodynamics, and from viscous (Navier-Stokes) fluid dynamics to inviscid (Euler) fluid dynamics. He has discussed in detail how the caustics that occur in ray optics are an emergent phenomena and are associated with singular asymptotic expansions in the wave theory.

The philosopher of science Jeremy Butterfield showed rigorously that theory reduction occurred for four specific systems that exhibited emergence, defined by him as a novel and robust property. Thus, novelty is not sufficient for irreducibility.

9. Contextuality and downward causation

Any real system has a context. For example, it has boundary and an environment, both in time and space. In many cases the properties of the system are completely determined by the parts of the system and their interactions. Previous history and boundaries do not matter. However, in some cases the context may have a significant influence on the state of the system. An example is Rayleigh-Bernard convection cells and turbulent flow whose existence and nature are determined by the interaction of the fluid with the container boundaries. A biological example concerns what factors determine the structure, properties, and function that a particular protein (linear chain of amino acids) has. It is now known that the only factor is not just the DNA sequence that encodes for the amino acid sequence, in contradiction to some versions of the Central Dogma of molecular biology. Other factors may be the type of cell that contains the protein and the network of other proteins in which the particular protein is embedded. Context sometimes matters.

Supervenience is the idea that once the micro level is fixed, macro levels are fixed too. The examples above might be interpreted as evidence against supervenience. Supervenience is used to argue against “the possibility for mental causation above and beyond physical causation.”

Downward causation is sometimes equated with emergence, particularly in debates about the nature of consciousness. In the context of biology, Denis Noble defines downward causation as when higher level processes can cause changes in lower level properties and processes. He gives examples where physiological effects can switch on and off individual genes or signalling processes in cells, including maternal effects and epigenetics.

10. Intra-stratum closure: informational, causal, and computational

The ideas described below were recently developed by Rosas et al. from a computer science perspective. They defined emergence in terms of universality and discussed its relationship to informational closure, causal closure, and computational closure. Each of these are given a precise technical definition in their paper. Here I give the sense of their definitions. In considering a general system they do not pre-define the micro- and macro- levels of a system but consider how they might be defined so that universality holds, i.e., so that properties at the macro-level are independent of the details of the micro-level (i.e., are universal).

Informational closure means that to predict the dynamics of the system at the macroscale an observer does not need any additional information about the details of the system at the microscale. Equilibrium thermodynamics and fluid dynamics are examples.

Causal closure means that the system can be controlled at the macroscale without any knowledge of lower-level information. For example, changing the software code that is running on a computer allows one to reliably control the microstate of the hardware of the computer regardless of what is happening with the trajectories of electrons in the computer.

Computational closure is a more technical concept, being defined in terms of “a conceptual device called the ε-(epsilon) machine. This device can exist in some finite set of states and can predict its own future state on the basis of its current one... for an emergent system that is computationally closed, the machines at each level can be constructed by coarse-graining the components on just the level below: They are, “strongly lumpable.” "

Rosas et al., show that informational closure and causal closure are equivalent and that they are more restrictive than computational closure. It is not clear to me how these closures relate to novelty as a definition of emergence.

In summary, emergence means different things to different people. I have listed ten different characteristics that have been associated with emergent properties. They are not all equivalent and so when discussing emergence it is important to be clear about which characteristic one is using to define emergence.

Tuesday, September 24, 2024

The multi-faceted character of emergence (part 1)

There is more to emergence than novel properties, i.e., where a whole system has a property that the individual components of the system do not have. Here I focus on emergent properties, but in most cases “property” might be replaced with state, phenomenon, or entity. I now discuss ten characteristics often associated with emergence, beyond novelty. Some people include one or more of these characteristics in their definitions of emergence. However, I do not include them in my definition because as I explain some of the characteristics are contentious. Some may not be necessary or sufficient for novel system properties.

The first five characteristics discussed below might be classified as objective (i.e., observable properties of the system) and the second five as subjective (i.e., associated with how an investigator thinks about the system). In different words, the first five are mostly concerned with ontology (what is real) and the second five with epistemology (what we know). The first five characteristics concern discontinuities, universality, diversity, mesoscales, and modification of parts. The second five concern self-organisation, unpredictability, irreducibility, downward causation, and closure.

1. Discontinuities

Quantitative changes in the system can become qualitative changes in the system. For example, in condensed matter physics spontaneous symmetry breaking only occurs in the thermodynamic limit (i.e., when the number of particles of the system becomes infinite). More is different. Thus, as a quantitative change in the system size occurs the order parameter becomes non-zero. In a system that undergoes a phase transition at a non-zero temperature, a small change in temperature can lead to the appearance of order and to a new state of matter. For a first-order phase transition, there is discontinuity in properties such as the entropy and density. These discontinuities define a phase boundary in the pressure-temperature diagram. For continuous phase transitions the order parameter is a continuous function of temperature, becoming non-zero at the critical temperature. However the derivative with respect to temperature may be discontinuous and/or thermodynamic properties such as the specific heat and susceptibility associated with the order parameter may approach infinite as the critical temperature is approached.

Two different states of a system are said to be adiabatically connected if one can smoothly deform one state into the other and all the properties of the system also change smoothly. The case of the liquid-gas transition illustrates subtle issues about defining emergence. A discontinuity does not imply a qualitative difference (novelty). On the one hand, there is a discontinuity in the density and entropy of the system as the liquid-gas phase boundary is crossed in the pressure-temperature diagram. On the other hand, there is no qualitative difference between a gas and a liquid. There is only a quantitative difference: the density of the gas is less than the liquid. Albeit sometimes the difference is orders of magnitude. The liquid and gas state can be adiabatically connected. There is a path in the pressure-temperature phase diagram that can be followed to connect the liquid and gas states without any discontinuities in properties.

The ferromagnetic state also raises questions, as illustrated by a debate between Rudolf Peierls and Phil Anderson about whether ferromagnetism exhibits spontaneous symmetry breaking. Anderson argued that it did not as, in contrast to the antiferromagnetic state, a non-zero magnetisation (order parameter) occurs for finite systems and the magnetic order does not change the excitation spectrum, i.e., produce a Goldstone boson. On the other hand, singularities in properties at the Curie temperature (critical temperature for ferromagnetism) only exist in the thermodynamic limit. Also, a small change in the temperature, from just above the Curie temperature to below, can produce a qualitative change, a non-zero magnetisation.

2. Universality

Properties often referred to as emergent are universal in the sense that it is independent of many of the details of the parts of the system. There may be many different systems that can have a particular emergent property. For example, superconductivity is present in metals with a diverse range of crystal structures and chemical compositions.

Robustness is related to universality. If small changes are made to the composition of the system (for example replacing some of the atoms in the system with atoms of different chemical element) the novel property of the system is still present. In elementary superconductors, introducing non-magnetic impurity atoms has no effect on the superconductivity.

Universality is both a blessing and a curse for theory. Universality can make it easier to develop successful theories because it means that many details need not be included in a theory in order for it to successfully describe an emergent phenomenon. This is why effective theories and toy models can work even better than might be expected. Universality can make theories more powerful because they can describe a wider range of systems. For example, properties of elemental superconductors can be described by BCS theory and by Ginzburg-Landau theory, even though the materials are chemically and structurally diverse. The curse of universality for theory is that universality illustrates the problem of “under-determination of theory”, “over-fitting of data” and “sloppy theories” [Sethna et al.]. A theory can agree with the experiment even when the parameters used in the theory may be quite different from the actual ones. For example, the observed phase diagram of water can be reproduced, sometimes with impressive quantitative detail, by combining classical statistical mechanics with empirical force fields that assume water molecules can be treated purely being composed of point charges.

Suppose we start with a specific microscopic theory and calculate the macroscopic properties of the system, and they agree with experiment. It would then be tempting to think that we have the correct microscopic theory. However, universality suggests this may not be the case.

For example, consider the case of a gas of weakly interacting atoms or molecules. We can treat the gas particles as classical or quantum. Statistical mechanics gives exactly the same equation of state and specific heat capacity for both microscopic descriptions. The only difference may be the Gibbs paradox [the calculated entropy is not an extensive quantity] which is sensitive to whether or not the particles are treated as identical or not. Unlike the zeroth, first, and second law of thermodynamics, the third law does require that the microscopic theory be quantum. Laughlin discusses these issues in terms of “protectorates” that hide “ultimate causes” .

In some physical systems, universality can be defined in a rigorous technical sense, making use of the concepts and techniques of the renormalisation group and scaling. These techniques provide a method to perform coarse graining, to derive effective theories and effective interactions, and to define universality classes of systems. There are also questions of how universality is related to the robustness of strata, and the independence of effective theories from the coarse-graining procedure.

3. Diversity

Even when a system is composed of a small number of different components and interactions, the large number of possible stable states with qualitatively different properties that the system can have is amazing. Every snowflake is different. Water is found in 18 distinct solid states. All proteins are composed of linear chains of 20 different amino acids. Yet in the human body there are more than 100,000 different proteins and all perform specific biochemical functions. We encounter an incredible diversity of human personalities, cultures, and languages. A stunning case of diversity is life on earth. Billions of different plant and animal species are all an expression of different linear combinations of the four base pairs of DNA: A, G, T, and C.

This diversity is related to the idea that "simple models can describe complex behaviour". One example is Conway’s Game of Life. Another example is how simple Ising models with a few competing interactions can describe a devil's staircase of ground states or the multitude of different atomic orderings found in binary alloys.

Goldenfeld and Kadanoff defined complexity [emergence] as “structure with variations”. Holland (VSI) discusses “perpetual novelty” giving the example of the game of chess, where are typical game may involve the order of 1050 move sequences. “Motifs” are recurring patterns (sequences of moves) in games.

Condensed matter physics illustrates diversity with the many different states of matter that have been discovered. The underlying microscopics is “just” electrons and atomic nuclei interacting according to Coulomb’s law.

The significance of this diversity might be downplayed by saying that it is just a result of combinatorics. But such a claim overlooks the issue of the stability of the diverse states that are observed. In a system composed of many components each of which can take on a few states the number of possible states of the whole system grows exponentially with the number of components. For example, for a chain of ten amino acids there are 1013 different possible linear sequences. But this does not mean that all these sequences will produce a functional protein, i.e., a molecule that will fold rapidly (on the timescale of milliseconds) into a stable tertiary structure and perform a useful biochemical function such as catalysis of a specific chemical reaction or signal transduction.

4. Simple entities at the mesoscale

A key idea in condensed matter physics is that of quasi-particles. A system of strongly interacting particles may have excitations, seen in experiments such as inelastic neutron scattering and Angle Resolved PhotoElectron Spectroscopy (ARPES), that can be described as weakly interacting quasi-particles. These entities are composite particles, and have properties that are quantitatively different, and sometimes qualitatively different, from the microscopic particles. Sometimes this means that the scale (size) associated with the quasi-particles is intermediate between the micro- and the macro-scales, i.e., it is a mesoscale. The existence of quasi-particles leads naturally to the technique of constructing an effective Hamiltonian [effective theory] for the system where effective interactions describe the interactions between the quasi-particles.

The economist Herbert Simon argued that a characteristic of a complex system is that the system can be understood in terms of nearly decomposable units. Rosas et al., argue that emergence is associated with there being a scale at which the system is “strongly lumpable”. Denis Noble has highlighted how biological systems are modular, i.e., composed of simple interchangeable components.

5. Modification of parts and their relationships

Emergent properties are often associated with the state of the system exhibiting patterns, order, or structure, terms that may be used interchangeably. This reflects that there is a particular relationship (correlation) between the parts which is different to the relationships in a state without the emergent property. This relationship may also be reflected in a generalised rigidity. For example, in a solid applying a force on one surface results in all the atoms in the solid experiencing a force and moving together. The rigidity of the solid defines a particular relationship between the parts of the system.

Properties of the individual parts may also be different. For example, in a crystal single-atom properties such as electronic energy levels change quantitatively compared to their values for isolated atoms. Properties of finite subsystems are also modified, reflecting a change in interactions between the parts. For example, in a molecular crystal the frequencies associated with intramolecular atomic vibrations are different to their values for isolated molecules. However, emergence is a sufficient but not a necessary condition for these modifications. In gas and liquid states, novelty is not present but there are still such changes in the properties of the individual parts.

As stated at the beginning of this section the five characteristics above might be associated with ontology (what is real) and objective properties of the system that an investigator observes and depend less on what an observer thinks about the system. The next five characteristics might be considered to be more subjective, being concerned with epistemology (how we determine what is true). In making this dichotomy I do not want to gloss over the fuzziness of the distinction or of two thousand years of philosophical debates about the relationship between ontology and epistemology, or between reality and theory.

In the next post, I will discuss the remaining five characteristics: self-organisation, unpredictability, irreducibility, contextuality and downward causation, and intra-stratum closure.

Thanks for reading this far!

Monday, July 22, 2024

Clarity about the relationship of emergence, complexity, predictability, and universality

Emergence means different things to different people. Except, that practically everyone likes it! Or at least, likes using the word. Terms associated with emergence include novelty, unpredictability, universality, stratification, and self-organisation. We need to be clearer about what we mean by each of these terms and how they are related or unrelated. Significant progress is reported in a recent preprint.

Software in the natural world: A computational approach to hierarchical emergence

Fernando E. Rosas, Bernhard C. Geiger, Andrea I Luppi, Anil K. Seth, Daniel Polani, Michael Gastpar, Pedro A.M. Mediano

This preprint is the subject of a nice article in Quanta Magazine.

The New Math of How Large-Scale Order Emerges by Philip Ball

Ball defines emergence in terms of unpredictability. He states:

"Loosely, the behavior of a complex system might be considered emergent if it can’t be predicted from the properties of the parts alone."

He describes the work of Rosas et al. as follows,

"A complex system exhibits emergence, according to the new framework, by organizing itself into a hierarchy of levels that each operate independently of the details of the lower levels."

This is defining emergence in terms of universality. Rosas et al. use an analogy with software, which runs independently of the details of the hardware of the computer and does not depend on microscopic details such as electron dynamics.

There are three types of closure associated with emergence: informational, causal, and computational.

Informational closure means that to predict the dynamics of the system at the macroscale one does not need any additional information from the microscale.

Equilibrium thermodynamics is a nice example.

Causal closure means that the system can be controlled at the macroscale without any knowledge of lower-level information.

"Interventions we make at the macro level, such as changing the software code by typing on the keyboard, are not made more reliable by trying to alter individual electron trajectories."

"...we can use macroscopic variables like pressure and viscosity to talk about (and control) fluid flow, and knowing the positions and trajectories of individual molecules doesn’t add useful information for those purposes. And we can describe the market economy by considering companies as single entities, ignoring any details about the individuals that constitute them."

Computational closure is a more technical concept.

"a conceptual device called the ε-(epsilon) machine. This device can exist in some finite set of states and can predict its own future state on the basis of its current one. It’s a bit like an elevator, said Rosas; an input to the machine, like pressing a button, will cause the machine to transition to a different state (floor) in a deterministic way that depends on its past history — namely, its current floor, whether it’s going up or down and which other buttons were pressed already. Of course an elevator has myriad component parts, but you don’t need to think about them. Likewise, an ε-machine is an optimal way to represent how unspecified interactions between component parts “compute” — or, one might say, cause — the machine’s future state."

Aside: epsilon-machines featured significantly in my previous post about What is a complex system?

"Computational mechanics allows the web of interactions between a complex system’s components to be reduced to the simplest description, called its causal state."

"...for an emergent system that is computationally closed, the machines at each level can be constructed by coarse-graining the components on just the level below: They are, in the researchers’ terminology, “strongly lumpable.”"

In some sense, this may be related to the notion of quasiparticles and effective interactions in many-body physics.

Aside: In 1962, Herbert Simon identified hierarchies as an essential feature of complex systems, both natural and artificial. A key property of a level in the hierarchy is that it is nearly decomposable into smaller units, i.e., it can be viewed as a collection of weakly interacting units. The time required for the evolution of the whole system is significantly decreased due to the hierarchical character. The construction of an artificial complex system, such as a clock, is faster and more reliable if different units are first assembled separately and then the units are brought together into the whole. Simon argues that the reduction in time scales due to modularity is why biological evolution can occur on realistic time scales. The 1962 article is reprinted in The Sciences of the Artificial.

The paper by Rosas et al. is one of the most important ones I have encountered in the past few years. I am slowly digesting it.

The beauty of the paper that it is mathematically rigorous. All the concepts are precisely defined and the central results are actually theorems. This replaces the vagueness of most discussions of emergence, including by myself.

The paper has helpful figures and considers concrete examples including Ehrenfest's Urn, an Ising model with Glauber dynamics, and a Hopfield neural network model.

I thank Gerard Milburn for bringing the Quanta article to my attention.

Wednesday, April 3, 2024

Is biology better at computing than supercomputers?

Stimulated by discussions about the physics of learning machines with Gerard Milburn, I have been wondering about biomolecular machines such as proteins that do the transcription and translation of DNA in protein synthesis. These are rather amazing machines.

I found an article which considers a problem that is simpler than learning, computation.

The thermodynamic efficiency of computations made in cells across the range of life

Christopher P. Kempes, David Wolpert, Zachary Cohen and Juan Pérez-Mercader

It considers the computation of translating a random set of 20 amino acids into a specific string for a specific protein. Actual thermodynamic values are compared to a generalised Landauer bound for computation. Below is the punchline. (page 9)

Given that the average protein length is about 325 amino acids for 20 unique amino acids, we have that p_i=p=1/20³²⁵=1.46×10⁻⁴²³, where there are 20³²⁵ states, such that the initial entropy is , which gives the free energy change of kT(S_I−0)=4.03×10⁻¹⁸ (J) or 1.24×10⁻²⁰ (J per amino acid). This value provides a minimum for synthesizing a typical protein.

We can also calculate the biological value from the fact that if four ATP equivalents are required to add one amino acid to the polymer chain with a standard free energy of 47.7 (kJ mol⁻¹) for ATP to ADP, then the efficiency is 1.03×10⁻¹⁶ (J) or 3.17×10⁻¹⁹ (J per amino acid).

This value is about 26 times larger than the generalized Landauer bound.

These results illustrate that translation operates at an astonishingly high efficiency, even though it is still fairly far away from the Landauer bound. To put these results in context, it is interesting to note that the best supercomputers perform a bit operation at approximately 5.27×10⁻¹³ (J per bit). In other words, the cost of computation in supercomputers is about eight orders of magnitude worse than the Landauer bound of (J) for a bit operation, which is about six orders of magnitude less efficient than biological translation when both are compared to the appropriate Landauer bound. Biology is beating our current engineered computational thermodynamic efficiencies by an astonishing degree.

Tuesday, February 27, 2024

Emergence? in large language models (revised edition)

Last year I wrote a post about emergence in AI, specifically on a paper claiming evidence for a "phase transition" in Large Language Models' ability to perform tasks they were not designed for. I found this fascinating.

That paper attracted a lot of attention, even winning an award for the best paper at the conference at which it was presented.

Well, I did not do my homework. Even before my post, another paper called into question the validity of the original paper.

Are Emergent Abilities of Large Language Models a Mirage?

Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

we present an alternative explanation for [the claimed] emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance.

... we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.

One of the issues they suggest is responsible for the smooth behaviour is

the phenomenon known as neural scaling laws: empirical observations that deep networks exhibit power law scaling in the test loss as a function of training dataset size, number of parameters or compute

One of the papers they cite on power law scaling is below (from 2017).

Deep Learning Scaling is Predictable, Empirically

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

The figure below shows the power law scaling between the validation loss and the size of the training data set.

They note that these empirical power laws are yet to be explained.

I thank Gerard Milburn for ongoing discussions about this topic.

Tuesday, February 6, 2024

Four scientific reasons to be skeptical of AI hype

The hype about AI continues, whether in business or science. Undoubtedly, there is a lot of potential in machine learning, big data, and large language models. But that does not mean that the hype is justified. It is more likely to limit real scientific progress and waste a lot of resources.

My innate scepticism receives concrete support from an article from 2018 that gives four scientific reasons for concern.

Big data: the end of the scientific method?

Sauro Succi and Peter V. Coveney

The article might be viewed as a response to a bizarre article in 2008 by Chris Anderson, editor-in-chief at Wired, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

‘With enough data, the numbers speak for themselves, correlation replaces causation, and science can advance even without coherent models or unified theories’.

Here are the four scientific reasons for caution about such claims given by Succi and Coveney.

(i) Complex systems are strongly correlated, hence they do not (generally) obey Gaussian statistics.

The law of large numbers (central limit theorem) may not apply and rare events may dominate behaviour. For example, consider the power law decays observed in many complex systems. They are in sharp contrast to the rapid exponential decay in the Gaussian distribution. The authors state, "when rare events are not so rare, convergence rates can be frustratingly slow even in the face of petabytes of data."

(ii) No data are big enough for systems with strong sensitivity to data inaccuracies.

Big data and machine learning involve fitting data to a chosen function, such as a "cost function" with many parameters. That fitting involves a minimisation routine which acts on some sort of "landscape." If the landscape is smooth and minima are well-separated and not separated by too large of maxima then the routine may work. However, if the landscape is rough or the routine gets stuck in some metastable state there will be problems, such as over-fitting.

(iii) Correlation does not imply causation, the link between the two becoming exponentially fainter at increasing data size.

(iv) In a finite-capacity world, too much data is just as bad as no data.

In other words, it is all about curve fitting. The more parameters used the less likely for insight to be gained. Here the authors quote the famous aphorism, attributed to von Neumann and Fermi, "with four parameters I can fit an elephant and with five I can make his tail wiggle."

Aside: an endearing part of the article is the inclusion of tow choice quotes from C.S. Lewis

‘Once you have surrendered your brain, you've surrendered your life’ (paraphrased)

‘When man proclaims conquest of power of nature, what it really means is conquest of power of some men over other men’.

I commend the article to you and look forward to hearing your perspective. Is the criticism of AI hype fair? Are these four scientific reasons good grounds for concern.

Tuesday, January 16, 2024

Wading through AI hype about materials discovery

Discovering new materials with functional properties is hard, very hard. We need all the tools we can from serendipity to high-performance computing to chemical intuition.

At the end of last year, two back-to-back papers appeared in the luxury journal Nature.

Scaling deep learning for materials discovery

All the authors are at Google. They claim that they have discovered more than two million new materials with stable crystal structures using DFT-based methods and AI.

On Doug Natelson's blog there are several insightful comments on the paper about why to be skeptical about AI/DFT based "discovery".

Here are a few of the reasons my immediate response to this paper is one of skepticism.

It is published in Nature. Almost every "ground-breaking" paper I force myself to read is disappointing when you read the fine print.

It concerns a very "hot" topic that is full of hype in both the science and business communities.

It is a long way from discovering a stable crystal to finding that it has interesting and useful properties.

Calculating the correct relative stability of different crystal structures of complex materials can be incredibly difficult.

DFT-based methods fail spectacularly for the low-energy properties of quantum materials, such as cuprate superconductors. But, they do get the atomic structure and stability correct, which is the focus of this paper.

It is a big gap between discovering a material that has desirable technological properties to one that meets the demanding criteria for commercialisation.

The second paper combines AI-based predictions, similar to the paper above, with robots doing material synthesis and characterisation.

An autonomous laboratory for the accelerated synthesis of novel materials

[we] realized 41 novel compounds from a set of 58 targets including a variety of oxides and phosphates that were identified using large-scale ab initio phase-stability data from the Materials Project and Google DeepMind

These claims have already been undermined by a preprint from the chemistry departments at Princeton and UCL.

Challenges in high-throughput inorganic material prediction and autonomous synthesis

We discuss all 43 synthetic products and point out four common shortfalls in the analysis. These errors unfortunately lead to the conclusion that no new materials have been discovered in that work. We conclude that there are two important points of improvement that require future work from the community:

(i) automated Rietveld analysis of powder x-ray diffraction data is not yet reliable. Future improvement of such, and the development of a reliable artificial intelligence-based tool for Rietveld fitting, would be very helpful, not only to autonomous materials discovery, but also the community in general.

(ii) We find that disorder in materials is often neglected in predictions. The predicted compounds investigated herein have all their elemental components located on distinct crystallographic positions, but in reality, elements can share crystallographic sites, resulting in higher symmetry space groups and - very often - known alloys or solid solutions.

Life is messy. Chemistry is messy. DFT-based calculations are messy. AI is messy.

Given most discoveries of interesting materials often involve serendipity or a lot of trial and error, it is worth trying to do what the authors of these papers are doing. However, the field will only advance in a meaningful way when it is not distracted and diluted by hype and authors, editors, and referees demand transparency about the limitations of their work.

Friday, October 13, 2023

Emergent abilities in AI: large language models

The public release of ChatGPT was a landmark that surprised many people, both in the general public and researchers working in Artificial Intelligence. All of a sudden it seemed Large Language Models had capabilities that some thought were a decade away or even not possible. It is like the field underwent a "phase transition." This idea turns out to be more than just a physics metaphor. It has been made concrete and rigorous in the following paper.

Emergent Abilities of Large Language Models

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

They use the following definition, "Emergence is when quantitative changes in a system result in qualitative changes in behavior," citing Phil Anderson's classic "More is Different" article. [Even though the article does not contain the word emergence].

In this paper, we will consider a focused definition of emergent abilities of large language models:
An ability is emergent if it is not present in smaller models but is present in larger models.

How does one define the "size" or "scale" of a model? Wei et al., note that "Today’s language models have been scaled primarily along three factors: amount of computation, number of model parameters, and training dataset size."

The essence of the analysis in the paper is summarised as follows.

We first discuss emergent abilities in the prompting paradigm, as popularized by GPT-3 (Brown et al., 2020). In prompting, a pre-trained language model is given a prompt (e.g. a natural language instruction) of a task and completes the response without any further training or gradient updates to its parameters.

An example of a prompt is shown below

Brown et al. (2020) proposed few-shot prompting, which includes a few input-output examples in the model’s context (input) as a preamble before asking the model to perform the task for an unseen inference-time example.

The ability to perform a task via few-shot prompting is emergent when a model has random performance until a certain scale, after which performance increases to well-above random.

An example is shown in the Figure below. The horizontal axis is the number of training FLOPs for the model, a measure of model scale. The vertical axis measures the accuracy of the model to perform a task, Modular Arithmetic, for which the model was not designed, but just given two-shot prompting. The red dashed line is the performance for a random model. The purple data is for GPT-3 and the blue for LaMDA. Note how once the model scale reaches about 10^22 there is a rapid onset of ability.

The Figure below summarises recent results from a range of research groups studying five different language model families. It shows eight different emergent abilities.

Wei et al., point out that "there are currently few compelling explanations for why such abilities emerge the way that they do".

The authors have encountered some common characteristics of emergent properties. They are hard to predict or anticipate before they are observed. They are often universal, i.e., they can occur in a wide range of different systems and are not particularly sensitive to the details of the components. Even after emergent properties are observed, it is still hard to explain why they occur, even when one has a good understanding of the properties of the system at a smaller scale. Superconductivity was observed in 1911 and only explained in 1957 by the BCS theory.

On the positive side, this paper presents hope that computational science and technology are at the point where AI may produce more exciting capabilities. On the negative side, there is also the possibility of significant societal risks such as having unanticipated power to create and disseminate false information, bias, and toxicity.

Aside: One thing I found surprising is that the authors did not reference John Holland, a computer scientist, and his book, Emergence.

I thank Gerard Milburn for bringing the paper to my attention.

Condensed concepts