Friday, March 8, 2024

Emergence and the stratification of physics into sub-fields

The concept of emergence is central to understanding sub-fields of physics and how they are related, and not related, to other sub-fields.

The table below shows a stratum of sub-disciplines of physics. For each strata there are a range of length, time, and energy scales that are relevant. There are distinct entities that are composed of the entities from lower strata. These composite entities interact with one another via effective interactions that arise due to the interactions present at lower strata and can be described by an effective theory. Each sub-discipline of physics is semi-autonomous. Collective phenomena associated with a single strata can be studied, described, and understood without reference to lower strata.

Table entries are not meant to be exhaustive but to illustrate how emergence is central to understanding sub-fields of physics and how they are related to one another.

What do you think of the table? Is it helpful? Have you seen something like this before?

I welcome suggestions about entries that I could add.

Tuesday, March 5, 2024

An illusion of purpose in emergent phenomena?

 A characteristic of emergent phenomena in a system of many interacting parts is that they exhibit collective behaviour where it looks like the many parts are "dancing to the same tune". But who is playing the music, who chose it, and who conducts the orchestra?

Consider the following examples.

1. A large group of starlings move together in what appears to be a coherent fashion. Yet, no lead starling is telling all the starlings how and where to move, according to some clever flight plan to avoid a predator. Studies of flocking [murmuration] have shown that each of the starlings just moves according to the motion of a few of their nearest neighbours. Nevertheless, the flock does move in a coherent fashion "as if" there is a lead starling or air traffic controller making sure all the planes stick to their flight plan.

2. You can buy a freshly baked loaf of bread at a local bakery every day. Why? Thousands of economic agents, from farmers to truck drivers to accountants to the baker, make choices and act based on limited local information. Their interactions are largely determined by the mechanism of prices and commercial contracts. In a market economy, no director of national bread supplies who co-ordinates the actions of all of these agents. Nevertheless, you can be confident that each morning you will be able to buy the loaf you want. The whole system acts in a co-ordinated manner "as if" it has a purpose: to reliably supply affordable high-quality bread.

3. A slime mould spreads over a surface containing food supplies with spatial locations and sizes similar to that of the cities surrounding Tokyo. After a few hours, the spread of the mould has reorganised so that it is focussed on paths that are similar to the routes of the Tokyo rail network. Moulds have no brain or computer chip but they can solve optimisation problems, such as finding the shortest path through a complex maze. In nature, this problem-solving ability has the advantage that it allows them to efficiently locate sources of food and nutrients. Slime moulds act "as if" they have a brain.

A biologist Michael Levin discusses the issue of intelligence in very small and primitive biological systems in a recent article, Collective Intelligence of Morphogenesis as a Teleonomic Process

[I first became aware of Levin's work through a podcast episode brought to my attention by Gerard Milburn. The relevant discussion starts around 36 minutes].

The emphasis on "as if" I have taken from Thomas Schelling in the opening chapter of his beautiful book, Micromotives and Macrobehaviour.

He also mentions the example of Fermat's principle in optics: the path light takes as it travels between two spatially separated points is the path for which the travel time is an extremum [usually a minimum]. The light travels "as if" it has the purpose of finding this extremum. 

[Aside: according to Wikipedia, 

"Fermat's principle was initially controversial because it seemed to ascribe knowledge and intent to nature. Not until the 19th century was it understood that nature's ability to test alternative paths is merely a fundamental property of waves."

Similar issues of knowledge/intent/purpose arise when considering the motion of a classical particle moving between two spatial points. It takes the path for which the value of the action [time integral of the Lagrangian along a path] has an extremal value relative to all possible paths. I suspect that the path integral formulation of quantum theory is required to solve the "as if" problem. Any alternative suggestions?

Tuesday, February 27, 2024

Emergence? in large language models (revised edition)

Last year I wrote a post about emergence in AI, specifically on a paper claiming evidence for a "phase transition" in Large Language Models' ability to perform tasks they were not designed for. I found this fascinating.

That paper attracted a lot of attention, even winning an award for the best paper at the conference at which it was presented.

Well, I did not do my homework. Even before my post, another paper called into question the validity of the original paper.

Are Emergent Abilities of Large Language Models a Mirage?

Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

we present an alternative explanation for [the claimed] emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance.

... we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.

One of the issues they suggest is responsible for the smooth behaviour is 

 the phenomenon known as neural scaling laws: empirical observations that deep networks exhibit power law scaling in the test loss as a function of training dataset size, number of parameters or compute  

One of the papers they cite on power law scaling is below (from 2017).

Deep Learning Scaling is Predictable, Empirically

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

The figure below shows the power law scaling between the validation loss and the size of the training data set.

They note that these empirical power laws are yet to be explained.

I thank Gerard Milburn for ongoing discussions about this topic.


Friday, February 16, 2024

Launching my book in a real physical bookshop

Physical bookstores selling physical books are in decline, sadly. Furthermore, the stores that are left are mostly big chains. Brisbane does have an independent bookstore, Avid Reader, in the West End. It is a vibrant part of the local community and has several author events every week.


My daughter persuaded me to do a book launch, for Condensed Matter Physics: A Very Short Introduction (Oxford UP, 2023) 

 

It is at Avid Reader on Monday, February 26, beginning at 6 pm.


Most readers of this blog are not in Brisbane, but if you are or know people who are please encourage them to consider attending.

The event is free but participants need to register, as space is limited.

 

I will be in conversation about the book with my friend, Dr Christian Heim, an author, composer, and psychiatrist. Like the book, the event is meant for a general audience.


  

Friday, February 9, 2024

The role of effective theories and toy models in understanding emergent properties

Two of the approaches to the theoretical description of systems with emergent properties that have been fruitful are effective theories and toy models. These leverage our limited knowledge of many details about a system with many interacting components.

Effective theories

An effective theory is valid at a particular range of scales. This exploits the fact that in complex systems there is often a hierarchy of scales (length, energy, time, or number). In physics, examples of effective theories include classical mechanics, general relativity, classical electromagnetism, and thermodynamics. The equations of an effective theory can be written down almost solely from consideration of symmetry and conservation laws. Examples include the Navier-Stokes equations for fluid dynamics and non-linear sigma models in elementary particle physics. Some effective theories can be derived by the “coarse-graining” of theories that are valid at a finer scale. For example, the equations of classical mechanics result from taking the limit of Planck’s constant going to zero in the equations of quantum mechanics. The Ginzburg-Landau theory for superconductivity can be derived from the BCS theory. The parameters in effective theories may be determined from more microscopic theories or from fitting experimental data to the predictions of the theory. For example, transport coefficients such as conductivities can be calculated from a microscopic theory using a Kubo formula.

Effective theories are useful and powerful because of the minimal assumptions and parameters used in their construction. For the theory to be useful it is not necessary to be able to derive the effective theory from a smaller scale theory, or even to have such a smaller scale theory. For example, even though there is no accepted quantum theory of gravity, general relativity can be used to describe phenomena in astrophysics and cosmology and is accepted to be valid on the macroscopic scale. Some physicists and philosophers may consider smaller-scale theories as more fundamental, but that is contested and so I will not use that language. There also are debates about how effective field theories fit into the philosophy of science.

Toy models

In his 2016 Nobel Lecture, Duncan Haldane said, “Looking back, … I am struck by how important the use of stripped down “toy models” has been in discovering new physics.” 

Here I am concerned with a class of theoretical models that includes the Ising, Hubbard, Agent-Based Models, NK, Schelling, and Sherrington-Kirkpatrick models. I refer to them as “toy” models because they aim to be as simple as possible, while still capturing the essential details of a particular emergent phenomenon. At the scale of interest, the model is an approximation, neglecting certain degrees of freedom and interactions. In contrast, at the relevant scale, effective theories are often considered to be exact because they are based on general principles.

Historical experience has shown that there is a strong justification for the proposal and study of toy models. They are concerned with a qualitative, rather than a quantitative, description of experimental data. A toy model is usually introduced to answer basic questions about what is possible. What are the essential ingredients that are sufficient for an emergent phenomena to occur? What details do matter? For example, the Ising model was introduced in 1920 to see if it was possible for statistical mechanics to describe the sharp phase transition associated with ferromagnetism.  

In his book The Model Thinker and online course Model Thinking, Scott Page has enumerated the value of simple models in the social sciences. An earlier argument for their value in biology was put by JBS Haldane in his seminal article about “bean bag” genetics. Simplicity makes toy models more tractable for mathematical analysis and/or computer simulation. The assumptions made in defining the model can be clearly stated. If the model is tractable then the pure logic associated with mathematical analysis leads to reliable conclusions. This contrasts with the qualitative arguments often used in the biological and social sciences to propose explanations. Such arguments can miss the counter-intuitive conclusions associated with emergent phenomena and the rigorous analysis of toy models. Such models can show what is possible, what are simple ingredients for a system sufficient to exhibit an emergent property, and how a quantitative change can lead to a qualitative change. In different words, what details do matter? 

Toy models can guide what experimental data to gather and how to analyse it. Insight can be gained by considering multiple models as that approach can be used to rule out alternative hypotheses. Finally, there is value in the adage, “all models are wrong, but some are useful.”

Due to universality, sometimes toy models work better than expected, and can even give a quantitative description of experimental data. An example is the three-dimensional Ising model, which was eventually found to be consistent with data on the liquid-gas transition near the critical point. Although, not a magnetic system, the analogy was bolstered by the mapping of the Ising model onto the lattice gas model. This success led to a shift in the attitude of physicists towards the Ising model. According to Martin Niss, from 1920-1950, it was viewed as irrelevant to magnetism because it did not describe magnetic interactions quantum mechanically. This was replaced with the view that it was a model that could give insights into collective phenomena. From 1950-1965, the view diminished that the Ising model was irrelevant to describing critical phenomena because it oversimplified the microscopic interactions.

Physicists are particularly good and experienced at the proposal and analysis of toy models. I think this expertise is a niche that they could exploit more in contributing to other fields, from biology to the social sciences. They just need humility to listen to non-physicists about what the important questions and essential details are.

Tuesday, February 6, 2024

Four scientific reasons to be skeptical of AI hype

The hype about AI continues, whether in business or science. Undoubtedly, there is a lot of potential in machine learning, big data, and large language models. But that does not mean that the hype is justified. It is more likely to limit real scientific progress and waste a lot of resources.

My innate scepticism receives concrete support from an article from 2018 that gives four scientific reasons for concern.

Big data: the end of the scientific method? 

Sauro Succi and Peter V. Coveney

The article might be viewed as a response to a bizarre article in 2008 by Chris Anderson, editor-in-chief at Wired, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

‘With enough data, the numbers speak for themselves, correlation replaces causation, and science can advance even without coherent models or unified theories’.

Here are the four scientific reasons for caution about such claims given by Succi and Coveney.

(i) Complex systems are strongly correlated, hence they do not (generally) obey Gaussian statistics.

The law of large numbers (central limit theorem) may not apply and rare events may dominate behaviour. For example, consider the power law decays observed in many complex systems. They are in sharp contrast to the rapid exponential decay in the Gaussian distribution. The authors state, "when rare events are not so rare, convergence rates can be frustratingly slow even in the face of petabytes of data."

(ii) No data are big enough for systems with strong sensitivity to data inaccuracies.

Big data and machine learning involve fitting data to a chosen function, such as a "cost function" with many parameters. That fitting involves a minimisation routine which acts on some sort of "landscape." If the landscape is smooth and minima are well-separated and not separated by too large of maxima then the routine may work. However, if the landscape is rough or the routine gets stuck in some metastable state there will be problems, such as over-fitting.

(iii) Correlation does not imply causation, the link between the two becoming exponentially fainter at increasing data size.  

(iv) In a finite-capacity world, too much data is just as bad as no data.

In other words, it is all about curve fitting. The more parameters used the less likely for insight to be gained. Here the authors quote the famous aphorism, attributed to von Neumann and Fermi, "with four parameters I can fit an elephant and with five I can make his tail wiggle."

Aside: an endearing part of the article is the inclusion of tow choice quotes from C.S. Lewis
‘Once you have surrendered your brain, you've surrendered your life’ (paraphrased)

‘When man proclaims conquest of power of nature, what it really means is conquest of power of some men over other men’.

I commend the article to you and look forward to hearing your perspective. Is the criticism of AI hype fair? Are these four scientific reasons good grounds for concern. 

Thursday, January 25, 2024

Emergence and the Ising model

The Ising model is emblematic of “toy models” that have been proposed and studied to understand and describe emergent phenomena. Although originally proposed to describe ferromagnetic phase transitions, variants of it have found application in other areas of physics, and in biology, economics, sociology, neuroscience, complexity theory, …  

Quanta magazine had a nice article marking the model's centenary.

In the general model there is a set of lattice points {i} with a “spin” {sigma_i = +/-1} and a Hamiltonian

where h is the strength of an external magnetic field and J_ij is the strength of the interaction between the spins on sites i and j. The simplest models are where the lattice is regular, and the interaction is uniform and only non-zero for nearest-neighbour sites.

The Ising model illustrates many key features of emergent phenomena. Given the relative simplicity of the model, exhaustive studies since its proposal in 1920, have given definitive answers to questions often debated about more complex systems. Below I enumerate some of these insights: novelty, quantitative change leads to qualitative change, spontaneous order, singularities, short-range interactions can produce long-range order, universality, three horizons/scales of interest, self-similarity, inseparable horizons, and simple models can describe complex behaviour.

Most of these properties can be illustrated with the case of the Ising model on a square lattice with only nearest-neighbour interactions (J_ij = J). Above the critical temperature (Tc = 2.25J), and in the absence of an external magnetic field the system has no net magnetisation. Below Tc, at net magnetisation occurs. For J > 0 (J < 0) this state is ferromagnetic (antiferromagnetic).

Novelty

The state of the system below Tc is qualitatively different than that at very high temperatures or the state of a set of non-interacting spins. Thus, the non-zero magnetisation is an emergent property, as defined in this post. This state is also associated with spontaneous symmetry breaking and more than one possible equilibrium state, i.e., the magnetisation can be positive or negative.

Quantitative change leads to qualitative change

The qualitative change associated with formation of the magnetic state can occur with a small quantitative change in the value of the ratio T/J, i.e., either by decreasing T or increasing J. Formation of the magnetic state is also associated with the quantitative change of increasing the number of spins from a large finite number to infinity. 

Singularities

For a finite number of spins all the thermodynamic properties of the system are an analytic function of the temperature and magnitude of an external field. However, in the thermodynamic limit, these properties become singular at T=Tc and h=0. This is the critical point in the phase diagram of h versus T. Some of the quantities, such as the specific heat capacity and the magnetic susceptibility, become infinite at the critical point. These singularities are characterised by critical exponents, most of which have non-integer values. Consequently, the free energy of the system is not an analytic function of T and h.

Spontaneous order

The magnetic state occurs spontaneously. The system self-organises. There is no external field causing the magnetic state to form. There is long-range order, i.e., the value of spins that are infinitely apart from one another are correlated. 

Short-range interactions can produce long-range order.

Although there is no direct long-range interaction between spins, long-range order can occur. Prior to Onsager’s exact solution of the two-dimensional model, many scientists were not convinced that this was possible.

Universality

The values of the critical exponents are independent of many details of the model, such as the value of J, the lattice constant and spatial anisotropy, and the presence of small interactions beyond nearest neighbour. Many details do not matter. This is why the model can give a quantitative description of experimental data near the critical temperature, even though the model Hamiltonian is a crude descriptions of the interactions in a real material. It can describe not only magnetic transitions but also transitions in liquid-gas, binary alloys, and binary liquid mixtures.

Three horizons/scales of interest

There are three important length scales associated with the model. Two are simple: the lattice constant, and the size of the whole lattice. These are the microscopic and macroscopic scale. The third scale is emergent and temperature dependent: the correlation length, i.e., the distance over which spins are correlated with one another. This can also be visualised as the size of magnetisation domains seen in Monte Carlo simulations. 

The left, centre, and right panels above show a snapshot of a likely configuration of the system at a temperature less than, equal to, and greater than the critical temperature, Tc, respectively.

Understanding the connection between the microscopic and macroscopic properties of the system requires studying the system at the intermediate scale of the correlation length. This scale also defines emergent entities [magnetic domains] that interact with one another weakly and via an effective interaction.

Self-similarity

At the critical temperature, the correlation length is infinite. Consequently, rescaling the size of the system, as in a renormalisation group transformation, the state of the systems does not change. The system is said to be scale-free or self-similar like a fractal pattern. This is an example of self-organised criticality.

Inseparable horizons

I now consider how things change when the topology or dimensionality of the lattice changes or when interactions beyond nearest neighbours are added. This can change the relationships between the parts and the whole. Some details of the parts matter. Changing from a two-dimensional rectangular lattice to a linear chain the ordered state disappears. Changing to a triangular lattice with antiferromagnetic nearest-neighbour interactions removes the ordering at finite temperature and there are an infinite number of ground states at zero temperature. Thus, some microscopic details do matter.

The main point of this example is that to understand a large complex system we have to keep both the parts and the whole in mind. It is not either/or but both/and. Furthermore, there may be an intermediate scale, at which new entities emerge.

Aside: I suspect heated debates about structuralism versus functionalism in social sciences, and the humanities are trying to defend intellectual positions (and fashions) that overlook the inseparable interplay of the microscopic and macroscopic that the Ising model captures.

Simple models can describe complex behaviour

Now consider an Ising model with competing interactions, i.e. the neighbouring spins of a particular spin compete with one another and with an external magnetic field to determine the sign of the spin. This can be illustrated with the an Ising model on a hexagonal close packed (hcp) lattice with nearest neighbour antiferromagnetic interactions and an external magnetic field. The lattice is frustrated and can be viewed as layers of hexagonal (triangular) lattices where each layer is displaced relative to one another.

This model has been studied by materials scientists as it can describe the many possible phases of binary alloys, AxB1-x, where A and B are different chemical elements (for example, silver and gold) and the Ising spins on site i has value +1 or -1, corresponding to the presence of atom A or B on that site. The magnetic field corresponds to the difference in the chemical potentials of A and B, and is related to their relative concentration.

The authors studied the Ising model on the hexagonal close-packed (hcp) lattice in a magnetic field. The authors are all from materials science departments and are motivated by the fact that the problem of binary alloys AxB1_x can be mapped onto an Ising model. A study of this model found rich phase diagrams including 32 stable ground states with stoichiometries, including A, AB, A2B, A3B, A5B, and A4B3. Even for a single stoichiometry, there can be multiple possible distinct orderings (and crystal structures). Of these structures, six are stabilized by purely nearest-neighbour interactions, eight by addition of next-nearest neighbour interactions. The remaining 18 structures require multiplet interactions for their stability. 

A second example is the Anisotropic Next-Nearest Neighbour Ising (ANNNI) model, which supports a plethora of ordered states, including a phase diagram with a fractal structure, known as the Devil’s staircase.

These two Ising models illustrate how relatively simple models, containing competing interactions (described by just a few parameters) can describe rich behaviour, particularly a diversity of ground states.