Tuesday, February 27, 2024

Emergence? in large language models (revised edition)

Last year I wrote a post about emergence in AI, specifically on a paper claiming evidence for a "phase transition" in Large Language Models' ability to perform tasks they were not designed for. I found this fascinating.

That paper attracted a lot of attention, even winning an award for the best paper at the conference at which it was presented.

Well, I did not do my homework. Even before my post, another paper called into question the validity of the original paper.

Are Emergent Abilities of Large Language Models a Mirage?

Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

we present an alternative explanation for [the claimed] emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance.

... we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.

One of the issues they suggest is responsible for the smooth behaviour is 

 the phenomenon known as neural scaling laws: empirical observations that deep networks exhibit power law scaling in the test loss as a function of training dataset size, number of parameters or compute  

One of the papers they cite on power law scaling is below (from 2017).

Deep Learning Scaling is Predictable, Empirically

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

The figure below shows the power law scaling between the validation loss and the size of the training data set.

They note that these empirical power laws are yet to be explained.

I thank Gerard Milburn for ongoing discussions about this topic.


Friday, February 16, 2024

Launching my book in a real physical bookshop

Physical bookstores selling physical books are in decline, sadly. Furthermore, the stores that are left are mostly big chains. Brisbane does have an independent bookstore, Avid Reader, in the West End. It is a vibrant part of the local community and has several author events every week.


My daughter persuaded me to do a book launch, for Condensed Matter Physics: A Very Short Introduction (Oxford UP, 2023) 

 

It is at Avid Reader on Monday, February 26, beginning at 6 pm.


Most readers of this blog are not in Brisbane, but if you are or know people who are please encourage them to consider attending.

The event is free but participants need to register, as space is limited.

 

I will be in conversation about the book with my friend, Dr Christian Heim, an author, composer, and psychiatrist. Like the book, the event is meant for a general audience.


  

Friday, February 9, 2024

The role of effective theories and toy models in understanding emergent properties

Two of the approaches to the theoretical description of systems with emergent properties that have been fruitful are effective theories and toy models. These leverage our limited knowledge of many details about a system with many interacting components.

Effective theories

An effective theory is valid at a particular range of scales. This exploits the fact that in complex systems there is often a hierarchy of scales (length, energy, time, or number). In physics, examples of effective theories include classical mechanics, general relativity, classical electromagnetism, and thermodynamics. The equations of an effective theory can be written down almost solely from consideration of symmetry and conservation laws. Examples include the Navier-Stokes equations for fluid dynamics and non-linear sigma models in elementary particle physics. Some effective theories can be derived by the “coarse-graining” of theories that are valid at a finer scale. For example, the equations of classical mechanics result from taking the limit of Planck’s constant going to zero in the equations of quantum mechanics. The Ginzburg-Landau theory for superconductivity can be derived from the BCS theory. The parameters in effective theories may be determined from more microscopic theories or from fitting experimental data to the predictions of the theory. For example, transport coefficients such as conductivities can be calculated from a microscopic theory using a Kubo formula.

Effective theories are useful and powerful because of the minimal assumptions and parameters used in their construction. For the theory to be useful it is not necessary to be able to derive the effective theory from a smaller scale theory, or even to have such a smaller scale theory. For example, even though there is no accepted quantum theory of gravity, general relativity can be used to describe phenomena in astrophysics and cosmology and is accepted to be valid on the macroscopic scale. Some physicists and philosophers may consider smaller-scale theories as more fundamental, but that is contested and so I will not use that language. There also are debates about how effective field theories fit into the philosophy of science.

Toy models

In his 2016 Nobel Lecture, Duncan Haldane said, “Looking back, … I am struck by how important the use of stripped down “toy models” has been in discovering new physics.” 

Here I am concerned with a class of theoretical models that includes the Ising, Hubbard, Agent-Based Models, NK, Schelling, and Sherrington-Kirkpatrick models. I refer to them as “toy” models because they aim to be as simple as possible, while still capturing the essential details of a particular emergent phenomenon. At the scale of interest, the model is an approximation, neglecting certain degrees of freedom and interactions. In contrast, at the relevant scale, effective theories are often considered to be exact because they are based on general principles.

Historical experience has shown that there is a strong justification for the proposal and study of toy models. They are concerned with a qualitative, rather than a quantitative, description of experimental data. A toy model is usually introduced to answer basic questions about what is possible. What are the essential ingredients that are sufficient for an emergent phenomena to occur? What details do matter? For example, the Ising model was introduced in 1920 to see if it was possible for statistical mechanics to describe the sharp phase transition associated with ferromagnetism.  

In his book The Model Thinker and online course Model Thinking, Scott Page has enumerated the value of simple models in the social sciences. An earlier argument for their value in biology was put by JBS Haldane in his seminal article about “bean bag” genetics. Simplicity makes toy models more tractable for mathematical analysis and/or computer simulation. The assumptions made in defining the model can be clearly stated. If the model is tractable then the pure logic associated with mathematical analysis leads to reliable conclusions. This contrasts with the qualitative arguments often used in the biological and social sciences to propose explanations. Such arguments can miss the counter-intuitive conclusions associated with emergent phenomena and the rigorous analysis of toy models. Such models can show what is possible, what are simple ingredients for a system sufficient to exhibit an emergent property, and how a quantitative change can lead to a qualitative change. In different words, what details do matter? 

Toy models can guide what experimental data to gather and how to analyse it. Insight can be gained by considering multiple models as that approach can be used to rule out alternative hypotheses. Finally, there is value in the adage, “all models are wrong, but some are useful.”

Due to universality, sometimes toy models work better than expected, and can even give a quantitative description of experimental data. An example is the three-dimensional Ising model, which was eventually found to be consistent with data on the liquid-gas transition near the critical point. Although, not a magnetic system, the analogy was bolstered by the mapping of the Ising model onto the lattice gas model. This success led to a shift in the attitude of physicists towards the Ising model. According to Martin Niss, from 1920-1950, it was viewed as irrelevant to magnetism because it did not describe magnetic interactions quantum mechanically. This was replaced with the view that it was a model that could give insights into collective phenomena. From 1950-1965, the view diminished that the Ising model was irrelevant to describing critical phenomena because it oversimplified the microscopic interactions.

Physicists are particularly good and experienced at the proposal and analysis of toy models. I think this expertise is a niche that they could exploit more in contributing to other fields, from biology to the social sciences. They just need humility to listen to non-physicists about what the important questions and essential details are.

Tuesday, February 6, 2024

Four scientific reasons to be skeptical of AI hype

The hype about AI continues, whether in business or science. Undoubtedly, there is a lot of potential in machine learning, big data, and large language models. But that does not mean that the hype is justified. It is more likely to limit real scientific progress and waste a lot of resources.

My innate scepticism receives concrete support from an article from 2018 that gives four scientific reasons for concern.

Big data: the end of the scientific method? 

Sauro Succi and Peter V. Coveney

The article might be viewed as a response to a bizarre article in 2008 by Chris Anderson, editor-in-chief at Wired, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

‘With enough data, the numbers speak for themselves, correlation replaces causation, and science can advance even without coherent models or unified theories’.

Here are the four scientific reasons for caution about such claims given by Succi and Coveney.

(i) Complex systems are strongly correlated, hence they do not (generally) obey Gaussian statistics.

The law of large numbers (central limit theorem) may not apply and rare events may dominate behaviour. For example, consider the power law decays observed in many complex systems. They are in sharp contrast to the rapid exponential decay in the Gaussian distribution. The authors state, "when rare events are not so rare, convergence rates can be frustratingly slow even in the face of petabytes of data."

(ii) No data are big enough for systems with strong sensitivity to data inaccuracies.

Big data and machine learning involve fitting data to a chosen function, such as a "cost function" with many parameters. That fitting involves a minimisation routine which acts on some sort of "landscape." If the landscape is smooth and minima are well-separated and not separated by too large of maxima then the routine may work. However, if the landscape is rough or the routine gets stuck in some metastable state there will be problems, such as over-fitting.

(iii) Correlation does not imply causation, the link between the two becoming exponentially fainter at increasing data size.  

(iv) In a finite-capacity world, too much data is just as bad as no data.

In other words, it is all about curve fitting. The more parameters used the less likely for insight to be gained. Here the authors quote the famous aphorism, attributed to von Neumann and Fermi, "with four parameters I can fit an elephant and with five I can make his tail wiggle."

Aside: an endearing part of the article is the inclusion of tow choice quotes from C.S. Lewis
‘Once you have surrendered your brain, you've surrendered your life’ (paraphrased)

‘When man proclaims conquest of power of nature, what it really means is conquest of power of some men over other men’.

I commend the article to you and look forward to hearing your perspective. Is the criticism of AI hype fair? Are these four scientific reasons good grounds for concern. 

Thursday, January 25, 2024

Emergence and the Ising model

The Ising model is emblematic of “toy models” that have been proposed and studied to understand and describe emergent phenomena. Although originally proposed to describe ferromagnetic phase transitions, variants of it have found application in other areas of physics, and in biology, economics, sociology, neuroscience, complexity theory, …  

Quanta magazine had a nice article marking the model's centenary.

In the general model there is a set of lattice points {i} with a “spin” {sigma_i = +/-1} and a Hamiltonian

where h is the strength of an external magnetic field and J_ij is the strength of the interaction between the spins on sites i and j. The simplest models are where the lattice is regular, and the interaction is uniform and only non-zero for nearest-neighbour sites.

The Ising model illustrates many key features of emergent phenomena. Given the relative simplicity of the model, exhaustive studies since its proposal in 1920, have given definitive answers to questions often debated about more complex systems. Below I enumerate some of these insights: novelty, quantitative change leads to qualitative change, spontaneous order, singularities, short-range interactions can produce long-range order, universality, three horizons/scales of interest, self-similarity, inseparable horizons, and simple models can describe complex behaviour.

Most of these properties can be illustrated with the case of the Ising model on a square lattice with only nearest-neighbour interactions (J_ij = J). Above the critical temperature (Tc = 2.25J), and in the absence of an external magnetic field the system has no net magnetisation. Below Tc, at net magnetisation occurs. For J > 0 (J < 0) this state is ferromagnetic (antiferromagnetic).

Novelty

The state of the system below Tc is qualitatively different than that at very high temperatures or the state of a set of non-interacting spins. Thus, the non-zero magnetisation is an emergent property, as defined in this post. This state is also associated with spontaneous symmetry breaking and more than one possible equilibrium state, i.e., the magnetisation can be positive or negative.

Quantitative change leads to qualitative change

The qualitative change associated with formation of the magnetic state can occur with a small quantitative change in the value of the ratio T/J, i.e., either by decreasing T or increasing J. Formation of the magnetic state is also associated with the quantitative change of increasing the number of spins from a large finite number to infinity. 

Singularities

For a finite number of spins all the thermodynamic properties of the system are an analytic function of the temperature and magnitude of an external field. However, in the thermodynamic limit, these properties become singular at T=Tc and h=0. This is the critical point in the phase diagram of h versus T. Some of the quantities, such as the specific heat capacity and the magnetic susceptibility, become infinite at the critical point. These singularities are characterised by critical exponents, most of which have non-integer values. Consequently, the free energy of the system is not an analytic function of T and h.

Spontaneous order

The magnetic state occurs spontaneously. The system self-organises. There is no external field causing the magnetic state to form. There is long-range order, i.e., the value of spins that are infinitely apart from one another are correlated. 

Short-range interactions can produce long-range order.

Although there is no direct long-range interaction between spins, long-range order can occur. Prior to Onsager’s exact solution of the two-dimensional model, many scientists were not convinced that this was possible.

Universality

The values of the critical exponents are independent of many details of the model, such as the value of J, the lattice constant and spatial anisotropy, and the presence of small interactions beyond nearest neighbour. Many details do not matter. This is why the model can give a quantitative description of experimental data near the critical temperature, even though the model Hamiltonian is a crude descriptions of the interactions in a real material. It can describe not only magnetic transitions but also transitions in liquid-gas, binary alloys, and binary liquid mixtures.

Three horizons/scales of interest

There are three important length scales associated with the model. Two are simple: the lattice constant, and the size of the whole lattice. These are the microscopic and macroscopic scale. The third scale is emergent and temperature dependent: the correlation length, i.e., the distance over which spins are correlated with one another. This can also be visualised as the size of magnetisation domains seen in Monte Carlo simulations. 

The left, centre, and right panels above show a snapshot of a likely configuration of the system at a temperature less than, equal to, and greater than the critical temperature, Tc, respectively.

Understanding the connection between the microscopic and macroscopic properties of the system requires studying the system at the intermediate scale of the correlation length. This scale also defines emergent entities [magnetic domains] that interact with one another weakly and via an effective interaction.

Self-similarity

At the critical temperature, the correlation length is infinite. Consequently, rescaling the size of the system, as in a renormalisation group transformation, the state of the systems does not change. The system is said to be scale-free or self-similar like a fractal pattern. This is an example of self-organised criticality.

Inseparable horizons

I now consider how things change when the topology or dimensionality of the lattice changes or when interactions beyond nearest neighbours are added. This can change the relationships between the parts and the whole. Some details of the parts matter. Changing from a two-dimensional rectangular lattice to a linear chain the ordered state disappears. Changing to a triangular lattice with antiferromagnetic nearest-neighbour interactions removes the ordering at finite temperature and there are an infinite number of ground states at zero temperature. Thus, some microscopic details do matter.

The main point of this example is that to understand a large complex system we have to keep both the parts and the whole in mind. It is not either/or but both/and. Furthermore, there may be an intermediate scale, at which new entities emerge.

Aside: I suspect heated debates about structuralism versus functionalism in social sciences, and the humanities are trying to defend intellectual positions (and fashions) that overlook the inseparable interplay of the microscopic and macroscopic that the Ising model captures.

Simple models can describe complex behaviour

Now consider an Ising model with competing interactions, i.e. the neighbouring spins of a particular spin compete with one another and with an external magnetic field to determine the sign of the spin. This can be illustrated with the an Ising model on a hexagonal close packed (hcp) lattice with nearest neighbour antiferromagnetic interactions and an external magnetic field. The lattice is frustrated and can be viewed as layers of hexagonal (triangular) lattices where each layer is displaced relative to one another.

This model has been studied by materials scientists as it can describe the many possible phases of binary alloys, AxB1-x, where A and B are different chemical elements (for example, silver and gold) and the Ising spins on site i has value +1 or -1, corresponding to the presence of atom A or B on that site. The magnetic field corresponds to the difference in the chemical potentials of A and B, and is related to their relative concentration.

The authors studied the Ising model on the hexagonal close-packed (hcp) lattice in a magnetic field. The authors are all from materials science departments and are motivated by the fact that the problem of binary alloys AxB1_x can be mapped onto an Ising model. A study of this model found rich phase diagrams including 32 stable ground states with stoichiometries, including A, AB, A2B, A3B, A5B, and A4B3. Even for a single stoichiometry, there can be multiple possible distinct orderings (and crystal structures). Of these structures, six are stabilized by purely nearest-neighbour interactions, eight by addition of next-nearest neighbour interactions. The remaining 18 structures require multiplet interactions for their stability. 

A second example is the Anisotropic Next-Nearest Neighbour Ising (ANNNI) model, which supports a plethora of ordered states, including a phase diagram with a fractal structure, known as the Devil’s staircase.

These two Ising models illustrate how relatively simple models, containing competing interactions (described by just a few parameters) can describe rich behaviour, particularly a diversity of ground states.

Friday, January 19, 2024

David Mermin on his life in science: funny, insightful, and significant

 David Mermin has posted a preprint with the modest title, Autobiographical Notes of a Physicist

There are many things I enjoyed and found interesting about his memories. A few of the stories I knew, but most I did not. He reminisces about his interactions with Ken Wilson, John Wilkins, Michael Fisher, Walter Kohn, and of course, Neil Ashcroft.

Mermin is a gifted writer and can be amusing and mischievous. He is quite modest and self-deprecating about his own achievements.

He explains why we should refer to the Hohenberg-Mermin-Wagner theorem, not Mermin-Wagner.

One of his Reference Frame columns in Physics Today, stimulated Paul Ginsbarg to start the arXiv.

I was struck by how Mermin's career belongs to a different era. The community was smaller and more personal. Doing physics was fun. Time was spent savouring the pleasure of learning new things and explaining them to others. Colleagues were friends rather than competitors. His research was curiosity-driven. This led to Mermin making significant contributions to quantum foundations. And, he only published about two papers per year!

Teaching was valued, enjoyable, and stimulated research. It was also a way to learn a subject, regardless of the level at which it was taught. For eight years, Mermin and Ashcroft spent half their time writing their beautiful textbook!

I look forward to hearing others' reflections.

Tuesday, January 16, 2024

Wading through AI hype about materials discovery

 Discovering new materials with functional properties is hard, very hard. We need all the tools we can from serendipity to high-performance computing to chemical intuition. 

At the end of last year, two back-to-back papers appeared in the luxury journal Nature.

Scaling deep learning for materials discovery

All the authors are at Google. They claim that they have discovered more than two million new materials with stable crystal structures using DFT-based methods and AI.

On Doug Natelson's blog there are several insightful comments on the paper about why to be skeptical about AI/DFT based "discovery".

Here are a few of the reasons my immediate response to this paper is one of skepticism.

It is published in Nature. Almost every "ground-breaking" paper I force myself to read is disappointing when you read the fine print.

It concerns a very "hot" topic that is full of hype in both the science and business communities.

It is a long way from discovering a stable crystal to finding that it has interesting and useful properties.

Calculating the correct relative stability of different crystal structures of complex materials can be incredibly difficult.

DFT-based methods fail spectacularly for the low-energy properties of quantum materials, such as cuprate superconductors. But, they do get the atomic structure and stability correct, which is the focus of this paper.

It is a big gap between discovering a material that has desirable technological properties to one that meets the demanding criteria for commercialisation.

The second paper combines AI-based predictions, similar to the paper above, with robots doing material synthesis and characterisation.

An autonomous laboratory for the accelerated synthesis of novel materials

[we] realized 41 novel compounds from a set of 58 targets including a variety of oxides and phosphates that were identified using large-scale ab initio phase-stability data from the Materials Project and Google DeepMind

These claims have already been undermined by a preprint from the chemistry departments at Princeton and UCL.

Challenges in high-throughput inorganic material prediction and autonomous synthesis

We discuss all 43 synthetic products and point out four common shortfalls in the analysis. These errors unfortunately lead to the conclusion that no new materials have been discovered in that work. We conclude that there are two important points of improvement that require future work from the community: 
(i) automated Rietveld analysis of powder x-ray diffraction data is not yet reliable. Future improvement of such, and the development of a reliable artificial intelligence-based tool for Rietveld fitting, would be very helpful, not only to autonomous materials discovery, but also the community in general.
(ii) We find that disorder in materials is often neglected in predictions. The predicted compounds investigated herein have all their elemental components located on distinct crystallographic positions, but in reality, elements can share crystallographic sites, resulting in higher symmetry space groups and - very often - known alloys or solid solutions. 

Life is messy. Chemistry is messy. DFT-based calculations are messy. AI is messy. 

Given most discoveries of interesting materials often involve serendipity or a lot of trial and error, it is worth trying to do what the authors of these papers are doing. However, the field will only advance in a meaningful way when it is not distracted and diluted by hype and authors, editors, and referees demand transparency about the limitations of their work.