Condensed concepts: AI

Showing posts with label AI. Show all posts

Tuesday, February 24, 2026

Information theoretic measures for emergence and causality

The relationship between emergence and causation is contentious, with a long history. Most discussions are qualitative. Presented with a new system, how does one identify the microscopic and macroscopic scales that may be most useful for understanding and describing the system? Can Judea Pearl’s seminal ideas about causality be implemented practically for understanding emergence?

Broadly speaking, a weakness of discussions of emergence and causality is that it is hard to define these concepts in a rigorous and quantitative manner that makes them amenable to empirical testing, with respect to theoretical models and to experimental data.

Fortunately, in the past decade, there have been some specific proposals to address this issue, mostly using information theory. A helpful recent review is by Yuan et al.

“Two primary challenges take precedence in understanding emergence from a causal perspective. The first is establishing a quantitative definition of emergence, whereas the second involves identifying emergent behaviors or phenomena through data analysis.
To address the first challenge, two prominent quantitative theories of emergence have emerged in the past decade. The first is Erik Hoel et al.’s theory of causal emergence [19] whereas the second is Fernando E. Rosas et al.’s theory of emergence based on partial information decomposition [24].
Hoel et al.’s theory of causal emergence specifically addresses complex systems that are modeled using Markov chains. It employs the concept of effective information (EI) to quantify the extent of causal influence within Markov chains and enables comparisons of EI values across different scales [19,25]. Causal emergence is defined by the difference in the EI values between the macro-level and micro-level."

One perspective on causal emergence is that it occurs when the dynamics of a system at the macro-level is described more efficiently by macro-variables than by the dynamics of variables from the micro-level.

Klein et al. used Hoel’s information-theoretic measures of causal emergence to analyse protein interaction networks (interactomes) in over 1800 species, containing more than eight million protein–protein interactions, across different scales. They showed the emergence of ‘macroscales’ that are associated with lower noise and uncertainty. The nodes in the macroscale description of the network are more resilient than those in less coarse-grained descriptions. Greater causal emergence (i.e., a stronger macroscale description) was generally seen in multicellular organisms compared to single-cell organisms. The authors quantified causal emergence in terms of mutual information (between large and small scales) and effective information (a measure of the certainty in the connectivity of a network). Philip Ball (2023) (pages 218-220) gives an account of this work in terms of the emergence of multicellularity in biological evolution. He introduced the term causal spreading (pages 225-7), arguing that over the history of evolution the locus of causation has changed.

Yuan et al. continue

"However, in Hoel’s theory of causal emergence, it is essential to establish a coarse-graining strategy beforehand. Alternatively, the strategy can be derived by maximizing the effective information (EI) [19]. However, this task becomes challenging for large-scale systems due to the computational complexity involved. To address these problems, Rosas et al. introduced a new quantitative definition of causal emergence [24] that does not depend on coarse-graining methods, drawing from partial information decomposition (PID)-related theory. PID is an approach developed by Williams et al., which seeks to decompose the mutual information between a target and source variables into non-overlapping information atoms: unique, redundant, and synergistic information [29]…"

The Figure below is taken from Rosas et al. Xt^j (j=1,…,n) are microscopic variables that define a Markov chain. Vt is a macroscopic variable that is completely determined by the microscopic variables.

“Diagram of causally emergent relationships. Causally emergent features have predictive power beyond individual components. Downward causation takes place when that predictive power refers to individual elements; causal decoupling when it refers to itself or other high-order features.”

Rosas et al. applied the method to specific systems, including Conway’s Game of Life, Reynolds’ flocking model, and neural activity as measured by electrocorticography. More recently, it was used to describe emergence in computer science, including the identification of modular structures. Calculations were performed for specific examples, including Ehrenfest’s urn model for diffusion, the Ising model with Glauber dynamics, a Hopfield neural network model for associative memory.

Yuan et al. also state the following:

"The second challenge pertains to the identification of emergence from data. In an effort to address this issue, Rosas et al. derived a numerical method [24]. However, it is important to acknowledge that this method offers only a sufficient condition for emergence and is an approximate approach. Another limitation is that a coarse-grained macro-state variable should be given beforehand to apply this method."

Sas et al. recently stated

“Empirical applications of this framework to study emergence … including the study of gene regulatory networks [22], the dynamics of the human brain [23], the internal dynamics of reservoir computing [24], and the formation of useful internal representations in machine learning [25].”

Yuan et al. also discuss two significant connections between causal emergence and machine learning. First, machine learning can be used to improve calculations of causal emergence. Second, causal emergence measures can be used to better understand how machine learning works and improve it.

The work described above built on earlier work by Crutchfield, who claimed that the identification of emergence and hierarchies could be made operational, stating that “different scales are delineated by a succession of divergences in statistical complexity at lower levels.” More recently, Rupe and Crutchfield have reported progress towards identifying emergent self-organisation in a system.

Although this work on quantitative measures of emergence based on information theory represents significant progress, there are many open problems. Examples include the extension to non-Markovian systems and the development of computationally feasible methods for large systems. The latter is particularly important in physical systems where spontaneous symmetry breaking occurs, as this only happens in the thermodynamic limit of an infinite system.

There is an unrecognised similarity between the work described above and techniques recently developed to characterise phase transitions in statistical mechanics models such as the Ising model and classical dimer models. Coarse-graining (CG) is optimised by maximising the Real-Space Mutual Information (RSMI) between a spatial block and its distant environment.

In general, maximising mutual information is notoriously hard but can be done using state-of-the-art machine learning algorithms. Gokmen et al. have developed an algorithm that they claim “can, unsupervised, construct order parameters, locate phase transitions, and identify spatial correlations and symmetries for complex and large-dimensional real-space data.” Furthermore, the optimal CG explicitly identifies the scaling operators associated with the critical point.

The classical dimer model provides a stringent test as “the relevant low-energy degrees of freedom are profoundly different from the microscopic building blocks of the theory and change qualitatively throughout the phase diagram.” In other words, the emergent entities (quasiparticles such as vortices associated with the height field, which is described by a sine-Gordon field theory) are different from the dimers.

It is encouraging to see that two different scientific communities have developed similar ideas to address this challenging problem of making discussions about emergence and causality more concrete and quantitative.

Saturday, October 25, 2025

Can AI solve quantum-many body problems?

I find it difficult to wade through all the hype about AI, along with the anecdotes about its failings to reliably answer basic questions.

Gerard Milburn kindly brought to my attention a nice paper that systematically addresses whether AI is useful as an aid (research assistant) for solving basic (but difficult) problems that researchers in condensed matter theorists care about.

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

The abstract is below.

My only comment is one of perspective. Is the cup half full or half empty? Do we emphasise the failures or the successes?

The optimists among us will claim that the success in solving a smaller number of these difficult problems shows the power and potential of AI. It is just a matter of time before LLMs can solve most of these problems, and we will see dramatic increases in research productivity (defined as the amount of time taken to complete a project).

The pessimists and skeptically oriented will claim that the failures highlight the limitations of AI, particularly when training data sets are small. We are still a long way from replacing graduate students with AI bots (or at least using AI to train students in the first year of their PhD).

What do you think? Should this study lead to optimism, pessimism, or just wait and see?

----------

Large language models (LLMs) have shown remarkable progress in coding and math problem-solving, but evaluation on advanced research-level problems in hard sciences remains scarce. To fill this gap, we present CMT-Benchmark, a dataset of 50 problems covering condensed matter theory (CMT) at the level of an expert researcher. Topics span analytical and computational approaches in quantum many-body, and classical statistical mechanics. The dataset was designed and verified by a panel of expert researchers from around the world. We built the dataset through a collaborative environment that challenges the panel to write and refine problems they would want a research assistant to solve, including Hartree-Fock, exact diagonalization, quantum/variational Monte Carlo, density matrix renormalization group (DMRG), quantum/classical statistical mechanics, and model building. We evaluate LLMs by programmatically checking solutions against expert-supplied ground truth. We developed machine-grading, including symbolic handling of non-commuting operators via normal ordering. They generalize across tasks too. Our evaluations show that frontier models struggle with all of the problems in the dataset, highlighting a gap in the physical reasoning skills of current LLMs. Notably, experts identified strategies for creating increasingly difficult problems by interacting with the LLMs and exploiting common failure modes. The best model, GPT5, solves 30\% of the problems; average across 17 models (GPT, Gemini, Claude, DeepSeek, Llama) is 11.4±2.1\%. Moreover, 18 problems are solved by none of the 17 models, and 26 by at most one. These unsolved problems span Quantum Monte Carlo, Variational Monte Carlo, and DMRG. Answers sometimes violate fundamental symmetries or have unphysical scaling dimensions. We believe this benchmark will guide development toward capable AI research assistants and tutors.

Wednesday, August 13, 2025

My review article on emergence

I just posted on the arXiv a long review article on emergence

Emergence: from physics to biology, sociology, and computer science

The abstract is below.

I welcome feedback.

------

Many systems of interest to scientists involve a large number of interacting parts and the whole system can have properties that the individual parts do not. The system is qualitatively different to its parts. More is different. I take this novelty as the defining characteristic of an emergent property. Many other characteristics have been associated with emergence are reviewed, including universality, order, complexity, unpredictability, irreducibility, diversity, self-organisation, discontinuities, and singularities. However, it has not been established whether these characteristics are necessary or sufficient for novelty. A wide range of examples are given to show how emergent phenomena are ubiquitous across most sub-fields of physics and many areas of biology and social sciences. Emergence is central to many of the biggest scientific and societal challenges today. Emergence can be understood in terms of scales (energy, time, length, complexity) and the associated stratification of reality. At each stratum (level) there is a distinct ontology (properties, phenomena, processes, entities, and effective interactions) and epistemology (theories, concepts, models, and methods). This stratification of reality leads to semi-autonomous scientific disciplines and sub-disciplines. A common challenge is understanding the relationship between emergent properties observed at the macroscopic scale (the whole system) and what is known about the microscopic scale: the components and their interactions. A key and profound insight is to identify a relevant emergent mesoscopic scale (i.e., a scale intermediate between the macro- and micro- scales) at which new entities emerge and interact with one another weakly. In different words, modular structures may emerge at the mesoscale. Key theoretical methods are the development and study of effective theories and toy models. Effective theories describe phenomena at a particular scale and sometimes can be derived from more microscopic descriptions. Toy models involve minimal degrees of freedom, interactions, and parameters. Toy models are amenable to analytical and computational analysis and may reveal the minimal requirements for an emergent property to occur. The Ising model is an emblematic toy model that elucidates not just critical phenomena but also key characteristics of emergence. Many examples are given from condensed matter physics to illustrate the characteristics of emergence. A wide range of areas of physics are discussed, including chaotic dynamical systems, fluid dynamics, nuclear physics, and quantum gravity. The ubiquity of emergence in other fields is illustrated by neural networks, protein folding, and social segregation. An emergent perspective matters for scientific strategy, as it shapes questions, choice of research methodologies, priorities, and allocation of resources. Finally, the elusive goal of the design and control of emergent properties is considered.

Friday, July 25, 2025

Reviewing emergent computational abilities in Large Language Models

Two years ago, I wrote a post about a paper by Wei et al, Emergent Abilities of Large Language Models

Then last year, I posted about a paper Are Emergent Abilities of Large Language Models a Mirage? that criticised the first paper.

There is more to the story. The first paper has now been cited over 3,600 times. There is a helpful review of the state of the field.

Emergent Abilities in Large Language Models: A Survey

Leonardo Berti, Flavio Giorgi, Gjergji Kasneci

It begins with a discussion of what emergence is, quoting from Phil Anderson's More is Different article [which emphasised how new properties may appear when a system becomes large] and John Hopfield's Neural networks and physical systems with emergent collective computational abilities, which was the basis of his recent Nobel Prize. Hopfield stated

"Computational properties of use to biological organisms or the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons)."

Berti et al. observe, "Fast forward to the LLM era, notice how Hopfield's observations encompass all the computational tasks that LLMs can perform."

They discuss emergent abilities as in-context learning, defined as the "capability to generalise from a few examples to new tasks and concepts on which they have not been directly trained."

Here, I put this review in the broader context of the role of emergence in other areas of science.

Scales.

Simple scales that describe how large an LLM is include the amount of computation, the number of model parameters, and the size of the training dataset. More complicated measures of scale include the number of layers in a deep neural network and the complexity of the training tasks.

Berti et al. note that the emergence of new computational abilities does not just follow from increases in the simple scales but can be tied to the training process. I note that this subtlety is consistent with experience in biology. Simple scales would be the length of an amino acid chain in a protein or base pairs in a DNA molecule, the number of proteins in a cell or the number of cells in an organism. More subtle scales include the number of protein interactions in a proteome or gene networks in a cell. Deducing what the relevant scales are is non-trivial. Furthermore, as emphasised by Denis Noble and Robert Bishop, context matters, e.g., a protein may only have a specific function if it is located in a specific cell.

Novelty.

When they become sufficiently "large", LLMs have computational abilities that they were not explicitly designed for and that "small" versions do not have.

The emergent abilities range "from advanced reasoning and in-context learning to coding and problem-solving."

The original paper by Wei et al. listed 137 emergent abilities in an Appendix!

Berti et al. give another example.

"Chen et al. [15] introduced a novel framework called AgentVerse, designed to enable and study collaboration among multiple AI agents. Through these interactions, the framework reveals emergent behaviors such as spontaneous cooperation, competition, negotiation, and the development of innovative strategies that were not explicitly programmed."

An alternative to defining novelty in terms of a comparison of the whole to the parts is to compare properties of the whole to those of a random configuration of the system. The performance of some LLMs is near-random (e.g., random guessing) until a critical threshold is reached (e.g., in size) when the emergent ability appears.

Discontinuities.

Are there quantitative objective measures that can be used to identify the emergence of a new computational ability? Researchers are struggling to find agreed-upon metrics that show clear discontinuities. That was the essential point of Are Emergent Abilities of Large Language Models a Mirage?

In condensed matter physics, the emergence of a new state of matter is (usually) associated with symmetry breaking and an order parameter. Figuring out what the relevant broken symmetry and the order parameter often requires brilliant insight and may even lead to a Nobel Prize (Neel, Josephson, Ginzburg, Leggett,...) A similar argument can be made with respect to the development of the Standard Model of elementary particles and gauge fields. Furthermore, the discontinuities only exist in the thermodynamic limit (i.e., in the limit of an infinite system), and there are many subtleties associated with how the data from finite-size computer simulations should be plotted to show that the system really does exhibit a phase transition.

Unpredictability.

The observation of new computational abilities in LLMs was unanticipated and surprised many people, including the designers of the specific LLMs involved. This is similar to what happens in condensed matter physics, where new states of matter have mostly been discovered by serendipity.

Some authors seem surprised that it is difficult to predict emergent abilities. "While early scaling laws provided some insight, they often fail to anticipate discontinuous leaps in performance."

Given the largely "black box" nature of LLMs, I don't find it the unpredictability surprising. It is hard for condensed matter systems, and they are much better characterised and understood.

Modular structures at the mesoscale.

Modularity is a common characteristic of emergence. In a wide range of systems, from physics to biology to economics, a key step in the development of the theory of a specific emergent phenomenon has been the identification of a mesoscale (intermediate between the micro- and macro-scales) at which modular structures emerge. These modules interact weakly with one another, and the whole system can be understood in these terms. Identification of these structures and the effective theories describing them has usually required brilliant insight. An example is the concepts of quasiparticles in quantum many-body physics, pioneered by Landau.

Berti et al. do not mention the importance of this issue. However, they do mention that "functional modules emerge naturally during training" [Ref. 7,43,81,84] and that "specialised circuits activate at certain scaling thresholds [24]".

Modularity may be related to an earlier post, Why do deep learning algorithms work so well? In the training process, a neural network rids noisy input data of extraneous details...There is a connection between the deep learning algorithm, known as the "deep belief net" of Geoffrey Hinton, and renormalisation group methods (which can be key to identifying modularity and effective interactions).

Is emergence good or bad?

Undesirable and dangerous capabilities can emerge. Those observed include deception, manipulation, exploitation, and sycophancy.

These concerns parallel discussions in economics. Libertarians, the Austrian school, and Federich Hayek tend to see the emergence as only producing socially desirable outcomes, such as the efficiency of free markets [the invisible hand of Adam Smith]. However, emergence also produces bubbles and crashes and recessions.

Resistance to control

A holy grail is the design, manipulation, and control of emergent properties. This ambitious goal is promoted in materials science, medicine, engineering, economics, public policy, business management, and social activism. However, it largely remains elusive, arguably due to the complexity and unpredictability of the systems of interest. Emergent properties of LLMs may turn out to offer similar hopes, frustrations, and disappointments. We should try, but have realistic expectations.

Toy models.

This is not discussed in the review. As I have argued before, a key to understanding a specific emergent phenomenon is the development of toy models that illustrate the phenomenon and the possible essential ingredients for it to occur. The following paper may be a step in that direction.

An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem

Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Ard A. Louis

In a similar vein, another possibly relevant paper is the review

Statistical Mechanics of Deep Learning

Yasaman Bahri, Jonathan Kadmon, Jeffrey Pennington1, Sam S. Schoenholz, Jascha Sohl-Dickstein and Surya Ganguli

They considered a toy model for the error landscape for a neural network, and show that the error function for a deep neural net of depth D corresponds to the energy function for a D-spin spherical spin glass. [Section 3.2 in their paper].

Tuesday, October 22, 2024

Colloquium on 2024 Nobel Prizes

This friday I am giving a colloquium for the UQ Physics department.

2024 Nobel Prizes in Physics and Chemistry: from biological physics to artificial intelligence and back

The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.” Half of the 2024 Chemistry prize was awarded to Dennis Hassabis and John Jumper for “protein structure prediction” using artificial intelligence. I will describe the physics background needed to appreciate the significance of the awardees work.

Hopfield proposed a simple theoretical model for how networks of neurons in a brain can store and recall memories. Hopfield drew on his background in and ideas from condensed matter physics, including the theory of spin glasses, the subject of the 2021 Physics Nobel Prize.

Hinton, a computer scientist, generalised Hopfield’s model, using ideas from statistical physics to propose a “Boltzmann machine” that used an artificial neural network to learn to identify patterns in data, by being trained on a finite set of examples.

For fifty years scientists have struggled with the following challenge in biochemistry: given the unique sequence of amino acids that make up a particular protein can the native structure of the protein be predicted? Hassabis, a computer scientist, and Jumper, a theoretical chemist, used AI methods to solve this problem, highlighting the power of AI in scientific research.

I will briefly consider some issues these awards raise, including the blurring of boundaries between scientific disciplines, tensions between public and corporate interests, research driven by curiosity versus technological advance, and the limits of AI in scientific research.

Here is my current draft of the slides.

Saturday, October 12, 2024

2024 Nobel Prize in Physics

I was happy to see John Hopfield was awarded the Nobel Prize in Physics for his work on neural networks. The award is based on this paper from 1982

Neural networks and physical systems with emergent collective computational abilities

One thing I find beautiful about the paper is how Hopfield drew on ideas about spin glasses (many competing interactions lead to many ground states and a complex energy landscape).

A central insight is that an efficient way to store the information describing multiple objects (different collective spin states in an Ising model) is in terms of the inter-spin interaction constants (J_ij's) in the Ising model. These are the "weights" that are trained/learned in computer neural nets.

It should be noted that Hopfield's motivation was not at all to contribute to computer science. It was to understand a problem in biological physics: what is the physical basis for associative memory?

I have mixed feelings about Geoffrey Hinton sharing the prize. On the one hand, in his initial work, Hinton used physics ideas (Boltzmann weights) to extend Hopfields ideas so they were useful in computer science. Basically, Hopfield considered a spin glass model at zero temperature and Hinton considered it at non-zero temperature. [Note, the temperature is not physical it is just a parameter in a Boltzmann probability distribution for different states of the neural network]. Hinton certainly deserves lots of prizes, but I am not sure a physics one is appropriate. His work on AI has certainly been helpful for physics research. But so have lots of other advances in computer software and hardware, and those pioneers did not receive a prize.

I feel a bit like I did with Jack Kilby getting a physics prize for his work on integrated circuits. I feel that sometimes the Nobel Committee just wants to remind the world how physics is so relevant to modern technology.

Ten years ago Hopfield wrote a nice scientific autobiography for Annual Reviews in Condensed Matter Physics,

Whatever Happened to Solid State Physics?

After the 2021 Physics Nobel to Parisi, I reflected on the legacy of spin glasses, including the work of Hopfield.

Aside: I once pondered whether a chemist will ever win the Physics prize, given that many condensed matter physicists have won the chemistry prize. Well now, we have had an electronic engineer and a computer scientist winning the Physics prize.

Another side: I think calling Hinton's network a Boltzmann machine is a scientific misnomer. I should add this to my list of people getting credit for things that did not do. Boltzmann never considered networks, spin glasses or computer algorithms. Boltzmann was a genius, but I don't think we should be attaching his name to everything that involves a Boltzmann distribution. To me, this is a bit like calling the Metropolis algorithm for Monte Carlo simulations the Boltzmann algorithm.

Tuesday, February 27, 2024

Emergence? in large language models (revised edition)

Last year I wrote a post about emergence in AI, specifically on a paper claiming evidence for a "phase transition" in Large Language Models' ability to perform tasks they were not designed for. I found this fascinating.

That paper attracted a lot of attention, even winning an award for the best paper at the conference at which it was presented.

Well, I did not do my homework. Even before my post, another paper called into question the validity of the original paper.

Are Emergent Abilities of Large Language Models a Mirage?

Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

we present an alternative explanation for [the claimed] emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance.

... we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.

One of the issues they suggest is responsible for the smooth behaviour is

the phenomenon known as neural scaling laws: empirical observations that deep networks exhibit power law scaling in the test loss as a function of training dataset size, number of parameters or compute

One of the papers they cite on power law scaling is below (from 2017).

Deep Learning Scaling is Predictable, Empirically

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

The figure below shows the power law scaling between the validation loss and the size of the training data set.

They note that these empirical power laws are yet to be explained.

I thank Gerard Milburn for ongoing discussions about this topic.

Tuesday, February 6, 2024

Four scientific reasons to be skeptical of AI hype

The hype about AI continues, whether in business or science. Undoubtedly, there is a lot of potential in machine learning, big data, and large language models. But that does not mean that the hype is justified. It is more likely to limit real scientific progress and waste a lot of resources.

My innate scepticism receives concrete support from an article from 2018 that gives four scientific reasons for concern.

Big data: the end of the scientific method?

Sauro Succi and Peter V. Coveney

The article might be viewed as a response to a bizarre article in 2008 by Chris Anderson, editor-in-chief at Wired, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

‘With enough data, the numbers speak for themselves, correlation replaces causation, and science can advance even without coherent models or unified theories’.

Here are the four scientific reasons for caution about such claims given by Succi and Coveney.

(i) Complex systems are strongly correlated, hence they do not (generally) obey Gaussian statistics.

The law of large numbers (central limit theorem) may not apply and rare events may dominate behaviour. For example, consider the power law decays observed in many complex systems. They are in sharp contrast to the rapid exponential decay in the Gaussian distribution. The authors state, "when rare events are not so rare, convergence rates can be frustratingly slow even in the face of petabytes of data."

(ii) No data are big enough for systems with strong sensitivity to data inaccuracies.

Big data and machine learning involve fitting data to a chosen function, such as a "cost function" with many parameters. That fitting involves a minimisation routine which acts on some sort of "landscape." If the landscape is smooth and minima are well-separated and not separated by too large of maxima then the routine may work. However, if the landscape is rough or the routine gets stuck in some metastable state there will be problems, such as over-fitting.

(iii) Correlation does not imply causation, the link between the two becoming exponentially fainter at increasing data size.

(iv) In a finite-capacity world, too much data is just as bad as no data.

In other words, it is all about curve fitting. The more parameters used the less likely for insight to be gained. Here the authors quote the famous aphorism, attributed to von Neumann and Fermi, "with four parameters I can fit an elephant and with five I can make his tail wiggle."

Aside: an endearing part of the article is the inclusion of tow choice quotes from C.S. Lewis

‘Once you have surrendered your brain, you've surrendered your life’ (paraphrased)

‘When man proclaims conquest of power of nature, what it really means is conquest of power of some men over other men’.

I commend the article to you and look forward to hearing your perspective. Is the criticism of AI hype fair? Are these four scientific reasons good grounds for concern.

Tuesday, January 16, 2024

Wading through AI hype about materials discovery

Discovering new materials with functional properties is hard, very hard. We need all the tools we can from serendipity to high-performance computing to chemical intuition.

At the end of last year, two back-to-back papers appeared in the luxury journal Nature.

Scaling deep learning for materials discovery

All the authors are at Google. They claim that they have discovered more than two million new materials with stable crystal structures using DFT-based methods and AI.

On Doug Natelson's blog there are several insightful comments on the paper about why to be skeptical about AI/DFT based "discovery".

Here are a few of the reasons my immediate response to this paper is one of skepticism.

It is published in Nature. Almost every "ground-breaking" paper I force myself to read is disappointing when you read the fine print.

It concerns a very "hot" topic that is full of hype in both the science and business communities.

It is a long way from discovering a stable crystal to finding that it has interesting and useful properties.

Calculating the correct relative stability of different crystal structures of complex materials can be incredibly difficult.

DFT-based methods fail spectacularly for the low-energy properties of quantum materials, such as cuprate superconductors. But, they do get the atomic structure and stability correct, which is the focus of this paper.

It is a big gap between discovering a material that has desirable technological properties to one that meets the demanding criteria for commercialisation.

The second paper combines AI-based predictions, similar to the paper above, with robots doing material synthesis and characterisation.

An autonomous laboratory for the accelerated synthesis of novel materials

[we] realized 41 novel compounds from a set of 58 targets including a variety of oxides and phosphates that were identified using large-scale ab initio phase-stability data from the Materials Project and Google DeepMind

These claims have already been undermined by a preprint from the chemistry departments at Princeton and UCL.

Challenges in high-throughput inorganic material prediction and autonomous synthesis

We discuss all 43 synthetic products and point out four common shortfalls in the analysis. These errors unfortunately lead to the conclusion that no new materials have been discovered in that work. We conclude that there are two important points of improvement that require future work from the community:

(i) automated Rietveld analysis of powder x-ray diffraction data is not yet reliable. Future improvement of such, and the development of a reliable artificial intelligence-based tool for Rietveld fitting, would be very helpful, not only to autonomous materials discovery, but also the community in general.

(ii) We find that disorder in materials is often neglected in predictions. The predicted compounds investigated herein have all their elemental components located on distinct crystallographic positions, but in reality, elements can share crystallographic sites, resulting in higher symmetry space groups and - very often - known alloys or solid solutions.

Life is messy. Chemistry is messy. DFT-based calculations are messy. AI is messy.

Given most discoveries of interesting materials often involve serendipity or a lot of trial and error, it is worth trying to do what the authors of these papers are doing. However, the field will only advance in a meaningful way when it is not distracted and diluted by hype and authors, editors, and referees demand transparency about the limitations of their work.

Friday, October 13, 2023

Emergent abilities in AI: large language models

The public release of ChatGPT was a landmark that surprised many people, both in the general public and researchers working in Artificial Intelligence. All of a sudden it seemed Large Language Models had capabilities that some thought were a decade away or even not possible. It is like the field underwent a "phase transition." This idea turns out to be more than just a physics metaphor. It has been made concrete and rigorous in the following paper.

Emergent Abilities of Large Language Models

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

They use the following definition, "Emergence is when quantitative changes in a system result in qualitative changes in behavior," citing Phil Anderson's classic "More is Different" article. [Even though the article does not contain the word emergence].

In this paper, we will consider a focused definition of emergent abilities of large language models:
An ability is emergent if it is not present in smaller models but is present in larger models.

How does one define the "size" or "scale" of a model? Wei et al., note that "Today’s language models have been scaled primarily along three factors: amount of computation, number of model parameters, and training dataset size."

The essence of the analysis in the paper is summarised as follows.

We first discuss emergent abilities in the prompting paradigm, as popularized by GPT-3 (Brown et al., 2020). In prompting, a pre-trained language model is given a prompt (e.g. a natural language instruction) of a task and completes the response without any further training or gradient updates to its parameters.

An example of a prompt is shown below

Brown et al. (2020) proposed few-shot prompting, which includes a few input-output examples in the model’s context (input) as a preamble before asking the model to perform the task for an unseen inference-time example.

The ability to perform a task via few-shot prompting is emergent when a model has random performance until a certain scale, after which performance increases to well-above random.

An example is shown in the Figure below. The horizontal axis is the number of training FLOPs for the model, a measure of model scale. The vertical axis measures the accuracy of the model to perform a task, Modular Arithmetic, for which the model was not designed, but just given two-shot prompting. The red dashed line is the performance for a random model. The purple data is for GPT-3 and the blue for LaMDA. Note how once the model scale reaches about 10^22 there is a rapid onset of ability.

The Figure below summarises recent results from a range of research groups studying five different language model families. It shows eight different emergent abilities.

Wei et al., point out that "there are currently few compelling explanations for why such abilities emerge the way that they do".

The authors have encountered some common characteristics of emergent properties. They are hard to predict or anticipate before they are observed. They are often universal, i.e., they can occur in a wide range of different systems and are not particularly sensitive to the details of the components. Even after emergent properties are observed, it is still hard to explain why they occur, even when one has a good understanding of the properties of the system at a smaller scale. Superconductivity was observed in 1911 and only explained in 1957 by the BCS theory.

On the positive side, this paper presents hope that computational science and technology are at the point where AI may produce more exciting capabilities. On the negative side, there is also the possibility of significant societal risks such as having unanticipated power to create and disseminate false information, bias, and toxicity.

Aside: One thing I found surprising is that the authors did not reference John Holland, a computer scientist, and his book, Emergence.

I thank Gerard Milburn for bringing the paper to my attention.

Saturday, June 17, 2023

Why do deep learning algorithms work so well?

I am interested in analogues between cognitive science and artificial intelligence. Emergent phenomena occur in both, there have been some fruitful cross-fertilisation of ideas, and the extent of the analogues is relevant to debates on fundamental questions concerning human consciousness.

Given my general ignorance and confusion on some of the basics of neural networks, AI, and deep learning, I am looking for useful and understandable resources.

Related questions are explored in a nice informative article from 2017 in Quanta magazine, New Theory Cracks Open the Black Box of Deep Learning by Natalie Wolchover.

Like a brain, a deep neural network has layers of neurons — artificial ones that are figments of computer memory. When a neuron fires, it sends signals to connected neurons in the layer above. During deep learning, connections in the network are strengthened or weakened as needed to make the system better at sending signals from input data — the pixels of a photo of a dog, for instance — up through the layers to neurons associated with the right high-level concepts, such as “dog.”

After a deep neural network has “learned” from thousands of sample dog photos, it can identify dogs in new photos as accurately as people can. The magic leap from special cases to general concepts during learning gives deep neural networks their power, just as it underlies human reasoning, creativity and the other faculties collectively termed “intelligence.”

Experts wonder what it is about deep learning that enables generalization — and to what extent brains apprehend reality in the same way.

The article describes work by Naftali Tishby and collaborators that provides some insight into why deep learning methods work so well. This was first described in purely theoretical terms in a 2000 preprint

The information bottleneck method, Naftali Tishby, Fernando C. Pereira, William Bialek

The idea is that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts.

Tishby was stimulated in new directions in

2014 after reading a surprising paper by the physicists David Schwab and Pankaj Mehta

An exact mapping between the Variational Renormalization Group and Deep Learning

[They] discovered that a deep-learning algorithm invented by Geoffrey Hinton called the “deep belief net” works, in a particular case, exactly like renormalization [group methods in statistical physics... When they]. applied the deep belief net to a model of a magnet at its “critical point,” where the system is fractal, or self-similar at every scale, they found that the network automatically used the renormalization-like procedure to discover the model’s state.

Although this connection was a valuable new insight, the specific case of a scale-free system, is not relevant to many deep learning situations.

Tishby and Ravid Shwartz-Ziv discovered that

Over the course of training, common patterns in the training data become reflected in the strengths of the connections, and the network becomes expert at correctly labeling the data, such as by recognizing a dog, a word, or a 1.
...layer by layer, the networks converged to the information bottleneck theoretical bound: a theoretical limit derived in Tishby, Pereira and Bialek’s original paper that represents the absolute best the system can do at extracting relevant information. At the bound, the network has compressed the input as much as possible without sacrificing the ability to accurately predict its label...
...deep learning proceeds in two phases: a short “fitting” phase, during which the network learns to label its training data, and a much longer “compression” phase, during which it becomes good at generalization, as measured by its performance at labeling new test data.

What these new discoveries teach us about the relationship between learning in humans and in machines is contentious and explored briefly in the article. Although neural nets were inspired by the structure of the human brain the connection with the neural nets used today is tenuous.

The mystery of how brains sift signals from our senses and elevate them to the level of our conscious awareness drove much of the early interest in deep neural networks among AI pioneers, who hoped to reverse-engineer the brain’s learning rules. AI practitioners have since largely abandoned that path in the mad dash for technological progress, instead slapping on bells and whistles that boost performance with little regard for biological plausibility.

Monday, July 26, 2021

Sage wisdom on computational materials science

Roald Hoffmann and Jean-Paul Malrieu are two of my favourite living theoretical chemists. Both greatly value the role of concepts and intellectual clarity in theory. Hoffmann has featured in 22 posts on this blog.

They recently published a wonderful trilogy in Angewandte Chemie.

Simulation vs. Understanding: A Tension, in Quantum Chemistry and Beyond.

Part A. Stage Setting

Part B. The March of Simulation, for Better or Worse

Part C. Toward Consilience

I add this trilogy to my list of 5 papers every computational chemistry student should read, suggested by me a decade ago. [Malrieu is author of one of those and Hoffmann co-author of another.]

Although the trilogy addresses and uses specific examples from computational quantum chemistry it is just as relevant to anyone interested in computational materials science. Actually, I hope that anyone interested in materials science would read and digest it as it gives a sober and balanced perspective about the relationship between theory, simulation, and understanding.

Articles are timely as they address hype about how AI techniques will "revolutionise" materials theory.

The articles are beautifully written and engage with broader themes such as philosophy of science, culture, art, and politics.

Finally, I just love this photo of the two authors, both in their eighties. the photo reflects some of the joy they find in science, so beautifully expressed in these articles.

I thank Ben Powell for bringing the papers to my attention.

Condensed concepts