Tuesday, February 6, 2024

Four scientific reasons to be skeptical of AI hype

The hype about AI continues, whether in business or science. Undoubtedly, there is a lot of potential in machine learning, big data, and large language models. But that does not mean that the hype is justified. It is more likely to limit real scientific progress and waste a lot of resources.

My innate scepticism receives concrete support from an article from 2018 that gives four scientific reasons for concern.

Big data: the end of the scientific method? 

Sauro Succi and Peter V. Coveney

The article might be viewed as a response to a bizarre article in 2008 by Chris Anderson, editor-in-chief at Wired, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

‘With enough data, the numbers speak for themselves, correlation replaces causation, and science can advance even without coherent models or unified theories’.

Here are the four scientific reasons for caution about such claims given by Succi and Coveney.

(i) Complex systems are strongly correlated, hence they do not (generally) obey Gaussian statistics.

The law of large numbers (central limit theorem) may not apply and rare events may dominate behaviour. For example, consider the power law decays observed in many complex systems. They are in sharp contrast to the rapid exponential decay in the Gaussian distribution. The authors state, "when rare events are not so rare, convergence rates can be frustratingly slow even in the face of petabytes of data."

(ii) No data are big enough for systems with strong sensitivity to data inaccuracies.

Big data and machine learning involve fitting data to a chosen function, such as a "cost function" with many parameters. That fitting involves a minimisation routine which acts on some sort of "landscape." If the landscape is smooth and minima are well-separated and not separated by too large of maxima then the routine may work. However, if the landscape is rough or the routine gets stuck in some metastable state there will be problems, such as over-fitting.

(iii) Correlation does not imply causation, the link between the two becoming exponentially fainter at increasing data size.  

(iv) In a finite-capacity world, too much data is just as bad as no data.

In other words, it is all about curve fitting. The more parameters used the less likely for insight to be gained. Here the authors quote the famous aphorism, attributed to von Neumann and Fermi, "with four parameters I can fit an elephant and with five I can make his tail wiggle."

Aside: an endearing part of the article is the inclusion of tow choice quotes from C.S. Lewis
‘Once you have surrendered your brain, you've surrendered your life’ (paraphrased)

‘When man proclaims conquest of power of nature, what it really means is conquest of power of some men over other men’.

I commend the article to you and look forward to hearing your perspective. Is the criticism of AI hype fair? Are these four scientific reasons good grounds for concern. 

No comments:

Post a Comment

Emergence and protein folding

Proteins are a distinct state of matter. Globular proteins are tightly packed with a density comparable to a crystal but without the spatia...