The hype about AI continues, whether in business or science. Undoubtedly, there is a lot of potential in machine learning, big data, and large language models. But that does not mean that the hype is justified. It is more likely to limit real scientific progress and waste a lot of resources.
My innate scepticism receives concrete support from an article from 2018 that gives four scientific reasons for concern.
Big data: the end of the scientific method?
Sauro Succi and Peter V. Coveney
The article might be viewed as a response to a bizarre article in 2008 by Chris Anderson, editor-in-chief at Wired, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete
‘With enough data, the numbers speak for themselves, correlation replaces causation, and science can advance even without coherent models or unified theories’.
Here are the four scientific reasons for caution about such claims given by Succi and Coveney.
(i) Complex systems are strongly correlated, hence they do not (generally) obey Gaussian statistics.
The law of large numbers (central limit theorem) may not apply and rare events may dominate behaviour. For example, consider the power law decays observed in many complex systems. They are in sharp contrast to the rapid exponential decay in the Gaussian distribution. The authors state, "when rare events are not so rare, convergence rates can be frustratingly slow even in the face of petabytes of data."
(ii) No data are big enough for systems with strong sensitivity to data inaccuracies.
Big data and machine learning involve fitting data to a chosen function, such as a "cost function" with many parameters. That fitting involves a minimisation routine which acts on some sort of "landscape." If the landscape is smooth and minima are well-separated and not separated by too large of maxima then the routine may work. However, if the landscape is rough or the routine gets stuck in some metastable state there will be problems, such as over-fitting.
(iii) Correlation does not imply causation, the link between the two becoming exponentially fainter at increasing data size.
(iv) In a finite-capacity world, too much data is just as bad as no data.
‘Once you have surrendered your brain, you've surrendered your life’ (paraphrased)‘When man proclaims conquest of power of nature, what it really means is conquest of power of some men over other men’.
I commend the article to you and look forward to hearing your perspective. Is the criticism of AI hype fair? Are these four scientific reasons good grounds for concern.
No comments:
Post a Comment