Why do deep learning algorithms work so well?

I am interested in analogues between cognitive science and artificial intelligence. Emergent phenomena occur in both, there have been some fruitful cross-fertilisation of ideas, and the extent of the analogues is relevant to debates on fundamental questions concerning human consciousness.

Given my general ignorance and confusion on some of the basics of neural networks, AI, and deep learning, I am looking for useful and understandable resources.

Related questions are explored in a nice informative article from 2017 in Quanta magazine, New Theory Cracks Open the Black Box of Deep Learning by Natalie Wolchover.

Like a brain, a deep neural network has layers of neurons — artificial ones that are figments of computer memory. When a neuron fires, it sends signals to connected neurons in the layer above. During deep learning, connections in the network are strengthened or weakened as needed to make the system better at sending signals from input data — the pixels of a photo of a dog, for instance — up through the layers to neurons associated with the right high-level concepts, such as “dog.” 

After a deep neural network has “learned” from thousands of sample dog photos, it can identify dogs in new photos as accurately as people can. The magic leap from special cases to general concepts during learning gives deep neural networks their power, just as it underlies human reasoning, creativity and the other faculties collectively termed “intelligence.” 

Experts wonder what it is about deep learning that enables generalization — and to what extent brains apprehend reality in the same way.

The article describes work by Naftali Tishby and collaborators that provides some insight into why deep learning methods work so well. This was first described in purely theoretical terms in a 2000 preprint

The information bottleneck method, Naftali Tishby, Fernando C. Pereira, William Bialek 

The idea is that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts.

Tishby was stimulated in new directions in

2014 after reading a surprising paper by the physicists David Schwab and Pankaj Mehta

 An exact mapping between the Variational Renormalization Group and Deep Learning 

[They] discovered that a deep-learning algorithm invented by Geoffrey Hinton called the “deep belief net” works, in a particular case, exactly like renormalization [group methods in statistical physics... When they]. applied the deep belief net to a model of a magnet at its “critical point,” where the system is fractal, or self-similar at every scale, they found that the network automatically used the renormalization-like procedure to discover the model’s state. 

Although this connection was a valuable new insight, the specific case of a scale-free system, is not relevant to many deep learning situations.

Tishby and Ravid Shwartz-Ziv discovered that 

Over the course of training, common patterns in the training data become reflected in the strengths of the connections, and the network becomes expert at correctly labeling the data, such as by recognizing a dog, a word, or a 1.

...layer by layer, the networks converged to the information bottleneck theoretical bound: a theoretical limit derived in Tishby, Pereira and Bialek’s original paper that represents the absolute best the system can do at extracting relevant information. At the bound, the network has compressed the input as much as possible without sacrificing the ability to accurately predict its label...

...deep learning proceeds in two phases: a short “fitting” phase, during which the network learns to label its training data, and a much longer “compression” phase, during which it becomes good at generalization, as measured by its performance at labeling new test data.

What these new discoveries teach us about the relationship between learning in humans and in machines is contentious and explored briefly in the article. Although neural nets were inspired by the structure of the human brain the connection with the neural nets used today is tenuous.

The mystery of how brains sift signals from our senses and elevate them to the level of our conscious awareness drove much of the early interest in deep neural networks among AI pioneers, who hoped to reverse-engineer the brain’s learning rules. AI practitioners have since largely abandoned that path in the mad dash for technological progress, instead slapping on bells and whistles that boost performance with little regard for biological plausibility.

Comments

  1. It's from a different angle, but I've found the proposed connections to statistical physics very interesting; see e.g. Lin, Tegmark and Rolnick's "Why does deep and cheap learning work so well?", arXiv:1608.08225, J. Stat. Phys. 168, 1223-1247 (2017)

    ReplyDelete

Post a Comment

Popular posts from this blog

What is Herzberg-Teller coupling?

Is it an Unidentified Superconducting Object (USO)?

What should be the order of authors on a conference poster or talk?