Thursday, December 14, 2017

Statistical properties of networks

Today I am giving a cake meeting talk about something a bit different. Over the past year or so I have tried to learn something about "complexity theory", including networks. Here is some of what I have learnt and found interesting. The most useful (i.e. accessible) article I found was a 2008 Physics Today article, The physics of networks by Mark Newman.

The degree of a node, denoted k, is equal to the number of edges connected to that node. A useful quantity to describe real-world networks is the probability distribution P(k); i.e. if you pick a random node it gives the probability that the node has degree k.

Analysis of data from a wide range of networks, from the internet to protein interaction networks, finds that this distribution has a power-law form,

This holds over almost four orders of magnitude.
This is known as a scale-free network, and the exponent is typically between 2 and 3.
This power law is significant for several reasons.
First, it is in contrast to a random network for which P(k) would be a Poisson distribution, which decreases exponentially with large k, with a scale defined by the mean value of k.
Second, if the exponent is less than three, then the variance of k diverges in the limit of an infinite size network, reflecting large fluctuations in the degree.
Third, the "fat tail" means there is a significant probability of "hubs", i.e. nodes that are connected to a large number of other nodes. This reflects the significant spatial heterogeneity of real-world networks. This property has a significant effect on others properties of the network, as I discuss below.

An important outstanding question is do real-world networks self-organise in some sense to lead to the scale-free properties?

A question we do know the answer to is, what happens to the connectivity of the network when some of the nodes are removed?
It depends crucially on the network's degree distribution P(k).
Consider two different node removal strategies.
a. Remove nodes at random.
b. Deliberately target high degree nodes for removal.
It turns out that a random network is equally resilient to both "attacks".
In contrast, a scale-free network is resilient to a. but particularly susceptible to b.

Suppose you want to stop the spread of a disease on a social network. What is the best "vaccination" strategy?  For a scale-free network that will be to target the highest degree nodes in the hope of producing "herd immunity".

It's a small world!
This involves the popular notion of "six degrees of separation". If you take two random people on earth then on average one has to just go six steps in "friend of a friend of a friend ..." to connect them. Many find this surprising but, this arises because your social network increases exponentially with the number of steps you take and something like (30)^6 gives the population of the planet.
Newman states that a more surprising result is that people are good at finding the short paths, and Kleinberg showed that the effect only works if the social network has a special form.

How does one identify "communities" in networks?
A quantitative method is discussed in this paper and applied to several social, linguistic, and biological networks.

A topic which is both intellectually fascinating and of great practical significance concerns

This is what epidemiology is all about. However, until recently, almost all mathematical models assumed spatial homogeneity, i.e. that the probability of any individual being infected was equally likely. In reality, it depends on how many other individuals they interact with, i.e. the structure of the social network.

The crucial parameter to understand whether an epidemic will occur turns out to not be the mean degree but the mean squared degree. Newman argues
Consider a hub in a social network: a person having, say, a hundred contacts. If that person gets sick with a certain disease, then he has a hundred times as many opportunities to pass the disease on to others as a person with only one contact. However, the person with a hundred contacts is also more likely to get sick in the first place because he has a hundred people to catch the disease from. Thus such a person is both a hundred times more likely to get the disease and a hundred times more likely to pass it on, and hence 10 000 times more effective at spreading the disease than is the person with only one contact.
I found the following Rev. Mod. Phys. from 2015 helpful.
Epidemic processes in complex networks 
Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, Alessandro Vespignani

They consider different models for the spread of disease. A key parameter is the infection rate lambda which is the ratio of the transition rates for infection and recovery. This is the SIR [susceptible-infectious-recover] model proposed in 1927 by Kermack and McKendrick [cited almost 5000 times!]. This was one of the first mathematical models for epidemiology.

Behaviour is much richer (and more realistic) if one considers models on a complex network. Then one can observe "phase transitions" and critical behaviour. In the figure below rho is the fraction of infected individuals.

In a 2001 PRL [cited more than 4000 times!] it was shown using a "degree-based mean-field theory" that the critical value for lambda is given by
In a scale-free network the second moment diverges and so there is no epidemic threshold, i.e. for an infinitely small infection rate and epidemic can occur.

The review is helpful but I would have liked more discussion of real data about practical (e.g. public policy) implications. This field has significant potential because due to internet and mobile phone usage a lot more data is being produced about social networks.


  1. Ross, it's been a while since I last commented - I've had a job (and associated life) change.

    I wanted to thank you for this post; it's interesting (very much so), educational, and relevant - connecting scientific thinking to the real world.
    I enjoyed it!

    1. Thanks for the encouragement.
      I wish you the best in your new job. I hope it is for the better. From a selfish point of view, I hope you will still have time and opportunity for your stimulating comments.

    2. Yes, although the reason was not positive (my job in science asking me to compromise on my own ethical standards), the outcome makes my and my families life better - as far as I can see now.

      I intend to still enjoy your posts, so I expect to still be provoked every now and then. Keep up the honest assessments of physics and society!