One of the biggest questions in biology is, What is the relationship between genotypes and phenotypes? In different words, how does a specific gene (DNA sequence) encode information that allows a very specific biological structure with a unique function to emerge?

Like big questions in many fields, this is a question about emergence.

In biology, this mapping from genotype to phenotype occurs at many levels from protein structure to human personality. An example is how the RNA encodes the structure of a SARS-CoV2 virion.

A fascinating thing about biological structures is that many have a certain amount of symmetry. The human body has reflection symmetry and many virions have icosahedral symmetry. What is the origin of this tendency to symmetry? Could evolution produce it?

Scientists will sometimes make statements such as the following about evolution.

Symmetric structures preferentially arise not just due to natural selection but also because they require less specific information to encode and are therefore much more likely to appear as phenotypic variation through random mutations.

How do we know this is true? Can such a statement be falsified? Or at least, can we produce concrete models or biological systems that are consistent with this statement?

There is a fascinating paper in PNAS that addresses the questions above.

**Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution **Iain G. Johnston, Kamaludin Dingle, Sam F. Greenbury, Chico Q. Camargo, Jonathan P. K. Doye, Sebastian E. Ahnert, and Ard A. Louis

Here are a few highlights from the article. First, how one gets specific about information content and algorithms.

Genetic mutations are random in the sense that they occur independently of the phenotypic variation they produce. This does not, however, mean that the probability *P*(*p*) that a Genotype-Phenotype [GP] map produces a phenotype *p* upon random sampling of genotypes will be anything like a uniformly random distribution.

Instead, ... arguments based on the coding theorem of algorithmic information theory (AIT) (7) predict that the *P*(*p*) of many GP maps should be highly biased toward phenotypes with low Kolmogorov complexity *K*(*p***)** (8).

High symmetry can, in turn, be linked to low *K*(*p*) (6, 9–11). An intuitive explanation for this algorithmic bias toward symmetry proceeds in two steps:

1) Symmetric phenotypes typically need less information to encode algorithmically, due to repetition of subunits. This higher compressibility reduces constraints on genotypes, implying that more genotypes will map to simpler, more symmetric phenotypes than to more complex asymmetric ones (2, 3).

2) Upon random mutations these symmetric phenotypes are much more likely to arise as potential variation (12, 13), so that a **strong bias toward symmetry may emerge even without natural selection for symmetry.**

The authors consider several concrete models and biological systems that illustrate this bias toward symmetry. The first involves the structure of protein complexes, as given in the Protein Data Base (PDB).

A) Protein complexes self-assemble from individual units.

(B) Frequency of 6-mer protein complex topologies found in the PDB versus **the number of interface types, a measure of complexity **

**𝐾˜**

$$**K˜(p)****.** Symmetry groups are in standard Schoenflies notation: C6, D3, C3, C2, and C1. There is a strong preference for low-complexity/high-symmetry structures.

(C) Histograms of scaled frequencies of symmetries for 6-mer topologies found in the PDB (dark red) versus the frequencies by symmetry of the morphospace of all possible 6-mers illustrate that s**ymmetric structures are hugely overrepresented in the PDB database. **

Note the logarithmic scales for the probabilities (frequencies), meaning that the probabilities span four orders of magnitude. The authors claim that "many genotype–phenotype maps are **exponentially biased toward** phenotypes with low descriptional complexity. "

This intuition that simpler outputs are more likely to appear upon random inputs into a computer programming language can be precisely quantified in the field of AIT (7), where the Kolmogorov complexity *K*(*p*) of a string *p* is formally defined as a shortest program that generates *p* on a suitably chosen universal Turing machine (UTM).

From AIT the authors produce a bound (equation 1, and below), that exhibits the exponential decay of probability with complexity, similar to that seen in their graphs, such as the one shown below, for a model gene regulatory network that is modeled by 60 ordinary differential equations (ODEs). The red dashed line is the bound below.

𝑃(𝑝)≤2−𝑎𝐾˜(𝑝)−𝑏, [1

Scaled frequency vs. complexity 𝐾˜(𝑝)for the budding yeast ODE cell cycle model (30). Phenotypes are grouped by complexity of the time output of the key CLB2/SIC1 complex concentration. Higher frequency means a larger fraction of parameters generate this time curve. The red circle denotes the **wild-type phenotype, which is one of the simplest and most likely** phenotypes to appear. The dashed line shows a possible upper bound from Eq. 1. There is a clear bias toward low-complexity outputs.

One minor comment is that I was surprised that the authors did not reference the

classic 1956 paper by Crick and Watson. They introduced the concept of "genetic economy". Prior to any knowledge of the actual structure of virions, they predicted that virions would have icosahedral symmetry because that reduced the cost of the genome coding for the structure of the virion.

Hence, it would be interesting to explore the relationship between the PNAS paper and this one.

Reidun Twarock & Antoni Luque