Genotype-phenotype mapping and the genes as blueprint metaphor

One of my occasional series here at Footnotes to Plato is devoted to accessible (or so I hope) versions of some of my technical papers. The idea is to provide a taste of academic scholarship in philosophy of science, but in a form that can be read by more than a few dozen colleagues who specialize in the same exact area. So far, I’ve covered all suitable entries from 2013 to 2017. I don’t plan to go back to the very beginning of my career (for a good chunk of which I was publishing as an experimental biologist anyway), but only to 2010, the year after I moved to philosophy full time. By my reckoning, that means four papers left: on genotype-environment interactions and the “genes as blueprints” metaphor (2010), on the machine-information metaphor in biological research and education (2011), on the metaphor of adaptive landscapes (2012), and on paradigm shifts in evolutionary biology (also 2012). Yeah, that was my “metaphors” period. This entry is dedicated to the first paper mentioned, and over the next few months I’ll cover the remaining three.

The underlying question being considered here is: what is the relationship between genotypes and phenotypes? This question has marked the evolution of evolutionary theory ever since the rediscovery of Mendel’s work at the beginning of the twentieth century, which immediately generated an apparent conflict with the Darwinian view of gradual evolution. Famously, the answer proposed by the architects of the Modern Synthesis (i.e., the “standard model” in evolutionary biology from the 1940s on) is that genes determine phenotypes, as in the oft-cited metaphors of a “genetic blueprint” or a “genetic program.” This sort of answer bypasses the process of development, which is treated as an incidental blackbox with no direct causal relevance to the evolutionary process. Given this conceptual framework, it is no wonder that developmental biology was famously left out of the Modern Synthesis, and that it has (partially) re-emerged only recently within the so-called “evo-devo” approach.

In the paper, published in the Philosophical Transactions of the Royal Society (2010) I re-examine the question of the relationship between genotype and phenotype by going back to Alberch’s concept of a genotype-phenotype (G>P) “map” and examine what recent research tells us on actual G>P maps. It turns out that computational and empirical studies of three classes of systems (RNA folding, protein function and software development — though in this post I will talk only about the first and last cases) have yielded important generalizations about the problem, as well as novel insight into the evolutionary process more broadly. One of the consequences of these lines of research is that the blueprint metaphor is untenable and in fact positively misleading, and should be replaced by the concept of developmental encoding.

Back in 1991, Alberch introduced a metaphor differing from the standard blueprint view of genetic encoding, talking about a G>P “mapping function,” defined by a given parameter space and at least potentially amenable to mathematical description.

Alberch derived four general conclusions from his conceptualization of the G>P map: (i) the map is (much) more complex than a one-to-one relation between genotype and phenotype, which means that the same phenotype may be obtained from different combinations of genetic informational resources; (ii) the area in parameter space where a particular phenotype exists gives an indication of how stable that phenotype is likely to be; (iii) the parameter space is marked by “transformational boundaries,” i.e. areas were a small change in one or more developmental parameters will cause the transition from one phenotypic state to another; and (iv) the phenotypic stability of a given population will depend on which area of the parameter space it occupies, and in particular whether it is close to a transformational boundary or not.

Alberch’s famous example of a phenotypic transition that is amenable to be described according to his idea of parameter space and mapping function was the evolution of the number of digits in amphibians. In particular, he showed how salamanders tend to lose their fifth toe every time the digit reduction evolves, while anurans tend to lose their first digit. The difference between the two groups can be recreated experimentally by administration of an inhibitor of cell division, a result that Alberch interpreted as telling us that anurans and salamanders find themselves in different areas of the parameter space, and in particular that they are located near different transitional boundaries, so that every time the transition happens within one of the two groups it occurs by the same developmental means, but when the two groups are compared the transitions happen by different developmental routes.

Ambystoma salamander

A salamander of the genus Ambystoma, a classic model system for evo-devo studies

A good starting point to tackle the G>P mapping problem is to start simple, and the simplest place to start is the growing literature on RNA folding. RNA folding is relatively well understood at a chemical-physical level, with increasingly sophisticated computer models capable of predicting the three-dimensional folding of a linear sequence of nucleotides based on thermodynamic considerations. Moreover, it is relatively straightforward to verify such predictions experimentally for a subset of simulated folding patterns, and researchers can even carry out competition experiments among RNA molecules for a given catalytic function.

As far as the G>P problem is particularly concerned, the step from genotype to phenotype is in this case as short as it is possible in any biological system, and indeed probably somewhat reflects the ancestral situation in the RNA world hypothesized within the context of the origin of life problem. RNA folding is therefore both an extremely suitable system to begin examining G>P mapping and one that may yield important clues to how historically mapping functions got started and became more complex and indirect. A crucial advantage of RNA folding studies of G>P mapping is that the fitness function of the molecules is not assumed arbitrarily to follow a particular statistical distribution, but can be studied empirically. In other words, the connections between genotype and phenotype on one hand and between phenotype and fitness on the other hand are explicit, relatively simple and biologically meaningful.

Several important generalizations have emerged from studies of RNA folding, generalizations that are crucial to our understanding of phenotypic evolution beyond the relatively simple framework offered by the Modern Synthesis. Consider, for instance, the study of mutational networks, i.e. of the structure of the genotypic landscape in terms of one-mutation steps surrounding a given focal genotype. The idea goes back to Kauffman & Levin’s work on genotypic landscapes, back in 1987. The problem to be tackled is how does evolution explore phenotypic landscapes by moving across a corresponding genotypic landscape in a non-saltatory manner, according to standard Darwinian theory. The solution requires an understanding of the connection between the genotypic and phenotypic landscapes, and in the case of RNA folding one can actually computationally explore the totality of both landscapes for a given short sequence length, or statistically sample the properties of landscapes defined by longer sequences.

For instance, all 30-nucleotide long binary RNA molecules produce about one billion unique sequences, a bewildering genotypic space. This space, however, corresponds to only 220,000 unique folding shapes in the Guanine/Uracile nucleotide landscape and a mere 1,000 shapes in the Adenine/Uracile landscape, the two situations that have been extensively studied. This is a spectacular example of what biologists call “degeneracy” (i.e., redundancy of sequence coding), which in turn is a fundamental concept underlying the neutral theory of molecular evolution — according to which most (but, crucially, not all) mutations are selectively neutral at the molecular level.

Genotypes on these landscapes are connected by mutational networks whose properties can then be explored. An interesting result is that the distribution of phenotypes on RNA mutational networks follows regular patterns, characterized by a few abundant RNA shapes and a large number of rare ones. The structure of the landscape is such that evolution can explore most or all of the common structures by one-step mutations that preserve structure while moving the population on a neutral path of constant fitness, until it bumps into a novel phenotype with higher fitness. Interestingly, most genotypes turn out to be located within a few mutational steps from most of the common phenotypes in the landscape, making it predictable that such phenotypes will in fact be found by natural selection in a relatively short period of time. However, the connectivity on the landscape is always asymmetrical, which means that which particular phenotypes will be reached more easily while starting with a given genotype will be a matter of historical contingency.

Research on the general properties of RNA folding evolution has showed that the G>P function is such that small movements in genotypic space do not necessarily correspond to small movement in phenotypic space, a rather flagrant contradiction of one of the standard assumptions of the Modern Synthesis. In particular, if we consider a genotype G with a given phenotype P, it is likely that G is connected to a one-step neighbor associated with a phenotype which is not structurally similar to P. This brings us to a rather surprising general behavior that emerges from studies of RNA folding (as well as of protein function, micro-organisms and simulated systems — as discussed in the rest of the full paper), a true “punctuated equilibrium” pattern of evolution that does not result from the usual suspects in terms of underlying causes.

Punctuated equilibrium, of course, was one of the early challenges to the Modern Synthesis brought about by palaeontologists Eldredge & Gould back in 1972. The standard explanation for the fossil record pattern of stasis punctuated by occasional rapid shifts in phenotype is that of stabilizing selection. Simulations of RNA folding evolution display the same general pattern that one sees in the fossil record, obviously at a much smaller temporal scale. The mechanism, however, has nothing to do with “stabilizing selection” (a rather vague concept in itself, really simply a way to describe a statistical pattern of constant mean and reduced variance). Rather, the punctuated evolution results from the fact that the population divides itself into smaller chunks, each of which explores a portion of the largely neutral genotypic landscape. From time to time, a population encounters a new phenotypic optimum and “jumps” on it quickly. Stasis, in this context, is then not the result of selection for a constant phenotype, but rather of the largely neutral structure of the landscape, which allows populations to wander around until they find a new functional phenotype and jump into a nearby neutral network, only to resume their evolutionary wanderings.

RNA-like systems can also be a model for the evolution of ecological communities, thereby beginning to forge a still surprisingly lacking direct link between ecology and evolutionary biology. For instance, Takeuchi & Hogeweg, in 2008, showed that a population of replicators originally made of just one genotype evolves into a complex system characterized by four functionally distinct groups of genotypes, which the authors call “species.” Interestingly, the model also evolved “parasites” which not only were able to coexist with catalytic molecules, but in turn were themselves catalysts for the evolution of further complexity in the system. While Takeuchi & Hogeweg’s definition of species in this context may appear artificial, the group of genotypes they identified are in fact both ecologically functionally distinct and genealogically related to each other, and a functional-genealogical concept is certainly one of the viable contenders as a definition of biological species.

The examples drawn from research on RNA folding (as well as those not discussed here, on protein sequence space) help bring to the forefront a major limitation of the Modern Synthesis: the almost utter disregard for developmental biology.

Notoriously, that field was essentially left out of the synthesis of the 1940s that gave us the current structure of evolutionary theory. Part of the reason for this is that it has never been conceptually clear what exactly the role of development in evolution is. Mayr, a founding father of the Modern Synthesis, famously made a distinction — arching back to Aristotle — between proximate and ultimate causes in biology, with the genetic bases of phenotypes counting as proximate causes and the evolutionary processes that brought those phenotypes about considered as ultimate causes (see this post for an in-depth discussion). Even if one accepts Mayr’s framework, however, it is not clear whether development should be considered a proximate or an ultimate cause.

The onset of evo-devo and calls for an Extended Synthesis in biology (see this previous post) have reopened that question. The answer is emerging from research on the structure of G>P maps, and in particular from a parallel literature in computational science that attempts to exploit the characteristics of biological development to produce a new generation of “evolvable hardware.” The picture that is forming out of these efforts is that development is a necessary link between proximate and ultimate causality, and that in a sense the G>P map is whatever specific type of “developmental encoding” (as opposed to the classic genetic encoding) a given species of organism uses to produce environmentally apt phenotypes. Developmental encoding refers to situations were information encodes not a detailed description of the full system (as in the blueprint metaphor), but rather the local steps necessary to build the system through a developmental process.

Several authors have pointed out the limitations of both direct genetic encoding of “information” and of the blueprint metaphor that results from it. Ciliberti and collaborators, in a 2007 paper, have for instance referred referred to human-engineered systems as being characterized by “brittleness,” i.e. the unfortunate property that if one component ceases functioning properly, there is a high probability that the whole system will unravel. This is most clearly not what happens with biological organisms, which means that the oft-made analogy (ironically, by both some biologists and proposers of intelligent design creationism) between living organisms and “machines” or “programs” is profoundly misleading. Along similar lines, Stanley, also in 2007, reiterated that the amount of direct genetic information present in, say, the human genome (now estimated to be around 30,000 protein-coding genes) is orders of magnitude below what would be necessary to actually specify the spatial location, functionality and connectivity among the trillions of cells that make up a human brain. The answer must be in the local deployment of information that is possible through developmental processes, where the “instructions” can be used in a way that is sensitive (and therefore capable of adjusting) to both the internal and external environments.

According to Hartmann and colleagues (in another 2007 paper), artificial development is increasingly being used to solve computational problems outside of biology by direct analogy with biological systems. The results indicate that replacing direct genetic encoding with indirect developmental encoding dramatically reduces the search space for evolutionary algorithms. Moreover, the resulting systems are less complex and yet more robust (“fault-tolerant” in engineering jargon) than those obtained by evolving standard genetic algorithms. Another way to put the point is that direct genetic encoding is limited by the fact that the length of the genetic string grows proportionally to the complexity of the phenotype, thereby quickly encountering severe limitations in search space. With developmental encoding, instead, the evolving system can take advantage of a small number of genetic instructions mapping to a large number of phenotypic outcomes, because those outcomes are determined by the (local) interactions among parts of the system and by interactions of the system with the environment.

Simulations comparing the evolution of standard genetic systems of information encoding with systems based on developmental encoding clearly show that genetic systems reach a maximum level of fitness for low levels of complexity; at higher levels of complexity developmental encoding “scales” much better, with developmental systems being capable of achieving high fitness more quickly and efficiently. Moreover, developmental encoding leads to the artificial evolution of systems that are both significantly more robust to internal disruptions and significantly more flexible in response to external environmental conditions than standard genetic systems. This is an interesting situation whereby a research area parallel to evolutionary biology, computational science, draws inspiration from the actual structure of biological systems and ends up providing a theoretical underpinning for why, in fact, those biological systems are structured the way they are.

In conclusion, the conceptual and mathematical foundations of evolutionary theory are evolving from a simple beginning as bean-bag genetics, Mayr’s derogatory term for population genetics theory, to a sophisticated patchwork that draws from population genetics, quantitative genetics, bioinformatics and computational science. Medawar & Medawar, in 1983, famously said that “genetics proposes, epigenetics disposes,” where epigenetics here means the whole of developmental processes, a way to highlight that evolutionary theory finally needs a good conceptual understanding of development, and not just of genetics. As I have argued in the paper referenced here, such a broadened theoretical framework cannot come from population genetics alone, but benefits from the input of computational research both on simple biological examples of G>P maps, such as those underlying RNA folding and protein function, and from consideration of broader issues such as the properties of large neutral networks in genotypic space (see full paper) and of developmental versus genetic-encoding systems.


Categories: Massimo's Technical Stuff, Philosophy of Science

62 replies

  1. It’s relatively easy to code/navigate around disruptions blockages, etc. The real problem is when the presumed goal is lost, or fractures in some way. That’s when you get a Tower of Babel, political breakdown, loss of momentum, etc.


  2. Though to amend my prior point, in biology, that would be where speciation starts.


%d bloggers like this: