Evolution of Protocell-embedded Molecular Computation

Lead partner: ALife Group, Dublin City University


Molecular Classifier Systems (MCS)

We now report on a selection of simulation models of informazyme systems, and protocells incorporating them. This investigation firstly tests the above theoretical analysis and then progresses to the demonstration of various protocell level evolutionary phenomena, ultimately including the directed evolution of (basic) protocell “computation” capability. These models are all based on a simplified Artificial Chemistry loosely inspired by Holland’s Learning Classifier Systems (Holland and Reitman1977Holland2006), which we call the Molecular Classifier System (MCS) family of models. In these models the family of informazymes are represented by variable length binary strings; i.e., polymers where the sequence of monomers in the primary sequence are of just two distinct kinds, denoted by the symbols 0 and 1.11 Conceptually, each informazyme species has an informational structure (primary structure, or monomer sequence) and a separate enzymatic function (“folded” or secondary structure, or “shape”), as inspired by the role of ribozymes in the RNA world hypothesis (Joyce1991). The fundamental molecular dynamic in all the MCS models is then the collision between two molecules. These are taken to be selected under a uniform random distribution from the available molecules: i.e., the molecular level system is well stirred (within a single reactor or protocell). One molecule of the pair is randomly assigned the enzymatic role (the folded conformation), and the other is assigned the substrate role (the unfolded conformation). If the enzyme binds to, or recognises, the substrate, then the reaction may proceed, in accordance with the specific enzymatic function; otherwise the collision is elastic. We shall explore a number of binding or recognition rules, with different effects. In some cases we report on experiments with a single, fixed size, flow reactor, i.e., where there is only single-level, molecular “selection” or “evolution”. In other cases, we consider the molecules to be contained in protocells which can grow and divide, subject to some overall resource constraint on the protocell population size. In these cases we have the potential for selection and evolution at the protocell level also, with hierarchical coupling between these two levels of selection. A common enzymatic function, present in all the MCS models to be discussed here, is the ability to make an error-prone bit-wise copy of the primary, informational, structure of the bound, substrate, informazyme molecule (here regarded as a template to be replicated); that is, a replicase function. Replication errors will include point mutations, insertions and deletions. The models to be presented will differ in the recognition or binding mechanism; and in the repertoire of additional enzymatic functions other than replication.

MCS-0: Minimal Recognition12

MCS-0 is a minimal implementation, used to validate the core premise of using an informazyme sub-system as the hereditary mechanism of a protocell; and to demonstrate the coupling between selection at molecular and protocell levels.
Enzymatic Map
In MCS-0, the “trivial” folding of an identity map is used; that is, the secondary structure is identical to the primary structure. The recognition rule is simple substring mapping: i.e., binding occurs if the enzyme sequence is a substring of the substrate sequence. The only supported enzymatic function is replication. Note that every string is considered to be a substring of itself.
Molecular Level Dynamics
In MCS-0, all species are self-replicases. Of the pairwise interaction classes, only classes 9 (independent self-replicases) and 6 (facultative parasite) are possible. Any species will be a facultative parasite of any species which is a shorter substring of it. This model is therefore sufficient to demonstrate:
  • Self-maintenance of a dominant self-replicase with an accompanying quasi-species of (class 9) mutants.
  • Molecular displacement events, where a class 6 parasite takes over from an incumbent dominant species.
  • As parasites are always longer than hosts, there will be a monotonic evolutionary trend toward dominance by longer molecules.
Given a fixed per-bit mutation rate (v), the per-molecule mutation rate in MCS is given by V=1 - (1-v)^l, where l is the molecule length. Thus, as the length of the dominant species increases, so also will the per-molecule mutation rate. Accordingly, the mutational load will increase, the steady state concentration of the dominant will fall (\bar {x} \simeq 1-V), and the overall replication rate will fall as the square of this (R \simeq \bar {x}^2 \simeq (1-V)^2). See figure 15 for an example of how these quantities vary with increasing l.13

Figure 15: Variation of V, \bar {x} and R with sequence length l [v = 0.05]

While this follows directly from the theoretical analysis, it is a somewhat counter-intuitive result. In a natural sense, we can regard the total replication rate as a measure of the intrinsic molecular “fitness” of a molecular species in this system; therefore, in this system, we actually predict an evolutionary trend, based on perfectly “Darwinian” selection events, toward monotonically decreasing intrinsic fitness. This arises because the outcome of each selection event is completely determined by the “ecological interaction” (host-parasite) between the two species, which is a much stronger selective effect than the associated minor decrement in intrinsic fitness. However, even though these decrements are individually small, they are (in this system) monotonic, and directly give rise to a long term evolutionary collapse.
Test: Validation of Molecular Level Dynamics
We present two initial, validation tests. These both use a single, fixed size, flow reactor (M=10^3). Replication error (molecular mutation) is disabled. In the first test, the reactor is initialised with equal concentrations of two molecular species, neither of which is a substring of the other. This is a simple case of pure class 9 (independent replicators) dynamics. The prediction is for an initial period of statistical fluctuation; but as soon as one species gains a statistically significant higher concentration the “survival of the common” dynamic will take effect and that species will quasi-deterministically take over the reactor, selectively displacing the other. Of course, which species will “win” will vary randomly, with equi-probability, between runs. The approximate predicted dynamic is as in figure 14. Figure 16 shows one example run confirming this behaviour. This demonstrates the basic correctness of the implementation and of the approximate class 9 analysis.

Figure 16: MCS-0: Survival of the Common [M = 10^3]

The second validation test is essentially identical except testing the class 6 dynamic of invasion by a facultative parasite. In this case the reactor is initialised with the “host” species, x_1, at a concentration of 0.95, and the parasite species, x_2 at a concentration of 0.05. The host sequence is a one bit shorter substring of the parasite sequence. The approximate predicted dynamic is as in figure 8. Figure 17 shows one example run confirming this behaviour. Again, this demonstrates the basic correctness of the implementation and of the approximate class 6 analysis.

Figure 17: MCS-0: Displacement by Facultative Parasite [M = 10^3, x_1(0)=0.95]

Test: Molecular Level Evolutionary Dynamics
The test again uses a single, fixed size, flow reactor (M=10^3); but now replication error is enabled (v=0.05). The reactor is initialised with a single “dominant” species, 10 bits long, at concentration 0.60 (being the expected steady state concentration with a per-molecule mutation rate V=1-(1-v)^l=0.40). The balance of the molecules are generated as mutants of this dominant (in accordance with what would be the steady state mutant distribution). The predicted evolutionary behaviour, as discussed above, is a series of successive molecular displacement events, each corresponding to the random generation of a facultative parasite of the currently dominant species (host). By definition for this system, each such parasite must be at least one bit longer than the previous dominant. Accordingly, the per-molecule mutation rate, and corresponding mutational load, will increase with each displacement event; and the (quasi-)steady state concentration of each dominant (until it is itself displaced) will become progressively lower. Figure 18 shows one example run confirming this behaviour. Each separate plot corresponds to the concentration of a species which, at some point in the run, became the principal species (i.e., had the highest concentration of all species instantaneously present). It is seen that there is a sequence of displacement events; in each case, the new dominant species has a sequence which is a proper superstring of that of the previous dominant. Over macro-evolutionary time, the sequence length of the dominant species increases monotonically; and the “intrinsic fitness” (as indicated by the “steady state” concentration achieved by each successive dominant) steadily falls. When this run was (arbitrarily) terminated, the dominant species had sequence length 22, with a steady state concentration \simeq 0.32 and therefore a total replication rate of only \simeq 0.10. Even though an individual species can successfully self-maintain, under large mutational load, for an extended period, it is always eventually displaced by a longer parasite. As predicted, the long term evolutionary trajectory, via perfectly “Darwinian” short-term selection events, is a collapse to an essentially inert state.

Figure 18: MCS-0: Evolutionary “Collapse” [M = 10^3]

Protocell Level Evolutionary Dynamics
The MCS-0 chemistry is now extended with a protocell level dynamics. Instead of a fixed size reactor, molecules are now contained in variable sized protocells. The number of molecules is allowed to grow within a protocell. However, at a fixed threshold size (denoted S_{\mbox {max}}) the protocell becomes unstable and will divide or fission, with independent assortment of the contained molecules between the daughter protocells. Separately, the total size of the protocell population has a fixed maximum: whenever a protocell divides, a randomly chosen protocell from the total protocell population is removed (“killed”). Protocell reproduction, combined with a protocell population size limit, creates the conditions for selection and evolution at the protocell level—provided that there are heritable protocell traits with a systematic effect on protocell fitness. The basic heritable trait at the protocell level is the dominant molecular species: given independent assortment into the daughter protocells, the dominant status will normally be preserved, unchanged. In particular, it will be preserved against invasion by class 9 mutations due to the “survival of the common” effect at the molecular selection level; and, even more strongly, against invasion by class 6 mutants which are hosts relative to the dominant species (i.e., of which the dominant is a superstring). However, where a class 6 mutant arises which is a parasite of the currently dominant species then, in that protocell lineage, this parasite will quasi-deterministically grow to displace the incumbent, thus becoming a new dominant; which will then be heritable in turn. Such a molecular level takeover therefore represents a single mutation at the protocell level. It results in a protocell lineage with a different heritable trait; which may potentially be the target of selection at the protocell level. Whether there is selection, or merely drift, then depends on whether this trait has an effect on protocell level “fitness”. Protocell “death” is, by design, neutral with respect to all protocell traits (the protocell to be killed to maintain the fixed population size is chosen uniformly at random across the protocell population). Therefore the only potential fitness difference between protocells relates to birth rate. As protocell birth events are triggered solely by the number of (informazyme) molecules reaching a fixed threshold (S_{\mbox {max}}), which is independent of the molecular species and identical for all protocells, the only basis for systematic variation in birth rate is variation in molecular replication rate. As we have seen (figure 15), this does vary with the sequence length of the dominant molecular species, because of variation in the replication error rate and thus in the mutational load. We accordingly make two specific predictions about MCS-0 with protocell level structure:
  • Let the protocell population be initialised with equal number of protocells from two distinct strains. The “strain” of a protocell is here identified by the sequence of its dominant molecular species. Assume the two strains correspond to different sequence lengths. Then it is predicted that the strain characterised by the shorter sequence length will quasi-deterministically displace the other. This is a Darwinian selection event at the protocell level.
  • Given a protocell population consisting of a single strain, there will be protocell level mutational events on an on-going basis. These will correspond to the occurrence of class 6 molecular level takeovers in a founder protocell for each such lineage. However, because such strains necessarily are characterised by longer sequence length (as that is the only situation that allows a molecular level takeover) they will always be of lower fitness at the protocell level, and be rapidly eliminated again. Although protocells characterised by shorter sequences could invade if they ever arose, they can’t arise due to molecular level selection. Whereas protocells characterised by longer sequences can arise through the molecular level dynamics but are then selected against at the protocell level. That is, selection at the protocell level acts to precisely oppose selection at the molecular level, and the result is an evolutionary “stalemate”. The protocell level population will remain dominated indefinitely by whichever initial strain is characterised with the shorter sequence length.
Test: Protocell Level Evolutionary Dynamics
Figure 19 shows a test run of MCS-0 with the full protocell level structure (protocell fission threshold is S_{\mbox {max}} = 1.5\times 10^3 and the protocell population size is set at a maximum of 50). The population is initialised with equal numbers (25) of protocells of each of two strains, one dominated by a sequence of length 10, the other dominated by a sequence of length 11. It is seen that, as predicted, the strain associated with the shorter sequence length quasi-deterministically displaces the other. Subsequently, two episodes of protocell level mutation can be observed, where lineages of cells are founded which are dominated by molecular species with sequences which are proper superstrings of that of the dominant protocell strain. This is possible at the molecular level because these molecular species are facultative parasites relative to the incumbent dominant species. However, although these protocell level mutations do occur, and found new lineages, these are again selectively displaced by the original strain. The result is the predicted evolutionary stalemate.

Figure 19: MCS-0: Evolutionary “Stalemate” [Fission threshold S_{\mbox {max}}=1.5\times 10^3; protocell population size 50.]

Discussion
The core phenomenon of the single reactor MCS-0 system is the demonstration of a macro-evolutionary epoch, where the length of the dominant sequence grows steadily. This means that the per-molecule mutation rate is steadily increasing; or, equivalently, the replication “fidelity” is getting steadily lower. That is, this macro-evolutionary trajectory actually results in a progressive and systematic deterioration in intrinsic fitness of the dominating species. This is in marked contrast to the naive “hill climbing” interpretation of evolution; and illustrates how evolutionary processes may be much more a matter of ecological interaction, or game playing, than any kind of optimisation. In particular, we note that this behaviour is completely at variance with the “replicator determinism” scenario which, for example, Dawkins (1976) has characterised with the simplistic slogan “fidelity, fecundity, longevity”. Conversely, once a protocell organisation is superimposed on the “naked replicator” informazyme chemistry, the evolutionary collapse implicit in the unstructured molecular level is immediately halted. MCS-0 is, of course, a highly simplified “toy” system. Nonetheless, it serves as a useful baseline implementation of informazyme dynamics, enclosed within protocells. It has allowed basic validation of core elements of the ODE analysis; and has successfully demonstrated the operation of two hierarchically distinct but interacting levels of selection. It is thus a well-characterised foundation for elaboration into more sophisticated models.

MCS-1: Extended Recognition14

MCS-1 extends MCS-0 such that the full range of pairwise interaction dynamics become possible; and, further, parasites are not necessarily longer than their hosts (they can be the same length, or shorter). The latter should be sufficient to break the evolutionary “stalemate” described previously.
Enzymatic Map
In MCS-1 the folding is implemented as a mapping from strings (sequences) on the primary alphabet of {0, 1} (as used in MCS-0) to strings on a secondary “enzymatic” alphabet of {L, H}. The mapping operates left-to-right on sequential bit-pairs (dibits) as shown in table 1.

Primary (dibit) Secondary
00 L
01 L
10 H
11 H

Table 1: MCS-1: Mapping of primary structure dibits to secondary structure (enzymatic) symbols.

If the primary string has an odd number of bits the final trailing bit is ignored (has no function). As with MCS-0, the only enzymatic function is replication. The entire secondary structure sequence is used for recognition, according to the following rules:
  • L in the enzyme matches (binds) 0 in the substrate.
  • H in the enzyme matches (binds) 1 in the substrate.
  • Recognition occurs provided that the complete enzyme secondary string binds sequentially anywhere in the substrate sequence.
This is effectively substring binding, as in MCS-0, except now based on the secondary sequence of the enzyme recognising the primary sequence of the substrate. In MCS-0, an enzyme of any given length:
  • cannot bind any substrates shorter than itself;
  • binds exactly one substrate of the same length (namely another instance of itself);
  • and binds just those substrates longer than itself of which it is a proper substring (thereby being parasitised by them).
In MCS-1, as the secondary string is always only half as long as the corresponding primary structure, recognition is generally (and deliberately) less specific, and the possible interactions and relationships are significantly more diverse. Thus, an MCS-1 enzyme of any given (primary) length will bind to many different substrates of the same length—but not necessarily including other instances of itself; so, while in MCS-0 all enzymes were self-replicases, in MCS-1 only some are. Similarly, in MCS-0 if two species have a host-parasite relationship the parasite is always longer than the host; whereas in MCS-1 the parasite can be the same length or shorter (because its secondary structure, which defines the length of the required recognition region, is only half as long). More generally, while in MCS-0 only class 6 and 9 pairwise reaction dynamics could be realised, in MCS-1 all ten pairwise reaction classes can be realised.
Molecular Level Dynamics: Validation
Brute force search was used to identify pairs of informazyme species realising each of the possible pairwise interaction classes (neglecting the trivial case of the completely inert, class 0, pairing). For each such pair, experimental runs were carried out similar to those described for the MCS-0 validation; i.e., a fixed size flow reactor, initialised with an appropriate mix of the two species, and with replication error (molecular mutation) disabled. In all cases, the experimental runs matched the predicted dynamic behaviour qualitatively and quantitatively (modulo the statistical fluctuation due to finite size of the reaction, M=10^3).
Molecular Level Evolution
A simplistic prediction of the MCS-1 molecular evolution behaviour (i.e., in a single, fixed size, reactor, with replication error enabled) might be that it should be very similar to MCS-0. That is, if seeded with a self-replicase species, this species should self-maintain at a concentration \simeq 1-V (V being the per-molecule mutation rate), until a class 6 facultative parasite species arises; the latter would then displace the host, and become dominant in turn. However, in contrast to MCS-0, as parasites are not constrained to be longer than hosts, these displacement events should no longer result in monotonically increasing length of the dominant species. Rather, while some displacements would, indeed, result in a length increase, at least some displacements should also be observed in which the length remains the same or decreases. However, MCS-1, as defined, cannot in fact yield this behaviour. To illustrate this, figure 20 shows the concentration of a seed self-replicase species in ten experimental runs. With V=0.1, the reactor is initialised with a self-replicase at its notional steady state level of 0.9. However, instead of self-maintaining at this concentration for a period of time (until a class 6 parasite might arise), the concentration of the seed species rapidly collapses in every run. Nor is this due to “early” occurrence of class 6 parasites; examination of the species present in the reactor after the collapse of the seed species shows a highly diverse mix of species with no clearly identifiable “dominant” (i.e., no identifiable parasite of the original seed species).

Figure 20: MCS-1: Collapse of seed self-replicase [M=10^3, l=10, v=0.01, V=0.10, ten runs]

The explanation for this behaviour is clear from figure 21, which shows, for one of these runs, an analysis of all molecules present in the reactor by pairwise interaction class relative to the seed species. Clearly, the collapse in concentration of the seed species is initially due to diversification into a quasi-species of class 1 mutants (complete mutualists); and subsequently, to generation of class 4 obligate parasites.

Figure 21: MCS-1: Collapse of seed self-replicase—analysis by pairwise class relative to the seed.

Of course, this behaviour was actually already anticipated in the discussion of class 1 and class 4 pairwise systems, under mutational conditions; so, in effect, this experiment can be considered as a validation of that original analysis.15 What this demonstrates is that, despite its rather minimal complexity, MCS-1 is already “rich” enough in side reactions that coherent replicator dynamics (quasi-stable self-maintenance) or (Darwinian) molecular evolution is not possible in general, at least not in well-stirred reactors. In particular, there is very high (possibly unbounded) diversification across MCS-1 (class-1) quasi-species, due to the combination of relatively unspecific recognition and the fact that there are only very weak intrinsic fitness differences (itself a consequence of the more general MCS design, where recognition is all or nothing, and all “successful” replications then occur at the same rate). At the very least, this diversification makes it difficult to analyse the underlying dynamics of a reactor over time. But in any case, even if quasi-species diversification is considered as being, in some sense, “benign” (i.e., there is still “collective self-maintenance”), growth in obligate parasites (class 4) is not; and inevitably poisons ongoing replication activity. The problem of obligate parasitism in this kind of limited-specificity replicase system is well known; and can, in principle, be controlled through spatial structure (McCaskill et al.2001). In our context therefore, depending on the detailed mutation rates and distribution, it may be feasible to control obligate parasitism through the protocell containment mechanism. In any case, for the specific purposes of the current study, which is intended to investigate evolution of protocell embedded molecular computation, we will take a somewhat more focussed approach. We will retain the enhanced recognition mechanism of MCS-1, but introduce the ability to selectively disable reactions (replications) “associated” with particular pairwise interaction classes (specifically, class 1 and class 4). This is implemented by adding a test to every collision. If the two molecules are of distinct species, then the pairwise interaction class of the two species is checked against the allowed pairwise classes. If the interaction class is not allowed, then the collision is made elastic. Note that the effect of this particular mechanism on the dynamics is not simply that the “disabled” pairwise interaction classes cannot be instantiated; rather, species pairs which would instantiate a particular interaction class may (or may not) be transformed to instantiate a potentially different class as shown in table 2.

Original Class
Transformed Class
0 0 (no change)
1 9
2 2 (no change)
3 0
4 2
5 2
6 9
7 2
8 0
9 9 (no change)

Table 2: Parameterised Transformation of Pairwise Interaction Classes

(Note, in particular, that the dynamics associated with the pairwise interaction classes 0, 2 and 9 cannot be disabled by this mechanism, because it can only affect replications arising from cross-catalysis, and these classes do not involve such cross-replicase activity in the first place.) Transforming class 1 and class 4 pairwise interaction dynamics using this mechanism may then be predicted to recover the relatively tractable informazyme behaviour of MCS-0, with its possibility of protocell level heritability based on a single dominant informazyme species in any given protocell; but still with the more flexible mutational space (at the protocell trait level) afforded by the richer, limited-specificity, recognition model of MCS-1. Figure 22 shows one example run to test this. The test uses a single, fixed size, flow reactor (M=10^3), with replication error enabled (v=0.01). The reactor is initialised with a single “dominant” species, 10 bits long, at concentration 0.90 (being the expected steady state concentration with a per-molecule mutation rate V=1-(1-v)^l=0.10). The balance of the molecules are generated as mutants of this dominant (in accordance with what would be the steady state mutant distribution). Reactions corresponding to class 1 and class 4 pairwise interaction classes are disabled (i.e., these pairwise interaction classes are transformed to class 9 and 2 respectively, per table 2). Each separate plot corresponds to the concentration of a species which, at some point in the run, became the principal species (i.e., had the highest concentration of all species instantaneously present). It is seen that there is a series of displacement events; and, in many cases, the new dominant species is, indeed, class 6 relative to the previous dominant. There are specific displacement events involving longer, shorter, and same-length species. These events are all in accordance with the predicted behaviour. However, in addition to these events we also still observe some anomalous events. In particular, at c. t = 2000 in this run, the then dominant self-replicase species concentration collapses, but is not replaced by a single new, dominant species. Instead there is an extended interval of high diversity before a different self-replicase emerges to dominate again. There are also anomalous events in which a dominant self-replicase species is apparently displaced by a single competitor, but the latter is not class 6 relative to it. These behaviours are interpreted as indicating a breakdown of the simplifying assumption that the dynamics of an MCS system with a diversity of species (n>2) can be approximated by a superposition of the separate dynamics of the various pairwise (n=2) systems that are simultaneously being instantiated. Presumably these anomalous events reflect more complicated ecological relationships (for example, collective parasitism by co-operating sets of species etc.). There may also be a more-or-less complex dependence on the (dynamic) pattern of mutant generation.

Figure 22: MCS-1: Molecular Evolution [M = 10^3, v=0.01]

This more complex molecular evolutionary behaviour is clearly of some interest in its own right. In particular, figure 23 shows the total replication rate for the same experimental run. It is seen that, for this particular case, the replication rate also drops significantly during the extended anomalous period. This suggests that protocell level selection might displace protocells which are affected by such a collapse of self-replicase activity (assuming that it is heritable); however, substantial further investigation would be necessary to characterise this effect in detail. Accordingly, for our immediate purposes here, we will instead take the simplifying step of disabling all pairwise interaction dynamics other than:
  • Class 0 (both species inert)
  • Class 2 (one species self-replicase, one inert)
  • Class 6 (facultative parasitism/displacement)
  • Class 9 (independent self-replicases)
With this configuration (which is, in effect, intermediate in the complexity of its informazyme dynamics between that of MCS-0 and MCS-1), we can now proceed to the elaboration of MCS to support protocell-embedded molecular level “computation”.

Figure 23: MCS-1: Molecular Evolution, Total Replication Rate

MCS-2: Evolution of Protocell-embedded Molecular Computation

Molecular Computation Model
As noted, MCS-2 retains the basic enzyme-substrate recognition mechanism of MCS-1 (albeit with a parameterised ability to selectively enable or disable certain subsets of the possible pairwise interactions). The enzymatic function also always includes replication, as in MCS-0 and MCS-1; however, depending on the specific secondary enzyme structure or folding, there can be additional functionality. This additional functionality is designed to support computation at the molecular level; that is, where the available enzymatically mediated informazyme transformations have a direct interpretation in terms of computation.16 For this particular study, we deliberately restrict the possible computational functionality to be very minimal. This is to allow a focus on the interaction between the computation and the replication function at the molecular level, and the coupling between the molecular computation and the protocell “phenotype” level (where externally applied artificial selection and directed evolution is assumed to be possible). We seek to demonstrate an initial proof of principle of the feasibility of artificial selection and evolution of protocell level functionality that can only be realised by specific computational capability at the molecular level. If this can be done then it provides a basis for potential investigation of more complex molecular computation, embedded in protocells, in the future. Conversely, if it cannot be done, for any reason, even for such “minimal” kinds of computation, then this might indicate a significant potential barrier to the long term application of protocell-embedded molecular computation in realising general purpose information and communication technologies. Accordingly, we identify a “minimal” computational function at the molecular level as counting. This is implemented in the MCS-2 system via two new features of all informazyme molecules:
  • A fixed length prefix (leftmost) fragment of the primary structure becomes a “variable region”. This effectively allows every molecule to record a “counter state”. In the current implementation, the length of this region is fixed by a system-wide parameter, and will thus be the same for all informazymes. It will typically be set as just 2 bits (monomers) for the specific experiments to be reported, allowing counting up to modulo 4. This variable region has no enzymatic function; i.e., it is skipped when extracting the secondary, enzymatic, structure of an informazyme molecule. Further, this region is excluded from the recognition of a substrate; that is, on considering whether a given enzyme can bind to a given substrate, the variable region of the substrate sequence is ignored. Molecules which differ only in the counter state region will be considered as members of the same species.
  • In the secondary, enzymatic, structure of any informazyme, there is also a fixed length prefix (leftmost) fragment with special functionality. The length is set by the same system wide parameter as for the length of the counter state (though now it refers to symbols in the secondary structure, not the primary structure). This sets the “counting function” of each informazyme (when it is folded, i.e., acting as an enzyme).
The reaction behaviour in MCS-2 is then as follows:
  • If the enzyme secondary structure binds to the substrate primary structure (ignoring the counter state region of the latter) then the reaction can proceed; otherwise the collision is elastic.
  • If the two molecules are of different species, the pairwise interaction class of the two species is checked against the allowed classes. If the interaction class is not allowed, then the collision is elastic (i.e., reaction class is transformed per table 2).
  • The substrate is replicated in the usual way, by copying the primary structure (including the counter state), and subject to potential error/mutation.
  • The counter state of the newly created molecule is modified in a manner specified by the counting function region of the enzyme. Specifically, the counter state is incremented, modulo a counting base derived from the counting function region. For a counter region width of w, the counting base encoded by the counting function region can be any element of the set \{1 \; .. \; 2^w\}17. The net effect is that all informazyme molecular species still function as replicases, but they also potentially function as incrementers for a counter state encoded in the substrate, where the counting will roll over back to zero at a value (counting base) which is dictated by the particular enzyme, in the range \{1 \; .. \; 2^w\}. Note that, as a degenerate case, a species with counting base 1 does not increment at all, but forces the counter state to zero (as any counter state modulo 1 is zero).
Consider now the core behaviour of a self-replicase species in MCS-2:
  • We assume that the allowed pairwise interaction dynamics are classes 0, 2, 6 and 9 (per the discussion under MCS-1 molecular level evolution). Pending the emergence of a class 6 facultative parasite, a dominant self-replicase should be able to maintain its concentration, subject to the usual mutational load, i.e., at a concentration \simeq 1-V.
  • Across the molecular population of this species, there will generally be variation in the sequence of the counter state variable region. This is because, on each replication, the sequence of this region will be actively modified.
  • However, this variation will be different both in dynamic and in distribution from conventional “quasi-species” variation (driven simply by mutation). First, it will be limited in extent by the counting base encoded by the counting function region. This is a value in the range \{1 \; .. \; 2^w\}, which we will denote as z_i for any particular species i. Correspondingly then, the possible values of the counter state region will be limited to the set \{ 0 \; .. \; (z_i-1) \}. The distribution across these possible values will be statistically uniform (just from symmetry). Moreover, as this distribution is actively “forced” (the incrementing is performed with probability 1 on every replication), relaxation to it will be comparatively rapid. This is in contrast to the quasi-species case where the relaxation to the stable distribution is driven by the (comparatively) much lower rate of spontaneous mutation during replication.
In MCS-2, therefore, we have the possibility of self-replicase species giving rise to stable, and characteristically different, distributions of their variable regions, as a result of (stochastic, distributed) computational activity at the molecular level. Similarly to the original molecular evolution scenario of MCS-0, as class 6 mutations occur the dominant molecular species may be displaced by a different species; which may have a different counting function region (counting base) and therefore a different distribution of sequence variation in the counter state region.
Coupling to Protocell Morphology
In order to use selection at the protocell level to evolve molecular level computation, there must be some coupling between the two. We therefore introduce a mechanism whereby the primitive molecular computing mechanism described above can affect “phenotypic” traits at the protocell level; which can subsequently be the subject of artificial selection and evolution. In MCS-0 and MCS-1, protocell division was triggered simply by the number of informazyme molecules reaching a fixed, system-specified, threshold (S_{\mbox {max}}). In MCS-2, by contrast, we introduce a less rigid (and somewhat more biologically motivated) mechanism. We define an additional family of molecule, separate from the informazymes, designated as “membrane molecules”. Protocell division is now controlled, not by the number of informazyme molecules, but by the number of membrane molecules. Other than that, the protocell division mechanism is as before: it is triggered when the number of membrane molecules reaches a fixed, system-specified threshold (denoted B_{\mbox {max}}); and is implemented by splitting the parent protocell into two daughter protocells, with independent assortment of the parental molecules (both informazymes and membrane molecules) between these two. Membrane molecules are generated via a side reaction or side effect of the core replication/counting reaction between informazymes. This is made dependent on the variable region (counter state) of the newly replicated substrate/template in each successful replication: if this counter state has the value zero, then a membrane molecule is produced; otherwise no membrane molecule is produced. The effect is that, depending on the counting function of the dominant self-replicase in the protocell, membrane molecules can be generated at varying rates—on every replication, or every second replication (on average) etc. For self-replications of a species i, with counting base z_i, then 1/z_i of these, on average, will also generate a membrane molecule. As a result, different protocell strains (dominated by different informazymes) can have different characteristic sizes (measured by the total number of molecules, informazyme plus membrane) over their life-cycle. Thus, the counting function enzymatic region of the dominant informazyme is coupled to the phenotypic trait of (average) protocell size. Given that the standard MCS mechanism of class 6 facultative parasitism supports “mutation” at the protocell level (via takeover, in a particular protocell lineage, by a new dominant informazyme species), in a population of protocells there should then be heritable variability in the phenotypic trait of (average) size; and that will permit artificial selection for protocell size as a mechanism for evolution of protocell-embedded computation. One final technical modification is made to the MCS protocell selection architecture here. In MCS-0 and MCS-1 all protocell lineages have the same (average) size. As a result, regulating the protocell population to a fixed size has essentially the same effect as regulating the molecular (informazyme) population to a fixed size. In MCS-2, where there is heritable variation in protocell size, limiting the protocell population and limiting the informazyme population result in somewhat different protocell selection dynamics. This does not affect the outcome of any particular selectional event at the protocell level (as long as protocell “death” rate is independent of any heritable protocell traits); but does effect various technical details (such as realtime execution duration, normalisation of the simulation timescale, and graphical presentation of the evolutionary behaviour). Accordingly, we will stipulate that the specific results to be presented here will use the mechanism of limiting the molecular (informazyme) population to a fixed size. This is implemented as follows:
  • Whenever a new informazyme molecule is produced (in any protocell), the total number of informazymes (across all protocells) is compared to the specified threshold (M_{\mbox {\scriptsize max}}).
  • If the threshold is exceeded, a protocell is chosen uniformly at random across the whole protocell population and removed (“killed”).
Artificial Selection
To demonstrate the directed evolution of protocell-embedded molecular computation, we impose an artificial selection pressure, based on a relevant protocell-level trait. The mechanism here is to modulate the overall reaction rate in each protocell according to this applied selection pressure. This can be thought of as analogous to, say, modulating an energy flux to real (proto-)cells according to the chosen traits. An arbitrary function is externally provided which maps the relevant protocell trait(s) onto a value g \in [0, 1], the “molecular activity factor”. This is then interpreted as a probability of reaction at the molecular level. That is, an additional qualification step is added at the start of the processing of each molecular collision, as follows:
  • The instantaneous value(s) of relevant protocell trait(s) are evaluated for the containing protocell.
  • The applicable value of g is calculated.
  • A bernoulli random variable with parameter g is sampled (biased coin toss).
  • If this has the value 0 the collision is treated as elastic; otherwise it proceeds as normal.
As protocell reproduction ultimately relies on molecular replication (via production of membrane molecules), this allows protocell birth rate to be modulated in accordance with the applied molecular activity factor. As protocell death rates are made equal for all protocells (independently of any heritable traits), birth rates effectively represent the relative fitnesses of protocell lines with different inherited traits, and should drive protocell-level selection. For the experiments to be described here, the objective is to selectively evolve protocells dominated by informazymes with specific counting function (i.e., counting base). As explained in the discussion of the coupling to protocell morphology, the dominant counting function is coupled to the characteristic protocell size (measured by number of molecules) via the rate of membrane molecule production. Therefore, the function g will be a map from an appropriate measure of (instantaneous) protocell size. Note that the variation in number of membrane molecules over the protocell life-cycle is identical for all protocell strains, ranging from B_{\mbox {\scriptsize max}}/2 to B_{\mbox {\scriptsize max}} just before fissioning. The number of informazyme molecules also varies over the protocell life-cycle, but this range depends on the dominant counting base. Specifically, let the dominant counting base be z_i. Then, neglecting replications of molecules other than those of dominant species i, the generation of B_{\mbox {\scriptsize max}}/2 membrane molecules will require z_i (B_{\mbox {\scriptsize max}}/2) informazyme replications. It follows that the number of informazymes will vary (in the “steady state” protocell life cycle) between z_i (B_{\mbox {\scriptsize max}}/2) and z_i B_{\mbox {\scriptsize max}}. In particular, taking B_{\mbox {\scriptsize max}}=100 and w=2 (counter state/counting function width, so z_i\in \{1..4\}), Table 3 shows this life cycle range of S (number of informazymes) for the four possible values of z_i. These values will be used to structure the applied activity factor (g(S)) appropriately.

z_i S_{\mbox {\scriptsize min}} S_{\mbox {\scriptsize max}}
1 50 100
2 100 200
3 150 300
4 200 400

Table 3: “Steady state” life cycle ranges of informazyme count (S) for different values of dominant counting base z_i [B_{\mbox {\scriptsize max}}=100,\;w=2]

Figure 24 shows the example activity factor used in the series of MCS-2 experiments described in the following sections. However, note that, in general, nothing critical should hinge on the detailed structure of this function, as long as protocell strains with different life cycle size ranges are assigned significantly and consistently different birth rate on average.

Figure 24: Molecular activity factor (g(S)) to select counting to base z_i = 3.

Given a particular definition of g(S), and neglecting mutational load, the expected gestation period can be calculated as follows:
dS --- = Sg (S) dt dt = --dS-- Sg (S) ∫ Smax dS T = ------ Smin Sg (S)
The protocell birth rate will be simply 1/T. The average cell size over its life-cycle will be given by:
 1 ∫ T ¯S = -- Sdt T 0
Given that the total number of informazyme molecules is subject to a fixed maximum (M_{\mbox {\scriptsize max}}), the protocell population size will vary depending on the average size of the protocells present. If the population is dominated by a single counting base, the expected population size will be given by M_{\mbox {\scriptsize max}}/\bar {S}. Applying these derivations to the example activity factor from figure 24, and with M_{\mbox {\scriptsize max}}=1.5\times 10^4, we can calculate (using numerical methods as required) a variety of nominal protocell parameters as shown in table 4. Actual values will vary, due to mutational load, depending on the exact length of the particular dominant molecular species and the consequent replication error rate. Expected gestation times would be increased by the factor 1/(1-V) (V being the per-molecule error/mutation rate); and expected birth rates would be correspondingly reduced.

z_i
Gestation Time
Birth Rate
Relative Fitness
Mean Size
Mean Population
4 3.282 0.305 0.211 334 45
1 1.733 0.577 0.400 72 208
2 0.867 1.154 0.800 139 107
3 0.693 1.442 1.000 216 69

Table 4: Nominal parameters for protocells of the four possible distinct counting base values, ordered by relative fitness (neglecting mutational load).

Test: Directed Evolution of 2-bit Counting
We now present a sample of the evolutionary dynamics of the MCS-2 system, as described above. In each case the population is initialised with protocells all having the same dominant molecular species. This species is chosen to be a 20-bit long self-replicase, with counting base z_i=4; that is, having the minimum protocell fitness according to the imposed activity factor (figure 24). Figure 25 illustrates a run in which the “optimum” molecular counting base (z_i = 3) is successfully evolved. The graph shows the numbers of protocells present in the population, classified by the counting base of the principal molecule in each protocell. Note that, because the expected protocell sizes vary for different counting base values, the corresponding maximum populations also vary, as summarised in table 4. Accordingly, the counting base of the dominant protocells in the population can be conveniently inferred from the quasi-steady-state population size during each evolutionary phase in figure 25. In this run, the optimum was reached through two steps:
  • At t \simeq 300 there emerged a protocell strain with z_i=1.
  • At t \simeq 650 there emerged a protocell strain with the optimal counting base, z_i=3.
  • Thereafter, the population remained dominated by protocells with this optimal counting base.
In other successful runs, the exact trajectory followed may be different, just according to the protocell mutations which happen to arise. In particular, it is possible to jump from the initial, minimum, protocell fitness, to the optimal fitness level in just one transition; or it may sometimes require three fitness transitions in total.

Figure 25: Successful evolution of protocell-embedded 2-bit molecular counting. Numbers of protocells, classified by counting base (z_i) of principal molecular species in each protocell.

While the evolutionary behaviour is relatively simple when viewed at this level, it is useful to examine the more detailed behaviour at the level of specific protocell lineages. Figure 26 shows the same experimental run as before, but now identifying each distinct protocell lineage (classified by its principal molecular species). This demonstrates a somewhat more complicated sequence of events:
  • During the phase up to t \simeq 300, the initial protocell lineage is displaced by another with the same counting base at t \simeq 50. The principal molecular species in this new lineage is, as expected, a class 6 facultative parasite of the principal molecular species of the initial lineage; however, it is also one bit shorter. As analysed in the context of the MCS-0 system, this does imply a reduced mutational load, and consequently slightly higher fitness. This explains the relatively slow, but essentially monotonic, displacement.
  • At t\simeq 300, the first lineage with higher fitness emerged, and took over the population.
  • During the phase from t \simeq 300 to t \simeq 650, there is significant diversification of the protocell population, with a variety of lineages present at various times. However, these all have the same counting base (z_i=1) and thus similar fitness. This period of diversification might therefore be reasonably interpreted as largely random drift between these almost equal fitness lineages that are mutationally close to each other.
  • At t \simeq 650 there emerged a protocell lineage with the optimal counting base, z_i=3. This one lineage continued to dominate the population exclusively for the rest of the run.


Figure 26: Successful evolution of protocell-embedded 2-bit molecular counting. Numbers of protocells, classified by principal molecular species in each protocell.

However, the evolutionary dynamics in this system can be very variable. Thus, consider the run shown in figures 27 (classified by counting base). This shows just a single increase in protocell fitness (t \simeq 830), with no further improvement being found within the time allowed for the run (t_{\mbox {\scriptsize max}}=2000). The molecular species level graph of the same run, figure 28, shows significant additional complexity in the system behaviour. Note that during the period from t \simeq 830 to the end of the run, there is continuing ongoing mutation at the protocell species level, with many new species arising (and, indeed, one of these eventually drifting to fixation). Also note that, while protocell diversity is generally quite low, there is one relatively short period, from t \simeq 200 to t \simeq 400, where it is clearly much higher. These phenomena turn out to be quite common across large numbers of evolutionary runs and reflect distinctive features of the informazyme inheritance mechanism, and the interaction between molecular and protocell levels of selection. In particular:
  • Protocell level mutation is based on molecular level facultative parasitism (class 6 pairwise molecular dynamics). This is an asymmetric relationship, by definition. It follows that direct “back mutation” is never possible at the protocell level.
  • Certain molecular species have one or more class 6 parasites within a single bit mutation. The probability of such a parasite arising in a single replication is v/4 (assuming equal distribution over the four possible mutations at each bit position). The probability that this will occur at least once during a complete protocell reproduction is then given by 1-(1-v/4)^{S_{\mbox {\tiny min}}}. For the experimental parameters in use here, this varies between 0.11 and 0.39. While stochastic effects mean that generation of a single instance of such a parasite will not always result in a protocell level mutation, the nett rate of protocell level mutation will be of this order. Note that this cannot be offset by (direct) back mutation, as already explained. Thus, in the absence of significant fitness differences, a protocell strain based on such a molecular species will necessarily be unstable and will progressively be displaced by these mutant 1-bit strains.
  • This process can be iterative: as long as a protocell strain is characterised by a molecular species that has one or more class 6 parasites within a single bit mutation, such a strain will be unstable against those mutations.18 There may thus be a cascading diversification of protocell species. We conjecture that this is precisely the high diversification phenomenon of figure 28, from t \simeq 200 and t \simeq 400.
  • Conversely, certain molecular species may have no immediately nearby class 6 mutants. These are effectively the end points of the cascades described above. These have much lower spontaneous mutation rates, at the protocell level (requiring at least a 2-bit mutation in a single replication) and can thus be successfully stabilised by the imposed selection pressure (via the activity factor g(S)). However, there is then no guarantee that any such 2-bit class 6 mutations will be available at all, or which would yield a higher protocell level fitness. This can then give rise to extended evolutionary stagnation. This is the general phenomenon illustrated by the period from t \simeq 830 to the end of the run of figure 28. In this case, although protocell level mutations do continue to occur, none have a higher protocell level fitness.
While many other MCS-2 runs have been generated and analysed, across a variety of system parameters, the essential evolutionary features which have been identified are already illustrated by the above two sample runs. In summary, this framework, and these experiments, have demonstrated the basic feasibility of evolving protocell embedded molecular computation capability based on coupling a noisy, low reliability “informational chemistry” (in the form of informazymes), with both information storage and processing capability, to some tangible protocell level trait which can act as a target for imposed selection (such as characteristic protocell size in the current example). However, these experiments also show that the protocell-level mutational dynamics generated by such an architecture may have a number of distinct features, not typically assumed in evolutionary models. These include “one-way” mutational pathways (no back mutation), and high variability in mutational rate which is itself heritable. These features strongly influence the evolutionary trajectories that are available: it does not automatically follow that the outcome of an evolutionary experiment in such a system will be wholly, or even primarily, determined by the externally applied selection pressure: the “internal” constraints of the protocell level mutational network will also have a very strong influence.

Figure 27: Stalled evolution of protocell-embedded 2-bit molecular counting. Numbers of protocells, classified by counting base (z_i) of principal molecular species in each protocell.



Figure 28: Stalled evolution of protocell-embedded 2-bit molecular counting. Numbers of protocells, classified by principal molecular species in each protocell. Note very high diversity phase between t \simeq 200 and t \simeq 400.

Notes

11Throughout, the informazyme sequences or binary strings will be assumed to have a distinguished directionality, with identifiable “left” and “right” ends, somewhat analogously to the 5' and 3' ends of single stranded nucleic acid molecules.

12The core results in this section have been presented in (McMullin et al.2007b,a).

13The approximation R \simeq \bar {x}^2 neglects replication by species other than the current “dominant”. Assuming equal distribution across n class 9 mutants, this error term is approximately:

 ∑n Re = ((1 -x¯)∕n)2 = (1 - ¯x)2∕n = V2∕n
Calculation of n is non-trivial; but in MCS-0, every same sequence length mutant is class-9, so a reasonable minimum estimate is the number of these within a hamming distance of 1, i.e., l. This corresponding upper bound for R_e is also plotted in figure 15; where it is seen that even up to l\simeq 25, where \bar {x} has fallen to only 0.27 and the replication rate of x to only \simeq 0.077, the total replication rate for all mutants will be still be much smaller, at \simeq 0.020 or less. A species can still therefore effectively “dominate” the reactor dynamics in this system even when its absolute concentration is substantially smaller than the aggregate mutant population.

14A preliminary version of material in this section is presented in (Kelly et al.2008).

15In fact, the chronology of the investigation was opposite to this: the self-replicase collapse phenomena of figure 20 were observed first, and only subsequently understood through more careful consideration of the dynamics of class 1 and class 4 systems, under continuous mutation conditions.

16In principle,a more general possibility might be for molecular computations to be carried out by a different molecular family—i.e., other than informazymes—but which is somehow coupled to the informazyme chemistry (in order to be heritable). However, given that informazymes, by definition, already have the core computational functionality of being “information” carriers, and already directly realise protocell-level heritability, we choose to focus, in this investigation, on adding computational capabilities to their existing functional repertoire rather than to introduce a completely separate molecular family for this purpose.

17In essence the secondary string associated with the counting function is interpreted as a binary number in the “natural” way (L denoting digit 0 and H denoting digit 1) but with the special case that the number 0 is interpreted as 2^w.

18In principle, such a cascade could even be cyclical, though no such case has been experimentally confirmed in the MCS-2 system.

© 2004-2008 All rights reserved by PACE Consortium .   Email.   Web Managers: U. Tangen & J. S. McCaskill