Lead partner: ALife Group, Dublin City
University
Molecular Classifier Systems (MCS)
We now
report on a selection of simulation models of informazyme
systems, and protocells incorporating them. This investigation
firstly tests the above theoretical analysis and then progresses
to the demonstration of various protocell level evolutionary
phenomena, ultimately including the directed evolution of
(basic) protocell “computation” capability. These
models are all based on a simplified Artificial Chemistry
loosely inspired by Holland’s Learning Classifier
Systems (
Holland and
Reitman,
1977;
Holland,
2006), which we call the
Molecular Classifier System (MCS) family of models. In these
models the family of informazymes are represented by variable
length binary strings; i.e., polymers where the sequence of
monomers in the primary sequence are of just two distinct kinds,
denoted by the symbols
0
and
1
.
Conceptually, each informazyme species has an
informational structure (primary structure, or monomer sequence)
and a separate enzymatic function (“folded” or
secondary structure, or “shape”), as inspired by the
role of ribozymes in the RNA world hypothesis (
Joyce,
1991). The fundamental
molecular dynamic in all the MCS models is then the collision
between two molecules. These are taken to be selected under a
uniform random distribution from the available molecules: i.e.,
the molecular level system is well stirred (within a single
reactor or protocell). One molecule of the pair is randomly
assigned the enzymatic role (the folded conformation), and the
other is assigned the substrate role (the unfolded
conformation). If the enzyme binds to, or recognises, the
substrate, then the reaction may proceed, in accordance with the
specific enzymatic function; otherwise the collision is elastic.
We shall explore a number of binding or recognition rules, with
different effects. In some cases we report on experiments with a
single, fixed size, flow reactor, i.e., where there is only
single-level, molecular “selection” or
“evolution”. In other cases, we consider the
molecules to be contained in protocells which can grow and
divide, subject to some overall resource constraint on the
protocell population size. In these cases we have the potential
for selection and evolution at the protocell level also, with
hierarchical coupling between these two levels of selection. A
common enzymatic function, present in all the MCS models to be
discussed here, is the ability to make an error-prone bit-wise
copy of the primary, informational, structure of the bound,
substrate, informazyme molecule (here regarded as a template to
be replicated); that is, a replicase function. Replication
errors will include point mutations, insertions and deletions.
The models to be presented will differ in the recognition or
binding mechanism; and in the repertoire of additional enzymatic
functions other than replication.
MCS-0: Minimal Recognition
MCS-0 is a minimal implementation, used
to validate the core premise of using an informazyme sub-system
as the hereditary mechanism of a protocell; and to demonstrate
the coupling between selection at molecular and protocell
levels.
Enzymatic Map
In MCS-0, the
“trivial” folding of an identity map is used; that
is, the secondary structure is identical to the primary
structure. The recognition rule is simple substring mapping:
i.e., binding occurs if the enzyme sequence is a substring of
the substrate sequence. The only supported enzymatic function is
replication. Note that every string is considered to be a
substring of itself.
Molecular Level Dynamics
In MCS-0, all
species are self-replicases. Of the
pairwise interaction classes,
only classes 9 (independent self-replicases) and 6 (facultative
parasite) are possible. Any species will be a facultative
parasite of any species which is a shorter substring of it. This
model is therefore sufficient to demonstrate:
- Self-maintenance of a dominant self-replicase with an
accompanying quasi-species of (class 9) mutants.
- Molecular displacement events, where a class 6 parasite
takes over from an incumbent dominant species.
- As parasites are always longer than hosts, there will be a
monotonic evolutionary trend toward dominance by longer
molecules.
Given a fixed per-bit mutation rate (
), the per-molecule
mutation rate in MCS is given by
, where
is the molecule
length. Thus, as the length of the dominant species increases,
so also will the per-molecule mutation rate. Accordingly, the
mutational load will increase, the steady state concentration of
the dominant will fall (
), and the overall replication rate
will fall as the square of this (
). See figure
15 for an example of how these quantities vary
with increasing
.
While this follows directly from the theoretical analysis, it is
a somewhat counter-intuitive result. In a natural sense, we can
regard the total replication rate as a measure of the intrinsic
molecular “fitness” of a molecular species in this
system; therefore, in this system, we actually predict an
evolutionary trend, based on perfectly “Darwinian”
selection events, toward monotonically decreasing intrinsic
fitness. This arises because the outcome of each selection event
is completely determined by the “ecological
interaction” (host-parasite) between the two species,
which is a much stronger selective effect than the associated
minor decrement in intrinsic fitness. However, even though these
decrements are individually small, they are (in this system)
monotonic, and directly give rise to a long term evolutionary
collapse.
Test: Validation of Molecular Level
Dynamics
We
present two initial, validation tests. These both use a single,
fixed size, flow reactor (
). Replication error (molecular mutation) is
disabled. In the first test, the reactor is initialised with
equal concentrations of two molecular species, neither of which
is a substring of the other. This is a simple case of pure class
9 (independent replicators) dynamics. The prediction is for an
initial period of statistical fluctuation; but as soon as one
species gains a statistically significant higher concentration
the “survival of the common” dynamic will take
effect and that species will quasi-deterministically take over
the reactor, selectively displacing the other. Of course, which
species will “win” will vary randomly, with
equi-probability, between runs. The approximate predicted
dynamic is as in figure
14. Figure
16 shows one example run confirming this
behaviour. This demonstrates the basic correctness of the
implementation and of the
approximate class 9 analysis.
The second validation test is essentially identical except
testing the class 6 dynamic of invasion by a facultative
parasite. In this case the reactor is initialised with the
“host” species,
, at a concentration of
, and the
parasite species,
at a concentration of
. The host sequence is
a one bit shorter substring of the parasite sequence. The
approximate predicted dynamic is as in figure
8. Figure
17 shows one example run confirming this
behaviour. Again, this demonstrates the basic correctness of the
implementation and of the
approximate class 6 analysis.
Test: Molecular Level Evolutionary Dynamics
The test again uses a
single, fixed size, flow reactor (
); but now
replication error is enabled (
). The reactor is
initialised with a single “dominant” species, 10
bits long, at concentration
(being the expected
steady state concentration with a per-molecule mutation rate
).
The balance of the molecules are generated as mutants of this
dominant (in accordance with what would be the steady state
mutant distribution). The predicted evolutionary behaviour, as
discussed above, is a series of successive molecular
displacement events, each corresponding to the random generation
of a facultative parasite of the currently dominant species
(host). By definition for this system, each such parasite must
be at least one bit longer than the previous dominant.
Accordingly, the per-molecule mutation rate, and corresponding
mutational load, will increase with each displacement event; and
the (quasi-)steady state concentration of each dominant (until
it is itself displaced) will become progressively lower.
Figure
18 shows one example run
confirming this behaviour. Each separate plot corresponds to the
concentration of a species which, at some point in the run,
became the principal species (i.e., had the highest
concentration of all species instantaneously present). It is
seen that there is a sequence of displacement events; in each
case, the new dominant species has a sequence which is a proper
superstring of that of the previous dominant. Over
macro-evolutionary time, the sequence length of the dominant
species increases monotonically; and the “intrinsic
fitness” (as indicated by the “steady state”
concentration achieved by each successive dominant) steadily
falls. When this run was (arbitrarily) terminated, the dominant
species had sequence length 22, with a steady state
concentration
and therefore a total replication rate of only
. Even
though an individual species can successfully self-maintain,
under large mutational load, for an extended period, it is
always eventually displaced by a longer parasite. As predicted,
the long term evolutionary trajectory, via perfectly
“Darwinian” short-term selection events, is a
collapse to an essentially inert state.
Protocell Level Evolutionary Dynamics
The MCS-0 chemistry is
now extended with a protocell level dynamics. Instead of a fixed
size reactor, molecules are now contained in variable sized
protocells. The number of molecules is allowed to grow within a
protocell. However, at a fixed threshold size (denoted
) the
protocell becomes unstable and will divide or fission, with
independent assortment of the contained molecules between the
daughter protocells. Separately, the total size of the protocell
population has a fixed maximum: whenever a protocell divides, a
randomly chosen protocell from the total protocell population is
removed (“killed”). Protocell reproduction, combined
with a protocell population size limit, creates the conditions
for selection and evolution at the protocell
level—provided that there are heritable protocell traits
with a systematic effect on protocell fitness. The basic
heritable trait at the protocell level is the dominant molecular
species: given independent assortment into the daughter
protocells, the dominant status will normally be preserved,
unchanged. In particular, it will be preserved against invasion
by class 9 mutations due to the “survival of the
common” effect at the molecular selection level; and, even
more strongly, against invasion by class 6 mutants which are
hosts relative to the dominant species (i.e., of which the
dominant is a superstring). However, where a class 6 mutant
arises which is a parasite of the currently dominant species
then, in that protocell lineage, this parasite will
quasi-deterministically grow to displace the incumbent, thus
becoming a new dominant; which will then be heritable in turn.
Such a molecular level takeover therefore represents a single
mutation at the protocell level. It results in a protocell
lineage with a different heritable trait; which may potentially
be the target of selection at the protocell level. Whether there
is selection, or merely drift, then depends on whether this
trait has an effect on protocell level “fitness”.
Protocell “death” is, by design, neutral with
respect to all protocell traits (the protocell to be killed to
maintain the fixed population size is chosen uniformly at random
across the protocell population). Therefore the only potential
fitness difference between protocells relates to birth rate. As
protocell birth events are triggered solely by the number of
(informazyme) molecules reaching a fixed threshold (
), which is
independent of the molecular species and identical for all
protocells, the only basis for systematic variation in birth
rate is variation in molecular replication rate. As we have seen
(figure
15), this does vary with
the sequence length of the dominant molecular species, because
of variation in the replication error rate and thus in the
mutational load. We accordingly make two specific predictions
about MCS-0 with protocell level structure:
- Let the protocell population be initialised with equal
number of protocells from two distinct strains. The
“strain” of a protocell is here identified by the
sequence of its dominant molecular species. Assume the two
strains correspond to different sequence lengths. Then it is
predicted that the strain characterised by the shorter sequence
length will quasi-deterministically displace the other. This is
a Darwinian selection event at the protocell level.
- Given a protocell population consisting of a single strain,
there will be protocell level mutational events on an on-going
basis. These will correspond to the occurrence of class 6
molecular level takeovers in a founder protocell for each such
lineage. However, because such strains necessarily are
characterised by longer sequence length (as that is the only
situation that allows a molecular level takeover) they will
always be of lower fitness at the protocell level, and be
rapidly eliminated again. Although protocells characterised by
shorter sequences could invade if they ever arose, they
can’t arise due to molecular level selection. Whereas
protocells characterised by longer sequences can arise through
the molecular level dynamics but are then selected against at
the protocell level. That is, selection at the protocell level
acts to precisely oppose selection at the molecular level, and
the result is an evolutionary “stalemate”. The
protocell level population will remain dominated indefinitely by
whichever initial strain is characterised with the shorter
sequence length.
Test: Protocell Level Evolutionary
Dynamics
Figure
19 shows a
test run of MCS-0 with the full protocell level structure
(protocell fission threshold is
and the protocell population
size is set at a maximum of
). The population is
initialised with equal numbers (
) of protocells of each
of two strains, one dominated by a sequence of length
, the other dominated by
a sequence of length
. It is seen that, as predicted, the strain associated with
the shorter sequence length quasi-deterministically displaces
the other. Subsequently, two episodes of protocell level
mutation can be observed, where lineages of cells are founded
which are dominated by molecular species with sequences which
are proper superstrings of that of the dominant protocell
strain. This is possible at the molecular level because these
molecular species are facultative parasites relative to the
incumbent dominant species. However, although these protocell
level mutations do occur, and found new lineages, these are
again selectively displaced by the original strain. The result
is the predicted evolutionary stalemate.
Discussion
The core phenomenon of the single
reactor MCS-0 system is the demonstration of a
macro-evolutionary epoch, where the length of the dominant
sequence grows steadily. This means that the per-molecule
mutation rate is steadily increasing; or, equivalently, the
replication “fidelity” is getting steadily lower.
That is, this macro-evolutionary trajectory actually results in
a progressive and systematic deterioration in intrinsic fitness
of the dominating species. This is in marked contrast to the
naive “hill climbing” interpretation of evolution;
and illustrates how evolutionary processes may be much more a
matter of ecological interaction, or game playing, than any kind
of optimisation. In particular, we note that this behaviour is
completely at variance with the “replicator
determinism” scenario which, for example,
Dawkins (
1976) has characterised with
the simplistic slogan “fidelity, fecundity,
longevity”. Conversely, once a protocell organisation is
superimposed on the “naked replicator” informazyme
chemistry, the evolutionary collapse implicit in the
unstructured molecular level is immediately halted. MCS-0 is, of
course, a highly simplified “toy” system.
Nonetheless, it serves as a useful baseline implementation of
informazyme dynamics, enclosed within protocells. It has allowed
basic validation of core elements of the
ODE analysis; and has
successfully demonstrated the operation of two hierarchically
distinct but interacting levels of selection. It is thus a
well-characterised foundation for elaboration into more
sophisticated models.
MCS-1: Extended Recognition
MCS-1 extends MCS-0 such that the full
range of
pairwise
interaction dynamics become possible; and, further,
parasites are not necessarily longer than their hosts (they can
be the same length, or shorter). The latter should be sufficient
to break the
evolutionary
“stalemate” described previously.
Enzymatic Map
In MCS-1 the folding is
implemented as a mapping from strings (sequences) on the primary
alphabet of {
0
,
1
} (as used in MCS-0)
to strings on a secondary “enzymatic” alphabet of
{
L
,
H
}. The mapping operates
left-to-right on sequential bit-pairs (dibits) as shown in
table
1.
Primary (dibit) |
Secondary |
00 |
L |
01 |
L |
10 |
H |
11 |
H |
Table 1: |
MCS-1: Mapping of primary structure dibits
to secondary structure (enzymatic) symbols. |
|
If the primary string has an odd number of bits the final
trailing bit is ignored (has no function). As with MCS-0, the
only enzymatic function is replication. The entire secondary
structure sequence is used for recognition, according to the
following rules:
L
in the enzyme matches (binds) 0
in the substrate.
H
in the enzyme matches (binds) 1
in the substrate.
- Recognition occurs provided that the complete enzyme
secondary string binds sequentially anywhere in the substrate
sequence.
This is effectively substring binding, as in MCS-0, except
now based on the secondary sequence of the enzyme recognising
the primary sequence of the substrate. In MCS-0, an enzyme of
any given length:
- cannot bind any substrates shorter than itself;
- binds exactly one substrate of the same length (namely
another instance of itself);
- and binds just those substrates longer than itself of which
it is a proper substring (thereby being parasitised by
them).
In MCS-1, as the secondary string is always only half as
long as the corresponding primary structure, recognition is
generally (and deliberately) less specific, and the possible
interactions and relationships are significantly more diverse.
Thus, an MCS-1 enzyme of any given (primary) length will bind to
many different substrates of the same length—but not
necessarily including other instances of itself; so, while in
MCS-0 all enzymes were self-replicases, in MCS-1 only some are.
Similarly, in MCS-0 if two species have a host-parasite
relationship the parasite is always longer than the host;
whereas in MCS-1 the parasite can be the same length or shorter
(because its secondary structure, which defines the length of
the required recognition region, is only half as long). More
generally, while in MCS-0 only class 6 and 9 pairwise reaction
dynamics could be realised, in MCS-1 all ten pairwise reaction
classes can be realised.
Molecular Level Dynamics: Validation
Brute
force search was used to identify pairs of informazyme species
realising each of the possible
pairwise interaction classes
(neglecting the trivial case of the completely inert, class 0,
pairing). For each such pair, experimental runs were carried out
similar to those described for the
MCS-0
validation; i.e., a fixed size flow reactor, initialised
with an appropriate mix of the two species, and with replication
error (molecular mutation) disabled. In all cases, the
experimental runs matched the predicted dynamic behaviour
qualitatively and quantitatively (modulo the statistical
fluctuation due to finite size of the reaction,
).
Molecular Level Evolution
A simplistic prediction of the MCS-1
molecular evolution behaviour (i.e., in a single, fixed size,
reactor, with replication error enabled) might be that it should
be very similar to MCS-0. That is, if seeded with a
self-replicase species, this species should self-maintain at a
concentration
(
being the per-molecule mutation rate), until a class 6
facultative parasite species arises; the latter would then
displace the host, and become dominant in turn. However, in
contrast to MCS-0, as parasites are not constrained to be longer
than hosts, these displacement events should no longer result in
monotonically increasing length of the dominant species. Rather,
while some displacements would, indeed, result in a length
increase, at least some displacements should also be observed in
which the length remains the same or decreases. However, MCS-1,
as defined, cannot in fact yield this behaviour. To illustrate
this, figure
20 shows the
concentration of a seed self-replicase species in ten
experimental runs. With
, the reactor is initialised with a self-replicase
at its notional steady state level of
. However, instead of
self-maintaining at this concentration for a period of time
(until a class 6 parasite might arise), the concentration of the
seed species rapidly collapses in every run. Nor is this due to
“early” occurrence of class 6 parasites; examination
of the species present in the reactor after the collapse of the
seed species shows a highly diverse mix of species with no
clearly identifiable “dominant” (i.e., no
identifiable parasite of the original seed species).
The explanation for this behaviour is clear from
figure
21, which shows, for one
of these runs, an analysis of all molecules present in the
reactor by pairwise interaction class relative to the seed
species. Clearly, the collapse in concentration of the seed
species is initially due to diversification into a quasi-species
of class 1 mutants (complete mutualists); and subsequently, to
generation of class 4 obligate parasites.
Of course, this behaviour was actually already anticipated in
the discussion of
class
1 and
class 4
pairwise systems, under mutational conditions; so, in effect,
this experiment can be considered as a validation of that
original analysis.
What this demonstrates is that, despite its
rather minimal complexity, MCS-1 is already “rich”
enough in side reactions that coherent replicator dynamics
(quasi-stable self-maintenance) or (Darwinian) molecular
evolution is not possible in general, at least not in
well-stirred reactors. In particular, there is very high
(possibly unbounded) diversification across MCS-1 (class-1)
quasi-species, due to the combination of relatively unspecific
recognition and the fact that there are only very weak intrinsic
fitness differences (itself a consequence of the more general
MCS design, where recognition is all or nothing, and all
“successful” replications then occur at the same
rate). At the very least, this diversification makes it
difficult to analyse the underlying dynamics of a reactor over
time. But in any case, even if quasi-species diversification is
considered as being, in some sense, “benign” (i.e.,
there is still “collective self-maintenance”),
growth in obligate parasites (class 4) is not; and inevitably
poisons ongoing replication activity. The problem of obligate
parasitism in this kind of limited-specificity replicase system
is well known; and can, in principle, be controlled through
spatial structure (
McCaskill
et al.,
2001). In our context
therefore, depending on the detailed mutation rates and
distribution, it may be feasible to control obligate parasitism
through the protocell containment mechanism. In any case, for
the specific purposes of the current study, which is intended to
investigate evolution of protocell embedded molecular
computation, we will take a somewhat more focussed approach. We
will retain the enhanced recognition mechanism of MCS-1, but
introduce the ability to selectively disable reactions
(replications) “associated” with particular pairwise
interaction classes (specifically, class 1 and
class 4). This is implemented by adding a test to every
collision. If the two molecules are of distinct species, then
the
pairwise interaction
class of the two species is checked against the allowed
pairwise classes. If the interaction class is not allowed, then
the collision is made elastic. Note that the effect of this
particular mechanism on the dynamics is not simply that the
“disabled” pairwise interaction classes cannot be
instantiated; rather, species pairs which would instantiate a
particular interaction class may (or may not) be transformed to
instantiate a potentially different class as shown in
table
2.
Original Class |
Transformed Class
|
0 |
0 |
(no change) |
1 |
9 |
|
2 |
2 |
(no change) |
3 |
0 |
|
4 |
2 |
|
5 |
2 |
|
6 |
9 |
|
7 |
2 |
|
8 |
0 |
|
9 |
9 |
(no change) |
Table 2: |
Parameterised Transformation of Pairwise
Interaction Classes |
|
(Note, in particular, that the dynamics associated with
the pairwise interaction classes 0, 2 and 9
cannot be disabled by this mechanism, because it can only affect
replications arising from cross-catalysis, and these classes do
not involve such cross-replicase activity in the first place.)
Transforming class 1 and class 4 pairwise interaction
dynamics using this mechanism may then be predicted to recover
the relatively tractable informazyme behaviour of MCS-0, with
its possibility of protocell level heritability based on a
single dominant informazyme species in any given protocell; but
still with the more flexible mutational space (at the protocell
trait level) afforded by the richer, limited-specificity,
recognition model of MCS-1. Figure
22 shows one example run to test this. The
test uses a single, fixed size, flow reactor (
), with replication
error enabled (
). The reactor is initialised with a single
“dominant” species, 10 bits long, at concentration
(being the
expected steady state concentration with a per-molecule mutation
rate
). The balance of the molecules are generated
as mutants of this dominant (in accordance with what would be
the steady state mutant distribution). Reactions corresponding
to class 1 and class 4 pairwise interaction classes
are disabled (i.e., these pairwise interaction classes are
transformed to class 9 and 2 respectively, per
table
2). Each separate plot
corresponds to the concentration of a species which, at some
point in the run, became the principal species (i.e., had the
highest concentration of all species instantaneously present).
It is seen that there is a series of displacement events; and,
in many cases, the new dominant species is, indeed, class 6
relative to the previous dominant. There are specific
displacement events involving longer, shorter, and same-length
species. These events are all in accordance with the predicted
behaviour. However, in addition to these events we also still
observe some anomalous events. In particular, at
c.
in this run, the then dominant self-replicase species
concentration collapses, but is not replaced by a single new,
dominant species. Instead there is an extended interval of high
diversity before a different self-replicase emerges to dominate
again. There are also anomalous events in which a dominant
self-replicase species is apparently displaced by a single
competitor, but the latter is not class 6 relative to it.
These behaviours are interpreted as indicating a breakdown of
the simplifying assumption that the dynamics of an MCS system
with a diversity of species (
) can
be approximated by a superposition of the separate dynamics of
the various pairwise (
) systems that are simultaneously being instantiated.
Presumably these anomalous events reflect more complicated
ecological relationships (for example, collective parasitism by
co-operating sets of species etc.). There may also be a
more-or-less complex dependence on the (dynamic) pattern of
mutant generation.
This more complex molecular evolutionary behaviour is clearly of
some interest in its own right. In particular,
figure
23 shows the total
replication rate for the same experimental run. It is seen that,
for this particular case, the replication rate also drops
significantly during the extended anomalous period. This
suggests that protocell level selection might displace
protocells which are affected by such a collapse of
self-replicase activity (assuming that it is heritable);
however, substantial further investigation would be necessary to
characterise this effect in detail. Accordingly, for our
immediate purposes here, we will instead take the simplifying
step of disabling all pairwise interaction dynamics other than:
- Class 0 (both species inert)
- Class 2 (one species self-replicase, one inert)
- Class 6 (facultative parasitism/displacement)
- Class 9 (independent self-replicases)
With this configuration (which is, in effect, intermediate
in the complexity of its informazyme dynamics between that of
MCS-0 and MCS-1), we can now proceed to the elaboration of MCS
to support protocell-embedded molecular level
“computation”.
MCS-2: Evolution of Protocell-embedded Molecular
Computation
Molecular Computation Model
As noted, MCS-2
retains the basic enzyme-substrate recognition mechanism of
MCS-1 (albeit with a parameterised ability to selectively enable
or disable certain subsets of the possible pairwise
interactions). The enzymatic function also always includes
replication, as in MCS-0 and MCS-1; however, depending on the
specific secondary enzyme structure or folding, there can be
additional functionality. This additional functionality is
designed to support computation at the molecular level; that is,
where the available enzymatically mediated informazyme
transformations have a direct interpretation in terms of
computation.
For this particular study, we deliberately
restrict the possible computational functionality to be very
minimal. This is to allow a focus on the interaction between the
computation and the replication function at the molecular level,
and the coupling between the molecular computation and the
protocell “phenotype” level (where externally
applied artificial selection and directed evolution is assumed
to be possible). We seek to demonstrate an initial proof of
principle of the feasibility of artificial selection and
evolution of protocell level functionality that can only be
realised by specific computational capability at the molecular
level. If this can be done then it provides a basis for
potential investigation of more complex molecular computation,
embedded in protocells, in the future. Conversely, if it cannot
be done, for any reason, even for such “minimal”
kinds of computation, then this might indicate a significant
potential barrier to the long term application of
protocell-embedded molecular computation in realising general
purpose information and communication technologies. Accordingly,
we identify a “minimal” computational function at
the molecular level as counting. This is implemented in the
MCS-2 system via two new features of all informazyme molecules:
- A fixed length prefix (leftmost) fragment of the primary
structure becomes a “variable region”. This
effectively allows every molecule to record a “counter
state”. In the current implementation, the length of this
region is fixed by a system-wide parameter, and will thus be the
same for all informazymes. It will typically be set as just
bits (monomers)
for the specific experiments to be reported, allowing counting
up to modulo .
This variable region has no enzymatic function; i.e., it is
skipped when extracting the secondary, enzymatic, structure of
an informazyme molecule. Further, this region is excluded from
the recognition of a substrate; that is, on considering whether
a given enzyme can bind to a given substrate, the variable
region of the substrate sequence is ignored. Molecules which
differ only in the counter state region will be considered as
members of the same species.
- In the secondary, enzymatic, structure of any informazyme,
there is also a fixed length prefix (leftmost) fragment with
special functionality. The length is set by the same system wide
parameter as for the length of the counter state (though now it
refers to symbols in the secondary structure, not the primary
structure). This sets the “counting function” of
each informazyme (when it is folded, i.e., acting as an
enzyme).
The reaction behaviour in MCS-2 is then as follows:
- If the enzyme secondary structure binds to the substrate
primary structure (ignoring the counter state region of the
latter) then the reaction can proceed; otherwise the collision
is elastic.
- If the two molecules are of different species, the pairwise interaction class of
the two species is checked against the allowed classes. If the
interaction class is not allowed, then the collision is elastic
(i.e., reaction class is transformed per table 2).
- The substrate is replicated in the usual way, by copying the
primary structure (including the counter state), and subject to
potential error/mutation.
- The counter state of the newly created molecule is modified
in a manner specified by the counting function region of the
enzyme. Specifically, the counter state is incremented, modulo a
counting base derived from the counting function region. For a
counter region width of , the counting base encoded by the counting function
region can be any element of the set . The net effect is that all informazyme
molecular species still function as replicases, but they also
potentially function as incrementers for a counter state encoded
in the substrate, where the counting will roll over back to zero
at a value (counting base) which is dictated by the particular
enzyme, in the range . Note that, as a degenerate case, a
species with counting base does not increment at
all, but forces the counter state to zero (as any counter state
modulo is
zero).
Consider now the core behaviour of a self-replicase species
in MCS-2:
- We assume that the allowed pairwise interaction dynamics are
classes 0, 2, 6 and 9 (per the discussion under MCS-1 molecular level evolution). Pending the
emergence of a class 6 facultative parasite, a dominant
self-replicase should be able to maintain its concentration,
subject to the usual mutational load, i.e., at a concentration
.
- Across the molecular population of this species, there will
generally be variation in the sequence of the counter state
variable region. This is because, on each replication, the
sequence of this region will be actively modified.
- However, this variation will be different both in dynamic
and in distribution from conventional
“quasi-species” variation (driven simply by
mutation). First, it will be limited in extent by the counting
base encoded by the counting function region. This is a value in
the range , which we will denote as for any particular
species .
Correspondingly then, the possible values of the counter state
region will be limited to the set .
The distribution across these possible values will be
statistically uniform (just from symmetry). Moreover, as this
distribution is actively “forced” (the incrementing
is performed with probability on every replication),
relaxation to it will be comparatively rapid. This is in
contrast to the quasi-species case where the relaxation to the
stable distribution is driven by the (comparatively) much lower
rate of spontaneous mutation during replication.
In MCS-2, therefore, we have the possibility of
self-replicase species giving rise to stable, and
characteristically different, distributions of their variable
regions, as a result of (stochastic, distributed) computational
activity at the molecular level. Similarly to the original
molecular evolution scenario of MCS-0, as class 6 mutations
occur the dominant molecular species may be displaced by a
different species; which may have a different counting function
region (counting base) and therefore a different distribution of
sequence variation in the counter state region.
Coupling to Protocell Morphology
In order to use
selection at the protocell level to evolve molecular level
computation, there must be some coupling between the two. We
therefore introduce a mechanism whereby the primitive molecular
computing mechanism described above can affect
“phenotypic” traits at the protocell level; which
can subsequently be the subject of artificial selection and
evolution. In MCS-0 and MCS-1, protocell division was triggered
simply by the number of informazyme molecules reaching a fixed,
system-specified, threshold (
). In MCS-2,
by contrast, we introduce a less rigid (and somewhat more
biologically motivated) mechanism. We define an additional
family of molecule, separate from the informazymes, designated
as “membrane molecules”. Protocell division is now
controlled, not by the number of informazyme molecules, but by
the number of membrane molecules. Other than that, the protocell
division mechanism is as before: it is triggered when the number
of membrane molecules reaches a fixed, system-specified
threshold (denoted
); and is implemented by splitting the parent
protocell into two daughter protocells, with independent
assortment of the parental molecules (both informazymes and
membrane molecules) between these two. Membrane molecules are
generated via a side reaction or side effect of the core
replication/counting reaction between informazymes. This is made
dependent on the variable region (counter state) of the newly
replicated substrate/template in each successful replication: if
this counter state has the value zero, then a membrane molecule
is produced; otherwise no membrane molecule is produced. The
effect is that, depending on the counting function of the
dominant self-replicase in the protocell, membrane molecules can
be generated at varying rates—on every replication, or
every second replication (on average) etc. For self-replications
of a species
,
with counting base
, then
of these, on average, will also generate a membrane molecule. As
a result, different protocell strains (dominated by different
informazymes) can have different characteristic sizes (measured
by the total number of molecules, informazyme plus membrane)
over their life-cycle. Thus, the counting function enzymatic
region of the dominant informazyme is coupled to the phenotypic
trait of (average) protocell size. Given that the standard MCS
mechanism of class 6 facultative parasitism supports
“mutation” at the protocell level (via takeover, in
a particular protocell lineage, by a new dominant informazyme
species), in a population of protocells there should then be
heritable variability in the phenotypic trait of (average) size;
and that will permit artificial selection for protocell size as
a mechanism for evolution of protocell-embedded computation. One
final technical modification is made to the MCS protocell
selection architecture here. In MCS-0 and MCS-1 all protocell
lineages have the same (average) size. As a result, regulating
the protocell population to a fixed size has essentially the
same effect as regulating the molecular (informazyme) population
to a fixed size. In MCS-2, where there is heritable variation in
protocell size, limiting the protocell population and limiting
the informazyme population result in somewhat different
protocell selection dynamics. This does not affect the outcome
of any particular selectional event at the protocell level (as
long as protocell “death” rate is independent of any
heritable protocell traits); but does effect various technical
details (such as realtime execution duration, normalisation of
the simulation timescale, and graphical presentation of the
evolutionary behaviour). Accordingly, we will stipulate that the
specific results to be presented here will use the mechanism of
limiting the molecular (informazyme) population to a fixed size.
This is implemented as follows:
- Whenever a new informazyme molecule is produced (in any
protocell), the total number of informazymes (across all
protocells) is compared to the specified threshold ().
- If the threshold is exceeded, a protocell is chosen
uniformly at random across the whole protocell population and
removed (“killed”).
Artificial Selection
To demonstrate the
directed evolution of protocell-embedded molecular computation,
we impose an artificial selection pressure, based on a relevant
protocell-level trait. The mechanism here is to modulate the
overall reaction rate in each protocell according to this
applied selection pressure. This can be thought of as analogous
to, say, modulating an energy flux to real (proto-)cells
according to the chosen traits. An arbitrary function is
externally provided which maps the relevant protocell trait(s)
onto a value
, the “molecular activity factor”.
This is then interpreted as a probability of reaction at the
molecular level. That is, an additional qualification step is
added at the start of the processing of each molecular
collision, as follows:
- The instantaneous value(s) of relevant protocell trait(s)
are evaluated for the containing protocell.
- The applicable value of is calculated.
- A bernoulli random variable with parameter is sampled (biased coin
toss).
- If this has the value the collision is treated as elastic; otherwise it
proceeds as normal.
As protocell reproduction ultimately relies on molecular
replication (via production of membrane molecules), this allows
protocell birth rate to be modulated in accordance with the
applied molecular activity factor. As protocell death rates are
made equal for all protocells (independently of any heritable
traits), birth rates effectively represent the relative
fitnesses of protocell lines with different inherited traits,
and should drive protocell-level selection. For the experiments
to be described here, the objective is to selectively evolve
protocells dominated by informazymes with specific counting
function (i.e., counting base). As explained in the
discussion of the coupling to protocell
morphology, the dominant counting function is coupled to the
characteristic protocell size (measured by number of molecules)
via the rate of membrane molecule production. Therefore, the
function
will be
a map from an appropriate measure of (instantaneous) protocell
size. Note that the variation in number of membrane molecules
over the protocell life-cycle is identical for all protocell
strains, ranging from
to
just before fissioning. The number of informazyme molecules also
varies over the protocell life-cycle, but this range depends on
the dominant counting base. Specifically, let the dominant
counting base be
. Then, neglecting replications of molecules other than
those of dominant species
, the generation of
membrane molecules will require
informazyme replications.
It follows that the number of informazymes will vary (in the
“steady state” protocell life cycle) between
and
. In particular, taking
and
(counter state/counting
function width, so
), Table
3 shows
this life cycle range of
(number of informazymes) for the four possible values
of
. These
values will be used to structure the applied activity factor
(
)
appropriately.
|
|
|
1 |
50 |
100 |
2 |
100 |
200 |
3 |
150 |
300 |
4 |
200 |
400 |
Table 3: |
“Steady state” life cycle ranges
of informazyme count () for different values of dominant counting base
[] |
|
Figure
24 shows the example
activity factor used in the series of MCS-2 experiments
described in the following sections. However, note that, in
general, nothing critical should hinge on the detailed structure
of this function, as long as protocell strains with different
life cycle size ranges are assigned significantly and
consistently different birth rate on average.
Given a particular definition of
, and neglecting
mutational load, the expected gestation period can be calculated
as follows:
The protocell birth rate will be simply
. The average cell size
over its life-cycle will be given by:
Given that the total number of
informazyme molecules is subject to a fixed maximum (
), the protocell population size
will vary depending on the average size of the protocells
present. If the population is dominated by a single counting
base, the expected population size will be given by
. Applying these
derivations to the example activity factor from
figure
24, and with
, we can calculate
(using numerical methods as required) a variety of nominal
protocell parameters as shown in table
4. Actual values will vary, due to mutational
load, depending on the exact length of the particular dominant
molecular species and the consequent replication error rate.
Expected gestation times would be increased by the factor
(
being the per-molecule
error/mutation rate); and expected birth rates would be
correspondingly reduced.
|
Gestation Time
|
Birth Rate
|
Relative Fitness
|
Mean Size
|
Mean Population
|
4 |
3.282 |
0.305 |
0.211 |
334 |
45 |
1 |
1.733 |
0.577 |
0.400 |
72 |
208 |
2 |
0.867 |
1.154 |
0.800 |
139 |
107 |
3 |
0.693 |
1.442 |
1.000 |
216 |
69 |
Table 4: |
Nominal parameters for protocells of the
four possible distinct counting base values, ordered by relative
fitness (neglecting mutational load). |
|
Test: Directed Evolution of 2-bit Counting
We
now present a sample of the evolutionary dynamics of the MCS-2
system, as described above. In each case the population is
initialised with protocells all having the same dominant
molecular species. This species is chosen to be a 20-bit long
self-replicase, with counting base
; that is, having the
minimum protocell fitness according to the imposed activity
factor (figure
24).
Figure
25 illustrates a run in
which the “optimum” molecular counting base
(
) is
successfully evolved. The graph shows the numbers of protocells
present in the population, classified by the counting base of
the principal molecule in each protocell. Note that, because the
expected protocell sizes vary for different counting base
values, the corresponding maximum populations also vary, as
summarised in table
4.
Accordingly, the counting base of the dominant protocells in the
population can be conveniently inferred from the
quasi-steady-state population size during each evolutionary
phase in figure
25. In this run,
the optimum was reached through two steps:
- At there emerged a protocell strain with .
- At there emerged a protocell strain with the
optimal counting base, .
- Thereafter, the population remained dominated by protocells
with this optimal counting base.
In other successful runs, the exact trajectory followed may
be different, just according to the protocell mutations which
happen to arise. In particular, it is possible to jump from the
initial, minimum, protocell fitness, to the optimal fitness
level in just one transition; or it may sometimes require three
fitness transitions in total.
While the evolutionary behaviour is relatively simple when
viewed at this level, it is useful to examine the more detailed
behaviour at the level of specific protocell lineages.
Figure
26 shows the same
experimental run as before, but now identifying each distinct
protocell lineage (classified by its principal molecular
species). This demonstrates a somewhat more complicated sequence
of events:
- During the phase up to , the initial
protocell lineage is displaced by another with the same counting
base at . The principal molecular species in this new
lineage is, as expected, a class 6 facultative parasite of the
principal molecular species of the initial lineage; however, it
is also one bit shorter. As analysed in the context of the MCS-0
system, this does imply a reduced mutational load, and
consequently slightly higher fitness. This explains the
relatively slow, but essentially monotonic, displacement.
- At , the first lineage with higher fitness emerged,
and took over the population.
- During the phase from to , there is significant diversification of the
protocell population, with a variety of lineages present at
various times. However, these all have the same counting base
() and thus
similar fitness. This period of diversification might therefore
be reasonably interpreted as largely random drift between these
almost equal fitness lineages that are mutationally close to
each other.
- At there emerged a protocell lineage with the
optimal counting base, . This one lineage continued to dominate the
population exclusively for the rest of the run.
However, the evolutionary dynamics in this system can be very
variable. Thus, consider the run shown in figures
27 (classified by counting base). This shows
just a single increase in protocell fitness (
), with no
further improvement being found within the time allowed for the
run (
). The molecular species
level graph of the same run, figure
28, shows significant additional complexity in
the system behaviour. Note that during the period from
to the end of
the run, there is continuing ongoing mutation at the protocell
species level, with many new species arising (and, indeed, one
of these eventually drifting to fixation). Also note that, while
protocell diversity is generally quite low, there is one
relatively short period, from
to
, where it is
clearly much higher. These phenomena turn out to be quite common
across large numbers of evolutionary runs and reflect
distinctive features of the informazyme inheritance mechanism,
and the interaction between molecular and protocell levels of
selection. In particular:
- Protocell level mutation is based on molecular level
facultative parasitism (class 6 pairwise molecular dynamics).
This is an asymmetric relationship, by definition. It follows
that direct “back mutation” is never possible at the
protocell level.
- Certain molecular species have one or more class 6 parasites
within a single bit mutation. The probability of such a parasite
arising in a single replication is (assuming equal
distribution over the four possible mutations at each bit
position). The probability that this will occur at least once
during a complete protocell reproduction is then given by
. For the experimental
parameters in use here, this varies between 0.11 and 0.39. While
stochastic effects mean that generation of a single instance of
such a parasite will not always result in a protocell level
mutation, the nett rate of protocell level mutation will be of
this order. Note that this cannot be offset by (direct) back
mutation, as already explained. Thus, in the absence of
significant fitness differences, a protocell strain based on
such a molecular species will necessarily be unstable and will
progressively be displaced by these mutant 1-bit strains.
- This process can be iterative: as long as a protocell strain
is characterised by a molecular species that has one or more
class 6 parasites within a single bit mutation, such a strain
will be unstable against those mutations. There may thus be a cascading diversification
of protocell species. We conjecture that this is precisely the
high diversification phenomenon of figure 28, from and .
- Conversely, certain molecular species may have no
immediately nearby class 6 mutants. These are effectively the
end points of the cascades described above. These have much
lower spontaneous mutation rates, at the protocell level
(requiring at least a 2-bit mutation in a single replication)
and can thus be successfully stabilised by the imposed selection
pressure (via the activity factor ). However, there is
then no guarantee that any such 2-bit class 6 mutations will be
available at all, or which would yield a higher protocell level
fitness. This can then give rise to extended evolutionary
stagnation. This is the general phenomenon illustrated by the
period from to the end of the run of figure 28. In this case, although protocell level
mutations do continue to occur, none have a higher protocell
level fitness.
While many other MCS-2 runs have been generated and
analysed, across a variety of system parameters, the essential
evolutionary features which have been identified are already
illustrated by the above two sample runs. In summary, this
framework, and these experiments, have demonstrated the basic
feasibility of evolving protocell embedded molecular computation
capability based on coupling a noisy, low reliability
“informational chemistry” (in the form of
informazymes), with both information storage and processing
capability, to some tangible protocell level trait which can act
as a target for imposed selection (such as characteristic
protocell size in the current example). However, these
experiments also show that the protocell-level mutational
dynamics generated by such an architecture may have a number of
distinct features, not typically assumed in evolutionary models.
These include “one-way” mutational pathways (no back
mutation), and high variability in mutational rate which is
itself heritable. These features strongly influence the
evolutionary trajectories that are available: it does not
automatically follow that the outcome of an evolutionary
experiment in such a system will be wholly, or even primarily,
determined by the externally applied selection pressure: the
“internal” constraints of the protocell level
mutational network will also have a very strong influence.