The J Curve

Sunday, August 29, 2004

Can friendly AI evolve?

Humans seem to presume an "us vs. them" mentality when it comes to machine intelligence (certainly in the movies =).

But is the desire for self-preservation coupled to intelligence or to evolutionary dynamics?… or to biological evolution per se? Self-preservation may be some low-level reflex that emerges in the evolutionary environment of biological reproduction. It may be uncoupled from intelligence. But, will it emerge in any intelligence that we grow through evolutionary algorithms?

If intelligence is an accumulation of order (in a systematic series of small-scale reversals of entropy), would a non-biological intelligence have an inherent desire for self-preservation (like HAL), or a fundamental desire to strive for increased order (being willing, for example, to recompose its constituent parts into a new machine of higher-order)? Might the machines be selfless?

And is this path dependent? Given the iterated selection tests of any evolutionary process, is it possible to evolve an intelligence without an embedded survival instinct?


  • The instinct toward self-preservation only exists because it has been selected for, right?

    Create an environment that responds not to "self-preservation" oriented behavior but to "selfless" behavior (for instance, by implementing a karma system in your digital petri dish universe that rewards selfless behavior and punishes selfish behavior) and you should be able to lay the groundwork of selfless instinct.

    But instinct is only a foundation. Once you reach self-awareness and intelligence, all bets are off. I think humans have demonstrated that fairly well. :-)

    BTW, if the neural-net guys are going down the right path to AI (and I think they are), the ones with the most processing power and the most all-encompassing database of human knowledge will be the first to get there. Which means to me Google and Microsoft. So maybe the question should be "What happens when two AI's, one whose overriding belief is 'Don't be evil,' and the other whose overriding belief is, 'Win at any cost,' emerge simultaneously?"

    By Blogger Charlie, at 5:07 AM  

  • Well, that is Google’s goal, so it will be interesting to see how it plays out.

    I have a hard time imagining an evolutionary algorithm that does not select for “survival” at some fundamental level (and still produces interesting results and an accumulation of design).

    As for the karma selection criteria, how would that actually be defined and applied? (The most selfless design makes it to the next round and the others don’t?) Lacking any selection criteria for viability, the karma-evolved population will die off in any competitive ecosystem of deployment. In other words, not everything in the world is “karma-evolved”, and to co-exist in any meaningful way in the real world will require some balance with survival needs. For example, the most selfless human would not consume any resources and would die in short order.

    Maybe this is obvious. But once you expand the selection criteria to be “live for some time and be selfless”, then I return to the embedded survival instinct question.

    Will the equivalent of the “reptilian brain” arise at the deepest level in any design accumulation over billions of competitive survival tests?

    A related question: can the frontier of complexity be pushed by any static selection criteria, or will it require a co-evolutionary development process?

    By Blogger Steve Jurvetson, at 8:42 PM  

  • Interesting discussion going on here... When I first read this (well, the posting and the first comment in the thread) I thought of ants and bees, and the whole question of where the individual ends and the "community" begins (as much as I personally despise the communal references, but I guess it plays only for us, humans, not for ants and bees ;-) ).

    For example, the most selfless human would not consume any resources and would die in short order.

    "Clockwork Orange", anyone? ;-/

    I have a hard time imagining an evolutionary algorithm that does not select for “survival” at some fundamental level (and still produces interesting results and an accumulation of design).And this is exactly the reason why I always felt uncomfortable with the whole notion of "we teach computers to play chess and we would get so much closer to understanding Life..." -- if anything, the old CoreWars game was much more appropriate to study for that purpose. As in, it is not important how well a human can program a computer, it is how well a computer can evolve to program itself!

    And if the result will be "friendly" to us -- who knows, but I'd still keep a big red switch in the overall design baseline... ;-)

    Paul B.

    By Blogger Paul B., at 10:55 PM  

  • Self-preservation must be programmed in, whether organically, ala natural selection, or willfully, because it makes for more robust software.

    Would a non-biological intelligence sacrifice its life for a higher-order existence? The Wachowski brothers obviously think so:). The sentinels sacrificed themselves by the thousands for the perceived betterment of the greater machine. Selflessness of the individual often means self-preservation for the species. Constituent elements of a system must often behave selflessly, sacrificing their existence for the survival of the parent.

    In any system's life, when a manifestation of entropy threatens to diminish it, it must act to counter the threat. In the case of digital life, we, its creators, must program in self-preservation, the security programs that defeat entropy, or redesign our genetic algorithms to auto-generate it, otherwise entropy will eventually win.

    I don't believe it's possible to evolve intelligence without an embedded survival instinct. Security must be built around the very core of the system, and around every vulnerable layer from there out. We are forever defeating entropy, and so it would seem we will always be.

    Can friendly AI evolve? Yes, so long as we don't subsequently threaten its existence in a way we've already programmed it to defend against. But in a world of finite resources, war with our own creations does seem inevitable, unless... we develop with our creations a deep symbiosis.

    Symbiosis, it would seem, is the ultimate friendliness.

    Steve, my partner and I saw you at Art of the Start in June. You were our favorite speaker, very inspiring. We're now writing a business plan around an enterprise application with a strong AI component. We hope to be ready for show, complete with prototype, by the end of the year. Hopefully you'll be interested in taking a look at it. Best regards, Carl Carpenter (

    By Anonymous Anonymous, at 11:21 AM  

  • Eliezer Yudkowsky (who has written a lot on the topic of Friendly AI) provided an interesting response to this question, which I am posting for him:

    "Evolving Friendly AI looks impossible. Natural selection has a single criterion of optimization, genetic fitness. Whichever allele outreproduces other alleles ends up dominating the gene pool; that's the tautology of natural selection. Yet humans have hundreds of different psychological drives, none of which exactly match the original optimization criterion; humans don't even have a concept of "inclusive genetic fitness" until they study evolutionary biology.

    How did the original fitness criterion "splinter" this way? Natural selection is probabilistic; it produced human psychological drives that statistically covaried with fitness in our ancestral environment, not humans with an explicit psychological goal of maximizing fitness. Humans have no fuzzy feelings toward natural selection, the way we have fuzzy feelings toward our offspring and tribesfolk. The optimization process of natural selection did not foresee, could not foresee, would not even try to foresee, a time when humans could engineer genes. Our attitude toward natural selection, which will determine the ultimate fate of DNA, is a strict spandrel of psychological drives that evolved for other reasons.

    The original optimization criterion ("nothing matters except the genes") was not faithfully preserved, either in the computational encoding of our psychological drives, or in the actual effect of our psychologies now that we're outside the ancestral context.

    This problem is intrinsic to any optimization process that makes probabilistic optimizations, or optimizes on the basis of correlation with the optimization criterion. The original criterion will not be faithfully preserved. This problem is intrinsic to natural selection and directed evolution.

    Building Friendly AI requires optimization processes that self-modify using deductive abstract reasoning (P ~ 100%) to write new code that preserves their current optimization target."

    By Blogger Steve Jurvetson, at 8:39 PM  

  • Steve, thank you for clarifying, and for the additional information. Eliezer Yudkowsky's writings on Friendly AI will no doubt stimulate more thought.

    How did the original fitness criterion splinter given nature’s selection of psychological drives covariant to fitness? I believe the answer may lie in seeing evolution as a series of emergent events. Emergent discontinuities create paradigm shifts which produce new behaviors and drives. Those traits which survive, those selected for, then re-aggregate to form the basis for the next discontinuity. What if seeming divergence from original selection criteria is in actuality spiraling back on itself? What if it’s actually leading us to some kind of emergent maximum granting us conscious control of selection itself?

    A Turing Tested manifestation of AI might prove to be just such a discontinuity. It could be that the kind of AI for which natural selection evolves symbiotically with us lies on a different scale than that currently imagined. It could be that emergence itself is an agent of mutation in the next, higher-level environment. Perhaps at each successive level of emergence the original optimization criterion is altered a little bit more (as a function of the accompanying paradigm shift) until it folds back upon itself, redoubling, and thereby generating the emergent maximum that redefines the foundation of the entire framework.

    In other words, I suspect your stated requirement for building friendly AI, that it involve optimization processes that self-modify using deductive abstract reasoning, might be realizable on the new emergent level I'm suggesting, a level which is itself produced by AI. Please forgive me if I seem a bit cagey here or this seems less than clear. I'm not quite ready to reveal the new technology my partners and I are working on.

    Again, thank you for this post. So nice to be able to dialogue with a mind like yours. Hopefully the hints I've offered have made you curious:). Best regards, –Carl Carpenter (

    By Anonymous Anonymous, at 6:47 PM  

  • It's not obvious to me how friendlyness comes out of self-modifying code.  I think this is a very powerful mechanism for recursive design improvement, but carries the risk of getting out of control.   Yes?

    By Blogger Steve Jurvetson, at 2:16 PM  

  • Steve wrote: "It's not obvious to me how friendlyness comes out of self-modifying code."For Friendliness to just magically pop out of self-modifying code would take a specific complex miracle. The usual analogy I use is pizza growing on palm trees. Now, if a human is exerting design intelligence to create a good specification of Friendliness, then there should be classes of self-modifying code that preserve it. But the mere fact of self-modification is not where the initial specification of Friendliness comes from. It comes from the human programmers.

    I don't want to rag too hard on this "emergence" business, because it's a very common mistake in AI academia and requires no spectacular stupidity. But still, "emergence" is pretty much the equivalent of "magic", as far as useful explanation goes - a model with no internal detail that makes no useful predictions even in retrospect. This kind of model is a very common mistake in scientific history, because to human ears it sounds like an explanation. But suppose I ask you how hurricanes work, and you tell me that they are emergent phenomena. What specific predictions can I make about a real-world hurricane? None. If only humanity had thought to apply this test to the vitalistic theory of biology and the phlogiston theory of fire.

    Today we are wiser and we know that "magic" is not an acceptable explanation, but human nature is still the same. When people see something and have no idea how it works, they still want to explain it. "Emergent" is just the word people use nowadays instead of "magical", and it has roughly the same empirical content.

    "Emergence" is also a popular explanation for another reason, which is that you can predict magical success. Suppose I say that a successful IPO emerges from starting a company. Great! So all I need to know for a successful IPO is how to file articles of incorporation - the rest will emerge. And maybe intelligence will emerge if we just get enough computing power together, or enough commonsense knowledge... The "emergent" part of the process is generally the part of the process that people don't understand and have no idea how to do. But if you call it "emergence" and think that's a fair explanation, you can hope for success anyway. It'll just... emerge.

    "I think this is a very powerful mechanism for recursive design improvement, but carries the risk of getting out of control. Yes?"Whether it's a powerful mechanism depends on what specific kind of self-modifying code it is. The same holds true of the risk of getting out of control. A mechanism that tests lots of random little modifications to see if some of them work some of the time is not as powerful as a mechanism that can reason deductively and abstractly. But conversely, if the former mechanism worked, it would be nearly certain to go "out of control". Though I'm not sure it makes sense to speak of losing control when you were never in control in the first place. Why would any sane person think they could control a self-modifying mechanism that makes probabilistic self-modifications? You don't know what it's going to do, it says so right there in the design specification. Well, maybe it'll emergently do exactly what you want it to do.

    By Blogger EliezerYudkowsky, at 1:25 PM  

  • Somehow by chance I ended up on this blog reading this discussion. Very interesting.
    I'd like to comment on the emerge = magic and where friendliness comes from in the context of the above discussion. It really looks like emergence is a kind of magic. I "believe" that we should make a clear distinction between programming friendliness and what it really is. I don't think we know what Friendliness itself is, but we can program the "action" of Friendliness, which we tend to call friendliness. If we realize this, or if we see this, then it is clear that whatever we program, the programs themself can not come up with friendly systems based on Friendliness, but only based on what was perceived or experienced as friendly. This "Artificial friendliness" is therefor not an action of Friendliness and can be either friendly or un-friendly. In other words, in some circumstances, it is just magic when we percieve it as magic, otherwise it could be anything............

    By Blogger Joost, at 2:42 AM  

  • Aside from the debate of whether evolution can create friendly AI's (which is intriguing), we can ask simply whether, broadly speaking, more neurons = more friendly. I'd say it is likely that the answer is yes - that our ability to even conceive of something like selflessness makes the possibility of our acting in that way non-zero (since we are unpredicable). Thus is a large collection of organisms, those more frequently able to have a thought like 'wouldn't it be weird if I gave my food to that guy just to be nice' will statistically generate more cases in which this outcome actually occurs.

    Since evolved AI's will have more neurons than us (or will in short order!), they will probably be able to have more thoughts like this, and may well be more friendly than us. If you compare us to dogs or monkeys, it seems to be the same thing in reverse - they are much more purposeful and not nearly as friendly.

    By Blogger Philip Rosedale, at 3:15 PM  

  • Eric Drexler emailed me this contribution to be included in the discussion:In speaking of machine "intelligence", we find it natural to ask what an imagined entity, "the intelligence" will do, and to worry about problems of "Us vs. Them". This question, however, reveals a deeply embedded assumption that itself should be questioned.

    If someone spoke of machine "power", then asked what "the power" will do, or spoke of machine "efficiency" and then asked what "the efficiency" will do, this would seem quite odd. Strength and efficiency are properties or capacities, not entities. Our human intelligence is closely bound up with our existence as distinct, competing entities, but is this really the only form that intelligence can take? Perhaps our questions about artificial intelligence are a bit like inquiring after the temperament and gait of a horseless carriage.

    A system demonstrates a crucial kind of intelligence if it produces innovative solutions to a wide range of problems. The size and speed of the system are not criteria for intelligence in this sense. With this definition, it becomes clear that we already know of intelligent systems that are utterly unlike individual wanting-and-striving human beings. They may be large or slow, but their existence can help us to correct our brain-in-a-body bias.

    One such system is global human society, including the global economy. It has build global telecommunications systems, engineered genes, and launched probes into interstellar space, solving many difficult problems along the way. One may object that society's intelligence merely reflects the intelligence of intelligent entities, but it nonetheless greatly exceeds their individual capacities, an it is organized in a radically different way. Society is ecological, not unitary. It is not a competing, mortal unit, in the evolutionary sense. It is far from having a single goal, or a hierarchy of goals, or even much coherence in its goals, and yet it is the most intelligent system in the known universe, as measured by its capacity to solve problems.

    If this seems like a cheat because society is based on intelligent units, consider another example: the biosphere. It has built intricate molecular machinery, continent-wide arrays of solar collectors, optically guided aerial interceptors, and human bodies, all based on the mindless process of genetic evolution. The problems solved in achieving these results are staggering. The biosphere itself, however, seems to have no desires, not even for self-preservation. Like society, the biosphere is ecological, not unitary.

    With these examples in hand, why should we think that "artificial intelligence" must entail "an artificial intelligence", with a purpose and a will? This is a mere assumption, all the worse for being unconscious. Artificial intelligence, too, can be ecological rather than unitary.

    Because intelligence is a property, not an entity, the problem of "Us vs. Them" but need not arise. It seems that we can develop and use powerful new forms of intelligence without having to trust any of Them.

    I propose a moratorium on talk that blindly equates intelligence with entities.

    (Note that the above essay does not dismiss standard concerns and danger scenarios. It merely counters the widespread assumption that systems providing machine intelligence services — e.g., solving given problems with given resources — must be unitary, goal-driven entities.)

    By Blogger Steve Jurvetson, at 1:39 PM  

  • Eric, Steve:

    Our global ecology exhibits kinds of complex designs, even though "global ecology" has no coherent aim. But these impressive designs are produced by an optimization process, natural selection, that has a very strong and definite criterion in each of its local optimizations. Alleles become dominant if they maximize reproductive fitness. If you thought about evolution anthropomorphically, evolution would be a scary, incomprehensible, monomaniacal drive that ignored everything in its way, that would cause any amount of pain and suffering to get the job done because "pain and suffering" simply don't enter into the optimization criterion.

    The large and murky desires of the global economy are also produced by many locally sharp desires, many locally targeted optimization processes, which is to say, many humans. The global power and murkiness of the planetary economy is the product of many local optimization processes acting on sometimes conflicting, sometimes cohering goals.

    I don't blindly equate intelligence with entities, rest assured. I don't even blindly think in terms of "intelligence" - I consider that anthropomorphic. "Optimization process" neatly describes both humans and natural selection - where an "optimization process" seeks out small targets in a large space of actions, plans, or designs. Or to justify the formalism a different way, "optimization process" simplifies how we look at the universe: We can describe the target of the optimization process, and then make inferences about what sort of designs we're likely to see, without explicitly modeling every step of the optimization process. It's useful to consider evolution as an "optimization process" because, knowing that evolution's design target is reproductive fitness, we can guess (even without modeling every step of the process of evolution) that we will find organisms with reproductively efficient designs.

    I would measure the power of an optimization process by the narrowness of the targets it can hit. For example, if an allele is better than all but 1 in 10^6 alternatives in the space of DNA strings, I would infer that at least 20 bits of optimization pressure have been exerted at that locus. But this measure of optimization power requires an optimization criterion, such as reproductive fitness.

    I confess that I don't know how to measure the power of an optimization process without some kind of utility measure, but hey, that's not really a fair objection - I could have picked a bogus mathematical formalism to think in. But still. If I am not to think of "intelligence" (or optimization) as the ability to produce better-than-random designs and plans - to steer the future into a small volume of all possible outcomes - then how am I to think of it? How can I measure the power of an optimization process, without some way to describe the optimization target?

    Drexler writes: If someone spoke of machine "power", then asked what "the power" will do, or spoke of machine "efficiency" and then asked what "the efficiency" will do, this would seem quite odd. Well, if I speak of the power of an optimization process, measured in bits, and then I ask what is the optimization target, that does not seem odd to me. It is no different from someone saying that an airplane just whooshed past at 600 mph, and my asking for the direction.

    If the intelligence / optimization process / whatever is not "doing something", it doesn't worry me. If the AI is "doing something", if it's steering the future into particular volumes of configuration space, then I'm worried - I want to know whether this volume of possible futures promotes, or even permits, human life and happiness.

    Drexler writes: It [human society] is far from having a single goal, or a hierarchy of goals, or even much coherence in its goals, and yet it is the most intelligent system in the known universe, as measured by its capacity to solve problems. If human society has no "goal", what are you using as your criterion of its "problem"-solving ability? Maybe society's ability to solve problems that seem, to you, highly relevant and apposite. For example, society's ability to solve the problem of feeding you.

    It is not coincidence that the seemingly murky and undirected processes of society have solved a problem that so well fits the goal (optimization target) of an individual element (you) making up the society. Society is indeed made up of individuals (us), and derives nearly all its power as an optimization process from the power of the component optimizers (us).

    If you superpose many optimization processes, the combined system may have something like a coherent output and power, to the extent that the goals and common subgoals of the optimizers overlap; so with human society. Or you may have mostly competition, as in ecology and the many local actions of the razor-sharp purposes of natural selections.

    But I don't know any way to consider the power of an optimization process, if it doesn't have a target. What even makes it an optimization process, then? And what "intelligence" would be impressive if it didn't select much-better-than-random actions, plans, and designs under some measure of utility?

    Could any AI that embodied a weak optimization process - weaker than human intelligence or natural selection, say - significantly threaten or benefit human society? If someone makes this case, I will reconsider my focus on "optimization" as the critical concept. But it seems to me that any sufficiently strong optimization process has the potential to beat us at our own game; outwit us, steer the future more efficiently than we do. And a strong optimization process can select more powerful designs for itself; recursively self-improve.

    A weak optimization process is no threat to us. We can employ our own intelligence to optimize it out of existence. A weak optimization process is also no benefit to humanity; it will not be able to create designs or make choices that are significantly better than random.

    So when I worry about "Artificial Intelligence", I worry about Really Powerful Optimization Processes, especially recursively self-improving RPOPs. It's not clear to me, at this point, why I should worry about anything else.

    One should also bear in mind: It doesn't matter how many weak, innocuous versions of "intelligence" are theoretically constructible. So long as it is possible to build strong AIs, we have to manage that problem, and it is rather more urgent than weak AI. I consider weak AI an entirely separate category of stuff-to-think-about, with little or no strategic importance except insofar as it impacts strong AI. Deep Blue plays chess very well, but so what?

    By Blogger EliezerYudkowsky, at 1:05 PM  

  • In continuation of Eliezer Yudkowsky's recognition that direction, or target, is equally important to the speed of a recursively self-optimizing system:

    It makes sense that in any given design, the single most important thing to have is a goal to achieve. Without the goal, an optimizing system can't accurately be described as optimizing. Any random change will optimize towards *some* end.

    Likewise, it makes sense that if any given system has a goal, it's complete success will undoubtedly compromise the success of other systems achieving their goals because in order for it to achieve maximum optimization, it must implement competitive optimizations.

    So the trick for successfully implementing a system which will satisfy our clearly plural goals and needs will necessarily recognize compromizations and be able to judge and balance them *at some point*.

    It would be not possible to implement cooperative-only optimizations without such a system, since metaphorically, any direction a plane flies will be away from one target and towards another. Cooperative has to be defined - which means we have to provide some recognition that there are no systems that need a given situation to exist.

    For example, there are no situations that a program needs to be overly fat (it is generally undesirable to have excessively large programs). Such generalizations are necessary before we can safely make *any* cooperative optimizations.

    By Anonymous Anonymous, at 9:30 PM  

  • Well, circling back to Susan Blackmore, perhaps we'll have to co-evolve the memes through the training environment:

    “My cat gives birth to a litter, purring all the way. It’s very different with humans and our large heads. It was a dangerous step in evolution. 2.5 million years ago, we started imitating each other. Our peculiar big brains are driven by the memes, not our genes. Language, religion and art are all parasites. We have co-evolved, adapted and become symbiotic with these parasites.”

    Perhaps it is naive to assume any AI would be friendly "out of the box" . Assembly and training required...

    By Blogger Steve Jurvetson, at 3:39 PM  

Post a Comment

<< Home