# Can computers provide an evolutionary environment...



## Vertigo (May 24, 2011)

... and what might the consequences be?

Here's a thought...

Modern computer viruses (viri?) almost invariably attempt to spread themselves (compare with breeding). In order to try and evade the anti-virus programs (their predators) they are also frequently self mutating (compare with DNA mutation). So the viruses that effect successful mutations that keep them hidden from the predators will be more successful breeding and the new viruses they spin out should retain that edge. All sounds very like the essentials for a survival of the fittest envirnoment where their world is the internet and the computers connected to it.

Possible consequnce: maybe our first artificial intelligence might evolve from an escaped computer virus!


----------



## Lenny (May 24, 2011)

From what I gather, modern viruses don't yet mutate themselves to stay ahead of the game. Rather, their all-powerful developer changes parts of them to fool the virus scanners, or gives them mechanisms to randomise their actions as much as possible to stay out of the net (probably the best example, and by far my favourite, is the Conficker virus and it's various iterations: http://www.sffchronicles.co.uk/forum/527543-an-amazing-look-at-the-conficker-worm.html).

Moving into the research sector, and genetic algorithms, and you'll find so many examples of evolution, particularly of robots. A recent example is a collection of robots that were programmed to gather food - obviously those with the better "genes" (that is, the ones that gathered food) had their code used in the evolution stage for the next round, whilst those who didn't were killed off. When the ability to share was added, and the scale of sharing was used to help determine which mutations went through to the next round, the researchers found that large groups of sharing robots began to develop: Evolving robots spontaneously emerge altruistic behaviour.

Possibly my favourite example of genetic algorithms was an experiment at the Magna Science Centre in Rotherham nearly ten years ago: You make my heart beep.

It consisted of two sets of robots - the "prey", which had solar panels and gathered energy from bright light sources, and the "predators", which replenished their energy by sucking the life out of the prey.

Eventually, they found that the slower predators began hunting in packs to catch the faster prey, and even started camping out around bright light sources.

Probably the most fantastic part was that one robot, called Gaak, whom I think was prey, escaped from his pen, and made it outside. A motorist nearly flattened the poor thing, which was going around in circles, not quite sure what to do with all the light.

---

I have a huge interest in algorithms (though more designing them rather than analysing them), and genetic algorithms are something that I've wanted to try out for a few years. My final exam of my degree is in two days, so I should be able to start playing soon (I'm thinking of starting simple and writing a genetic algorithm to try and optimally solve the Travelling Salesman problem)!

---

For those who aren't familiar, genetic algorithms are designed to mimic real-world evolution and involve a population, which is mutated to form new populations: http://en.wikipedia.org/wiki/Genetic_algorithm

Each population is made up of strings - the first population is randomly generated. "Evolution" occurs in rounds (generations). In each round, the fitness of each member of the population is evaluated with a "fitness function". Those which most closely meet it are put into a new list, and those that don't are deleted. The members of the population in the new list are then modified, usually by splitting them in half and merging two separate halves, which may then be further mutated at random. This becomes the new population.

The new population goes through the the next generation, and the process starts again.

---

To answer the questions, then:

*Can computers provide an evolutionary environment?*

Most definitely! It might be confined to the lab at the moment, but programming is such an incredibly powerful thing. Just imagine what could be done with a truly genetic virus - each virus has it's own string from the population. Those that get caught don't check in after x amount of time. Those that don't get caught upload their code to the cloud to create a new wave.

*What might the consequences be?*

In research, more or less anything can be done - another use of genetic algorithms is the design of acoustically optimal buildings. The consequences of that are wonderful! However, if someone malicious tries something evolutionary, we could face a pandemic.


----------



## Vertigo (May 24, 2011)

Fascinating Lenny, I've never looked much into genetic algorithms (lots of inclination but no time )

It certainly suggests that if we leave these things long enough and give them a sufficiently complex environment, who knows what they might develop into. I thought it particularly interesting that the slower "predators" evolved pack hunting in such a relatively short time span.



> Those that get caught don't check in after x amount of time


I sort of prefer the idea that when two such "entities" meet they could spawn a new one with a combination of each of their codes rather than uploading their code after a set period of time. Otherwise you run into the problem of how long that time should be.


----------



## Lenny (May 24, 2011)

If we pretend we're not discussing how to create a virus using genetic algorithms... 

First off, let's define our new population as the Cartesian product of two sets, x and y, where x,y are both half of a member of the previous population.

So for population member 2: "ABCDEF", x2 = "ABC" and y2 = "DEF". Set x is the set of elements x_i_ which are first halves of the previous population strings, and set y is the set of elements y_j_ which are the second halves of the previous population strings.

Set z, the new population, is the Cartesian product of x*y = {(x1, y1), (x1, y2), ... , (x2, y1), (x2, y2), ... , (x_i_, y_j_)}.

Oh how I am wishing for a LaTeX BBCode right now. 

---

Regarding the time frame, I suppose the limiting factor is the time taken by the big security companies to develop a fix for the virus or work out how to identify it. If we take that out of the equation for now, I'd say a good time frame would be ~ a week. Let's suppose that a secure user has a number of different virus- and malware scanners, some of which run in the background all the time, and some of which are scheduled to run at least once during a week. If we have 1,000 of these users, then we can safely assume that, despite overlaps, our malicious little creation will be in the radar of a whole variety of scanners. Some will be pick up the virus, and others won't. At the end of the week, we hopefully have a number left, which we then have contact the mothership (to avoid detection by the firewall, we'd need to piggy back on some already allowed process). The mothership evolves the population, updates all the viruses out there and sends out more to the uninfected machines.

So, with the worst case scenario of a well-protected user, we're laughing. But what about an unprotected user? Seeing as the virus will avoid detection completely, and have an enjoyable week hanging around with it's new group of undetected buddies on the same machine, we could end up diluting the population with genes that haven't been tested.

We could send out multiple copies of each population element, and evolve only the members of the population that return more than 95% of their messages. But even then, we may have the bad luck that all of one population element infect unprotected machines.

Alternatively, the message to the mothership could also contain a list of anti-virus and anti-malware programs installed on the machine, and only those that survived a certain number of scans gets evolved.

We could even put the two together - the mothership evolves the population members with the highest percentage of copies that survived more than _n_ scanners.

---

If two entities meet and create a new population between them and upload to the mothership, how would you determine the fitness of the parents, and subsequently the child?

So, say the mothership looks at the this new child and evaluates the parents. If the same parameters as above were used (both parents must have survived to create a child) then we'd get the same results by a different method.

If the parameters are relaxed, and a child is fit if at least one parent is fit, we may dilute the population.

---

As with all heuristic algorithms, so much depends on the fitness function - a bad fitness function could leave you with a less than optimal population even after a million iterations, but a very strict fitness function might leave you with nothing after only a few hundred iterations.

---

When we're doing genetic algorithms for something like TSP, there's no such time frame, obviously. As soon as a new population is formed, it's put through the fitness function and the most fit are saved for mutation.


----------



## Vertigo (May 24, 2011)

Yeah I can see where you are coming from, however I wouldn't be too worried about "untested parents" surely that should only slow the process down some. Ultimately the weak algorithms will still get weeded out by the anti-virus programs.

Incidentally from a malicious perspective I suspect in the long run they would eventually become non-malicious (assuming all aspects of their behavious are subject to mutation) as the quieter they are and the less noticeable, the more likely they are to escape detection. If they do damage then they will immediately be noticed and code developed to destroy them. In this scenario I can picture ending up them quietly "living" in your computer, attempting to be as unnoticeable as possible. Possibly even distributing different parts of themselves across multiple computers to minimise their "noise".



> Oh how I am wishing for a LaTeX BBCode right now


That reminds me of an incident in my very earliest days of programming. We were developing navigation software for the offshore oil industry and did a lot of Kalman filtering (is that the right name, it was so long ago), which, in the case of submersibles, involved a lot of 3 dimensional matrix maths. We did this on HP machines using HP "scientific BASIC" which had matrix maths built in. Then we started moving things to PC's when they first appeared (that dates me ) but they were so much slower that we decided to move to a compiled language and chose C. However we were so naive back then that when we got our first shiny new C compiler one of the first things we asked was "so where's the matrix maths routines then?... aaaaah". We brought a maths student in who was taking a year's industrial experience to write some for us 

Incidentally you couldn't recommend any good reading on the topic could you Lenny. Preferably biased towards C/C++ as they are my preferred languages.


----------



## Lenny (May 25, 2011)

Weak algorithms would be weeded out on the protected computers, yes, but on the unprotected ones you could do anything and not get caught. Whilst an unprotected machine is the best case for maliciousness, it is the worst case for experimenting with undetectable viruses.

I hadn't given thought to them possibly tending towards niceness. It does make sense, though. I wonder if it could be given as a reason why we don't seem to have viruses based on genetic algorithms?

Thinking about it more, though, if the code itself is just a receiver, and doesn't carry a malicious payload, you could use genetic algorithms to somewhat mimic conficker - infect millions of machines with undetectable code that, when the time comes, receives instructions, or a malicious payload, and wreaks havoc.

Terrifying!

I can't think of any good reads off the top of my head, but I'll have a look through my old lectures after my exam tomorrow and see what my lecturer referenced. We were taught to implement them using Java, so I can't promise anything in C/C++, I'm afraid.


----------



## Vertigo (May 25, 2011)

Well I can read Java code almost as easily as C++ so that's not a problem.

I do agree that unprotected computers would tend to mess up the evolution but surely it is at least starting to be standard practice for people to protect their computers (or am I just being naive ).

I'm not sure that genetic algorithms would be good for sending later instructions unless the code to receive the instructions is excluded from the mutations but if that were the case then it would surely make it much more vulnerable.

Oh and good luck with tomorrow's exam. Shouldn't you be revising rather than posting here  ? Or do you figure if you don't know it now then it's too late  ?


----------



## Vertigo (May 25, 2011)

**deleted**

Arrgh sorry my post didn't appear at first so I posted again


----------



## Lenny (May 25, 2011)

It's never too late to revise! Just under 24 hours to go, so there's more than enough time (even with a few breaks), and it's on the Advanced Theory of Computation - specifically computational complexity and advanced algorithms (alas, no genetic ones, though), which I know my way around.

---

I quite fancy trying to write a simulation of this problem. Something simple to begin with (say each population is a string of stats, and each virus scanner detects a virus if certain stats are above a specified limit), which can be grown into something more real.


----------



## Vertigo (May 25, 2011)

I would love to find the time to do the same  who knows maybe...

If you manage to put something together I would love to see it!

Remarkably little interest from any others on this... oh well I guess us programming geeks can chat away on our own


----------



## Metryq (May 25, 2011)

Vertigo said:


> Remarkably little interest from any others on this... oh well I guess us programming geeks can chat away on our own



It's not a lack of interest. Maybe the rest of us are just happy to sit here quietly and learn—_NO! Don't sit over there!_


----------



## Vertigo (May 25, 2011)

hehe


----------



## mithril (Jul 11, 2011)

Not sure of how whether it's possible or not but the Rifters trilogy by Peter Watts is one instance in fiction that I've read. It uses that concept pretty well too. Though I didn't really understand it all that well, it sounded plausible enough for the stories.

You might want to look that up to see how the concept is tackled in fiction.


----------



## Dozmonic (Jul 11, 2011)

Good reading regarding AI itself is that, Vertigo? Books such as http://www.markwatson.com/opencontent/JavaAI3rd.pdf cover what my open university course generally covered on AI.

A general search on genetic algorithms on the net will show you the concepts of those. You're looking at breeding algorithms and scoring them or making them compete against each other to provide a breeding pool for the next generation, then cross breeding the algorithms, introducing genetic mutation to allow variation and so forth. It's one of the more interesting aspects of AI (the basic search routines are a fundamental part of it, but boring as anything!).


----------



## Vertigo (Jul 11, 2011)

That looks interesting Dozmonic, I shall have to take a browse if ever I can find the time


----------

