sparkle
August 29, 2017

Was Darwin a Great Computer Scientist?

How evolution taught us the “genetic algorithm”.
evolutionman

In this article, I am going to explain the concept of genetic algorithm. First, I am going to present its origin and its goal. Then I am going to show you how to implement a genetic algorithm with a short python tutorial.

So, the question is:

How to create a good Artificial Intelligence?

The naive solution is to create an “empirical algorithm” which is a set of rules: “if you meet this conditions, act like that”. I could imagine that with enough rules like this we could reproduce natural intelligence. But the amount of work is colossal and the final solution will never be able to best its creator. Isn’t it depressing to spend a lot of time creating something while knowing it can’t be perfect?

To avoid that, a new idea emerged. What if instead of creating a direct solution we recreate evolution. We could create the first prehistoric fishes, put them in conditions suitable to evolution and let them evolve freely toward man-kind or even further. This idea is called “genetic algorithm” and we are going to build one in the next part. So first let us refresh our memories and try to understand the natural selection theorized by Darwin.

This theory is simple: if a population want to thrive, it must improve by itself constantly, it’s the survival of the fittest. The best element of the population should inspire the offspring, but the other individuals must not be forgotten in order to maintain some diversity and be able to adapt in case of a variation of the natural environment.

How genetic algorithms work, from one generation to the other
How genetic algorithms work, from one generation to the other

Genetic algorithms are especially efficient with optimization problems.

Example: the Knapsack problem

The backpack optimization is a classical algorithm problem. You have two things: a backpack with a size (the weight it can hold) and a set of boxes with different weights and different values. The goal is to fill the bag to make it as valuable as possible without exceeding the maximum weight. It is a famous mathematical problem since 1972. The genetic algorithm is well suited to solve that because it’s an optimization problem with a lot of possible solutions.

Other algorithms aren’t efficient at all, see the Wikipedia page if you want to understand why.

The Knapsack problem can be solved efficiently with a genetic algorithm
The Knapsack problem can be solved efficiently with a genetic algorithm

In order to test by ourselves how a genetic algorithm works, we are going to use it to solve a simple problem: how could I crack my colleague’s password?

The algorithm is implemented on Python 3. You can download it here for Windows, or install it using brew install python3 , sudo apt-get install python3 or sudo yum install python3 . I advise you to run this code inside a Jupyter notebook.

Choosing a fitness function

The evaluation function is the first step to create a genetic algorithm. It’s the function that estimates the success of our specimen: it will allow us to divide the population between the ugly duckling and the beautiful swans. The goal of this separation is that, later, the successful specimen will have more “chance” to get picked to form the next generation. As simple as it appears, don’t be fooled, it’s the step of a genetic algorithm with the more intelligence related to the problem.

What is our goal? Crack a password. Thus the goal of our function is to transform the binary result “fail or success” in a continuous mark from 0 (can’t fail more) to 100 (perfection).

The simplest solution here is:

fitness score = (number of char correct) / (total number of char)

That way, an individual with a bigger fitness result is a specimen genetically closer to success than the others. Thus the fitness function for our genetic algorithm will accurately sort the population.

The fitness function of our genetic algorithm

Creating our individuals

So now we know how to evaluate our individuals; but how do we define them? This part is really tricky: the goal is to know what are the unalterable characteristics and what is variable.

The comparison with genetics is here really helpful. Indeed, the DNA is composed of genes, and each of those genes comes through different alleles (different versions of this gene). Genetic algorithms retain this concept of population’s DNA.

In our case, our individuals are going to be words (obviously of equal length with the password). Each letter is a gene and the value of the letter is the allele. In the word “banana”: ‘b’ is the allele of the first letter.

What is the point of this creation?

  • We know that each of our individuals is keeping the good shape (a word with the correct size)

  • Our population can cover every possibility (every word possible with this size).

Out genetic algorithm can then explore all possible combinations.

Creating our first population

Now, we know what are the characteristics of our individuals and how we can evaluate their performance. We can now start the “evolution” step of our genetic algorithm.

The main idea to keep in mind when we create the first population is that we must not point the population towards a solution that seems good. We must make the population as wide as possible and make it cover as many possibilities as possible. The perfect first population of a genetic algorithm should cover every existing allele.

So in our case, we are just going to create words only composed of random letters.

The evolution step of our genetic algorithm

From one generation to the next

Given a generation, in order to create the next one, we have 2 things to do. First we select a specific part of our current generation. Then the genetic algorithm combines those breeders in order to create the next batch.

Breeders selection

They are lots of way to do this but you must keep in mind two ideas: the goals are to select the best solutions of the previous generation and not to completely put aside the others. The hazard is: if you select only the good solutions at the beginning of the genetic algorithm you are going to converge really quickly towards a local minimum and not towards the best solution possible.

My solution to do that is to select on the one hand the N better specimen (in my code, N = best_sample) and on the other hand to select M random individuals without distinction of fitness (M = lucky_few).

The breeders selection in our genetic algorithm

Breeding

We can keep on the biologic analogy for the breeding in our genetic algorithm. The goal of sexual reproduction is to mix the DNA of 2 individuals, so let’s do the same thing here. We have two individuals: “Tom” and “Jerry”, their DNA is defined by their alleles (the value of each letter). Thus in order to mix their DNA, we just have to mix their letters. There are lots of ways to do this so I picked the simplest solution: for each letter of the child, take randomly the letter of “Tom” or “Jerry”.

NB: Obviously, the couple “Tom" and “Jerry” isn’t going to produce only one child. You have to fix the number of children per couple in order to keep a stable population in your genetic algorithm. The number of individuals in the generation 0 equals the number of individuals in the next generation.

The breeding step of our genetic algorithm

Mutation

This last step of our genetic algorithm is the natural mutation of an individual. After the breeding, each individual must have a small probability to see their DNA change a little bit. The goal of this operation is to prevent the algorithm to be blocked in a local minimum.

In order to choose the mutation rate for our genetic algorithm, I followed this article counsels. I studied the influence of mutation more precisely here.

The mutation step of our genetic algorithm

Now you have all the tools to build your own genetic algorithm. Feel free to modify my own implementation. For each step, a lot of solutions are possible, take the fittest to solve your problem.

Here is my git repository with the whole code, here is the gist version.

Wanna go further?

Here is a list of other supports to train, understand and even compete on the field of AI and genetic algorithm.

Web application

BoxCar is an online example of a genetic algorithm. The goal is to create the most efficient two . wheels vehicles. Check the result here

Mobile Application

With this application, you are more solicited than in the previous. You have to create a “creature” with joints, bones, and muscles. Then, the genetic algorithm tries to optimize the moves of your creature in order to execute a task: jump, run, climb stairs and others.

Check the app here.

Do it yourself

Finally, last but not least: Coding Game. It’s a platform linking lots of dev around a common passion “AI”. Every month there is a one-week long contest where you must create the best artificial intelligence possible. The winner is nearly always using the genetic algorithm. So if you feel ready to go to the big leagues, take a deep breath and jump in here.

Thanks to Flavian Hautbois and Tristan Roussel. 

convolutional layer and convolution kernel

About Convolutional Layer and Convolution Kernel

A story of Convnet in machine learning from the perspective of kernel sizes.

rboy

Basics in R Programming

You are about to begin a project on R? Before you watch any tutorial, read these basic standards.

jupyter graph

Why Jupyter Is Not My Ideal Notebook

From notebook prototyping to production the right way.