It happened again. Last week, as I was explaining my job to someone, they interrupted me and said "So you're building Skynet". I felt like I had to show them this meme, which I thought described pretty well my current situation.
Artificial General Intelligence and pragmatic thinking
No need to say that super-human AI is nowhere near happening. Nonetheless, I think the public is fascinated by the idea of super-intelligent computers taking over the world. This fascination has a name: the myth of singularity. The singularity refers to the point in time when an artificial intelligence would enter a process of exponential improvement. A software so intelligent that it would be able to improve itself faster and faster. At this point, technical progress would become the exclusive doing of AIs, with unforeseeable repercussions on the fate of the human species.
Singularity is linked to the concept of Artificial General Intelligence. An Artificial General Intelligence can be defined as an AI that can perform any task that a human can perform. I find this concept way more interesting than the concept of singularity, because its definition is at least a bit concrete. As a result, you have elements to decide whether an algorithm is an Artificial General Intelligence or not. I, a human, can design pragmatic and innovative solutions to increase the value of your data. Current AI software can't. Therefore, he haven't reached Artificial General Intelligence.
Even more useful: if we are able to identify the features of human intelligence, then we can know what is missing in our algorithms. And we can improve them.
Let's do that.
How can we characterize human intelligence?
We defined an Artificial General Intelligence (AGI) as an AI that can at least match human intelligence's capabilities. If we want to go further, it would be good to have an idea of what makes human intelligence.
We have two options here: either we focus on the nature of human intelligence, either we focus on its characterization. The nature is where it comes from. The characterization is how we can recognize it.
There are thousands of theories aiming at defining the nature of human intelligence in each field of study. Psychology, biology, genetics, sociology, cognitive science, mathematics, theology... All of which I know close to nothing about. Good news is: we just have to focus on the characterization of human intelligence.
If we want to get closer to Artificial General Intelligence, our best shot is not to try to reproduce the human brain. The definition of AGI is functional: an AI that can do anything that humans can do. So, what can human intelligence do?
Of course, we can't draw an exhaustive list here. But there are a lot of features we can think of:
- abstract reasoning,
- learning from past experience,
- composition of elements,
- adaptability to new environments,
- problem solving,
- you can go on and on in the comments if you wish.
I promised you 3 reasons why we are far from achieving Artificial General Intelligence. So I'm gonna arbitrarily choose three features of human intelligence that our algorithms do not possess at this point:
- Out-of-distribution generalization
- Conscious reasoning
To be fair, it is not that arbitrary. We are going to focus on these 3 characteristics of human intelligence because we have ideas to achieve them. Isn't this exciting?
Artificial General Intelligence, feature by feature
According to the Structural Cognitive Modifiability theory, intelligence would be « the unique propensity of human beings to change or modify the structure of their cognitive functioning to adapt to the changing demands of a life situation ». It is undeniable that we humans are very good at adapting to great changes. At the youngest age, both our body and the environment change very fast, and yet babies are able to adapt to these changes and keep on learning.
But in the current state of machine learning, there is no way that an AI can adapt to such radical changes. I think I have the perfect example to show you where we're at exactly.
The example of ObjectNet
A few months ago, students at the MIT released ObjectNet. It is meant as a testing dataset for object recognition algorithms. And it is entirely made of pictures of objects from weird angles or in an unusual environment.
A human would never have any problem in recognizing any of these objects. Therefore, neither would an Artificial General Intelligence. However, when tested on this dataset, state-of-the-art algorithms' accuracy drops by 40-45%, compared to their performance on the usual testing set of ImageNet. Even though these algorithms have been trained of thousands of hammers or oven gloves, they become unable to recognize them when they are set in a previously unseen environment.
The reason behind this is that state-of-the-art machine learning algorithms are bad at generalizing outside of the distribution they have been trained on. What they are good at is extrapolating inside of this distribution. This means that if you show them an image which is quite similar to what they have experienced, if this image can exist with a high probability in the vision of the world that they have built from the images that you have already shown them, then they will be good at treating this image. But right now, AIs have a very weak imagination capability. This makes their vision of the world too limited by the examples that they have been shown.
Meta-learning and compositionality
But why are we, humans, good at this generalization problem? What do state-of-the-art algorithms lack to reach Artificial General Intelligence? I have two answers to this question. This is of course not exhaustive, but to me they provide satisfying improvement axis.
The first reason is meta-learning. Meta-learning can be defined as learning to learn. We say that an agent (human or AI) is learning when its performance at a specific task improves with experience on this task. In comparison, an agent is learning to learn when its performance at a new task improves with experience and the number of task. The goal of meta-learning is therefore to develop algorithms that are able to rapidly and efficiently adapt to new tasks. As a result, meta-learning algorithms are usually better at generalizing out of their training distribution, because they have not been trained to specialize on a task. They have been trained to adapt to new, previously unlikely data. Humans are the champions of meta-learning, because
- they are trained all their life on an incredibly wide variety of tasks;
- they benefit from the experience of their ancestors. Natural selection is evolution's training strategy. We, like all other species, inherit from genes that learn a bit from all precedent ancestors who lived in an unimaginably diverse set of situations and environments. This is the most impressive example of meta-learning that I can think of.
The second explanation I can provide as for why humans are so much better than machine learning algorithms at generalizing to unseen situations is compositionality. And I have a whole chapter for that.
The meaning of compositionality might not be clear at first glance. Chrome even keeps insisting that it's not a word. So let's start with a definition. compositionality is learning from a finite set of combinations, about a much larger set of combinations. Let's take a look at this example of typical social network garbage.
It is a very good example of compositionality. From a finite sate of combinations of three elements (apples, bananas and coconuts here), you should be able to infer the value of any new combination of these elements.
Compositionality is, among other things, closely linked to the philosophy of language. The principle of compositionality states that the meaning of an expression is defined by the elements composing this expression, and the way they are combined together. « People love apples » has a particular meaning. « Apples love people » has an other. Same elements, but combined differently.
More generally, we compose elements all the times. To invent new concepts, new objects, and to understand them. In 2015, in their paper Human-level concept learning through probabilistic program induction, Brenden Lake and his team chose the example of means of transportation.
Through compositionality, we are able to easily imagine new objects. To sum up, we can use what we know about a set of objects to learn about the concepts that compose them, and therefore we can extrapolate to new objects which had zero probability under the distribution of the training dataset.
Math interlude: the zero probability
What does it mean to have zero probability under the distribution of the training dataset? In the graph above, the training dataset is the set of all examples represented with a green point. Using this set of examples, most machine learning algorithms model a probability distribution (here with a Gaussian model). This represents what the algorithm thinks is most likely to happen. Some cases, typically because they are close to the cases that actually occurred in the training set, have high probability under the training distribution, even though we have never seen them actually happen. Machine learning algorithms are very good at treating those cases.
Other cases will have zero probability under the training dataset distribution. This doesn't mean that they will never happen. It just means that they are not part of the algorithm's vision of the world, based on what it has seen in the training dataset. The algorithm will be very bad at treating those cases.
Using compositionality, however, we've seen that we can generate these cases by recombining the elements which constitute the cases we have already seen. This is a tremendous opportunity to broaden the perspective of machine learning algorithms. To this day, I believe this is one of our best improvement axis in order to achieve, maybe, on day in the far future, Artificial General Intelligence.
Consciousness is a very big word. Like all very big words, it has a lot of complicated definitions. Some consider the nature of consciousness, others its function. Each time from a different perspective. We won't try to address the whole concept of consciousness. We will focus on conscious reasoning.
I call it conscious reasoning when we think in an active way. For instance, when you think about breathing, when you consciously breathe, you alternatively focus on inhaling and exhaling. It is different when you don't think about it. It's hard to imagine that when you don't focus on your breathing (or even when you're asleep), your body handles breathing as a duality between inhaling and exhaling. The process is more likely handled as a combination of a lot of biological phenomenons (many organs contracting and relaxing, transfer of oxygen from air to blood, of carbon dioxide from blood to air...).
This is the specificity of conscious reasoning: it is able to handle reality through very high level concepts. Typically, these concepts can fit in words or sentences. To understand this, I heard the best example in Yoshua Bengio's talk at NeurIPS 2019, which incidentally inspired this article.
When you drive your car to work and back home, every day the same commute, it becomes automatic. You follow a path you perfectly know and don't ever think about it. However, when you drive to a friend's home far, far way, in a town you have never visited, the way you drive is completely different. You are more focused. You actively think about every turn and read every sign.
This ability to manipulate high-level concepts is an other thing that state-of-the-art machine learning algorithms lack. Fortunately, there is still hope.
Global Workspace Theory
In cognitive science, the Global Workspace Theory suggests that there is a bottleneck of information. At each instant, only a very small fraction of all perceived information is filtered by this bottleneck and broadcasted in the whole brain. The concept of continuous flow of information has been widely challenged by the community. However, there is an interesting take-away here for us. The high-level concepts that we manipulate during conscious reasoning are based on low-level, high-dimensional information. All the perceptions that entered the bottleneck.
This is an inspiration for an emerging branch of machine learning: attention mechanisms. They have been first introduced by Dzmitry Bahdanau and researchers from the University of Montréal in 2015. Since then, attention mechanisms have allowed huge progress in neural machine translation and natural language processing, as well as other technical improvement. For instance, they propose an effective solution to the problem of vanishing gradients, which is a recurring problem in deep neural networks.
The logic behind attention mechanism is simple: ease the computation by focusing only on a few input elements at a time. Does it sound familiar? If we keep working on attention mechanism, we could get closer to humans' ability to link thousands of low-level perceptions with a small number of consciously manipulable high-level concepts.
OK then, but how far are we from Artificial General Intelligence?
I let it slip earlier: very far. We must acknowledge that in both 3 topics, even though we can hope for huge progress in the near future, we are still very far from human performance. We must also remember that those are 3 of the most promising improvement axis, but solving them will not be enough to achieve AGI.
Artificial General Intelligence is an exciting buzzword, because it is either a huge promise or a scaring threat. As any other buzzword, it must be manipulated with caution. I must admit that in this article I used it as an excuse to draw your attention to conscious reasoning, compositionality and out-of-distribution generalization. Because unlike Singularity or AGI, they represent practical ways to improve machine learning algorithms and actually boost the performance of artificial intelligence.
I hope you had a good time reading this piece. I also hope that as you reach the last paragraphs, you are mostly eager to learn more about how we can learn from human intelligence to improve our algorithms. If this is the case, I suggest you take the time to watch this conference of Yoshua Bengio which inspired this article. If you speak French, you can also watch this video where I explain these concepts to the team at Sicara.