Content based image retrieval (CBIR) systems enable to find similar images to a query image among an image dataset. The most famous CBIR system is the search per image feature of Google search. This article uses the keras deep learning framework to perform image retrieval on the MNIST dataset.
Our CBIR system will be based on a convolutional denoising autoencoder. It is a class of unsupervised deep learning algorithms.
To explain what content based image retrieval (CBIR) is, I am going to quote this research paper.
There are two [image retrieval] frameworks: text-based and content-based. The text-based approach can be tracked back to 1970s. In such systems, the images are manually annotated by text descriptors, which are then used by a database management system to perform image retrieval. There are two disadvantages with this approach. The first is that a considerable level of human labour is required for manual annotation. The second is the annotation inaccuracy due to the subjectivity of human perception. To overcome the above disadvantages in text-based retrieval system, content-based image retrieval (CBIR) was introduced in the early 1980s. In CBIR, images are indexed by their visual content, such as color, texture, shapes.
Basically we first extract features from an image database and store it. Then we compute the features associated with a query image. Finally we retrieve images with the closest features.
The key point about content based image retrieval is the feature extraction. The features correspond to the way we represent an image on a high level. How to describe the colours on an image? Its texture? The shapes on it? The features we extract should also allow an efficient retrieval of the images. This is especially true if we have a big image database.
There are many ways to extract these features.
Another possibility is to use deep learning algorithms. In this research paper the authors demonstrate that convolutional neural networks (CNN) trained for classification purposes can be used to extract a ‘neural code’ for images. These neural codes are the features used to describe images. It also demonstrates that it performs as well as state of the art approaches on many datasets. The problem about this approach is that we first need labelled data to train the neural network. The labelling task can be costly and time consuming. Another way to generate these ‘neural codes’ for our image retrieval task is to use an unsupervised deep learning algorithm. This is where the denoising autoencoder comes.
A denoising autoencoder is a feed forward neural network that learns to denoise images. By doing so the neural network learns interesting features on the images used to train it. Then it can be used to extract features from similar images to the training set.
If you are not familiar with autoencoders, I highly recommend to first browse these three sources:
We use the convolutional denoising autoencoder algorithm provided on keras tutorial.
For the general explanations on the above lines of code please refer to keras tutorial.
Notice that there are small differences compared to the tutorial. The first difference is this line:
encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x)
We set a name to the encoder layer in order to be able to access it.
We also saved the learned model by adding:
This will enable us to load it later in order to test it.
Finally, we reduced the number of epochs from 100 to 20 in order to save time :).
Let’s try our learned model to denoise an input test image.
First we regenerate the noisy data and load the previously trained autoencoder.
Then we call the following function that denoises the first noisy test image and plot it:
The result is
Our image database is the MNIST training dataset.
Our goal is to provide a query image and find the closest MNIST images.
First, we compute the features of the training dataset and the query images:
Before scoring our model we need to understand the scoring function we will use.
To assess the model, we use the scikit learn function:
label_ranking_average_precision_score. This function takes two arrays as input. First an array of zeros and ones. Second an array of relevance scores.
In our case, we compute the relevance score from the computed distance between the feature of the query image and the images of the database. The lower the distance the higher the relevance score should be.
We construct the first array following this rule: for each image on database, if the image has the same label as the query image, we append a ‘1’ to the array. Otherwise we append a ‘0’.
This scoring function returns a maximum score of 1 if the closest images have the same label as the query image. If there are images with a different label that are closer to the query image, the score decreases.
To get a feel of what it does let’s compute the value of this scoring function on some examples.
Suppose we have a query image with label ‘7’ and that we have four images in our database with following labels : ‘7’, ‘7’, ‘1’, ‘0’. The first two images of the database are relevant regarding the query image, and the two last ones are not. The first array that we pass to the scoring function should be [1, 1, 0, 0]. For each image on our image database we will compute a relevance score:
For each query image feature, we compute the Euclidian distance to the training dataset images features. The closer the distance the higher the relevance score should be. Then we apply the scoring function label_ranking_average_precision_score to our results.
The y axis correspond to the score computed with the label ranking average precision scoring function. The x axis corresponds to the n first results assessed.
To better understand this graph I will give an example. Suppose we have a database of 3images with labels 7, 7, 1 . And suppose the input image has a label 7. If our algorithm sorts the results on the following order: 7, 1, 7. First we will score only the first returned image : the scoring function returns 1. Then we assess the first two images returned [7, 1]: the scoring function return 1. Then we assess the first three results [7, 1, 7]: the score decreases and is now equal to 0.83 etc…
Overall the more retrieved images we assess the worse the score is.
You can find the full code here.
We tested an image retrieval deep learning algorithm on a basic dataset. Our convolutional denoising autoencoder is efficient when considering the first retrieved images. But we tested it on similar images. We didn’t have to deal with color, scaling and rotation issues.
To learn more on autoencoders for CBIR you can read this research paper from Alex Krizhevsky and Georey Hinton.
NeurIPS (prev. NIPS) Papers Selection
My favorite research articles from NeurIPS (previously NIPS) 2018.
About Convolutional Layer and Convolution Kernel
A story of Convnet in machine learning from the perspective of kernel sizes.
A progressive Web application with Vue JS, Webpack & Material Design [Part 1]
This tutorial aims to create a basic but complete progressive web application with VueJS and Webpack, from scratch.