Tired of photoshopping your images to extract their foreground? This post will explain what the GrabCut algorithm is and how to use it for automatic image segmentation with a hands-on OpenCV tutorial!
On my latest project, the first step of the algorithm we designed was seemingly simple: extract the main contour of an object on a white background. In this post, we’ll see how to solve this problem using GrabCut, a smart segmentation algorithm.
Go to the last section for a tutorial with open source code to try it out!
For the purpose of this post, let’s imagine that our object of interest is a spoon. We want to know everything about this spoon, from its length to the shape of each part. In order to do so, the first step is to extract its main contour.
Contours extracted with “findContours” are as good as the initial image binarization. In this example, we did not manage to obtain such a binarization. Indeed, pixels from the spoon were converted to 0s, notably around the flash zone. As a result, the main extracted contour is discontinuous. Moreover, the numerous subcontours present on the final image are noise that prevents us from retrieving the main shape of the spoon.
At this point, we took a step back. What are the initial problem characteristics?
one main object of interest on a united background
the object of interest is roughly always at the same position on the photo
not a lot of data, so training a machine learning algorithm is not really an option
Stated this way, our initial contours extraction problem can be solved with image segmentation. More specifically, foreground extraction is sufficient in this case. This is exactly what the GrabCut algorithm does.
In this section, we’ll dive into GrabCut theory. If you want to code right away, go to the next section!
Initially, GrabCut is an interactive foreground extraction algorithm which purpose is to minimize user interaction. Carsten Rother, Vladimir Kolmogorov and Andrew Blake designed it in 2004.
First, segmenting an image means attributing a label to each of its pixel. In the case of GrabCut and other foreground extraction algorithms, each pixel is either labeled as being in the foreground or the background of the image.
To obtain the segmentation, GrabCut takes advantage of the graph-like structure of an image. Each pixel has several links:
one “n-link” to each of its 4 direct neighbors
two “t-link” to the source and sink nodes of the graph, representing respectively the image foreground and background. If you feel lost don’t worry, I’ll go into the details in the next paragraph.
After graph construction, the image segmentation task consists of finding the cut of minimal cost that separates foreground and background. Let’s see how the graph is built!
To build the graph, you need to perform these two steps:
Define the weights attributed to each type of link
Define the associated cost function. Image segmentation is then equivalent to minimizing this cost function.
To obtain a relevant segmentation, weights attributed to the two types of links need to reflect what a good segmentation is.
“n-links” represent local information about a pixel and its direct surroundings.
The intuition behind it is that you want similar pixels to be grouped together: it should be more expensive to cut a link between two pixels close in color than between two different pixels
“t-links” represent global information about color distribution in the foreground and the background of the image. A t-link weight shows how well a pixel fits the background/foreground model. Gaussian Mixture Models (GMMs) model them in GrabCut.
I won’t go into GMMs details but if you want an introduction, this great article by brilliant.org is a good place to start.
Once the weights are defined, the cost function or energy function E is their sum over the graph:
GrabCut iteration loop: 1. Associate weights to graph links using existing labeling 2. Forget current labeling but keep weights 3. Use MinCut algorithm to segment the image.
Once the cost function is defined, solving the image segmentation problem is equivalent to solving a MinCut problem. This is a classic graph theory problem that a MinCut algorithm can solve.
Finally, GrabCut is an iterative algorithm. The initial output segmentation is used to re-estimate graph weights and then refine the image segmentation again, resulting in the GrabCut iterative loop.
To estimate graph weights for the first time, GrabCut needs some type of labels initialization.
In the original GrabCut version, the algorithm is interactive. The user provides a partial segmentation initialization by drawing a rectangle containing the object of interest. If needed, he can also apply touch-ups at the end of the segmentation to improve it.
In our use-case, we want the algorithm to be fully automatic. To do so, we take advantage of previous knowledge about the images we want to segment. For instance, we know roughly where the object is on the image. This is how we drew the input rectangle.
We’ll see in the code section how to use other input knowledge to improve the segmentation.
GrabCut goes a bit further than the hard segmentation between foreground and background. Once the iterative loop finishes, labels around the segmentation border are refined and classified into four groups: sure background, probable background, probable foreground, and sure foreground.
Then, GrabCut is finished and you’re good to go with the image segmentation!
Original GrabCut: user intervention is minimal but exists
Automatic GrabCut version: initial pixels labels are provided automatically
If you want to know more about the algorithm, you can read the original paper for free!
Lucky for us, OpenCV offers an out-of-the-box implementation of GrabCut. Let’s see how we can use it to solve the original image segmentation problem!
For more details, the whole notebook is available on Github.
First, let’s start with an input similar to the user input provided in the original GrabCut version: a bounding rectangle containing the object of interest.
Run the following gist on the image of your choice to see the result:
Well, that is not exactly the expected result! Here, GrabCut did not have enough initial information to perform well and failed to retrieve an accurate mask for the spoon.
As mentioned earlier, it is possible to provide GrabCut with additional information about the object of interest. This way, we help the initial modeling of the foreground and the background of the image. To do it technically, change the OpenCV mode when calling GrabCut.
One way to provide more information is to start by binarizing the original image and labeling as foreground all output black pixels, that we know are part of the object of interest.
Generally speaking, the more information you can embed in the initial mask, the better to help GrabCut segment your image. This can be done with a multitude of techniques, ranging from image binarization to previous knowledge about image color distribution for instance. You can check out this post for inspiration!
To conclude, if you face a situation with few data and not much touch time to segment an image, GrabCut is a great candidate! The table below sums up its main strong points and disadvantages:
GrabCut for image segmentation summary
If these disadvantages are unacceptable in your use-case, other image segmentation alternatives exist, like deep learning approaches such as Mask-RCNN for instance.
I hope this post was useful and I’d love to have your feedback!
You can also checkout OpenCV official GrabCut tutorial for more examples of applying GrabCut to a real-world image.
Are you looking for Image Recognition Experts? Don't hesitate to contact us!
OCR in the Wild: SOTA in Text Detection and Recognition
Review of 3 papers in Scene Text Detection and Recognition introducing a detection, a recognition and an end-to-end approach
End-to-end Object Detection with Template Matching using Python
Object detection using template matching - no data needed!
5 Ways Computer Vision Is Transforming Retail Industry
Découvrez pourquoi 40 % des retailers prévoient d’utiliser la reconnaissance d’image d‘ici les deux prochaines années.