November 20, 2020

GrabCut for Automatic Image Segmentation [OpenCV Tutorial]

This post will explain what the GrabCut algorithm is and how to use it for automatic image segmentation with a hands-on OpenCV tutorial.

Tired of photoshopping your images to extract their foreground? This post will explain what the GrabCut algorithm is and how to use it for automatic image segmentation with a hands-on OpenCV tutorial!

Automatic GrabCut
Automatic GrabCut on Baby Groot

On my latest project, the first step of the algorithm we designed was seemingly simple: extract the main contour of an object on a white background. In this post, we’ll see how to solve this problem using GrabCut, a smart segmentation algorithm.

Go to the last section for a tutorial with open source code to try it out!

Why we used GrabCut for image segmentation

For the purpose of this post, let’s imagine that our object of interest is a spoon. We want to know everything about this spoon, from its length to the shape of each part. In order to do so, the first step is to extract its main contour.

To do so, our first idea was to use OpenCVfindContours”. As its name tells us, this method extracts contours from a binarized image. As you can see below, it did not go as planned!

Spoon contours extraction with OpenCV findContours method. The main spoon contour is discontinuous.

Contours extracted with “findContours” are as good as the initial image binarization. In this example, we did not manage to obtain such a binarization. Indeed, pixels from the spoon were converted to 0s, notably around the flash zone. As a result, the main extracted contour is discontinuous. Moreover, the numerous subcontours present on the final image are noise that prevents us from retrieving the main shape of the spoon.

At this point, we took a step back. What are the initial problem characteristics?

  • one main object of interest on a united background

  • the object of interest is roughly always at the same position on the photo

  • not a lot of data, so training a machine learning algorithm is not really an option

Stated this way, our initial contours extraction problem can be solved with image segmentation. More specifically, foreground extraction is sufficient in this case. This is exactly what the GrabCut algorithm does.

A little bit of theory: how does GrabCut work?

In this section, we’ll dive into GrabCut theory. If you want to code right away, go to the next section!

Initially, GrabCut is an interactive foreground extraction algorithm which purpose is to minimize user interaction. Carsten Rother, Vladimir Kolmogorov and Andrew Blake designed it in 2004.

Problem formulation: what does image segmentation mean?

First, segmenting an image means attributing a label to each of its pixel. In the case of GrabCut and other foreground extraction algorithms, each pixel is either labeled as being in the foreground or the background of the image.

Problem formulation: what it means to segment an image. Every pixel is either labeled as foreground (green pixels) or background (blue pixels)

Graph modeling

To obtain the segmentation, GrabCut takes advantage of the graph-like structure of an image. Each pixel has several links:

  • one “n-link” to each of its 4 direct neighbors

  • two “t-link” to the source and sink nodes of the graph, representing respectively the image foreground and background. If you feel lost don’t worry, I’ll go into the details in the next paragraph.

After graph construction, the image segmentation task consists of finding the cut of minimal cost that separates foreground and background. Let’s see how the graph is built!

How to model the original image as a graph to solve the image segmentation problem

GrabCut graph definition

To build the graph, you need to perform these two steps:

  1. Define the weights attributed to each type of link

  2. Define the associated cost function. Image segmentation is then equivalent to minimizing this cost function.

To obtain a relevant segmentation, weights attributed to the two types of links need to reflect what a good segmentation is.

  • “n-links” represent local information about a pixel and its direct surroundings.

The data term takes into account n-link weights between pixels cut by the segmentation (different alpha values). The more similar the pixels are, the higher the cost. Alpha represents pixels labels and z pixels intensities

The intuition behind it is that you want similar pixels to be grouped together: it should be more expensive to cut a link between two pixels close in color than between two different pixels

  • “t-links” represent global information about color distribution in the foreground and the background of the image. A t-link weight shows how well a pixel fits the background/foreground model. Gaussian Mixture Models (GMMs) model them in GrabCut.

GrabCut GMMs Data Term
The smoothness term takes into account background color modeling. Alpha represents pixels labels, k, and theta GMMs parameter, and z pixels intensities. The details for D are available in the paper

I won’t go into GMMs details but if you want an introduction, this great article by is a good place to start.

Once the weights are defined, the cost function or energy function E is their sum over the graph:

GrabCut graph Energy function
Where E is the graph energy function. Alpha represents pixels labels, k, and theta GMMs parameter, and z pixels intensities. U is the smoothness term and V the data term, representing respectively global and local information

Graph theory to solve image segmentation

GrabCut iterative loop

GrabCut iteration loop: 1. Associate weights to graph links using existing labeling 2. Forget current labeling but keep weights 3. Use MinCut algorithm to segment the image.

Once the cost function is defined, solving the image segmentation problem is equivalent to solving a MinCut problem. This is a classic graph theory problem that a MinCut algorithm can solve.

Finally, GrabCut is an iterative algorithm. The initial output segmentation is used to re-estimate graph weights and then refine the image segmentation again, resulting in the GrabCut iterative loop.

Initialize the labels

To estimate graph weights for the first time, GrabCut needs some type of labels initialization.

In the original GrabCut version, the algorithm is interactive. The user provides a partial segmentation initialization by drawing a rectangle containing the object of interest. If needed, he can also apply touch-ups at the end of the segmentation to improve it.

GrabCut initial input
Initial user input for GrabCut: the user draws a rectangle containing the entire object of interest.

In our use-case, we want the algorithm to be fully automatic. To do so, we take advantage of previous knowledge about the images we want to segment. For instance, we know roughly where the object is on the image. This is how we drew the input rectangle.

We’ll see in the code section how to use other input knowledge to improve the segmentation.

Refine the segmentation

GrabCut goes a bit further than the hard segmentation between foreground and background. Once the iterative loop finishes, labels around the segmentation border are refined and classified into four groups: sure background, probable background, probable foreground, and sure foreground.

GrabCut Segmentation Refinement
GrabCut segmentation refinement

Then, GrabCut is finished and you’re good to go with the image segmentation!

To sum it up: GrabCut in a nutshell

original GrabCut for image segmentation

Original GrabCut: user intervention is minimal but exists

Automatic GrabCut

Automatic GrabCut version: initial pixels labels are provided automatically

If you want to know more about the algorithm, you can read the original paper for free!

Let’s code: hands-on OpenCV GrabCut tutorial!

Lucky for us, OpenCV offers an out-of-the-box implementation of GrabCut. Let’s see how we can use it to solve the original image segmentation problem!

For more details, the whole notebook is available on Github.

GrabCut initialized with a bounding box

First, let’s start with an input similar to the user input provided in the original GrabCut version: a bounding rectangle containing the object of interest.

Run the following gist on the image of your choice to see the result:

Spoon image segmentation with GrabCut boundary rectangle initialization

Well, that is not exactly the expected result! Here, GrabCut did not have enough initial information to perform well and failed to retrieve an accurate mask for the spoon.

GrabCut with a combined initialization

As mentioned earlier, it is possible to provide GrabCut with additional information about the object of interest. This way, we help the initial modeling of the foreground and the background of the image. To do it technically, change the OpenCV mode when calling GrabCut.

One way to provide more information is to start by binarizing the original image and labeling as foreground all output black pixels, that we know are part of the object of interest.

Spoon image segmentation with GrabCut combined initialization

Generally speaking, the more information you can embed in the initial mask, the better to help GrabCut segment your image. This can be done with a multitude of techniques, ranging from image binarization to previous knowledge about image color distribution for instance. You can check out this post for inspiration!

To conclude, if you face a situation with few data and not much touch time to segment an image, GrabCut is a great candidate! The table below sums up its main strong points and disadvantages:

GrabCut for Image Segmentation summary

GrabCut for image segmentation summary

If these disadvantages are unacceptable in your use-case, other image segmentation alternatives exist, like deep learning approaches such as Mask-RCNN for instance.

I hope this post was useful and I’d love to have your feedback!

You can also checkout OpenCV official GrabCut tutorial for more examples of applying GrabCut to a real-world image.

Are you looking for Image Recognition Experts? Don't hesitate to contact us!

Similar Articles

OCR in the Wild: SOTA in Text Detection and Recognition

Review of 3 papers in Scene Text Detection and Recognition introducing a detection, a recognition and an end-to-end approach

End-to-end Object Detection with Template Matching using Python

Object detection using template matching - no data needed!

Preview image of article 5 Manières Dont La Computer Vision Transforme l’Industrie Du Retail.

5 Ways Computer Vision Is Transforming Retail Industry

Découvrez pourquoi 40 % des retailers prévoient d’utiliser la reconnaissance d’image d‘ici les deux prochaines années.