The integral image, or Summed Area Table, was first introduced to us in 1984. However, it was only properly introduced to the world of Computer Vision in 2001 by Viola and Jones: Viola-Jones Object Detection Framework.
The integral image is used as a quick and effective way of calculating the sum of values (pixel values) in a given image - or a rectangular subset of a grid (the given image).
It can also, or is mainly, used for calculating the average intensity within a given image. If one wants to use the integral image, it is normally a wise idea to make sure the image is in grayscale first.
Let us start off with the basics. When creating an integral image, we need to create a summed area table (SAT). In this table, if we go to any point (x,y) then at this table entry we will come across a value. This value itself is quite interesting, as it is the sum of all the pixel values above, to the left and of course including the original pixel value of (x,y) itself:
What's really good about the summed area table, is that we are actually able to construct it with only one pass over of the given image. There will be more on complexity later, but, in order for this fact to become true, all we have to do is accept that the value in the SAT at (x,y) is simply calculated by:
s(x,y) = i(x,y) + s(x-1,y) + s(x,y-1) - s(x-1,y-1)
That is, we get the original pixel value i(x,y) from the given image, and then we add the values directly above this pixel, and directly left to this pixel from the SAT at s(x-1, y) and above at s(x, y-1). Finally, we subtract the value directly top-left of i(x,y) from the SAT – that is, s(x-1, y-1).
Here's a better example. Take the following image with pixel values, and its corresponding SAT:
Currently, we only have one value filled in for the SAT: we have filled in s(x-1,y-1) = 5. Why is this? Well, taking the equation from above, we simply substitute in values (assume for now that s(x-1,y-1) is s(x,y)):
- i(x,y) = 5
- s(x-1,y) = 0
- s(x, y-1) = 0
- s(x-1, y-1) = 0
Here, we have values of 0 for s(x',y'). This is because they were all out-of-bounds of the SAT, and are thus defaulted to a value of 0. Therefore, we have:
- s(x,y) = i(x,y) + s(x-1,y) + s(x,y-1) - s(x-1,y-1)
- s(x,y) = 5 + 0 + 0 - 0
- s(x,y) = 5
Now, I'm just going to substitute in the values for s(x,y-1) and s(x-1,y). It is up to you to check them, and see if they’re are correct – as well as to see if you can use the equation correctly:
This is what those values represent. taking the s(x-1,y) entry of the SAT, we have a value of 8. This value represents the sum of the pixels to the left, above and including itself. We have the image pixel itself with value 3, and from the SAT the above cell with value 5 (the left and top-left cells are 0 for being out-of-bounds). So 3+0+5-0=8, which is the sum of the pixels left, above and including itself.
Finally, I'll show you quickly the calculation for s(x,y) using all 4 values. Below is the completed SAT:
Firstly, substitute in the corresponding values.
- i(x,y) = 6
- s(x-1,y) = 8
- s(x, y-1) = 7
- s(x-1, y-1) = 5
Sticking these values into the equation we get: s(x,y) = 6 + 8 + 7 – 5 = 16. The question is, is this correct? Well yes, it is!
Remembering that 16 is the sum of all pixels top, left and itself, we add up the 4 pixel values in the actual given image: 5 + 2 + 3 + 6 = 16!
In order to do this we only need to use 4 values from the SAT, that is, 4 array references into the summed area table. With these 4 values, we then add or subtract them for the correct value of the sum of the pixels within that region. To do this, we use this equation:
i(x',y') = s(A) + s(D) - s(B) - s(C)
Now for an example. Let's say we wish to calculate the area contained by the green square:
Remember, the value 16 is the total sum of all the image's pixels. But we want the value of the image pixel of just that green square. As you can already see I've labelled the A, B, C and D. This is what each of them are:
Firstly, we have s(A) which encompasses the following cells:
So s(A) will have a value of 5. Next is the region for s(B):
Here, s(B) has value 7 because this is the value of the sum of pixels up to that point. s(C) looks fairly similar:
and will have value 8. As for s(D), this is the sum of all the pixels up to that point:
Therefore we have values:
- A = 5
- B = 7
- C = 8
- D = 16
Knowing this, we can now substitute them into the equation:
- i(x',y') = s(A) + s(D) - s(B) - s(C)
- i(x',y') = 5 + 16 - 7 - 8
- i(x',y') = 6
Therefore, we are left with the value 6. Think it’s wrong? Think again! Go back to the original image pixel values. Now look at the corresponding pixel value. What’s that, a 6!
Computing the sub-regions can also be done in constant time.