Modelling of the background (“uninteresting parts of the scene”), and of the foreground, play important roles in the tasks of visual detection and tracking of objects. This paper presents an effective and adaptive background modelling method for detectin
Other examples include Tracey [17] which models foreground and background by codebook vectors; [16] which quantizes and compresses background samples at each pixel into codebooks; and [24], where “cooccurrence” of image variations at neighboring image blocks is employed for modelling a dynamic background.
The pixel-level (i,e., ignoring spatial relationship between neighbors) Mixture of Gaussians (MOG) background model [8, 26] is popular and effective in modelling multi-modal distributed backgrounds. Augmented with a simple method to update the Mixture of Gaussian parameters, MOG can adapt to a change of the background (such as gradual light change, etc.). However, there still are some limitations of MOG: for example, in the training stage, MOG usually employs a K-means algorithm to initialize the parameters, which is slow and may be inaccurate. When the background involves many modes, modelling the background with a small number of Gaussians per pixel is not efficient. It is also hard to set the value of the learning rate.
A lot of variants of the MOG background model have been proposed [3, 13, 26]. Elgammal et. al.
[3] chose to replace the MOG Probability Density Function (PDF) with a Kernel-based density estimation method and showed it was effective in handling situations where the background contains small motions such as tree branches and bushes. Since the cost to compute the kernel density estimate at each pixel is very high, several pre-calculated lookup tables are used to reduce the burden of computation of the algorithm. Moreover, because the kernel bandwidth is estimated by using the median absolute deviation over samples of consecutive intensity values at the pixel, the bandwidth estimate may be inaccurate if the distribution of the background samples is multi-modal.
Instead of explicitly choosing a background PDF model, we propose to use the more simple notion of "consensus" to classify a pixel value as foreground or background. We believe such and 6