This paper presents an overview of Intelligent Video work currently under development at the GE Global Research Center and other research institutes.
The image formation process is discussed in terms of illumination, methods for automatic camera calibration, and lessons learned from machine vision. A variety of approaches for person detection are presented. Crowd segmentation methods enabling the tracking of individuals through dense environments such as retail and mass transit sites are discussed. It is shown how signature generation based on gross appearance can be used to reacquire targets as they leave and enter disjoint fields of view. Camera calibration information is used to further constrain the detection of people and to synthesize a top-view, which fuses all camera views into a composite representation. It is shown how site-wide tracking can be performed in this unified framework. Human faces are an important feature as both a biometric identifier and as a method for determining the focus of attention via head pose estimation. It is shown how automatic pan-tilt- zoom control; active shape/appearance models and super-resolution methods can be used to enhance the face capture and analysis problem. A discussion of additional features that can be used for inferring intent is given. These include body-part motion cues and physiological phenomena such as thermal images of the face. (Publisher abstract provided)