Ein ergonomisches Dialogsystem zur Steuerung von technischen Systemen in Wohnbereichen mittels Gestenerkennung
WS95/96 & SS96
1. Breig Marcus
2. Dechow Dietmar
3. Fellenberg Matthias
4. Jaeschke Reinhard
5. Jedamzik Michael
6. Kuleßa Thomas
7. Pukropski Dirk
8. Rief Dirk
9. Rosemann Dirk
10. Thümmler Axel

ARGUS is watching you!

Controlling appliances in home or office environments by hand gestures is a step towards a more intuitive and natural human computer interface. A vision system is watching the user and reacts to his gestures. The user is able to select brown or white goods by simply pointing towards them (deictic gesture). The system's feedback confirms the device, to which the user is pointing. The device selection is done by a special gesture (selection gesture) following the deictic gesture (or, in future, by voice input). Depending on the type of device, its internal state is switched (on/off for lamps, coffee maker etc.) or a device specific dialogue between the user and the device starts. Hence, the user is able to remote control his TV and VCR, his compact disc player or simply to switch on or off his lamp or coffee maker. This is presented in the short video, which was produced by the WDR television company of Dortmund (thanks to them for the publishing permission and their good job!).

ARGUS (9,45 MB QuickTime Video) The video is available in three video formats (9,45 MB .mov, 7,45 MB .mov [24 bit], 12,1 MB .avi, 16,55 MB .flc). First it shows the "prototyp" infra red transmitter, which was connected to the computer and which controlled the video and audio devices. Before application of the system the cameras must be calibrated. After calibration a typical gesture sequence is shown for controlling the VCR. The play gesture, for example, is the "victory" gesture. On the computer's screen the top windows show the camera view of the left and right camera. The brighter regions are the hand and the head of the user. The six smaller windows beneath show the color histograms of the head, the left and the right hand for both cameras. Now a dialogue is shown: The user selects a device by pointing. During pointing the computer feeds back the device (sound feedback and feedback on the screen), it recognizes to be pointed to by the user. The selection is executed by lowering the thumb (selection gesture). The CD player reacts. In the next scene the user controls the VCR by gestures and switches on the light. In the window of the camera view the head and hand and their surrounding boxes can be seen. The device specific dialogue, the feedback etc. is configurable through a device editor available in the prototype.

ARGUS needs at least two eyes

The ARGUS vision system currently uses two cameras and may be extended with more cameras. Most stereo vision systems use cameras in small distance, because they only deal with 3D measurment. In arbitrary environments (home and office environments), however, the cameras should generally be installed in far distance to each other in order to capture as much of the environment as possible, to get perspective independent views and additionally to do 3D measurments. The best results are achieved, if the two stereo cameras and the recorded object form a right angled triangle. The next figure shows a typical scenario, where ARGUS was tested. ARGUS is able to distinguish the tuner, tape, CD player and amplifier by pointing although they are standing rather close (relative angular resolution is 1.15°).

ARGUS scenario

Afraid of the ARGUS eyes?

Not necessary! ARGUS´ eyes follow inconspicuously. Therefore, ARGUS uses multi object tracking, which means to track the hands and head. ARGUS is not able to perform complete body tracking, in order to reduce computing power. So the rest of your body may do whatever it wants. For multi object tracking the cameras must be steered to capture all these objects. The user is allowed to behave natural and the hands and the head may appear and disappear, for example, if the user puts his hand into his pocket. Therefore, motion detection is necessary to find disappearing objects, if they appear again. Further, tracking (Kalman Filter) is necessarily applied to all objects not only for complexity reasons but also for better segmentation results. Motion identification must determine equal objects in the stereo camera view, which are detected by motion detection and tracked. Furthermore, this identification supports the 3D match and calculation of the stereo vision.

ARGUS System Architecture:

System Architecture

ARGUS consists of a dialogue control and four separate recognition systems:

  • The Object Recognition and Tracking system detects motion and classifies it through primitive motion features. Thus objects can be pre-classified as hand, head or any other object. This pre-identification allows to run gesture recognition only for those objects, which are classified to be hands, and, hence, safes computing power. Furthermore, their information is necessary for the matching of the objects in stereo vision. This means that the computer continiously finds motion in a room, it tracks the moving objects and it is able to understand, which object of the left camera corresponds to an object in the right camera. The tracking also includes to control the camera such, that it tracks the user.
  • For every object classified to be a hand, gesture recognition is applied. Therefore Segmentation is called, which does a colour recognition to find skin colour in the frames.
  • The shape features are calculated and classified by the static gesture recognition of ZYKLOP. ZYKLOP is based on fast multi-classification and runs in real-time.
  • Furthermore, the dynamic gesture recognition of ZYKLOP robustly reacts on specified gesture sequences and is able to perform trajectory recognition.

Finally, the Dialogue Control combines the recognition results from all objects detected in the camera views. As it knows corresponding objects from Object Recognition, it can determine 3D positions and pointing directions. If the user points towards a device, Dialogue Control determines the pointing direction and the related device and selects it. If the device is simple like a lamp or coffee maker, it is switched on or off. If it is a complex device, a device specific dialogue starts, allowing the user to control the loudness or balance of the amplifier, for example.

Main features of ARGUS

ARGUS is a remote control system applying gesture recognition which considers the skin colored body parts of the user. Therefore Sven Schröter has developed a color based object recognition. Particular aspects are hand and head tracking using computer steerable pan/tilt video cameras with focus control. ARGUS is prepared for:

Future Work:

ARGUS is going to be used for presentations in lecture hall. Therefore the knowledge and equipment of the multimedial systems will be used.


Further links about ARGUS are here.

ARGUS has been developed in a one year project of 10 graduate students: Marcus Breig, Dietmar Dechow, Matthias Fellenberg, Reinhard Jaeschke, Michael Jedamzik, Thomas Kuleßa, Dirk Pukropski, Dirk Rief, Dirk Rosemann and Axel Thümmler and my colleague  Michael Wittner. Thanks to Prof. Dr. H. Müller, who supported this project.

Most of the ARGUS prototype have been redesigned and implemented in object oriented design by Marcus Breig, Bernd Deimel, Christian Esken and Sven Schröter. Camera control was combined with motion detection and tracking by Marcus Breig. Christian Esken developed an intelligent motion identification, which enables to match objects from stereo vision, which reduces computation time and identifies the moving objects. Bernd Deimel developed several algorithms to separate the hand from the forearm.  

Response Partners:

Helge Baier,   Tel. +49-231-755 6328, E-Mail: baier@ls7.cs.uni-dortmund.de
Markus Kohler, Tel. +49-231-755 6125, E-Mail: Markus.Kohler@uni-dortmund.de
Sven Schröter,   Tel. +49-231-755 6328, E-Mail: schroete@ls7.cs.uni-dortmund.de