CONCLUSIONS - Face Analysis Techniques for Human-Computer Interaction

Automatic face analysis is a field that uses knowledge from many other fields such as pattern recognition, machine learning, signal processing, psychology, and neurology. Knowledge of how mammals and humans see has brought ideas to computational algorithms. For example, Gabor wavelets and neural networks, which are widely used in automatic face analysis, have natural origins. They are based on ideas that work well in humans and other mammals. On the other hand, many of the algorithms originate from technical fields because automatic face analysis is ultimately about signal processing, learning, and pattern recognition. One could say that automatic face analysis is about fitting the algorithms of nature and of computers together.

The creation of face analysis techniques and algorithms means that one must consider the reliability and speed of the algorithms. One is often gained at the expense of another. For example, algorithms that create a 3D model of the face are usually more reliable than those that use 2D models but also require more processing power, which makes them slower. In practice, the 3D algorithms are currently too slow to be used in most of the applications in the HCI field. This will change in the future when computers become faster.

In this dissertation the focus was on frontal 2D face analysis. Work on face analysis and on gender classification was reviewed. Most of these studies have introduced new algorithms that aim to improve classification reliability. However, studies where algorithms are compared thoroughly with various face datasets and with automatic face detection and alignment are rare. In this dissertation various gender classification algorithms were studied and compared. Some of the gender classification algorithms were novel. The algorithms were also combined with face

detection and alignment. The two face detectors used in the experiments were the novel blob face detector that was introduced in Chapter 3 and the cascade detector introduced by Viola and Jones (2001).

Tools are needed to improve the existing techniques and to experiment with novel ones. Various tools were implemented and used to carry out the experiments that were described. Also, a parallel training algorithm for discrete Adaboost was created because Adaboost training with a large dataset is very time consuming.

The blob face detector proved to be a fast and reliable detector for frontal faces when color images are available and the skin color model is good enough. However, the locations of the faces detected were not very accurate in the experiments. Better gender classification accuracies were achieved when the cascaded face detector was used because faces were located more accurately with it.

Experiments where no alignment, manual alignment, and several automatic alignment methods were used between face detection and gender classification showed that automatic alignment would increase classification accuracies if working properly. This is supported by the fact that manual alignment increased the classification accuracies compared to the no alignment condition. However, the automatic alignment methods that were used in the experiments decreased the classification accuracies.

To improve the methods, one could select a larger set of face images to create a face model, vary the parameters of the methods, or select a completely different alignment method.

When faces are detected automatically there may be horizontal and vertical inaccuracies in the face location, the face may be rotated, or the area that is detected as a face may be larger or smaller than the actual face.

An experiment where these parameters were varied was carried out. The Adaboost gender classifier with Haar-like features proved to be more reliable for rotations than the other classifiers.

Besides alignment, various other issues were considered. An SVM with pixel-based input proved to be the most reliable classifier when the input data was high quality, which is in line with many other studies (BenAbdelkader and Griffin, 2005; Castrillón-Santana et al., 2003;

Moghaddam and Yang, 2002; Yang et al., 2006b). Nevertheless, the experiments indicate that the features used for the gender classification may be more important than the machine learning method. Face image size used for the gender classification was not an important factor for classification accuracy.

The best gender classification rates that were achieved with frontal faces were slightly over 90% for high quality images. However, only images of

web camera quality can be assumed in many applications and with these images the highest classification rates were slightly over 80%.

Inputs to the classifiers were either histogram equalized image pixels or based on Haar-like features, or on LBP. Pixels were used as input to the neural network and to the SVM, Haar-like features were used as input to the Adaboost classifiers, and LBP features were used as input to the SVM classifier. The results of the experiments indicated that features are more important to classification accuracy than the classifier. For example, in the results described in Section 4.4 classifiers were compared using manually aligned face images with or without hair. The classifiers that used other than pixel-based input benefitted from the inclusion of hair in the face images while the classifiers that used pixel-based input benefitted from the exclusion of hair from the face images. This issue could have been studied further using each feature type with each classifier but this issue was left for future work.

Furthermore, it is possible that higher gender classification accuracies would have been achieved if, for example, the face images had been filtered with Gabor wavelets or if independent component analysis (ICA) had been used. Good results have been achieved with Gabor wavelets in face recognition (Shen and Bai, 2006). However, image pixels are readily available to be used as input after histogram equalization and no further calculations are needed, and values for Haar-like features and for LBP features are fast to calculate.

One could also consider classify gender on the basis of biometric features.

For example, distances between detected facial features such as eyes, nose, and mouth could be used for classifying gender because there are differences in face shape and in facial feature locations between genders.

The problem with these features is that besides gender, facial expressions, ethnicity, and age also affect face shape and the relations between facial features.

In many HCI applications it is important for the user to receive real-time feedback. Therefore the face analysis methods that are used in the applications should be fast. The face detection methods and method combinations used in the experiments meet these requirements. As computers become faster, methods and features requiring a lot of processing power can be used and better face detection and classification accuracies can be achieved. However, mobile devices have much less processing power than PCs and increasing numbers of applications are being developed for them. As many mobile devices have cameras face analysis can be used in these applications, but, again, face analysis methods have to be fast enough. Ultimately, it is likely that there will always be applications where the speed of the methods used is important.

The results in Chapters 3, 4, and 5 are interesting from the HCI perspective.

The novel face detection method presented in Chapter 3 is fast enough to be used in applications that run on a standard PC. When face detection is used in perceptual applications the faces detected may have variations in rotation, translation, and scale, and it is useful to know how these misalignments affect classification accuracies. The results in Chapter 4 address this issue. The face detectors and gender classifiers used in the experiments are usable with frontal faces, but for arbitrarily rotated faces, which are common in real applications, the methods cannot be used as such. One could use, for example, the rotation invariant multi-view face detector proposed by Huang et al. (2007) to detect arbitrarily rotated faces and train separate gender classifiers for each rotation case. The face analysis process should be fully automatic in many applications and all the results in Chapter 5 are for combined systems that are fully automatic.

The classification accuracy achieved with web camera images is interesting because they are likely to be used in many home applications.

Web cameras are inexpensive and many people have them. Gender classification accuracy was measured for web camera images in some of the experiments in Chapter 5.

Face analysis techniques have a wide field of application from computer games to learning applications. Existing applications were considered and ideas for future applications were presented. Recently face detection has been included in many digital cameras and search engines. This indicates that face analysis techniques are maturing and in the near future more applications will most probably be seen. However, there is still a lot of research to be done to create methods that work well in all kinds of conditions: indoors, outdoors, with partially occluded faces, profiles, and so on. There is also a lot of research to be done to find out how to make the best use of the face analysis techniques in the broad range of applications in the HCI field.

In document Face Analysis Techniques for Human-Computer Interaction (sivua 152-156)