Face Recognition Across Viewpoint

Page 1

Max-Planck-Institut für biologische Kybernetik Arbeitsgruppe Bülthoff Spemannstraße 38ּ •ּ 72076 Tübingenּ •ּ Germany

Technical Report No. 21

September 26, 1995

Face Recognition Across Viewpoint

Alice J. O'Toole, Heinrich H. B ultho & Candice L. Walker Abstract In two experiments we examined the ability of human observers to recognize faces from novel viewpoints. Previous work has indicated that there are marked declines in recognition performance when observers learn a particular view of a face and are asked to recognize the face from a novel viewpoint. We replicate these ndings and extend them in several ways. First, we replicate the well-known 3/4 view advantage for recognition and extend it to show that this advantage is stronger than would be expected simply due to the 3/4 view being the center of the learned views. In the second experiment, we found little evidence for advantageous transfer to a symmetric view of the other side of the face, in all cases, observers were much better at recognizing a face from the side learned. Third, we extended past results to explore the consistency of face recognizability for individual faces across di erent views and view transfer conditions. We found only a modest relationship between the recognizability of individual faces in the di erent view conditions. These data give insight into the organization of memory for faces and its stability across changes in viewpoint.

Many thanks are due to Niko Troje and Isabelle B ultho for the stimulus creation and processing. Alice O'Toole gratefully acknowledges support by the Alexander von Humboldt Stiftung and the hospitality of the Max-Planck Institut f ur biologische Kybernetik. Please direct all correspondence to A. J. O'Toole, School of Human Development, GR4.1, The University of Texas at Dallas, Richardson, TX 75083-0688, TEL: (214) 883-2486, Email: otoole@utdallas.edu

This document is available as /pub/mpi-memos/TR-21.ps.Z via anonymous ftp from ftp.mpik-tueb.mpg.de or from the World Wide Web, http://www.mpik-tueb.mpg.de/projects/TechReport/list.html.


1 Introduction

\known" or \unknown", e.g., \that is my suitcase on the conveyer belt." Both kinds of object recognition studies have addressed the question of \recognition" across viewpoint change, because a major part of both tasks is to recognize an object from whatever view of it you happen to have. An important practical di erence, however, is that in the former case, you need to recognize the object (i.e., pen) among other dissimilar objects, but in the latter case, you need to recognize the object (i.e., your suitcase) among other quite similar objects (i.e., other suitcases on the airport conveyor belt). Additionally, successful completion of the former task requires an ability to extract the information that the particular object has in common with the object category (i.e., what this object has in common with all/most pens). In the latter case, one must extract information that makes the particular exemplar unique among other objects of that category (i.e., what makes your suitcase di erent from everybody else's suitcase). Faces make a useful compare-and-contrast stimulus to the familiar and unfamiliar objects that have been used in previous work for two interconnected reasons. First, while both faces and objects need to be recognized in the above two ways (as the class of objects they represent and as individuals), the relative balance of the two tasks for faces and generic objects is di erent. For objects, when the latter type of individual exemplar recognition is required, it is usually limited to keeping track of a small number of individual exemplars. Thus, you usually need only to distinguish/remember a few individual suitcases from among all of the other suitcases in the world. For faces, we must be able to distinguish among and remember hundreds, if not thousands, of people by their faces. Second, given the number of faces we know individually, and the extensive experience we have in getting to know new faces, learning to recognize new faces is something most of us can do with very little di culty/training. It is also interesting to note that for other-race faces we often lack expertise equivalent to that which we have with faces of our own race. In this case, learning to recognize new faces is apparently much more di cult, even though there is good reason to think that faces of all races are equally distinguishable (see Shapiro and Penrod, 1986 for a review). In the present study, we examined the ability of observers to recognize faces across viewpoint changes. Additionally, we looked at the consistency of the recognizability of individual faces in

People are generally able to recognize familiar faces accurately from any of a variety of viewpoints. In the present study we address a number of issues related to this ability. Before proceeding, it is necessary to clarify the primary task used in this study and to distinguish it from other related, but di erent tasks. By face recognition, we mean the classi cation of a face as \known" or \unknown". We distinguish this from the task of face identi cation, which usually involves naming a face. This is a natural distinction in human memory. Often, we are absolutely certain that we recognize a face, but cannot identify it in any other way, e.g., retrieve a name or context of knowing. Relating this to the object recognition literature, it is important to note that the term \object recognition" has been used to refer to tasks that draw on two quite di erent kinds of information about objects. \Object recognition" has frequently been used to refer to a basic level categorization task involving the classi cation of a particular object as an instance of a class of objects (e.g., \This object is a chair.")1. In this case, the task is to produce the correct category name (e.g., Bartram, 1974; Biederman & Gerhardstein, 1994, Exps. 1 & 2) when presented with some exemplar object. This is a useful psychological task, because in many cases the problem is to recognize an object for some general functional purpose (e.g., in searching for a pen, we are generally interested only in knowing that something is a pen, not in whether or not we \know" the pen or have used it before). For faces, this basic level categorization task is simply the determination that a particular object is a face, and is obviously only the rst step. Other studies have used the term object recognition to refer to tasks that involve explicitly distinguishing among, recognizing, or matching individual instances of objects within a particular category (cf., Tarr, 1989; block con gurations; B ultho & Edelman, 1992; Edelman & B ultho , 1992; paper-clip and amoeboid-like objects, respectively; Biederman & Gerhardstein, 1995; Exps. 3, 4 &5; nonsense-objects2 , single volumes, and charm-bracelets). This is also a useful psychological task, because in many cases the problem is to recognize a particular object as Biederman & Gehardstein (1994) prefer the term \entry-level" categorization to basic level. 2 It is not clear that this is reasonably thought of as a class of objects, though they are nameless and have the appearance of wooden toys. 1

1


di erent learn-test transfer conditions. We think this latter approach has the potential to shed light on the relationship among representations of individual faces that have been created from different single views. At present, this is work in progress, and so we provide primarily an overview and glimpse of the interesting results to emerge thus far.

2.1 Results & Discussion Pose Condition Measures. The hit and false alarm

rates for each observer in each condition were assessed, and d's were computed from these rates. These d' data were submitted to a two-factor (3x3) analysis of variance (ANOVA), with learn pose (full face, three-quarter, and pro le) a betweensubjects factor and test pose (full face, threequarter, and pro le) a within-subjects factor. A main e ect of learn pose was found, F(2, 86) = 12.36, p .0001. A smaller, though signi cant effect of test pose was also found, F(2, 172) = 3.35, p .03. Finally, there was a highly signi cant interaction between learn and test pose F(4, 172) = 23.61, p .0001. The pattern of interaction means is displayed in Figure 1.

2 Experiment 1

<

Observers. Ninety volunteers roughly half male

and half female, between the ages of 18 and approximately 45 years old, were observers in the experiment. All were recruited from the University of Texas at Dallas (UTD) sta and student population. Stimuli. Seventy-two volunteers between the ages of 18 and approximately 40 years old from T ubingen, Germany volunteered to have their heads scanned by a CyberwareTM laser scanner. These produced a three-dimensional surface model of each head and a texture map, which is a standard RGB-image of the head that maps point-forpoint onto the head model. Most of the hair was digitally removed, leaving only a trace of the hair line in most scans. The face stimuli used here were Recognition Across Pose Change made by taking right and left full, three-quarter, and pro le views of the laser scans. 2 Figure 1: Recognition accuracy data as a function of Apparatus. All experimental events were conFulland Learn learn test pose in Experiment 1. trolled by a Macintosh Power PC 6100 pro3/4 Learn grammed using PsyScope. SeveralLearn points are worth noting from the data. Profile 1.5 First, the strong interaction between learn and Procedure. The experiment was a standard yes/no face recognition study varying the learn test pose is an indication that there is a cost in pose (full, three-quarter, pro le) and the test pose recognizing faces from a novel view. Second, as (full, three-quarter, pro le) and measuring recog- found in previous work (cf., e.g., Bruce & Valen1 learned tine, 1987, for a review), the three-quarter view nition performance as d' for discriminating and novel faces in each of the nine transfer condi- was the best view to learn. Part of this might tions. Observers were assigned randomly to one of be due to it being the central view of the three tested. However, this would not seem to account the three learning conditions and were instructed that they would view a series of faces 0.5 from a par- for the entire e ect, since the recognition of threequarter face views in the no view-change conditicular view and would be required to recognize the faces subsequently, possibly from a di erent tion, ( = 1 55), was markedly better than the view. Thirty-six faces were presented, one at a analogous no view-change conditions for pro le 0 viewed ( = 1 10) and full face ( = 1 16) views. Fitime, for 8 seconds each. Observers then all 72 faces and responded \old" or \new" using nally, for the most part, performance on learn-test labeled keys on the computer keyboard. Since we conditions and their inverses, (e.g., learning fullwere interested in getting measures of the recog- face ! testing three-quarter and learning threenizability of individual faces in all nine-0.5 conditions, quarter face ! testing full), were comparable with one exception. an elaborate counterbalancing scheme was imple- Full Test Learning three-quarter 3/4 Testface ! test- Profile Test mented to assure that d's for individual faces in ing pro le resulted much better performance than each condition could be based on equal numbers its inverse. One interpretation of this nding can be made as follows. It is possible that the pro le of presentations of the face as old and new. <

d'

<

d

d

2

0

0

:

:

d

0

:


View combinations

No-transfer Condition Correlation Coe cients

Learned Full & Tested Full correlated with Learned 3/4 & Tested 3/4 .35 (p .01) explained variance r2 = .12 Learned Full & Tested Full correlated with Learned Pro le & Tested Pro le .41 (p .01) explained variance r2 = .17 Learned Pro le & Tested Pro le correlated with Learned 3/4 & Tested 3/4 .34 (p .01) explained variance r2 = .115 <

<

<

Table 1: Correlations between the recognizability of faces in the no-transfer conditions.

View combinations

Transfer-Condition: full-face learned Correlation Coe cients

Learned Full & Tested Full correlated with Learned Full & Tested 3/4 Learned Full & Tested Full correlated with Learned Full & Tested Pro le View combinations

.25 (p

.05) explained variance r2 = .06

.18 (ns) explained variance r2 = .03

Transfer-Condition: 3/4-face learned Correlation Coe cients

Learned 3/4 & Tested 3/4 correlated with Learned 3/4 & Tested Full Learned 3/4 & Tested 3/4 correlated with Learned 3/4 & Tested Pro le View combinations

<

.36 (p

<

.01) explained variance r2 = .15

.26 (p

<

.05) explained variance r2 = .07

Transfer-Condition: pro le-face learned Correlation Coe cients

Learned Pro le & Tested Pro le correlated with Learned Pro le & Tested Full .18 (ns) explained variance r2 = .03 Learned Pro le & Tested Pro le correlated with Learned Pro le & Tested 3/4 .37 (p .01) explained variance r2 = .14 <

Table 2: Correlations between the recognizability of faces in transfer conditions when full-face was learned (top), 3/4-face was learned (center), pro le-face was learned (bottom).

3 Experiment 2

view, while not a bad view for recognition (as evidenced by the fact that its d' in the no-transfer case), is not a good view to transfer from.

The purpose of this experiment was to look at the role of view symmetry in recognition transfer. The design of this experiment was similar to Experiment 1 with only the following changes. Observers learned only three-quarter and pro le face views from one side of the face and were tested for recognition using full, three-quarter, and pro le face views of the other side of the face. Observers. Thirty-six volunteers roughly half male and half female, between the ages of 18 and approximately 45 years old were tested.

Stimulus Measures. The recognizability of each face in each pose transfer condition was assessed by collapsing data across observers and computing a d' for each face in each transfer condition. Two points are interesting. First, while statistically signi cant, the inter-correlations of face recognizability among the no-transfer conditions were quite modest, explaining only 17 percent of the variance in the very best case (see Table 1). A similar situation is seen for the transfer conditions (see Table 2). Thus, the di culty of recognizing individual faces in di erent views and view transfer conditions would seem to be only minimally related.

3.1 Results & Discussion Again the d' data were submitted to an ANOVA. No main e ect of learn pose, F(2, 34) = 3.01, p >

.05, or test pose was found, F(2, 68) 1. There was a signi cant interaction between learn and test <

3


pose F(2, 68) = 4.33, p .02. The pattern of interaction means is displayed in Figure 2. As can be seen, especially by contrast to Figure 1, there is a substantial cost associated with recognizing the symmetric view of the learned view. One possible alternative explanation of these results was that the observers in this experiment were in some way less motivated or less good at the task than the observers in the rst experiment. This alternative can be eliminated by comparing the learn-3/4 ! test full-face and the learn pro le ! test fullface conditions in Experiments 1 and 2. These two transfer conditions were duplicated exactly in the two studies. We included these two duplicate conditions, even though the full has no symmetric equivalent, because we wished to keep the di culty of the task and number of trials constant in both experiments. In any case, the duplicated conditions serve here also as controls on the comparability of the two data sets. The means for the Experiment 1 conditions are indicated with arrows on the Figure 2 graph. As can be seen, they are quite close, indicating that the observers in this second group were comparable to the observers in Experiment 1.

sented faces varying in view. They also found a trend, though not signi cant, for a similar symmetric transfer with identical stimuli to those used here, again in a perceptual matching task. The contrast may indicate an important di erence in the strategies available to observers when they are forced to remember a large number of faces when views are varied, versus when they are required only to discriminate pairs of individual exemplar faces. A second very important factor might be the quality of the match in symmetric views between texture-based and image-covered face views, which these data would indicate favors better head symmetry for the three-dimensional structure information than for the texture maps.

<

4 Summary

d'

These data replicate the ndings of past studies and also extend that work to begin to look at the relationship between the recognizability of individual faces in di erent transfer conditions. We think this latter approach has the potential to shed light on how related representations of individual faces can be when they are created from only a single view. An early indication of the utility of this approach was the

nding that re- Pose Chang Recognition Across Symmetric lationships between the recognizability of faces in Comparison the di erent conditions, explained with only smallExp. frac- 1 tion of the variance on the task. This suggests 2 that the quality and/or accessibility of the infor3/4 Learn mation used by observers to recognize the faces Profile (i.e., whatLearn makes them unique among the set of Exp. changes 1 - 3/4 Learn faces) markedly with view. Thus, faces 1.5 Exp. 1from - Profile Learn unusual one view may be quite typical from another, indicating that the view, rather that surface it suggests may dominate in the face code. An additional surprising result was the di culty 1 observers had in recognizing faces across symmetric view changes. While from an information point of view, it is clear that nearly any computational Figure 2: Recognition accuracy data as a function of model would be successful at a recognition task learn and test pose in Experiment 2. In all cases (exunder this transformation, human observers seem cept the full-face test cases), the learned 0.5 pose was on one side of the face and the test pose was on the other to be much less successful. The problem of symside. Arrows indicate the means for the two duplimetric transfer may be added to a list of other cate control conditions in Exp. 1, indicating that the equally easy problems for computational models large accuracy cost with symmetric views 0cannot be of faces that people seem to fail at. Examples attributed to less motivated, or accurate observers in include the well-known di culties people have in this second study. recogning faces in the photographic negative and This result is interesting to compare to data by in recognizing upside-down faces. These simple Troje and B ultho (1995), who found-0.5 evidence for failures may be informative about the nature of the human reasonably good transfer to a symmetric view us- Full Testface code. 3/4 Test Profile Test ing these stimuli without texture maps (i.e., pure shaded busts of the heads) in conjunction with a perceptual matching task for successively pre4


References

[1] Bartram, D. J. (1974). The role of visual and semantic codes in object naming. Cognitive Psychology, 6, 325-356. [2] Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115-147. [3] Biederman, I., & Gerhardstein, P. C. (1993). Recognizing depth-rotated objects: Evidence and conditions for three-dimensional viewpoint invariance. Journal of Experiment Psychology: Human Perception and Performance, 19(6), 1162-1182. [4] B ultho , H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc Natl Acad Sci USA, 89, 60-64. [5] Edelman, S., & B ultho , H. H. (1992). Orientation dependence in the recognition of familiar and novel views of three-dimensional objects. Vision Research, 32(12), 2385-2400. [6] Edelman, S., B ultho , H. H. & Weinshall, D. (1989). Stimulus familiarity determines recognition strategy for novel 3D objects. it AI Lab. Tech Report 1138, Cambridge: MA. [7] Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99(3), 480517. [8] Shapiro, P. N., & Penrod, S. D. (1986). Metaanalysis of face identi cation studies. Psychological Bulletin, 100, 139-156. [9] Tarr, M. J. (1989). Orientation dependence in three-dimensional object recognition. Unpublished doctoral dissertation. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA. [10] Tarr, M. J., & Pinker, S. (1991). Orientationdependent mechanisms in shape recognition: further issues. Psychological Science, 2(32), 207209. [11] Tarr, M. J. & B ultho , H. H. (1995) Is human object recognition better described by geonstructural-descriptions of by multiple views? Journal of Experimental Psychology: Human Perception and Performance. [12] Troje, N. & B ultho , H. H. (in press). Vision Research.

5


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.