Susanne Kaiser & Thomas Wehrle
Université de Genève, FPSE, Section de Psychologie
9, route de Drize
1227
Genève-Carouge
KEY WORDS: automatic coding of facial behavior, computer-games, cognition, emotion, autonomous agents, neural networks, simulation.
Besides these communicative functions, nonverbal behavior and especially facial behavior has expressive functions, indicating how the person feels in a given situation. Facial expression can provide information about the current affective state of a person, his or her more enduring mood, and some cognitive activity like concentration or boredom. Facial behavior may even provide information about temperament and personality, like shyness or hostility [EKM 78].
Studying the functions of facial expressions is not only of interest in the domain of psychology and other social sciences but also in the domain of computer science and artificial intelligence. Takeuchi and Nagao [TAK 92] for instance developed a computer interface, to which synthetic facial displays had been added. They attempt to use these facial displays as a new modality that should make the interaction more efficient, while lessening the cognitive load. According to these authors it is of central interest for current and future research to provide computer systems with synthetic facial expressions. In addition, Bichsel and Pentland [BIC 93] point out that automatic interpretation of gestures and facial expressions would also improve man- machine interaction. Up to now, the information flow from man to machine is restricted to moving a mouse, pressing buttons, and typing character sequences on a keyboard. They therefore developed a system that classifies head movements as "yes" (nodding head) or "no" (shaking head).
As psychologists, we can use human- computer interactions and interactive computer games in order to study the dynamics of emotional episodes in a more interactive manner than it is usually done in classical experimental settings. One traditional approach to study emotions is by asking people to remember as vividly as possible a situation where they experienced a certain emotion [MCH 82]. Another approach tries to evoke emotions with the aid of selected video sequences that are judged as emotionally arousing [CRA 86; HES 92]. Although most researchers in this domain admit that emotions are dynamic and often interactive processes, embedded in a physical and situational context, these dynamic properties of emotion episodes are not included or studied in these experimental settings. There is no interaction since the subject is either looking at a film as an observer or has to remember an emotional situation. In retrospect the subject has to give a single global label for this emotion. Obviously, in real interactions we are neither observers nor do we have to label our emotional experiences: in real interactions we are involved and we have to act and react.
In a computer game setting we have an experimental situation which is interactive. The player is involved and not only an observer. We can continuously register the game context and the manipulations of the subject. Additionally, the recording of subjects' facial expressions can serve as an indicator of his or her emotional involvement. In this way the computer game provides a relatively small but complete context for the interpretation of the internal emotional and cognitive regulatory processes. This is an important point, because what is felt by the subject and what a certain facial expression means is often very context specific. Our computer-game setting can be compared to Toda's idea of a microworld, an idea that he has already formulated with his Fungus Eater game in the early sixties [TOD 82].
In the following we will focus on three aspects of our work on emotion. In the first part we present a tool that allows to code facial expressions automatically, the Facial Expression Analysis Tool. In the second part we describe the theoretical framework of the microworld paradigm. In the third part we discuss how this approach can be used as a test-bed for emotion models, where an autonomous agent is implemented in a similar microworld as used in the computer game. We will also show how such an autonomous agent approach might help solving some problems of modeling behavior in psychology as well as addressing some pertinent problems in traditional AI.
There exist some attempts to develop an automated method of facial expression measurement , but our method currently represents the only applicable system for automated coding of facial behavior and uses FACS as classification scheme.
In recent years some progress has been made on the problem of face detection and recognition [e.g. BIC 93]. Although the problem of face recognition seems to be quite similar to the problem of facial expression recognition there are some important differences. First, in order to identify a face we have to look for individual specifities of a given face, second, a face recognition system should be insensitive to facial expressions. Contrary to this, a system that automatically measures facial expressions should disregard individual specifities, be independent of differing physiognomies, and look for what is common in e.g., all "smiles".
The performance of FEAT is comparable to a human expert and the combination of a fuzzy rule base and an artificial neural network has proven to be appropriate in dealing with the multiplicity and complexity of facial behavior. The system has some advantages compared to more classical symbolic as well as pure connectionist approaches. We tested those more traditional approaches as possible alternatives, but both lead to some specific problems.
The other alternative was to use traditional connectionist systems, like a back propagation approach [MCC 86]. Again, this was not feasible, given the large training sets that would have been necessary for a good classification of the many Action Units and combinations. A critical problem was also the poor ability to generalize over several individuals.
Actually we were able to use these nets to classify the facial behavior of a certain person, but for that we first had to produce a person dependent training set by hand, what made the advantage of the automation obsolete. In the probabilistic approach presented, using a language for expressing fuzzy rules and an artificial neural network, we managed to make the classification mechanism independent of individual physiognomies [KAI 92]. We exploit the expertise that we have to define the initial structure of the net. The weights of the connections between the nodes are not randomly set in the beginning as in back propagation nets that have to learn all from scratch. Before characterizing in more detail the network architecture, we will first describe the concrete procedure of the coding process.
The concrete coding procedure comprises three main phases. The first phase includes the digitization of the video sequence together with the pattern recognition and the preparation of the data for the neural network. The second phase covers the classification of this information with an artificial neural network. In a third phase the results can be presented with more or less detail in different ways, depending on the focus of interest.

Figure 1. Graphical user interface of the Facial Expression Analysis Tool.
Compared to the first version of the automated coding method [see KAI 92], FEAT handles all these phases within a convenient graphical user interface (see figure 1). The different steps of the coding procedure, as well as the video control and the video screen can be visualized and processed in the same environment. In the following we will give a short description of the three main phases.

Figure 2. User interface with integrated digitized video image.
{ Action Unit 1 }
AU1 : = AU1_Int[50] and AU1_W[100].
AU1_Int : = AUR1_Int and AUL1_Int.
AUR1_Int : = P3_Dist.
AUL1_Int : = P4_Dist.
AUR1_W : = P3_S2[80] or P3_S9[20].
AUL1_W : = P4_S2[80] or P4_S3[20].
AUR1 : = AUR1_Int[50] and AUR1_W[100].
AUL1
: = AUL1_Int[50] and AUL1_W[100].
Table 1. Rule for Action Unit 1

Figure 3. A concrete network structure, automatically derived from a fuzzy rule base.
The resulting network is a special kind of a multilayered feed- forward net [RUM 86]. Figure 3 shows an example of such a network. The nodes describing the distortion of the dot pattern for one still picture form the input layer. The nodes of the output layer represent Action Units and Action Unit combinations. During simulation the pattern information of a continuous video sequence is fed into the net with a resolution of 25 frames per second.
A special learning algorithm allows to improve the scoring capability of the net. This is achieved by giving the net examples of correct scorings. What the network learns is transparent to the user. The modified knowledge can be retrieved from the net in terms of new rules. Thus, controlled improvement and extension of the knowledge base is possible. This is an important advantage of this algorithm and this network architecture compared e.g. with learning algorithms like back- propagation. The network simulator is integrated in the FEAT system. A full network browser and network editor with graphical interface are also available. This allows to see and change the topology of a network, the inputs, activations, outputs of the nodes of this network, and the strength of the connections between these nodes for experimental reasons. Nodes can be addressed or traversed by using the cursor keys or the mouse. A full description of this tool would go beyond the scope of this paper but the functionality is comparable to commercial network tools.

Figure 4. One frame out of a sequence of facial behavior.
In order to examine all the details of a snapshot of a sequence, we can use a single- step view, frame by frame (see figure 4). Along the x- axis we find the repertoire of Action Units that are included in the knowledge base of the net. The height of the bars represents the activation of output-nodes (Action Units). Activation is a measure of the probability that a certain Action Unit is present. Activation depends upon whether the dots relevant to the Action Unit (i.e. mentioned in the rules) have moved in the proper direction and to what extent they have moved. These two components are separated in the left-hand part of the bar. This allows us to get more information on intensity.
On the right- hand part of the bar, vector information on the right and left sides of the face are presented separately. This shows whether or not an expression was produced symmetrically. The automated method thus measures directly two relevant aspects of the temporal organization of a facial expression - intensity and symmetry. This represents an important advantage of the automated method, as the coding of these aspects in slow-motion film by experts is difficult and very time- consuming.
Another representation form for the results visualizes the intensity information and the dynamics of facial expression. Figure 5 shows the distribution of Action Units over a period of 12 seconds. Similar to the tracing of an electroencephalograph, the intensity of Action Units can be seen in the horizontal width of the bars. We can see the onset and offset of an Action Unit as well as the duration of the apex. This gives a direct impression of the dynamics of facial behavior.

Figure 5. The dynamics of a facial sequence.
In summary, our method currently represents the only applicable system for automated coding of facial behavior. Besides the desired information about the FACS scorings this method gives us more precise information on estimated probability, intensity, and symmetry. The performance of our system is comparable to a human expert [KAI 92]. The overall agreement between automated scorings and human expert scorings that we found in a reliability study was .73, which exceeds the FACS Final Test criterion of .70 [EFR 78].
The method has been developed explicitly to analyze facial behavior in human- computer interactions. In the following we will discuss why we use interactive computer games as an experimental setting in emotion psychology.
The advantages of using a self programmed computer game are not only that all situations and interventions can be registered, but that the scenarios of such a computer game can be designed theory driven and that the game parameters can be varied systematically.
Theories about emotion antecedent appraisal processes lend themselves to this purpose [e.g. FRI 87; OAT 87b; ORT 88; REI 90; ROS 91; SCH 84; SMI 85; SOL 76; WEI 86]. These so called cognitive emotion theories or appraisal theories of emotion make concrete predictions concerning what kind of situational parameters are important as antecedents of emotions. There exists a considerable degree of convergence between the different appraisal theories, especially with respect to the central dimensions postulated in the different approaches.
Some of the appraisal dimensions that are mentioned by most of the theories are e.g. novelty, pleasantness, desirability, controllability, power, expectedness, suddenness, and intentionality. Within a game these dimensions can be changed systematically. The dimension of general control for instance can be manipulated in a way that in one situation the event occurs by chance, i.e. the control is low; and in another situation the player can avoid risky areas or behavior, i.e. control is high. In a low power situation, the player is in a weak position and has only limited resources to deal with the event. In a high power situation he is in a strong position and has ample resources to deal with the event. Beside the substantial convergence in the field of appraisal theories, authors do differ with respect to the exact number and definition of appraisal dimensions. The question of how many and which appraisal criteria are minimally needed to explain emotion differentiation is one of the central issues in research on emotion-antecedent appraisal. Authors differ also with respect to their predictions about which pattern of appraisal results is likely to produce a particular emotion.
These differences between appraisal theories are often rather subtle and cannot be tested with post hoc evaluations. With the design of such game situations we can study these differences by varying the game parameters systematically. The empirically found relations between appraisal manipulation and emotions can be used to test the theoretical assumptions and to decide between opposing hypotheses. E.g. there are different hypotheses concerning the necessary and sufficient antecents of anger. One hypothesis says that anger is always the result of the appraisal of an event as goal-obstructive and due to someone else's blameworthy intent. Another hypothesis says that intentionality is not a necessary antecedent of anger. To test these opposing hypotheses within the compute- game, we can use two classes of objects, namely passive and active objects, that both can hinder the player. In the case of an active object, i.e. another agent, the behavior can be presented in such a way that the agent is clearly responsible and intends to act in this way. In the case of a passive object no clear assignment of responsibility nor attribution of intentionality can be made.
These examples for possible concrete questions that can be studied within this computer game interaction can be characterized as belonging to a hypotheses testing research approach. Beside this more or less classical research objectives, our long- term goal is to develop an integrative and ecological model for studying the interactive and intrapsychic aspects of situated behavior in real or artificial environments. Starting from the theoretically postulated appraisal dimensions and the occurring behavior patterns (e.g. facial behavior, game strategies) a theoretically and empirically based repertory of emotions, with the corresponding behaviors, and the situational contexts could be developed.
In psychology, computer simulation is a well known but only rarely used research instrument. Nevertheless, current streams in emotion psychology which aim to integrate emotion research in an interdisciplinary Cognitive and Affective Science place particular emphasis on computer models [e.g. BOW 82; FRI 87; OAT 87b; ORT 88; PFE 85; SLO 81]. In these models, emotions are implemented as black-boxes, i.e. one wants to have a program that behaves in such a way as it is predicted by the theoretical assumptions, but the underlying processes are not explicitly modeled. However, the goal is not only to predict the behavior, but also to describe the postulated underlying mechanisms that produce the respective behavior. A mechanism for the organization of behavior for instance should not only explain what behavior occurs but also how this can be achieved. With this approach we hope to explore the usefulness of psychological constructs like emotion for autonomous agent design. We also hope to be able to make emotion theories more concrete and better testable. The pseudo-empirical results of the simulations can be used to generate new hypotheses and to further develop the theoretical model. In order to validate the implemented model empirically, one can compare the simulated behavior with the empirical data that are found in the computer game interaction .
"You are a remote control operator of the robot miner nicknamed 'Fungus Eater', sent to a planet called Taros to collect uranium ore, which uses wild fungi growing on the surface of the planet as the main energy source for its biochemical engine. The uranium ore and fungi are distributed over the land of Taros, and little is known about the mode of their distribution. As the operator you can control every activity of the Fungus Eater, including the sensitivity of the fungus- and uranium-detection devices. All the sensory information the robot obtains will be transmitted here and displayed on this console so that you will feel as if you are the Fungus Eater itself." [TOD 82, p. 95].
As an extension to this early Solitary Fungus Eater, Toda developed a model of an Social Emotional Fungus Eater. For Toda the Fungus Eater clearly becomes emotional when the necessary mechanisms are introduced for functioning in a wild, changing, and not always predictable environment, for which the human emotional system was obviously originally designed. These additional mechanisms, called urges (fear, anxiety, anger etc.), are activated once a situation has been identified as being relevant to some vital concern. Although Toda never implemented the Fungus Eater, he clearly had and has this goal in mind.
A microworld approach seems to be of special interest as a test-bed for emotion models, especially if we are able to implement such a model in a computer simulation. But as Pfeifer [PFE 88] mentioned in a survey of AI models of emotions, what is missing and urgently required now are pertinent tools. Wehrle [WEH 94] has developed such a tool, the Autonomous Agent Modeling Environment (AAME) that is used for the modeling of the described emotional problem solver. In a first implementation of a Social Fungus Eater this tool proved to be able to generate the postulated behavior and the experiments led to new ideas of how the model could be improved or extended in respect to the mentioned appraisal theories.
Autonomous Agent research tries to overcome some fundamental problems of traditional symbolic AI, which we can only mention in the scope of this article, that is the symbol grounding problem [HAR 90], the frame problem [PYL 87], the frame-of-reference-problem [CLA 89], and the situatedness and embodiedness of intelligent systems that deal with a constantly changing environment [SUC 87] (for a survey see [VER 93]). For us, this more biologically inspired AI is also a more psychologically plausible AI. Other psychologist, as e.g. Oatley [OAT 87a] also notice the link between psychology and autonomous agent research, especially with respect to emotion research:
"It seems certain that, as we understand more about cognition, we will need to explore autonomous systems with limited resources that nevertheless cope successfully with multiple goals, uncertainty about environment, and co-ordination with other agents. In mammals, these cognitive design problems seem to have solved, at least in part, by the processes underlying emotions." [OAT 87a, p. 211].There already exists a considerable variety of models and architectures for Autonomous Agents [e.g. BRO 86; MAE 90; VER 92]. The emphasis in these architectures is on a more direct coupling of perception to action, distributedness and decentralization, dynamic interaction with the environment and intrinsic mechanisms to cope with resource limitation and incomplete knowledge (for an overview see [MEY 90] and [VAR 92]).