INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION, 12(3&4), 441–459 Copyright © 2000, Lawrence Erlbaum Associates, Inc.
Capturing Design Space From a User Perspective: The Repertory Grid Technique Revisited Marc Hassenzahl Usability Engineering User Interface Design GmbH
Rainer Wessler Department of Psychology University of Osnabrück
The design of an artifact (e.g., software system, household appliance) requires a multitude of decisions. In the course of narrowing down the design process, “good ideas” have to be divided from “bad ideas.” To accomplish this, user perceptions and evaluations are of great value. The individual way people perceive and evaluate a set of prototypes designed in parallel may shed light on their general needs and concerns. The Repertory Grid Technique (RGT) is a method of elucidating the so-called personal constructs (e.g., friendly–hostile, bad–good, playful–expert-like) people employ when confronted with other individuals, events, or artifacts. We assume that the personal constructs (and the underlying topics) generated as a reaction to a set of artifacts mark the artifacts’ design space from a user’s perspective and that this information may be helpful in separating valuable ideas from the not so valuable. This article explores the practical value of the RGT in gathering design-relevant information about the design space of early artifact prototypes designed in parallel. Ways of treating the information gathered, its quality and general advantages, and limitations of the RGT are presented and discussed. In general, the RGT proved to be a valuable tool in exploring a set of artifact’s design space from a user’s perspective.
1. INTRODUCTION To design an artifact (e.g., software system, household appliance) is a constant problem-solving and decision-making process. In the course of this process, the number of possible alternatives is narrowed down until a final design is reached. A We thank the MediaPlant project group, especially Stefan Hofmann, Alard Weisscher, Jochen Klein, and Tobias Komischke for designing and implementing the prototypes used in this study. Thanks also to Florian Sarodnick for his support. Requests for reprints should be sent to Marc Hassenzahl, User Interface Design GmbH, Dompfaffweg 10, 81827 Munich, Germany. E-mail: marc.hassenzahl@uidesign.de
442
Hassenzahl and Wessler
multitude of decisions have to be made, revolving around the general purpose of the artifact, its “context of use” (Bevan & Macleod, 1994) and connected trade-offs and arguments (see Moran & Carroll, 1994, for an overview). Taking all design-driving information together, this “bundle” can be thought of as an artifact’s design space, thereby implying something that can be charted and explored. Several frameworks and methods have been proposed to capture design space. Design Space Analysis (MacLean, Young, Bellotti, & Moran, 1991), for example, provides a means for designers to make the options they have and the decisions they make explicit by an analytical effort—an “act of reflection” (Carroll & Moran, 1991, p. 199). Carroll and Rosson (1991) took a slightly different approach in the framework of Claims Analysis. They proposed extracting psychological claims from an artifact, which represent testable assumptions (i.e., empirical hypotheses) about the artifact’s design rationale. What these approaches have in common is the primarily analytical perspective on design space. A more empirical way to explore design space, especially appropriate for novel artifacts, is parallel design (Nielsen, 1993) with a subsequent evaluation phase. In parallel design, several designers are asked to work out design solutions for an artifact with a certain purpose. Each designer has to work on her or his own, to ensure maximum heterogeneity of the single solutions. The basic assumption is that by combining the valuable ideas embodied in the single solutions, a new superior solution can emerge. Whether an idea is valuable or not must be confirmed by subsequent evaluation. This evaluation yields the crucial information to guide further design. A wealth of user-based usability evaluation techniques is available, such as questionnaires (e.g., “IsoMetrics”; Gediga, Hamborg, & Düntsch, 1999), interviews and/or usability testing methods (e.g., “Thinking Aloud”; Jørgensen, 1989; also known as “Verbal Protocol Analysis”; Ericsson & Simon, 1984; or “Cooperative Evaluation”; Wright & Monk, 1991). All these methods can be considered as varying in the amount of predetermined structure that they impose on the data acquisition and analysis process. The major advantages of prestructured approaches (e.g., questionnaires) are their robustness, in the sense of reliability and objectivity, and their efficiency. One major drawback is their insensitivity to topics, thoughts, and feelings—in short, information—that do not fit into the predetermined structure. This is especially problematic if there is a general lack of knowledge about the topic to be researched. For example, recently “fun” is considered as an important software or product requirement (Draper, 1999; Hassenzahl, Platz, Burmester, & Lehner, 2000). However, without an a priori notion of fun as an important aspect of software acceptance and an idea how to define it, prestructured methods will inevitably fail. They simply lack openness to new, yet unconsidered topics. Another important drawback of prestructured approaches is their tendency to produce data that is of low practical use in a design process. Carroll (1997), for example, argued that “formal experiments [a very structured approach] are fine for determining which of two designs is better on a set of a priori dimensions, but they are neither flexible nor rich enough to guide a process of continual redesign” (p. 504). Unstructured methods (e.g., open interviews) in general have the required openness and the potential to produce design-relevant data, but this advantage is again
Capturing Design Space
443
accompanied by major drawbacks. First, a lack of predefined structure requires a lot more effort to be put into the actual analysis of the data obtained. Often, hours of interviews have to be transcribed, coded, and integrated—it is a “complex, labor-intensive and uncertain business” (Banister, Burman, Parker, Taylor, & Tindall, 1994, p. 49). The same holds true for the qualitative analysis of video protocols from usability testing sessions. Second, serious issues of objectivity and reliability arise, which touches on one of the core issues in the more or less philosophical argument between protagonists of a quantitative-oriented versus qualitative-oriented research tradition (see Buur & Bagger, 1999; Hassenzahl, 1999). To summarize, the user-based evaluation of artifacts in a parallel design situation requires an efficient but open method that produces data rich and concrete enough to guide design. None of the traditional methods seems to satisfy all those requirements at once. The obvious problems with popular user-based evaluation methods lead us to consider the Repertory Grid Technique (RGT; Kelly, 1955) as a possible candidate method for capturing design space from a user’s perspective. The RGT makes it possible to understand an individual’s personal (i.e., idiosyncratic) construction of her or his environment (e.g., artifacts, other persons). It avoids some of the problems just discussed. Despite this, as a method for comparing or evaluating different artifacts, it is somewhat out of fashion. With its high point around the 1980s (with a whole issue of the International Journal of Man–Machine Studies devoted to the topic; Shaw, 1980), the RGT remains popular as a knowledge acquisition tool (e.g., Gaines & Shaw, 1997) and the results proved helpful for various purposes, such as structuring hypertexts (Dillon & McKnight, 1990). The objective of this article is to present the RGT as a method of capturing design space (i.e., design-relevant information) from a user’s perspective. First, RGT and the rationale for using it in the context of artifact design are described. Second, the RGT is applied to a set of simple prototypes designed in parallel. We present examples of how different types of information can be extracted from the data, thereby proposing a possible procedure for treating the obtained data. This procedure comprises three steps—charting the design space, exploring and understanding the design space, and abstracting. We investigate whether it is possible to abstract from the idiosyncratic perspectives to identify underlying topics relevant for the artifact to be designed. The major advantage of the latter may be a possible stimulation of theory development (Carroll, Singley, & Rosson, 1992). Furthermore, we attempt to assess the quality and usefulness of the obtained data. Third, the advantages of RGT, as well as the limitations, are discussed.
2. USING RGT TO BRING DESIGN SPACE TO LIFE The RGT (Kelly, 1955) originally stems from the psychological study of personality (see Banister et al., 1994; Fransella & Bannister, 1977, for an overview). Kelly assumed that the meaning we attach to events or objects defines our subjective reality, and thereby the way we interact with our environment. The idiosyncratic views of individuals, that is, the different ways of seeing, and the differences to other indi-
444
Hassenzahl and Wessler
viduals define unique personalities. It is stated that our view of the objects (persons, events) we interact with is made up of a collection of similarity–difference dimensions, referred to as personal constructs. For example, if we perceive two cars as being different, we may come up with the personal construct fancy–conservative to differentiate them. On one hand, this personal construct tells something about the person who uses it, namely his or her perceptions and concerns. On the other hand, it also reveals information about the cars, that is, their attributes. From a design perspective, we are interested in differences between artifacts (i.e., the cars in our example) rather than differences in the individual, thus we intend to focus on what the personal constructs of a group of individuals might tell us about the artifacts they interact with. The differences between artifacts, manifest in the personal constructs a group of individuals comes up with, is the design-relevant information that should bring design space to life. The RGT is a method of extracting personal constructs in a systematic way. In a first step, an individual is presented with a randomly drawn triad from a group of artifacts that populate design space. He or she is asked in what way two of the three are similar to each other and different from the third. This induces a search and the production of an appropriate personal construct that accounts for a perceived difference. The personal construct found is named (e.g., playful–serious, two-dimensional–three-dimensional, ugly–attractive) and the whole process is repeated until no further novel constructs arise. The result is a kind of semantic differential solely based on the idiosyncratic view of the individual. In a second step, the individual is asked to rate all artifacts on her or his personal constructs. The result is an individual-based description of the artifacts based on differences amongst them. The RGT may have several advantages. First, it is a structured approach, but nevertheless open to the idiosyncratic views of each individual. It captures the way individuals construct the design space populated by artifacts. Second, it is more efficient than comparable unstructured approaches. To focus on the personal constructs as data denotes a significant reduction in the amount of data to be analyzed (hopefully without severe reduction in meaningful content). This is especially important in the context of parallel design, where a large number of design alternatives are favorable. Third, personal constructs may have the potential to be designrelevant data. The whole approach is likely to generate different views on the artifacts, embodying various individual needs and concerns in relation to the artifact. Fourth, the basic method lends itself to the application of almost any set of artifacts. These (envisioned) advantages form the rationale for using the RGT in a parallel design situation. In the remainder of the article an application of the RGT is presented and the results are discussed.
3. AN APPLICATION OF THE RGT IN A PARALLEL DESIGN SITUATION 3.1. Method Participants. A total of 11 individuals (6 women, 5 men) participated in the study. They were mainly recruited among Siemens employees; most of them had
Capturing Design Space
445
responded to a public announcement in the canteen. Their job background was heterogeneous and covered nontechnical backgrounds (e.g., sports student, designer) as well as technical backgrounds (e.g., software developer, network administrator). The sample’s mean age was 34 years (Min = 22, Max = 54). Computer expertise was assessed by a five-item questionnaire and varied from moderate (3 participants) to high (8 participants).
Artifacts. In a parallel design session, we asked students of visual, industrial, and ergonomic design to design and implement seven different artifacts (i.e., prototypes). These prototypes should serve to fulfill the same simple, yet realistic work-related task—to switch off a pump in an assumed industry plant control room. This required at least the following steps: selecting the pump, switching it off, and an action confirmation (i.e., safety check). The shutting-down of the pump required some time. It was left to the students whether this process was visualized or not. The whole parallel design session was part of a larger project concerned with designing innovative control room interfaces. The student designers were given no restrictions about prototype form and interaction style in advance. The students were encouraged to work out solutions according to what they found appropriate or interesting. Color Plates 17 and 18, Figure 1 show the prototypes. Although each prototype allowed the user to accomplish the same task, they strongly varied in design and interaction style. Multiple design dimensions were varied (e.g., colors, metaphors). Six out of the seven prototypes had animated parts. As long as the predominant design principle of parallel design is heterogeneity, flaws in the visual and ergonomic design were not corrected. A former study (Hassenzahl et al., 2000) using the same prototypes showed that they varied considerably in appealingness, perceived ergonomic quality (i.e., task-related quality aspects), and hedonic quality (non-task-related quality aspects). From these results, it can be tentatively concluded that the design principle of heterogeneity was met.
Additional measures. In addition to the personal constructs, appealingness rankings of the prototypes were obtained from each participant. The overall rank order of the prototypes was based on the sum of each prototype’s individual ranks.
Procedure. Each participant was led into the laboratory separately. After a short introduction, the participant was seated in front of a 30-in. CRT that showed small pictures of the seven prototypes in a random order. The whole procedure consisted of three parts—introduction, extraction, and assessment. 1. In the introduction part, the participant was instructed to familiarize himself or herself with the prototypes. Each prototype embodies the task of switching off a pump. To accomplish this, the participant had to select the running pump with the
446
Hassenzahl and Wessler
mouse and was then asked whether he or she really wanted the pump to be switched off. After a confirmation and a safety check, the pump was switched off. Once the participant was convinced that the pump was coming to a halt she or he was to inform the experimenter. The interaction per prototype lasted approximately 2 min. After getting familiar with all seven prototypes the participant was asked to rank order the prototypes according to their appealingness. This rank ordering was followed by a short break. 2. In the extraction part of the procedure, three of the seven prototypes (i.e., a prototype triad) were randomly chosen and displayed on the screen. The participant was asked to find a dimension (i.e., personal construct) in that two of the three prototypes was similar (i.e., inclusive construct-pole) but differed from the third (i.e., exclusive construct-pole). He or she was then required to label both poles in a way that expresses the intended dimension as brief and clear as possible. After labeling, a difference dimension a new triad was presented. This part of the procedure was repeated until the participant was unable to state a construct he or she did not mention before. 3. In the assessment part, the participant was asked to evaluate each prototype on her or his personal constructs by using a scale ranging from 1 (inclusive construct-pole) to 5 (exclusive construct-pole). Demographics and computer expertise had been assessed at the end of each session. The whole session took about 1 hr and 15 min.
3.2. Results and Discussion The following sections not only present our findings, but also their sequence can be viewed as an example for the stepwise exploration of RGT data. Three steps were suggested: • Charting the design space: The first step is to visualize the relations among the prototypes; that is, to create a “map” of design space. • Exploring and understanding design space: Based on the map relations between single prototypes (i.e., pairs of prototypes) can be further explored. This exploration may yield detailed design-relevant information. • Abstraction: Underlying topics made visible: An abstraction from the results obtained can promote a deeper understanding of the underlying topics. This may prove helpful for solving future design problems. Moreover, it may stimulate the development of theories of the design of artifacts.
Charting the Design Space The RGT yielded 170 personal constructs, with a median of 15 constructs per participant (Min = 9, Max = 29). Before we consider the obtained constructs in detail, we attempt to visualize the relations among the prototypes, to create a map of the design space. To accomplish this, we calculated Euclidean distances between
Capturing Design Space
447
the prototypes based on differences in the assessment of each prototype on the personal constructs (see Procedure section). The resulting distance matrix was then submitted to a Pathfinder Network Analysis (Knowledge Network Organizing Tool [KNOT], 1992; Schvaneveldt, 1990; Wandmacher, 1993). The Pathfinder algorithms seek to determine a two-dimensional representation of a distance matrix in space, with nodes representing objects and links representing relations (i.e., similarity) between objects. Figure 2 shows the map of design space for the seven prototypes. The figure shows that the Windows-like prototype is a central node, connecting all other prototypes. Actually, this reflects in part the way the prototypes were designed. The designer who produced the Windows-like prototype was most knowledgeable about the domain. The other designers referred to him as an important source of information during the design. In a way, his design became a blueprint of the other designs. One may argue that a strong recommendation of parallel design, namely to have the designers work separately from each other (Nielsen, 1993), was not fully taken into account. Conversely, the Windows-like prototype’s central role may be simply a product of it most purely representing the task to be accomplished by the participants. Regardless of which interpretation holds true, it is astonishing that the central role of the Windows-like prototype, which was more or less implicit, was perceived by the participants, that is, is evident in the data. From the central Windows-like prototype three different branches extrude. Branch 1 consists of the prototypes blue and comic. Blue adapts the general layout from Windows-like, but presents it in a more visually designed way. A dialog flag extruding the pump symbol replaced the dialog box. Moreover, the general color scheme was changed from Windows-gray to a dark and intensive blue. Comic still
FIGURE 2 “Map” of design space (derived from a Pathfinder Network Analysis with r = 4, q = n – 1). Nodes represent the prototypes. Links show the relations between prototypes.
448
Hassenzahl and Wessler
draws on the general layout, but introduces a surprising dialog element (i.e., the comic figure holding up a dialog sign) and fun-related design. Taken together, Branch 1 may represent a transition from a basically technology-oriented, well-known design to a more appealing, surprise- and fun-related design. Branch 2 consists of the prototypes game-like, cube, and real. Again game-like is the prototype that draws on the general layout of the Windows-like prototype. It differs by introducing dimensionality. The representation is changed from the two-dimensional to an isometric representation, similar to the way a certain genre of computer games present themselves (e.g., StarCraft, 1998; Weisscher, 1999). This dimensionality is further supported by the use of three-dimensional rendering. Cube sacrifices the general overview provided by Windows-like and game-like. It presents itself in a close-up view. Although the representation of the pump is still an abstraction of a real pump, it looks more graspable and real than in the game-like prototype. The shiny, metal-like surface and the solid, animated, three-dimensional dialog cube deepens this impression. Real takes this impression a little further by introducing a zooming in from an overview of the plant layout to a close-up of the pump. The actual zoom is presented as a camera flight through space. The pump is modeled after a real pump (but not necessarily one used in an industrial context), with knobs to switch it on and off, a flap hiding an action confirmation, and a round indicator for its status. Furthermore, the pump was visually vibrating to show that it is running. To summarize, Branch 2 may represent the transition from an abstract, two-dimensional design to a more reality-based, three-dimensional design. Branch 3 consists of the “animated� prototype. This prototype’s design is quite close to the Windows-like. It mainly differs in the fact that the pump icon itself is animated (it turns when it is running) and that the actual process of shutting down the pump is represented by signals running down the line connecting the dialog box with the pump icon (see Color Plate 18, Figure 1). In short, Branch 3 may represent a transition from the still to the strongly animated. The map of design space is helpful for getting a first idea how the participants perceive the prototypes. It visualizes similarities and dissimilarities (i.e., relations) between prototypes apparent in the data. Nonetheless, it is descriptive in nature. In other words, it will not help to distinguish good from bad ideas. To overcome this limitation, we might combine the map of design space with the overall appealingness ranking. In the map (see Figure 2), the circles attached to the prototype nodes show the appealingness rank for each prototype (based on the sum of the individual ranks). Low numbers indicate a higher degree of appealingness and high numbers a lower degree of appealingness. Apparently, the blue and Windows-like prototypes are the most appealing, whereas the real, comic, and animated prototypes are the least appealing. Noticeably, the end nodes of the three branches (comic, real, animated) are consistently perceived as the least appealing. A conclusion from this result could be that fun-oriented design (Branch 1), three-dimensional reality-based design (Branch 2), and animations (Branch 3) simply do not appeal to the participants (at least in a technology-oriented domain). From our perspective, this conclusion is oversimplified. It seems more likely that it is not the design elements per se (e.g., fun-oriented design,
Capturing Design Space
449
reality-based design) that are unappealing, but their extremity. For example, looking at Branch 1 the minor changes in design from the quite common Windows-like to the more unusual blue prototype are positively received. This is reflected by the higher appealingness rank of the blue prototype. Blue may introduce some novelty and good design solutions, which add value to the prototype. However, by going a step further to the comic prototype, appealingness drops dramatically. Indeed, the comic prototype presents itself in a fairly extreme, fun-oriented way, being quite different from the central Windows-like prototype—perhaps too different. This interpretation is consistent with Dreyfuss’s (1955; cited in Carroll, 1997) recommendation of introducing new functions through familiar “survival forms.” This idea rests on the observation that people are often bewildered by unnecessary novelty. In fact, the interpretation of the appealingness ranks in combination with the map of design space is difficult and close to speculation. The simple knowledge of differences (relations, respectively) between prototypes and their appealingness does not seem sufficient to understand those differences. An understanding of the nature of these differences is necessary to substantially guide design. To be more precise, concrete, design-relevant information is needed to understand the design space of a set of artifacts. In the next section, we describe how the map of design space we just created can guide the extraction of such design-relevant information.
Exploring and Understanding Design Space: Extracting Design-Relevant Information By interpreting the map of design space based on what we as designers (or authors) know about or how we perceive the inhabitants of this space (i.e., the prototypes), we neglect the qualitative value of the personal constructs obtained. A possible way of substantiating our interpretation is to look at the actual personal constructs that differentiate two prototypes. Basically, the differences between each pair of prototypes can be further explored, resulting in a large number of comparisons to be made. At this point the map becomes valuable for guiding the process of comparing. If, for example, we seek to substantiate our interpretation of Branch 1 as representing the transition from the technology-oriented to the fun-oriented, we may look at the actual personal constructs that differentiate between the Windows-like and the comic prototype. A construct clearly differentiates if one prototype is characterized by one pole of the construct, whereas the opposite pole characterizes the other prototype. Table 1 shows the personal constructs that differentiate between the comic and the Windows-like prototype. The constructs are grouped by topics. Topic labels are shown in italics. It is striking that a good part of the personal constructs in Table 1 indeed address the technology-oriented versus fun-oriented difference between the two prototypes. However it becomes apparent that this difference is not only perceived but also evaluated. The participants express concerns about the appropriateness of a so
450
Hassenzahl and Wessler
Table 1: Personal Constructs That Differentiate Between the “Comic” and the “Windows-Like” Prototype Prototype “Comic”
Prototype “Windows-Like”
Not serious (fun-oriented) 1. Does not take the problem seriously 2. Had been fun 3. Non-expert-like 4. Frivolity 5. Not serious 6. Playful 7. All show, no substance 8. Inappropriately funny Not competent 9. Process of switching off appears incompetent 10. Appears incapable Novel 11. Figure is a novel interaction element 12. Has its own character 13. Unusual presentation No impairment of process visibility 14. Visibility is not impaired 15. Overview remains 16. No impairment of visibility Low readability 17. Text is not readable 18. Font size is too small Low “mouseablity” 19. “Yes” and “no” selections hard to hit with mouse 20. Buttons hard to hit with mouse Other 21. Color of pump is neutral 22. Status of pump visible 23. Not suitable for frequent use
Serious (technology-oriented) Takes the problem seriously Serious (good for work) Technically appropriate Points at something technical More serious Expert-like Technology-oriented Serious Competent Process of switching off appears competent Is very capable Usual Usual interaction Windows-interface monotony Usual computer-like presentation Impairment of process visibility box Dialog box blocks important aspects of process Dialog box covers overview Dialog box blocks overview High readability Text is readable Font size is appropriate High “mouseablity” “Yes” and “no” selections easy to hit with mouse Buttons easy to hit with mouse Other Color helps to identify pump Status of pump not visible Suitable for frequent use
Note. The constructs are grouped by topic. Topic labels are in italics. All construct examples were originally in German.
obviously fun-related design in an industrial context. Due to that fact that some of the personal constructs are evaluative in tone, it is possible to separate good from bad ideas. The design-relevant information is that the comic prototype succeeds in inducing a sense of playfulness, but that the apparent lack of seriousness is considered inappropriate for the intended context of use. The blunt playfulness seems to contradict the need for an expert-like and competent-looking prototype (Table 1, Constructs 1–10). Furthermore, the comic prototype’s novelty is a topic (Constructs 11–13). Again, the related personal constructs vary in tone: One construct emphasizes positive feelings derived from the comic prototype’s “own character” (Construct 12), whereas the “unusual presentation” construct connotes concerns about the obvious novelty (Construct 13).
Capturing Design Space
451
Besides the adequacy concerns, that is, the comic prototype is perceived as more playful and novel than the Windows-like, but it is doubtful whether its design fits into the intended context of use, more usability related aspects are expressed. For example, the fact that the dialog box in the Windows-like prototype blocks the view of the process was negatively received (Constructs 14–16). The participants voiced a need to sustain the view onto the whole process while operating the system. This was much better solved by the comic prototype by presenting the figure with the “dialog sign” in the empty space between interface elements (see Color Plate 17, Figure 1). Moreover, the small size of the font on the comic’s dialog sign was perceived negatively (see Color Plate 17, Figure 1). It impaired the readability (Constructs 17–18) and the mouseability (Constructs 19–20) of the prototype. Some additional constructs addressed the neglect of colors in the comic prototype (Construct 21), the lack of feedback of the pump status in the Windows-like prototype (Construct 22), and the general judgment that the comic prototype is not suitable for frequent use (Construct 23). Altogether, the detailed analysis added substantial, design-relevant information to the preliminary interpretation of the differences between the prototypes based on the map of design space and the appealingness ranking. The most revealing proved to be the personal constructs, which are evaluational in tone. Those constructs help to understand how design elements (e.g., color, metaphors, dimensionality, etc.) are perceived and received by participants. For the sake of brevity, we refrain from presenting further detailed analysis of differences between two prototypes. However, to provide a rough idea of the fruitfulness of the information obtained in the form of personal constructs, we performed the following additional analysis. We, along with an additional rater, independently rated each construct as belonging to one of the following categories: Type A, “descriptive;” Type B, “evaluative, useful for artifact selection;” and Type C, “evaluative, useful for artifact redesign without the need for further analysis.” Type A constructs point to certain differences (e.g., two-dimensional vs. three-dimensional) to which individuals are receptive. They can be used to verify whether design elements (e.g., dimensionality) used are actually perceived by the participants. Nevertheless, it remains unclear which pole of the construct is considered as good or bad, respectively. This certainly limits its use. Type B constructs point to relevant issues, without referring to concrete measures to be taken to resolve these issues. For example, the construct “interesting versus boring” points to an important difference between prototypes. It is obvious that to be boring is not considered a positive attribute of a prototype (i.e., to be boring is bad). Such a construct is evaluational in tone and can be used for selecting a good artifact from the set of studied artifacts. Type B constructs tend to be too abstract, but still point to important issues. However, the question as to which design elements let one prototype appear boring and the other interesting cannot be answered by the personal construct alone. Again, this limits its use for guiding design, because the design elements responsible for making a prototype, for example, interesting (i.e., a good idea) cannot be identified. Type C constructs point to relevant issues, with a clear reference to the relevant design elements. For example, the personal construct “font too small versus font size ap-
452
Hassenzahl and Wessler
propriate” clearly indicates the participant’s expectations and the associated design element (i.e., the font size in some prototypes). These constructs are evaluative in tone. They are concrete and useful for guiding design, but they tend to be very detailed. From a design perspective, it would be desirable to have a small number of the purely descriptive Type A constructs, a medium number of the more abstract Type B constructs, and a large number of the more detailed Type C constructs. Interrater agreement (Fleiss, 1971) of the initial categorization performed independently by the three raters was satisfactory (κ = 0.64, µ = 11.14; p < .01, two-tailed). For the final classification disagreement was resolved either by using the category two raters at least agreed on or by negotiation. Table 2 shows the construct types concerning design relevancy, and the number and percentage of constructs belonging to each category. A large proportion (44%) of the personal constructs obtained are Type A constructs. Unfortunately, these constructs are of limited use when it comes to practical design. This result may reflect the fact that it is much easier to come up with descriptive constructs rather than evaluative. Even so, in further applications of the RGT, measures should be taken to reduce the number of purely descriptive personal constructs. The proportion of Type B and C constructs are as desired, with a large number of the detailed Type C constructs and a medium number of Type B constructs. It is encouraging that the number of Type C constructs almost equals the number of the much easier to produce Type A constructs. The design relevancy of the information captured by the RGT is—despite the large proportion of purely descriptive constructs (Type A)—encouraging. However, it should be noted that the analysis presented herein has a limitation. It neglects the fact that some personal constructs may address the same issue or even the same design element (e.g., text is not readable vs. text is readable, or font size too small vs. font size is appropriate), thus the actual amount of information may be overestimated. To our mind, a more complete analysis of the actual quality of the information obtained should be postponed until a more extensive pool of personal constructs, obtained with different sets of artifacts, is available. For now, we tentatively conclude that an RGT will produce at least some design-relevant information.
Abstraction: Underlying Topics Made Visible It is quite common to evaluate artifacts (or a set of artifacts) to improve their design. Often neglected is the additional step of abstracting and generalizing from the data obtained (Carroll et al., 1992). Such abstractions could stimulate the developTable 2: Number and Percentage of Personal Constructs Belonging to Different Types Concerning Their Design Relevancy Construct Type
Design Relevancy
A Descriptive Low B Evaluative; Useful for artifact selection Medium C Evaluative; Useful for artifact redesign High without the need for further analysis
No. of Constructs
% of Constructs
75 34 61
44 20 36
Capturing Design Space
453
ment of theories for the design of artifacts and could prove helpful for solving future design problems. From our perspective, it is an important asset of an (evaluation) method to support abstraction. A first inspection of the personal constructs by the experimenter (i.e., Wessler) revealed obvious similarities within and between participants. Based on these similarities, he defined and described construct classes. We, with one additional rater, then categorized each construct as belonging to one of the construct classes. Ambiguous constructs were removed. Interrater agreement (Fleiss, 1971) of categorization was satisfactory (κ = 0.68, µ = 15.49; p < .01, two-tailed). Based on difficulties encountered during classification, the initial set of construct classes was slightly reformulated. It was attempted to resolve disagreement about construct’s class membership among the raters. In the case disagreements could not be resolved, the construct was removed from further analysis. Altogether, 153 constructs remained in the analysis. Table 3 shows the identified construct classes, some examples, and the number of constructs summarized by the class.
Design principles. The construct class “design principles” refers to differences in metaphors (e.g., reality, anthropomorphism, desktop), visual design methods (e.g., animation, color), and interaction design methods (buttons, dialog boxes) used by the designers. It illustrates the participants’ receptiveness to differences in the way the designers solved the design problem set by the given task. The summarized constructs are more or less descriptive in nature.
Quality of interaction. The construct class “quality of interaction” refers to differences in the prototype’s controllability, simplicity, and efficiency. It shows the participants’ ability to express and put into focus usability problems occurring during interaction. The summarized constructs are more or less evaluative in nature.
Quality of presentation. The construct class “quality of presentation” refers to differences in the prototype’s self-descriptiveness (i.e., ability to communicate its functions, the way it is operated, and its current status) and clarity (e.g., readability, unambiguousness). It shows the participants’ ability to infer differences in the usability of prototypes from the way information is presented to them. The summarized constructs are more or less evaluative in nature.
Hedonic quality. The construct class “hedonic quality” refers to differences in the prototype’s hedonic quality (Hassenzahl et al., 2000), that is, the non-task-related qualities modernity, novelty, and ability to stimulate. It illustrates the participants’ responsiveness to differences beyond mere usability and utility. The summarized constructs are more or less evaluative in nature.
454
Hassenzahl and Wessler Table 3: Construct Classes, Examples, and Number of Constructs
Construct Class
1 Design principles
Personal Construct Examples
Two dimensional–Three dimensional Detail view–Total view Graspable–Abstract 2 Quality of interaction Trial and error–Unambiguous control Dialog element inefficient–Dialog element efficient Demanding interaction–Straightforward interaction 3 Quality of presentation Presentation confusing–Presentation clear Too much information–Appropriate amount of information Structure remains vague–Structure becomes apparent 4 Hedonic quality Boring–Interesting Has its own character–Windows-interface monotony Novel interaction element–Conventional interaction element 5 Adequacy concerns Inappropriately funny–Serious Unnecessary animation–No unnecessary animation Appears incapable–Is very capable
No. of % of Constructs Constructs 61
40
16
10
49
32
4
3
23
15
Note. See text for further descriptions of the construct classes. All construct examples were originally in German.
Adequacy concerns. The construct class “adequacy concerns” refers to user concerns about the prototype’s adequacy for the intended context of use (proficiency vs. playfulness) and the adequacy of employed design principles in general (animation, light effects). It illustrates the participants’ belief about the extent to which the prototype is suitable for the task. The summarized constructs are evaluative in nature. If we look at the percentage of constructs devoted to the different topics, a great deal are perceived differences in the design principles and elements employed by the designers (40%). This demonstrates the participants’ receptiveness to variations in design. Differences in quality of presentation (32%) is the second strongest group. Together with quality of interaction (10%) these constructs represent perceived differences in the usability (i.e., task-oriented quality aspects) of the prototypes. As long as the prototypes used in this study did not allow for extensive interaction (mean interaction time was about 2 min), the stronger focus on presentational differences can be easily explained. Hedonic quality, that is, non-task-oriented quality aspects (3%) did not receive much attention from the participants. This is astonishing, given the fact that more quantitatively oriented analyses of the same set of prototypes showed that perceptions of ergonomic (i.e., usability) and hedonic quality contributed almost equally to the appealingness of the prototypes (Hassenzahl et al., 2000). Why was there only such a small number of personal constructs devoted to the topic of hedonic quality? There are at least two possible explanations. First, if we take a closer look at the adequacy concerns (15%), it is striking that a great many constructs revolve
Capturing Design Space
455
around concerns as to whether design principles and elements employed to induce hedonic quality are appropriate in the intended context of use. In other words, the topic adequacy concerns also addresses hedonic quality, but the major issue is rather the adequacy of the way hedonic quality is induced than a general appreciation of hedonic quality. Second, it might simply be much harder to be aware of differences in hedonic quality, that is to produce hedonic-oriented personal constructs. For example, becoming aware of the fact that a certain color scheme violates oneâ&#x20AC;&#x2122;s taste requires a lot more reflection than becoming aware that a certain text is not readable. Most likely both explanations hold true to some extent. To summarize, laypersons confronted with a prototype representing a task from an unknown and technology-oriented domain are especially receptive of differences in usability, with the clear need for a good usability. They are also receptive of differences in hedonic quality, but they express strong concerns about the adequacy of the way this quality aspect was induced. Based on these results, further studies can be planned (e.g., same set of prototypes presented to domain experts instead). By cumulating the personal constructs obtained under different conditions (e.g., sets of artifacts, user groups) and an abstraction step, domain-specific and general models of design space from a userâ&#x20AC;&#x2122;s perspective could be built up. These models would certainly help designers to develop their designs in a way appreciated by the potential users. From our perspective, abstraction and generalization of the results obtained seems possible and fruitful. It certainly stimulates the adoption of new perspectives on the artifacts to be designed and their context of use. To summarize, the RGT combined with the analysis methods proposed proved to be helpful in charting, exploring, and understanding the design space of the given set of artifacts. It lent itself to both (a) the generation of concrete, design-relevant information and (b) the abstraction from this information of material to stimulate further analysis and theory development. Future studies should explore the utility of the RGT by varying the types of artifacts. By comparing the results from these, it may be possible to determine whether the personal constructs found are of general nature or heavily dependent on the actual artifacts studied.
4. ADVANTAGES, LIMITATIONS, AND FURTHER IMPROVEMENTS OF THE RGT In the following sections advantages, limitations, and improvements of the RGT for exploring design space are presented and discussed.
4.1. Advantages The most important advantages of the RGT are (a) its ability to gather design-relevant information, (b) its ability to illuminate important topics without the need to
456
Hassenzahl and Wessler
have a preconception of these, (c) its relative efficiency, and (d) the wide variety of types of analyses that can be applied to the gathered data. As already emphasized, a method that aims at guiding design has to provide information of high practical value. In this study, 36% of the constructs obtained were rated to be useful for artifact redesign without the need for further analysis. Another 20% of the constructs contribute relevant information to the design process although additional analysis is necessary. The amount of high-quality information gathered becomes even more remarkable, if it is taken into account that the RGT as an adaptive method is likely to draw the designer’s attention to topics yet unconsidered. Information such as the concerns dealing with the prototypes’ adequacy are especially valuable in early phases of the design process. There are good reasons to doubt whether other methods would have been able to make the adequacy concerns that clear. Interestingly, only a small number of personal constructs directly refer to hedonic quality. We argued that both hedonic quality constructs and adequacy concerns constructs are in fact tied to the same artifact properties. However, what was intended by the designers to induce hedonic quality was ill-received by the users. Many people seem to share the belief that—for professional use—a certain degree of seriousness is indispensable. This points to another strength of the RGT. Because of its openness, important user attitudes, beliefs, and needs will surface without the necessity to have a preconception of these. Compared to other nondirective methods (e.g., open interviews), the RGT is an economic method. Especially in a parallel design situation with many prototypes to be assessed, the RGT remains efficient. Beside the relative small amount of time needed for carrying out an RGT study, the analysis of the data gathered also seemed to require less time than the analysis of data gathered with other nondirective methods. In cases where experimenter–participant interaction is not necessary or even desirable, the RGT can also be run in totally computerized versions, which makes it even more efficient. The most interesting feature of the RGT is the wide variety of different types of analyses that can be applied to the gathered personal constructs. It provides data that lend to the identification of general needs, beliefs and attitudes, and specific interartifact differences, as well as interperson or intergroup differences.
4.2. Limitations Apart from its advantages, several limitations of the RGT became obvious during its application: (a) the requirement of a set of at least four artifacts, (b) the insensitivity to good or bad attributes shared by all artifacts in the set, (c) the lack of support for the actual phrasing and labeling of constructs, (d) the high number of merely descriptive constructs, and (e) problems with determining relations among constructs. Although highly recommended by Nielsen (1993), extensive parallel design hardly ever seems to be practiced due to small budgets and lack of time (at least according to our experiences). Beyond its application in a parallel design phase, the
Capturing Design Space
457
RGT could be used to compare a prototype with similar products already available at the market. For example, employing the RGT to compare different design approaches to competing sites in the World Wide Web may be interesting, because the experimenter will find various alternative products at his or her free disposal. Especially in Web site design not only well-known qualities (e.g., usability) but also new topics (e.g., hedonic qualities, joy of use, fun) often have to be dealt with. Such new topics can be identified and explored by means of the RGT. Another problem of the RGT touches one of its core assumptions, namely to equate a personal construct with a similarity–dissimilarity dimension triggered by variation between artifacts. Attributes of artifacts that may be important but do not vary in the set of artifacts at hand will never appear as a personal construct. Thus, good or bad attributes shared by all artifacts in the set will go unnoticed. For example, the underlying topic (construct class), adequacy concerns, would have been unlikely to emerge without having the comic prototype in the set. This also emphasizes the importance of maximizing the heterogeneity of the artifacts designed in parallel. Even artifacts that seem to comprise extreme design methods are desirable. The more differences exist and the more extreme they are, the more unlikely it becomes that relevant topics will go unnoticed. Even though the RGT appears to have a structured and defined process, we would like to stress the fact that the phrasing and labeling of the personal constructs requires verbal skills of both the experimenter and the participant. The actual process of phrasing remains unstructured and can be best described as “mining” for the core meaning of the participant’s initial statement. This obviously calls for more supportive and elaborated techniques. The application of the RGT presented previously still produced a certain number (44%) of mainly descriptive personal constructs that are of less value to guide artifact design. For example, the construct two-dimensional versus three-dimensional leaves it open whether the participant actually prefers a three-dimensional or a two-dimensional interface. Measures should be taken to reduce the number of merely descriptive constructs. Finally, the construct pool gathered by applying the RGT is not structured in any way. Thus, relations between personal constructs remain unclear. For example, the data gathered in this study showed that the comic prototype was found to be highly playful and not suitable for professional use. Although it may appear obvious, one cannot readily conclude that the comic prototype is not suitable for professional use because it is so playful. A good deal of reasoning and interpretation is required to state relations between constructs. This surely decreases the RGT’s objectivity.
4.3. Improvements and Future Work As already discussed, constructs that are evaluative in nature (e.g., font size appropriate vs. text not readable) are more valuable than constructs that are descriptive in nature (e.g., two-dimensional vs. three-dimensional). Unfortunately participants are likely to come up with descriptive constructs as well as evaluative constructs. To solve this problem, one might simply instruct the participants to refrain from stat-
458
Hassenzahl and Wessler
ing descriptive constructs. Another possibility is a short follow-up interview after finishing the RGT. In this interview, experimenter and participant briefly review each construct to determine which construct pole the participant considers as desirable. This procedure would convert descriptive constructs to evaluative constructs, given that the participant is able to state a clear preference. The follow-up interview could also help the experimenter to establish construct priorities. Taking into account that in almost every design project budget and time are limited, the prioritization of constructs seems inevitable (see Hassenzahl, 2000). To establish construct priorities, the experimenter could simply ask the participant to rank the constructs in terms of personal relevance or importance. One of the most impressive features of the RGT is its sensitivity to individual perceptions, needs, beliefs, and attitudes. Further work should bring up a method that is as sensitive but applicable with only one artifact throughout the whole design process—a situation much more likely to occur in an industrial setting (e.g., Hassenzahl & Burmester, 2000). Furthermore, such a method should be able to capture relations between single constructs, for example as a hierarchical or network structure.
5. GENERAL CONCLUSION In this article, we attempted to put into focus the RGT as a method of exploring design space in a parallel design situation. It is not a new method and obviously has its limitations, but nevertheless seems worth a try if user perceptions, needs, beliefs, and attitudes are to be taken into account by the design. Beyond questions about the mere practicality of the RGT, the underlying personal construct approach addresses a more fundamental issue, namely the informational value of the idiosyncratic (i.e., personal) perspectives of users on the artifacts we design. Human–computer interaction research still tends to quickly generalize across users. The best design alternative is generally the one the most people agree on. This view—rooted in the quantitative research tradition—neglects the informational value of contradictions and inconsistency in idiosyncratic perspectives. To our mind, much can be learned from understanding the naive theories people have about the artifacts they live with.
REFERENCES Banister, P., Burman, E., Parker, I., Taylor, M., & Tindall, C. (1994). Qualitative methods in psychology. Philadelphia: Open University Press. Bevan, N., & Macleod, M. (1994). Usability measurement in context. Behaviour & Information Technology, 13, 132–145. Buur, J., & Bagger, K. (1999). Replacing usability testing with user dialogue. Communications of the ACM, 42(5), 63–66. Carroll, J. M. (1997). Human–computer interaction: Psychology as a science of design. International Journal of Human–Computer Studies, 46, 501–522. Carroll, J. M., & Moran, T. P. (1991). Introduction to the special issue on design rationale. Human–Computer Interaction, 6, 197–200.
Capturing Design Space
459
Carroll, J. M., & Rosson, M. B. (1991). Deliberated evolution: Stalking the view matcher in design space. Human–Computer Interaction, 6, 281–318. Carroll, J. M., Singley, M. K., & Rosson, M. B. (1992). Integrating theory development with design evaluation. Behaviour & Information Technology, 11, 247–255. Dillon, A., & McKnight, C. (1990). Towards a classification of text types: A repertory grid approach. International Journal of Man–Machine Studies, 33, 623–636. Draper, S. W. (1999). Analysing fun as a candidate software requirement. Personal Technology, 3(1), 1–6. Dreyfuss, H. (1955). Designing for people. New York: Simon & Schuster. Ericsson, K. A., & Simon, H. A. (1984). Verbal reports as data. Cambridge, MA: MIT Press. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382. Fransella, F., & Bannister, D. (1977). A manual for repertory grid technique. London: Academic. Gaines, B. R., & Shaw, M. L. G. (1997). Knowledge acquisition, modelling and inference through the World Wide Web. International Journal of Human–Computer Studies, 46, 729–759. Gediga, G., Hamborg, K.-C., & Düntsch, I. (1999). The IsoMetrics usability inventory: An operationalization of ISO 9241-10 supporting summative and formative evaluation of software systems. Behaviour & Information Technology, 18, 151–164. Hassenzahl, M. (1999). Usability engineers as clinicians. Common Ground, 9(3), 12–14. Hassenzahl, M. (2000). Prioritising usability problems: Data-driven and judgement-driven severity estimates. Behaviour & Information Technology, 19, 29–42. Hassenzahl, M., & Burmester, M. (2000). Zur Diagnose von Nutzungsproblemen: Praktikable Ansätze aus der qualitativen Forschungspraxis [The diagnosis of usability problems: Practical qualitative approaches]. In K.-P. Timpe, H.-P. Willumeit, & H. Kolrep (Eds.), Bewertung von Mensch-Maschine-Systemen 3: Berliner Werkstatt Mensch-Maschine-Systeme (pp. 171–184). Düsseldorf, Germany: VDI Verlag. Hassenzahl, M., Platz, A., Burmester, M., & Lehner, K. (2000). Hedonic and ergonomic quality aspects determine a software’s appeal. In Proceedings of the ACM CHI 2000 conference on human factors in computing (pp. 201–208). New York: ACM. Jørgensen, A. H. (1989). Using the thinking-aloud method in system development. In G. Salvendy & M. J. Smith (Eds.), Designing and using human–computer interfaces and knowledge based systems (pp. 743–750). Amsterdam: Elsevier. Kelly, G. A. (1955). The psychology of personal constructs (Vol. 1 & 2). New York: Norton. KNOT: Knowledge Network Organizing Tool [computer software]. (1992). Las Cruces, NM: Interlink. MacLean, A., Young, R. M., Bellotti, V. M. E., & Moran, T. P. (1991). Questions, options, criteria: Elements of design space analysis. Human–Computer Interaction, 6, 201–250. Moran, T. P., & Carroll, J. M. (1994). Design rationale: Concepts, techniques, and use. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Nielsen, J. (1993). Usability engineering. San Diego, CA: Academic. Schvaneveldt, R. W. (1990). Pathfinder associative networks: Studies in knowledge organization. Norwood, NJ: Ablex. Shaw, M. L. G. (1980). Advances in personal construct technology [Editorial]. International Journal of Man–Machine Studies, 13, 1–2. StarCraft [computer software]. (1998). Irvine, CA: Blizzard. Wandmacher, J. (1993). Software-Ergonomie [Software-ergonomics]. Berlin: de Gruyter. Weisscher, A. (1999). Innovative user interfaces for power distribution systems. Unpublished manuscript, Technical University Delft, Faculty of Industrial Design. Wright, P. C., & Monk, A. F. (1991). A cost-effective evaluation method for use by designers. International Journal of Man–Machine Studies, 35, 891–912.
FIGURE 1
Artifacts (i.e., prototypes) used in the study. COLOR PLATE 17
a1: Overview of the industrial plant. The green disc at the left represents the pump. a2: Detail of the windows-like dialog box. a3: Detail of the action confirmation. b1: Close-up of the pump (no overview exists). b2: Pump with a “dialog cube”. b3: Details of the “dialog cube” turning c1: Overview. The green disc at the left represents the pump. c2: Detail of the “dialog flag” extruding from the pump symbol. c3: Detail of the action confirmation. d1: Overview. The disc with arrow at the left represents the pump. d2: Detail of a figure holding up a “dialog sign”. d3: Detail of the action confirmation. e1: Overview of the industrial plant in a 3D style. Clicking on the pump initiates zoom-in. e2: Detail of the pump with an on/off-knob e3: Detail of the action confirmation. f1: Overview. The green rotating disc at the left represents the pump. f2: Detail of the transparent dialog box. It is visually connected to the pump by a line. g1: Overview in an isometric, rendered style. The golden shape represents the pump. The pump is selected and in on-status. g2: Detail of action confirmation. g3: Pump in off-status.
FIGURE 1 (continued)
Artifacts (i.e., prototypes) used in the study. COLOR PLATE 18