Measuring the "Importance" of Attributes

This report describes the results of a study conducted by Market Facts, Inc., in which various methods of collecting attribute importance data were compared. The methods included 4-point and 6-point rating scales, pairwise comparisons among attributes, and a "checklist" format, where respondents indicated only which attributes were most important, second most important, and third most important. Except for the checklist format, the rank order of attribute means was virtually identical for the various methods.


Introduction

It is a common practice in product image studies to ask respondents to indicate the importance of various characteristics in evaluating products or in deciding which brand to buy. Typically, this information is collected using either a checklist, where respondents simply "check" attributes they feel are important, or rating scales where the end points of the scale denote "not important" and "extremely important" (or something similar).

Often, the primary purpose of collecting importance data is simply to rank the attributes from most to least important or to identify a subset of "important" ones. These analyses usually involve comparisons among the attribute means. One factor that may affect the rank order of attribute means or the relative differences between them is the type of scale used to collect the data. In the study described here, some commonly used methods of collecting importance data were compared in order to determine whether the relative importance of the attributes was consistent across the methods.


Outline of the study

Approximately 4,000 members of the Market Facts Consumer Mail Panel were sent questionnaires listing eleven benefits of hand and body lotions and asked to evaluate the importance of each benefit. Respondents were divided in to five groups, which differed only in the type of scale used to evaluate the benefits.

One group was asked to rate the importance of each benefit on a 4-point scale where the scale points were labeled "not important" (1), "somewhat important" (2), "very important" (3) and "extremely important" (4). Two groups were given 6-point scales. For one group, the scale points were labeled as "not important" (1), "only slightly important" (2), "somewhat important" (3), " very important" (4), "extremely important" (5) and "absolutely essential" (6). For the other group only the endpoints of the scale were labeled: "not important" (1) and "extremely important" (6).

Another group was instructed to simply "check" the three most important benefits. For purpose of analysis, a value of 3 was assigned to the benefit indicated as most important, 2 to the second most important benefit, 1 to the third most important benefit, and zero to all other benefits.

The final group was given all 55 possible pairs of benefits and asked to indicate the relative importance of the members of each pair using an unlabeled 5-point scale. For example, one pair appeared as follows:



Respondents were instructed to check the middle box (3) if the benefits were equally important, a box toward the left if the benefit on the left was more important, or a box toward the right if the benefit on the right was more important. Each pair yields two scores: one for the benefit on the left and one for the benefit on the right. If a respondent checked the middle box, both benefits received a "score" of 3. If the rightmost box in the above example was checked, "not greasy" would be assigned a 1 and "softens skin" a 5. If the box under "2" was checked, "not greasy" would be scored as 4 and "softens skin" as 2. The "final" score for a benefit for a respondent was calculated by computing the mean for that benefit across all pairs in which it appeared. Since respondents provide comparative judgements of benefit importance, the interpretation of the final benefit scores (and means) is different than for ordinary rating scales: one cannot tell how important the benefits are to a respondent (or across all respondents) in an "absolute" sense.

The pairwise approach was included in this study because of its theoretical appeal. This approach forces respondents to make explicit comparisons or trade-offs among the benefits. (This is similar to the task of ranking the benefits, where such comparisons are also made, but not explicitly.) Considerably more information is requested of respondents (55 responses in this case, versus 11 for rating scales), in an attempt to obtain greater discrimination among the attributes. (In general, K(K-1)/2 pairs must be evaluated to score or order K attributes, so this approach is impractical for more than about a dozen attributes.)


Results

The original means for each benefit are listed in Table 1. These means are not directly comparable because the scales upon which they are based are different. Therefore, for purposes of comparison the means were rescaled so that for each method the least important benefit had a value of zero and the most important benefit a value of 100*. This rescaling preserves the rank order of the benefits within a method, as well as the relative magnitudes of the differences among the benefits. The rescaled means are listed in Table 2.


*The formula used in this rescaling is 100(X- min)/(max - min), where "X" is a benefit mean for a particular method, and "min" and "max" are the smallest and largest benefit means for that method.


Table 1
Original Means

  4-point 6-point 6-point  
Benefit labeled labeled unlabeled Pairs Pick 3
Will not irritate skin 3.71 5.47 5.76 3.44 0.48
Moisturizes skin 3.57 5.16 5.65 3.42 1.43
Softens skin 3.47 5.01 5.56 3.37 0.75
Protects skin 3.47 4.94 5.52 3.43 0.58
Not greasy 3.46 4.94 5.45 3.19 0.79
Effective for a long time 3.32 4.64 5.43 3.08 0.59
Absorbs into skin quickly 3.18 4.48 5.33 3.22 0.98
Economical to use 3.02 4.22 4.89 2.74 0.18
Remains effective afterwashing 2.77 3.85 4.62 2.70 0.18
Recommended by dermatologists 2.16 2.96 3.68 2.35 0.05
Has a feminine fragrance 2.10 2.87 3.67 2.08 0.09


Table 2
Rescaled Means

  4-point 6-point 6-point  
Benefit labeled labeled unlabeled Pairs Pick 3
Will not irritate skin 100.0 100.0 100.0 100.0 31.7
Moisturizes skin 91.1 88.2 94.4 98.5 100.0
Softens skin 85.3 82.5 90.2 94.9 50.6
Protects skin 85.2 79.6 88.5 99.6 38.7
Not greasy 84.7 79.7 85.2 81.4 54.1
Effective for a long time 75.6 68.0 83.9 73.4 39.4
Absorbs into skin quickly 67.2 61.9 79.3 84.0 67.6
Economical to use 57.0 51.8 58.2 49.1 9.4
Remains effective afterwashing 41.7 37.6 45.3 45.8 9.6
Recommended by dermatologists 3.3 3.7 0.2 20.0 0.0
Has a feminine fragrance 0.0 0.0 0.0 0.0 3.0


These tables reveal considerable consistency among the methods with respect to the rank order of the benefits. The lowest 4 benefits are the same for all five methods. Also, the results for the 4-point and 6-point scales are virtually identical and very similar to those for the pairwise comparison approach. Although the original means for the unlabeled 6-point scale are consistently higher than those for the labeled version, the rank order is the same. The "Pick 3" method, however, resulted in a substantially different ordering of the top 7 benefits. In particular, the benefit "will not irritate skin", highest in all other methods, ranks seventh in the "Pick 3" method.


Discussion and Recommendations


Checklists

There are several viable explanations (other than "sampling fluctuation") for the discrepancy between the results of the "Pick 3" method and the other methods. Respondents in the "Pick 3" method could provide responses to only 3 of the 11 benefits, whereas there were no analogous restrictions for the other methods. Further, assigning values of "3", "2" and "1" (or any other 3 distinct values) to the first, second and third most important benefits, and "0" to all others, makes two unverifiable assumptions: (1) the top 3 attributes are not equally important to the respondent and (2) no other attributes are important (or they are equally unimportant). As a result, this method yields sensible data only for those respondents for whom these assumptions are true. (The data for the "pick 3" method were also analyzed after coding the answers as simply "checked" (1) or "not checked" (0), but the results were essentially unchanged.) Basically, the problem is that the data for a respondent constitute a partial ranking, although computation of means treats the data as if they represent a complete set of rankings or ratings.

An alternative to such "restricted checklists" is to allow respondents to check all attributes they feel are important, with no restrictions. This is tantamount to using a 2-point rating scale (important or not important). However, respondents are then limited in expressing the "degree" of importance of the attributes. A further problem, common to all checklists, is that unless a "don't know" or "noresponse" option is explicitly provided for each attribute. It is impossible to distinguish between that and a response of "not important".


Rating scales and pairwise comparisons

Despite the extra effort required of the respondents and the different nature of the scale, the pairwise comparison approach yielded results very similar to those obtained from the 4-point and 6-point rating scales. Also, the results of this study, as well as prior experience, suggest that when "ordinary" rating scales are used, the number of scale points and whether the intermediate scale points are labeled have little influence on either the rank order of attribute means or the relative distances between the means. Although not an issue addressed in this study, it is possible that too many scale points may lead to potential errors in judgement on the part of the respondent (i.e., "measurement error"). It is easier for a respondent to rate an attribute considered to be relatively important as a "4" on a 5-point scale than to decide whether it should be assigned a "7" or "8" (or perhaps even a "6" or "9") on a 10-point scale.

In summary, when the primary objective of collecting importance data is to order the attributes or, more generally, to examine differences among attribute means, rating scales can produce virtually the same results as the more complex pairwise comparison approach, with much less effort on the respondent's part. If checklists are used, the respondent's task is even easier, but information about "degrees" of importance is sacrificed. Also, the results should be interpreted cautiously if a nonresponse cannot be distinguished from a response of "not important" or if respondents are restricted in the number of attributes they may check.