An Analysis of Importance Ratings
Introduction
Importance ratings, in one form or another, have become a rather ubiquitous commodity in marketing research, found in a variety of studies. Yet, this approach to assessing the weight or value given to various product characteristics and benefits during the purchase decision making process has often been criticized. Many view the resulting measurements as lacking discrimination, both between benefits evaluated and among respondents (e.g., "too many people said too many items are extremely important"). Further, the resulting ratings are often perceived as not reflecting the true nature of product benefits or motivations, hence the need for "derived importance" to get at what consumers truly think and feel.
However, if adequately analyzed, information obtained from importance ratings can be most effective in creating a basis for marketing strategy. These ratings do supply a critical understanding of consumer needs, especially in revealing that not everyone may want or need the same things from a product. So, importance ratings are an effective basis for segmentation. Coupled with this use is the notion that total sample averages may be misleading, falling in a sparsely inhabited middle ground of attitudes between segments for whom a benefit may be either very important or inconsequential. Further, this segmentation invariably leads to the creation of a hierarchy of benefits, ordering them from "price of entry" for all consumers, to those crucial for attracting segments or consumer niches, to unimportant. Lastly, when merged with product perceptions, it becomes a simple task to note the motivations for purchase or the product directions to be taken to enhance overall opinion.
Validation
As mentioned above, the straight-forward use of importance ratings, as a means of assessing what product benefits are influential during the purchase decision making process, has declined through the years. Instead, more indirect or "derived" measures have gained in prominence, if not in utility. In discussing the use of analysis of importance ratings, it would be best to first defend them, to show that these ratings do indeed reflect some degree of impact on the purchase decision. Specifically, product benefits which are considered most important, say using an average importance rating as a criterion, are also those for which the relationship between brand usage and brand ratings are strongest.
Consider data from an individual respondent and relate brand usage to brand ratings. Usage may be defined as regular usage during some time period to which a respondent would respond "yes" or "no" for several brands in the category of interest. Or, more generally, usage may be defined in terms of future purchase likelihood, again recorded as a "yes" or "no" response. Further, brand ratings could be obtained for those same brands. The brands may be evaluated using, say, a 5-point scale graduated from "poor" to "excellent" or perhaps employing something as simple as a checklist with the respondent simply noting whether the brand possesses each attribute listed.
As an example, seven analgesic brands are of interest. Future purchase interest was recorded along with a rating of whether the brand was considered effective for headache pain. The data from one respondent is displayed below. Positive purchase intent was coded as a "1" and no intent as "0". For expedience, a checklist format was used for the ratings, with a "1" given to brands perceived as having this characteristic and a "0" otherwise.
| Brand | Purchase Intent | Effective for Headache Pain |
| A | 1 | 0 |
| B | 0 | 0 |
| C | 0 | 1 |
| D | 1 | 1 |
| E | 1 | 1 |
| F | 0 | 0 |
| G | 0 | 0 |
These data are summarized in the table,
which suggests some relationship between purchase intent and effectiveness. Further, a statistical summary measure may be used to quantify the relationship with one number. For example, the Jaccard coefficient (described in Research on Research Paper number 40) for these data is .5, the proportion of brands with positive purchase intent and perceived as effective among those brands considered as either. In a relative sense, effectiveness for headache pain may be taken as an important product attribute, more important than those with smaller coefficients, suggesting that purchase intent is related to the pain relief supplied by the brand. So, the Jaccard coefficient may be considered a useful measure, showing the strength of the fundamental relationship between usage, as measured by future purchase intent, and perceptions across brands. Important product attributes, features or benefits, those which motivate the purchase decision, must possess the strongest relationship to some measure of usage.
This form of analysis can be extended. First, the analysis would be performed for all respondents, with an average coefficient value calculated across respondents. This average is then obtained for each product benefit. These averages can then be related to average importance ratings.
Consider data from a study in which roughly 600 respondents were asked a series of questions pertaining to the analgesic category. The sample, obtained from the Market Facts Consumer Mail Panel, was considered representative of category users. Evaluations of thirty product benefits formed the core of the questionnaire. Typical importance ratings for these benefits were obtained using a 10-point scale. The benefits also served as the basis for rating each of seven major brands. A check-list format was employed for the brand evaluations. Further, information on past six month regular brand usage was recorded and used with the brand check-list information in estimating the Jaccard coefficients. As such, these values measure the relationship between actual usage and brand characterization. A table displaying average importance ratings and the average Jaccard values is shown below.
Table 1
| AVERAGE IMPORTANCE RATINGS | AVERAGE JACCARD COEFFICIENTS | |
| Effective on headache pain | 9.20 | 42.7 |
| Works best | 9.06 | 44.1 |
| Works fast | 9.04 | 45.8 |
| A strong pain reliever | 8.96 | 44.9 |
| Long lasting relief | 8.90 | 39.1 |
| Would relieve worst symptoms | 8.71 | 37.9 |
| Doesn't upset my stomach | 8.26 | 48.2 |
| Good value | 8.09 | 37.3 |
| Brand I trust | 8.07 | 51.4 |
| Relieves muscle soreness | 8.05 | 40.8 |
| Safer to use than most | 7.96 | 29.7 |
| Relieves cold/flu aches/pains | 7.88 | 35.2 |
| Brand I prefer | 7.76 | 52.1 |
| Package provides needed info | 7.75 | 39.5 |
| Specific ingredients I want | 7.24 | 43.2 |
| Easy to swallow | 7.14 | 44.3 |
| Reduces fever better | 6.94 | 27.3 |
| Tamper-proof packaging | 6.56 | 34.5 |
| Easy-to-open packaging | 6.23 | 25.3 |
| Doctors recommend | 6.04 | 28.5 |
| Costs more but is worth it | 5.67 | 24.3 |
| Effective on arthritis pain | 5.33 | 25.4 |
| Recommended by pharmacists | 5.19 | 22.3 |
| Comes in caplet form | 4.98 | 34.6 |
| Helps you sleep | 4.82 | 14.9 |
| Advanced/latest breakthrough | 4.75 | 15.8 |
| Effective on menstrual pain | 4.15 | 22.3 |
| Comes in gelatin covered form | 3.90 | 23.0 |
| Brand hospitals use most | 3.62 | 23.2 |
| Was prescription-only | 3.57 | 11.1 |
Perhaps the simplest way to assess the relationship between these two sets of numbers is to plot the average importance ratings against the average Jaccard coefficients obtained for each benefit. The plot below shows a reasonably strong relationship: benefits most highly related to usage also tend to be those with the highest average importance ratings. The strength of the relationship lends credence to the notion that importance ratings do indeed offer valid insights into consumer needs. Importance does not necessarily have to be indirectly derived.
Lastly, information from two benefits, both brand related ("brand I trust" and "brand I prefer"), shows some inconsistency in the plot above. Both have average Jaccard coefficients that are larger than expected given the average importance ratings. This issue will be addressed below where it will be noted that both serve as a basis for respondent segmentation. The lack of consistency is due to the total sample average importance value being a poor summary when subgroups of respondents with different needs or wants are present. Do note that the average Jaccard coefficient, relating usage to brand perceptions, hinted at the underlying utility of these items.
Some Data and Scaling Issues
A first step in dealing with importance ratings is to admit that the scaling often used can be insensitive. A large proportion of respondents will, almost regardless of product category, find benefits to be important. Three suggestions are offered here. The first is the use of concern scales, as described in Research on Research Paper number 53. Expressing benefits in an objective sense and asking respondents to rate how concerned they are with that product characteristic when considering purchase has proven effective in limiting the number of top box responses. (White this form of scaling takes a different name, it should still be considered a form of importance scaling, given the comparability of measurement goals. Also, the analyses proposed below will be equally effective if the concern scaling is used.)
A simple change of scale label may also be useful. Replacing the top box label of "extremely important" with "absolutely essential", along with instructions to limit the number of top box responses, may also be effective. However, the number of scale points, although a topic of keen discussion, really doesn't seem to be an issue. For the types of analyses proposed in this paper, the difference between use of a 5-point scale and one with ten possible scale responses matters little.
Lastly, consider an anchoring approach, mixing the virtues of ranking and rating. Respondents are asked to read through the entire list of statements and identify some few benefits considered most important and to which the highest scale value is assigned (a "5" if a 5-point importance scale is used, for example). The same process is followed for identifying some few truly unimportant items. These are scaled a "1". The number of "some few" items depends on the length of the list of items to be evaluated. Typically, the top and bottom three to five items will suffice. The remaining items are then rated somewhere between the second highest rating scale point and the second lowest. As such, the upper and lower bounds of importance are established for each respondent who is then encouraged to use the entire scale. Note that this approach works only if respondents can be exposed to the entire list of items first, suggesting that mail panel or mall-intercept data collection methods may work best.
An additional point to consider, although unrelated to data or scaling issues, is that importance ratings may only deal well with benefits that consumers already associate with the category of products being studied. Conversely, importance ratings tend to be poor predictors of whether some new or untried benefit can be successfully exploited in that category. Respondents may accurately judge the utility of a lubricant in a shave gel yet may not fully appreciate the benefit of vitamins in the gel added to enhance the vigor of the shaver.
Reporting Total Sample Results
Once data are collected, there is a natural tendency to report some summary of the importance ratings. Total sample averages or top box percentages are often used to order benefits from most to least important. Such an analysis assumes the sample from which these data: had been collected is homogeneous, that all respondents within that sample would essentially (aside from usual between-respondent or sampling variability) order the items the same way. What was important to one person was important to all. This may be true for some few benefits, those that can be considered as "price of entry", the product characteristics that any brand in the category must possess to be considered for purchase. This may also be true for benefits of no particular utility. Yet, benefits of middling importance, as judged by a summary from the total sample, may not share this consistency. The gross, total sample summary statistics for these benefits may hide the fact that some smaller subgroup of respondents find them to be of great importance perhaps equal in rating to the "price of entry" benefits, while the remaining respondents may consider these benefits to be of much less importance. The subgroup, and there may be several, each with their own special needs, must then be identified and segregated to fully appreciate the information contained in the importance ratings. A cluster analysis based on the importance ratings may be very useful in revealing such subgroups, providing an extremely beneficial segmentation.
An Introduction to the Analysis
Before proceeding further, note that the underlying analysis logic follows the tenets of analysts of variance. An examination of the total sample importance averages, say, is akin to an assessment of the main effect for the benefits, statistically estimating the extent to which these averages differ amongst themselves and establishing a total sample ranking of benefits, from most to least important. Considering the consistency of respondent ratings across benefits brings up two additional estimates obtained from an analysis of variance. The first, and generally least useful, is the respondent main effect, a measure of the extent to which respondent averages, obtained from calculating a mean across the benefits for each respondent, differ. These averages may be more a function of response bias or scale usage than anything meaningfully related to the measure of benefits. So, this effect is typically considered a nuisance, statistically removed and discarded.
The second estimate, or rather set of estimates, relates to the notion that different people rate the benefits differently. In its simplest sense, different people order the benefits differently. Adding a little complexity, some benefits may be much more (or less) important and rated as such by specific groups of respondents. So, not only will the ordering change (where perhaps this change is rather gentle and not readily noticeable beyond statistical error), but the magnitude of ratings may change as well, amplifying the differences among respondents. This is the statistical notion of interaction, that the specific needs of the respondent interact with each specific benefit to yield a response or rating which may not be well predicted from sample averages. It is this interaction information which serves as the basis for the clustering process.
A Simple Example lor Assessing Interaction
This can best be understood by an example with a simple display. The small table below displays ratings, using a 6-point importance scale, to four benefits obtained from five respondents.
Table 2 Respondent
| Benefit | 1st | 2nd | 3rd | 4th | 5th | Mean |
| A | 4 | 3 | 6 | 5 | 4 | 4.40 |
| B | 5 | 4 | 4 | 3 | 6 | 4.40 |
| C | 5 | 4 | 3 | 3 | 4 | 3.80 |
| D | 4 | 2 | 3 | 3 | 5 | 3.40 |
| Mean | 4.50 | 3.25 | 4.00 | 3.50 | 4.00 | |
| Total Table Average: 4.00 |
As mentioned above, the key information for segmentation purposes contained in this table relates to the degree to which respondent and benefit interact, or the extent to which row and column (total sample) summaries, using averages, fail to adequately predict the specific rating given by a respondent to a benefit. These averages are provided for completeness. The quickest way to get at this interaction information is to subtract the row and column averages from each entry, as shown below. For clustering purposes, note that no useful information is lost by this subtraction process. Recall the opinion given above that average respondent ratings reflect response bias and can be safely removed. The benefit averages may be useful as a summary but serve only to cloud the interaction information. Stripping them off makes subsequent interpretation simpler and associated graphics easier to read.
Table 3 Respondent
| Benefit | 1st | 2nd | 3rd | 4th | 5th |
| A | -.90 | -.65 | 1.60 | 1.10 | -1.15 |
| B | .10 | .35 | -.40 | -.90 | .85 |
| C | .70 | .95 | -.80 | -.30 | -.55 |
| D | .10 | -.65 | -.40 | .10 | .85 |
Each of the residuals in Table 3 (residual in the sense of being left over after the subtraction process) was obtained from the corresponding values in Table 2 by subtracting the row and column averages and then adding back the total table (grand) average. Using the 1st respondent's rating of Benefit A as an example:
-.90 = 4 - 4.4 - 4.5 + 4
A thorough visual examination of Table 3 does yield some insights. For example, the pattern of residuals for the 1st and 2nd respondents is fairly similar, as is that for the 3rd and 4th respondents. Yet, the pattern for the 1st and 2nd respondents is quite different from that found for the 3rd and 4th respondents. Clearly, though, the magnitude of residual ratings that remain indicate that the row and column summaries were insufficient in predicting ratings and suggest the presence of interaction. However, it's difficult to determine whether there is anything systematic to the numbers that remain or whether it's just noise. A simple plotting of the residual information for each benefit will help considerably and is shown below.
The display, called an interaction plot, shows respondent profiles. The plot confirms the summary obtained from visual inspection of the table. There appear to be two major (and perhaps one minor) patterns, one for the 3rd and 4th respondents and one for the remaining three. (The 5th respondent deviates a little from this second pattern.) The visual examination of the patterns reveals clusters. The clusters are groups of respondents who rate the benefits essentially the same way, yet differently from the respondents who would fall into other clusters.
Next, consider the usefulness of the total sample benefit averages. Referring back to Table 2, Benefits A and B were rated equally from a total sample perspective. Yet, the plot suggests those summaries are misleading. For example, the total sample average for Benefit A is 4.4, a poor summarization considering the two disparate groups supplying ratings of it. So, Benefit A may serve well as the basis for respondent segmentation. This benefit is of great importance to one group (the average rating from 3rd and 4th respondents is 5.5) and of middling importance (with an average rating of 3.67) to the other. Conversely, the residuals plotted for Benefit B are not as variable as those of Benefit A, nor is the pattern suggestive of any underlying segments other than to note that respondents who had large negative residuals on Benefit A tended to have large positive residuals for B, and vice versa.
A Large Example: Introduction and Statistical Preface
The preceding small example served as a visual illustration of the clustering process, revealing segments of people who have similar patterns of response. And it showed that total sample averages may be misleading if useful segments exist within the data. With that in mind, the large data set introduced above is used again. Recall that the data set had roughly 600 respondents, each of whom rated the importance of thirty benefits pertinent to the analgesic category. A 10-point importance rating scale was used. Note that no special importance scaling was used, just a 10-point scale with labels ranging from "extremely" to "not at all" important. As will be shown, in spite of the relatively unsophisticated scaling, the analysis was useful. (With large datasets such as this, the use of interaction plots to visually identify and form clusters becomes impractical to say the least. The plot serves an illustrative purpose but, in fact, is not used in the creation of clusters. However, a clustering approach that mimics the visual process employed is utilized and is documented in the appendix.)
Table 4 below shows the average ratings for the total sample (listed under "GRAND MEANS") and for each of four segments or clusters extracted from the data. Cluster sizes are shown as well. Lastly, the column headed "F-RATIO" lists numbers which are effective summaries of differences among the cluster averages. The larger the ratio (of differences among the averages relative to a measure of error or within-cluster variability, not supplied here), the greater the differences among the means. F-ratios can be no smaller than 0 and have no upper bound. For interpretative purposes, a value greater than 40 is suggestive of useful differences among the cluster groups. The value of 40 is roughly the median F-ratio. (Note that these F-ratios are not easily assessed from a statistical significance testing point- of-view, regardless of their historic association with analysis of variance. The clusters from any such analysis were created to be as different as possible and, as such, the assumptions typically invoked in the usual analysis of variance situation do not hold. These F-ratios are distributed in accordance with non-central Chi-squared distributions. The non-centrality parameters are unknown and, given the exploratory nature of acluster analysis, are not worth estimating.) Lastly, four cluster groups are used for expedience. No defense, statistical or otherwise, is offered for its use, although the groups are statistically stable and reliable. Further, some simple graphics, such as bar charts of the extent to which cluster group means differ from the total sample averages, serve well to summarize and help interpret the cluster results. For this paper, the table will suffice to show points of interest.
Table 4
| GRAND MEANS | GROUP 1 | GROUP 2 | GROUP 3 | GROUP 4 | F Ratio | |
| NUMBER OF OBSERVATIONS | 173 | 139 | 195 | 124 | ||
| Effective on headache pain | 9.20 | 9.55 | 9.14 | 8.83 | 9.37 | 7.02 |
| Works best | 9.06 | 9.11 | 9.51 | 8.88 | 8.77 | 7.17 |
| Works fast | 9.04 | 8.95 | 9.34 | 8.90 | 9.05 | 2.77 |
| A strong pain reliever | 8.96 | 8.95 | 9.22 | 8.79 | 8.94 | 1.89 |
| Long lasting relief | 8.90 | 8.79 | 9.39 | 8.92 | 8.48 | 7.05 |
| Would relieve worst symptoms | 8.71 | 8.59 | 9.09 | 8.63 | 8.58 | 2.34 |
| Doesn't upset my stomach | 8.26 | 7.95 | 8.31 | 8.69 | 7.98 | 3.93 |
| Good value | 8.09 | 7.94 | 7.86 | 7.95 | 8.91 | 7.17 |
| Brand I trust | 8.07 | 8.60 | 8.94 | 8.34 | 5.92 | 48.97 |
| Relieves muscle soreness | 8.05 | 7.62 | 8.42 | 8.30 | 7.85 | 4.33 |
| Safer to use than most | 7.96 | 7.70 | 8.05 | 8.47 | 7.44 | 6.36 |
| Relieves cold/flu aches/pains | 7.88 | 7.55 | 7.96 | 8.19 | 7.77 | 2.49 |
| Brand I prefer | 7.76 | 8.52 | 9.01 | 8.07 | 4.81 | 93.50 |
| Package provides needed info | 7.75 | 7.60 | 7.41 | 8.28 | 7.52 | 3.95 |
| Specitic ingredienets I want | 7.24 | 7.07 | 6.89 | 7.83 | 6.95 | 4.22 |
| Easy to swallow | 7.14 | 7.08 | 6.84 | 7.74 | 6.64 | 4.60 |
| Reduces fever better | 6.94 | 6.49 | 6.96 | 7.66 | 6.41 | 7.95 |
| Tamper proof packaging | 6.56 | 5.79 | 6.29 | 7.61 | 6.30 | 9.50 |
| Easy-to-open packaging | 6.23 | 5.12 | 7.85 | 7.49 | 4.01 | 67.99 |
| Doctors recommend | 6.04 | 6.51 | 5.71 | 7.50 | 3.48 | 56.85 |
| Costs more but is worth it | 5.67 | 5.54 | 5.48 | 6.81 | 4.27 | 23.08 |
| Effective on arthritis pain | 5.33 | 2.02 | 7.13 | 8.02 | 3.71 | 186.71 |
| Recommended by pharmacists | 5.19 | 5.79 | 3.72 | 6.97 | 3.17 | 71.35 |
| Comes in caplet form | 4.98 | 5.18 | 3.61 | 6.81 | 3.35 | 49.20 |
| Helps you sleep | 4.82 | 3.84 | 3.98 | 6.69 | 4.48 | 37.27 |
| Advanced/latest breakthrough | 4.75 | 5.10 | 3.24 | 6.51 | 3.20 | 61.48 |
| Effective on menstrual pain | 4.15 | 4.76 | 2.05 | 3.92 | 6.03 | 36.11 |
| Comes in gelatin covered form | 3.90 | 4.29 | 2.00 | 5.77 | 2.55 | 75.62 |
| Brand hospitals use most | 3.62 | 4.03 | 1.68 | 5.89 | 1.65 | 131.46 |
| Was prescription-only | 3.57 | 3.43 | 2.17 | 5.50 | 2.31 | 71.02 |
A Next Step
The clustering was utilized to help develop an ordering of benefits. The segments created were used primarily to assess the stability of that ordering and to suggest different orderings based on the needs of specific segments. These segments may be of added utility when coupled with ratings of brand or product performance. The idea here is to create a perceptual map (see Research on Research Paper number 29 for a description of the biplot, a form of map) for each segment. The map displays the perceptual positions of brands or products, along with information on which characteristics best differentiate among them and correlate most strongly with some measure of overall acceptance. Each map would be created within a specific segment which has well-defined needs. As such, each map displays brand or product strengths and weaknesses, with characteristic-based information on how to take advantage of these for a specific group of respondents whose needs are known. The result is far more informative than total sample analyses attempting to relate importance to brand or product performance (e.g., gap or quadrant analysis).
Appendix: Clustering Algorithm
The algorithm discussed below is recommended for use with responses to attitude statements. The objective of the clustering algorithm is to group respondents who possess reasonably similar patterns of response across the statements. The mathematics used are identical to testing for the presence of interaction in an analysis of variance situation when a two-factor design with one observation per cell is encountered. From a clustering perspective, the two factors in the "design" are respondent and the items to be rated. These factors represent the rows and columns of a table or matrix input to the analysis. The "one observation per cell" indicates that each respondent supplied a rating for each item. As discussed above, patterns of response and interaction are synonymous.
The logic of the clustering algorithm used involves three steps:
- Decomposing ratings given by respondents on several items;
- Isolating that portion of information which pertains to interpretable and stable patterns of responses across those items;
- Classification of respondents based on similarity of patterns.
A fourth step, assessing stability of cluster results, is addressed last.
Consider a matrix of ratings, the rows of which are respondents, the columns are items. The first step in understanding the clustering algorithm is noting that each rating can be decomposed into five pieces corresponding to a linear model, such as encountered in analysis of variance applications:
- A grand mean, the mean of all ratings of all respondents across all variables;
- A row effect, the difference between a respondent's average rating, across items, and the grand mean;
- A column effect the difference between a variable mean and the grand mean;
- The interaction between rows and columns (respondents and items);
- Error.
The second step is to isolate the interaction which, again, is in the form of ratings across items. The analysis of variance parallel is followed quite closely. Removal of the row and column means (subtracting off the row and column means yielding what is referred to as double-centered data) eliminates the grand mean, row and column effects from the data. Interaction and error remain. Interaction or pattern is systematic variability and can be teased out using principal components anslysis. The principal components analysis is performed on a sum of squares and cross products or covariance matrix calculated from the doubly-centered data. If enough pattern to pattern variability exists in the data, then the first few principal components should be relatively large. Error, or what is non-systematic variability, is relegated to smaller subsequent components and, ultimately, will be ignored. In essence, the principal components analysis serves as a "vacuum cleaner" accumulating systematic variability which becomes the basis for clustering.
As a brief digression, two critical analysis points underlie the principal components analysis. First, the data are not standardized prior to the analysis. So, the principal components analysis is performed essentially on covariances between items and not correlations. Second, the interpretation of the components is not a primary objective of the analysis. What is important is that a useful number of components be retained to yield a stable, interpretable clustering. Output associated with the principal components analysis is oriented toward helping the user distinguish between components with sufficient variability upon which stable, reliable clusters may be formed and those whose variance is essentially noise.
To the issue of standardization, this rescaling of responses by each item's standard deviation typically hinders the clustering process. The standardization reduces the weight or statistical impact of items with greater variability while increasing the influence of less variable items. Considering that the goal of the clustering is to identify differences among respondents, and hence be driven by items with the greatest variability, standardization is counterintuitive. Consider an example with two items, each measured on a 10-point scale. One item has a standard deviation of 10, the second has a standard deviation of 1.5. Clearly, the first provides much greater insight into how respondents differ. The small standard deviation for the second item suggests a very homogeneous set of ratings; everyone rated the item pretty much the same way. Yet, standardization changes the standard deviation to 1 for both items. In any subsequent analysis which is sensitive to variability indata, and principal components analysis is, both items would be of equal weight or statistical influence. As such, the information about respondent differences contributed by the first item with the large standard deviation has been greatly muted.
An adjunct to the issue of principal component retention is the issue of rotation. It is neither necessary nor generally prudent to rotate the components. A first question to ask here is "Which rotation scheme should be used?". There are many approaches which may yield many different solutions. Which is "best" maybe difficult to assess and which is ultimately chosen seems more a function of familiarity than informed judgment. Unrotated principal components are easy enough to interpret so the use of rotation seems pointless.
The unrotated components are very much like contrasts or comparisons among items, reflecting trade-offs implicit in the responses to those items. While the rating task itself maybe unconstrained (i.e., respondents are free to use any rating value for any item regardless of how often that rating value had been used for other items), the process of row-centering (removing respondent differences) imposes the constraint: after row-centering, the sum of any respondent's ratings is zero. The data now can be interpreted from a relative point-of-view, relative to the respondent average. Items for which a respondent felt more strongly (e.g., the associated benefit was more important) receive a larger positive value. Items for which the respondent felt less strongly, and so received a lower than average rating, get smaller negative values. The principal components obtained from row-centered data reflect this relative view and display the contrasts among the items quite clearly.
(As a statistical note, factor analyses or principal components analyses typically encountered in marketing research have, before rotation, a first factor which accounts for a good deal of variability. This factor represents differences among respondent average ratings. Second and subsequent factors reflect the contrasts discussed above since they are estimated after this first factor has been removed. As such, the process of row-centering is no different from simply removing variability due to that first factor.)
As mentioned above, a small number of principal components are retained which most likely contain information on how respondents interact with the items. Component scores are calculated for each respondent. The scores are then entered into a directed K-means procedure for the clustering itself. The directed nature of the K-means procedure identifies well-spaced "seeds" or respondents to serve as initial cluster centers. As the clustering proceeds, the centers are updated to reflect the inclusion of new cluster members. Classification into a cluster is distance-based: A respondent is classified into that cluster to which he / she is closest and has the smallest distance.
Finally, cluster stabilily is assessed by a nearest neighbor density estimation scheme. Stability here may be defined in two contexts. A cluster solution may be considered stable if it can be replicated by a new sample drawn independently of the first from the same population of respondents. Further, stability may be characterized by placing respondents with comparable ratings into the same cluster, where respondents with comparable ratings are called neighbors. As such, respondents are considered reliably classified if a large proportion of his / her neighbors are also classified in that same cluster. Further, a cluster solution is considered stable if at least 80% of those placed in a cluster have a majority of their neighbors in that cluster as well.
