Response measures, data collection methods, and conjoint analysis: A two-attribute case study
Introduction:
Conjoint analysis can be one of the most powerful research tools in the marketer's armamentarium because of its ability to predict consumer preferences for products which have never been directly evaluated or perhaps even developed. Literally hundreds of conjoint studies have been commissioned during the last decade. But, there is no consensus on how best to implement the various steps needed to execute a conjoint analysis study in spite of the rather large amount of experience the marketing community has had with the technique.
For example, differences of opinion exist on the appropriateness of various response measures, data collection methods, model specifications, and estimation procedures. There are various schools of thought on these topics, and each has its supporters. However, surprisingly few published reports have appeared which attempt to determine whether the various implementation procedures have any influence on the predictive validity of the resulting models. And, the predictability of hold-out preference orderings is probably the single, most important criterion by which these modeling procedures can be judged.
In order to provide data which can be used to address a subset of these issues, Market Facts conducted a study to determine what impact different response measures and data collection methods have on the predictive validity of a relatively simple conjoint model.
The choice situation selected for this project was consumer preference for men's basic blue jeans which were described in terms of only two attributes: brand and price. Tables One and Two list the six levels of brand and the six levels of price which were used in this study. The various response measures and data collection methods employed included:
- A complete rank ordering of all 36 brand-price combinations (jean concepts)
- Rank orderings of mutually exclusive and exhaustive subsets of all brand-price combinations
- Preference ratings of all brand-price combinations
- Rated paired-comparisons of selected pairs of brand price combinations
- Simple preference paired-comparisons of selected pairs of brand-price combinations
The listing of these methodologies is ordered from greatest to least in terms of the amount of preference information each attempts to elicit from the respondent. The predictive value of the conjoint model parameter estimates (utilities) should increase with the amount of information the methodology provides as long as the respondent is willing and able to detect and report potentially slight preference differences among the jean
| Table One | Table Two |
| Brands used to describe jean concepts in the estimation section | Prices used to describe jean concepts in the estimation section |
| Lee | $10.49 |
| Levi | $11.99 |
| Penney's | $13.49 |
| Sears | $14.99 |
| Sedgefield | $16.49 |
| Wrangler | $17.99 |
concepts. However, to the extent that these conditions are not met, the differences in the amount of preference information these methodologies actually provide should decrease as should the differences in predictive validity of the conjoint models to which they give rise. The above reasoning assumes. of course, that the conjoint model is correctly specified.
Method:
A sample of 300 consumers was recruited at one of Market Facts' permanent central location facilities. Each qualified respondent was between 13 and 49 years of age and had purchased at least one pair of men's basic blue jeans in the last six months from a non-discount clothing or department store. Individuals were assigned sequentially to one of four equal-sized treatment groups.
The questionnaire consisted of three parts: an estimation section, a validation section and a demographic section. The material in the latter two sections was constant across treatment groups, but the composition of the estimation section differed among treatment groups in the following ways.
The 75 individuals in Group 1 were given a card deck containing the complete set of 36 brand-price combinations, and asked to rate each brand-price combination in terms of their like lihood to purchase using a nine point scale. To facilitate the rating task, a numbered rating board was provided, and each brand-price combination was rated by placing its card on the appropriately numbered section of the rating board. Respondents were allowed to reconsider previous ratings at any time. Following the rating of each of the cards, they then rank-ordered the cards within a rating section, again in terms of likelihood to purchase. This provided a complete rank-order of the entire set, in addition to the complete set of ratings.
The individuals in Group 2 were presented with six sets of six brand-price combinations, and asked to rank the combinations with in each set in terms of likelihood to purchase. A Latin Square arrangement was used to partition the complete set of 36 brand-price combinations into six sets of six in such a way that each of the six brands and six prices appeared in each of the sets.
The individuals in Group 3 were presented with 30 pairs of brand-price combinations and asked to indicate which one of each pair they preferred most in terms of purchase interest. Across the 30 pairs, each specific combination appeared either once or twice and each specific brand and each specific price appeared 10 times in all. Within a pair, brands and prices always differed, and each specific brand and each specific price was paired with every other brand and every other price exactly twice. Essentially, an effort was made to construct as connected a design as possible, given the number of pairs presented.
The individuals in Group 4 were presented with the same 30 pairs of brand-price combinations that were seen by the individuals in Group 3. However, instead of just indicating the preferred member of each pair, they were asked to indicate the degree of their preference. A nine point rating scale was used with the end points representing "greatly prefer" for the respective member of each pair and the middle point representing "no preference". Simple preference paired-comparison data was subsequently created for this group by retaining just the directions of preference and ignoring their magnitudes.
Following the completion of the estimation section, all respondents rank ordered, within each of three sets, six brand-price combinations in terms of likelihood to purchase. These three sets of six jean concepts comprised the validation section and are listed in Table Three.
Conjoint Analysis:
Six sets of individual level conjoint analyses were performed on the data contained in the estimation sections of the four treatment groups. In addition to the rank-order data from Group 2 and the simple preference paired-comparison data from Group 3, both the rating and rank-order data from Group 1 and the rated as well as simple preference paired-comparison data from Group 4 were analyzed. Thus, the respondents in Groups 1 and 4 each provided two sets of data for analysis
Output from these analyses were utilities for each of the six brands and price developed-separately for each of the 6 x 75 = 450 individual sets of responses. Also output from these analyses was tau, a measure of goodness-of-fit of the conjoint model to the data in the estimation set. Only those individual analyses whose tau value exceeded .65 were retained for subsequent analysis. By this criterion, 42 individual sets of responses were discarded leaving 66, 70, 69, 71, 67, and 65 respondents in groups 1A (ratings), 1B (rankings), 2 (sets of rankings), 3(simple preference pairs), 4A (simple preference pairs), and 4B (rated pairs), respectively.
Prediction Results:
The question of central interest in this study is whether different response measures and data collection methods give rise to conjoint analysis models which differ in predictive validity. In order to answer this question, the individually developed conjoint analysis models were used to predict the preference information contained in the validation section.
Two measures of predictive validity were used. One measure was first choice hit rate, which was scored as 1 or 0 depending on whether or not the first choice jean concept was correctly predicted as first choice. The other measure was the Spearman rank correlation coefficient computed between the actual rank orders of preference and the predicted rank orders of preference. Both measures were calculated for each individual conjoint model for each of the three rank orderings contained in the validation section.
Preliminary analyses revealed that although the three validation sets differed in predictive validity (see Table Three), there was no evidence of a model by validation set interaction. Consequently, the data were collapsed over validation sets within individuals for each measure. Figures 1 and 2 display the proportion of first choices correctly predicted and the average Spearman rank correlation for each conjoint model within each treatment group.
One-way analyses of variance were performed on the average hit rate and average Fisher-z- transformed Spearman rank correlation computed within individuals. No significant differences in average hit rate were found among treatments.
However, the transformed Spearman rank correlation of Group 2 (sets of rankings) was found to be significantly higher than the remaining conditions exclusive of Group 4B (rated pairs) with which it did not differ statistically. (All tests were conducted at the alpha = .05 level).
Discussion
The response measures and data collection methods employed in this study differed in terms of inherent sensitivity and the degree at cognitive effort required to successfully perform the preference tasks. However, no significant differences emerged in the predictive validity of the resulting conjoint models with the exception of Group 2 (sets of rankings), as measured by Spearman rank correlation. The Group 2 individuals performed the same task in the estimation section as in the validation section, so their relatively superior performance in the latter task could be due to their being more practiced at that task than the other groups.
An alternative and perhaps more likely explanation for the superior performance of the Group 2 individuals is that their methodology may have achieved a balance between asking for too little preference information and asking for too much, i.e., more than could accurately and consistently be given by the respondent. Either extreme would result in utilities which are less reliably estimated than in a more balanced situtation. Respondents found rank-orderinq sets of six jean concepts less tedious than rating 36 concepts or responding to 30 paired-comparisons and easier than rank-ordering 36 concepts in total.
It remains to be seen whether the results of this study will generalize to the more typical and more difficult conjoint study in which products/services vary on more than two attributes. Our suspicion is that greater differences among the methodologies would appear under more varied circumstances with some procedures continuing to seek too little preference information and others continuing to seek too much.
We suspect that, just as in the simple case studied here, having respondents rank-order medium-sized sets of concepts will provide the optimal amount of preference information for conjoint model development. As a possible added benefit, this methodology seems to maintain the greatest degree of respondent interest in the preference task.
Table Three
Sets of jean concepts used for validation
| Set 1 | Set 2 | Set 3 | |||
| Sedgefield | @$14.39 | Wrangler | @$13.19 | Levi | @$11.09 |
| Penney's | @$12.29 | Sedgefield | @$16.19 | Wrangler | @$14.69 |
| Lee | @$10.79 | Levi | @$15.29 | Penney's | @$15.89 |
| Sears | @$12.89 | Penney's | @$11.69 | Lee | @$14.09 |
| Levi | @$16.79 | Sears | @$11.39 | Sedgefield | @$13.79 |
| Wrangler | @$12.59 | Lee | @$17.09 | Sears | @$15.59 |
| Hit rate = .72 | Hit rate = .70 | Hit rate = .84 | |||
| Spear corr = .77 | Spear corr = .77 | Spear corr = .85 | |||
