Effects of Various Rating Scale Descriptors and Administration on Response Profiles With Telephone Interview Data
Ratings given to attitudinal questions administered over the telephone are influenced by the order in which response alternatives are described to respondents, the inclusion of a scale mid-point descriptor, and the choice of end-point labels. If the questions are different, researchers shouldn't expect the answers to be the same.
Rating scales are utilized extensively in marketing research to quantify attitudes and perceptions toward various consumer products, brands and services. The responses an individual provides to rating scale questions, however, will be influenced to a certain extent by the properties of the scale itself. It is difficult, if not impossible, to extricate this bias from the individual's response in hopes of obtaining a more accurate attitudinal assessment. Therefore, it is imperative that the researcher consider the implications that use of rating scales of differing scale characteristics has on a respondent's answers.
This served as the impetus for the research described in this paper. Telephone interviews were conducted utilizing ten-point rating scales that varied slightly in their properties and administration, and comparisons were made of the resulting response profiles from each. The specific research question addressed is whether providing a scale mid-point description, reversing the order in which scale end points are described, or differing the end-point descriptions influence the resulting response distribution.
Related research conducted with mail survey data is reported in earlier Research on Research papers. Research on Research # 1 explored the effect of varying the order that the scale end-points are presented to respondents. Refer to Research on Research # 3 for a comparison of semantic differential scales. Research on Research # 28 and # 53 examined the impact of varying the number of scale points.
Method
1,960 nationally representative adults were telephoned and asked to rate their experience with their primary bank or financial institution on a ten-point rating scale. The interviews took place over two weekends separated by a period of 12 days. Six rating scale versions were used, three versions each time period. Individuals were randomly assigned to one of these six versions. Approximately 300 respondents were exposed to each scale version. Exhibit A displays the exact wording of each rating scale version.
Three characteristics of the rating scales varied: 1) the semantic description of the end-points of the scale, 2) inclusion / exclusion of a "middle" (5th) point label (in addition to the end-point labels), and 3) the order in which the scale end-point labels were read to the respondent. Each of these variations represents a separate comparison that was examined, the results of which are described in more detail below.
Results
The response distributions and descriptive statistics for the six rating scale versions are displayed in Exhibits B through E. Results from several tests of differences (top-box, top-2-box, top-3-box, bottom-box, mean, and response distribution) between the comparable distributions are also displayed.
Effect of Labeling a "Middle" Scale Point:
Exhibit B displays a comparison of response profiles from respondents given a mid-point scale description ("Quite Acceptable") in addition to end-point descriptions vs. respondents given only the end-point descriptions. Three times as many respondents gave a rating of "5" when the fifth point was described than when it was not. This difference was significant at the 99% confidence level as was the difference between profile means.
The Order in which End-Point Labels are Presented;
Exhibit C allows a visual examination of effects attributable to the order in which scale end-points are described to respondents. A greater percentage of respondents endorsed the end-point that was described first, as can be seen when looking at each end-point separately. A significant "top-box" and "bottom-box" difference at the 95% confidence level was detected between responses to the end-points: (1) "Completely Unsatisfied" and (10) "Completely Satisfied." (This effect was not significant when comparing scales where the mid-point was described, but those distributions were too strongly "contaminated" by inclusion of the mid-point description to be of use here.)
Effects of Different Scale End-Point Labels:
Exhibits D and E compare response distributions to scales utilizing different end-point descriptors:
- Exhibit D: "Completely Unsatisfied" to "Completely Satisfied" vs. "Completely Unsatisfactory" to "Perfect in Every Way";
- Exhibit E: "Completely Satisfied" to "Completely Unsatisfied" vs. "Extraordinary" to "Unacceptable."
The particular labels used to anchor the rating scale can impact the entire response distribution. A significant difference at the 95% confidence level was detected between the overall distributions displayed in Exhibit E but not between those in Exhibit D. The "top-box" and "bottom-box" differences reached significance at th e 95% confidence level for both sets of scales, with the exception of the "top-box" difference between "Completely Satisfied" and "Perfect in Every Way," which was significant at the 90% confidence level.
Discussion
The order in which response alternatives are described, the inclusion of a scale mid-point descriptor, and the choice of end-point labels all can have a significant impact on the obtained response profile. Therefore, absolute values of means, "top-box" percentages, etc. depend on the specific characteristics of the scale used.
The higher endorsement of scale end-points described first to respondents noted in Exhibit C exemplifies the "primacy effect," which has been observed by other survey researchers. The primacy effect was noted in a mail panel study reported in Research on Research # 1, in which respondents rated a series of questions on a five-point rating scale. One group of respondents received the questions in one order and the order was reversed for the other group of respondents. Although the absolute ratings are affected, differences among attributes ("objects") may not be (as suggested by Research on Research # 53).
One explanation for this effect is that the respondents can expend less mental effort or judgment by endorsing the scale end-point described first, as long as the scale end-point described first is in line with their general attitudes to the rated object (e.g., the respondent's primary bank or financial institution). This may be especially true for respondents unable to readily access the information needed to form a more "refined" judgment. In such cases, the respondents may not be familiar with or has not previously fanned a judgment about the object. It is not clear, however, if this explanation is generalizable to other modes of data collection.
A similar explanation can be made for the observed spike at the mid-point of the response distribution when the mid-point label was described to respondents (see Exhibit B). Some respondents may have endorsed the mid-point if the information they needed to make a judgment about the object was not accessible and a five point rating was in line with their general attitude to the object. This does not necessarily indicate that these respondents merely have a "weak" attitude or no opinion about the object.
It is also possible that context effects influenced respondent's interpretation of the "Quite Acceptable" descriptor when anchored by the end-points: (1) "Unacceptable" and (10) "Extraordinary." For example, if respondents were asked to associate the label "Quite Acceptable" with one of ten scale points anchored by the end-points (1) "Unacceptable" and (10) "Extraordinary," a median placement would be to scale points nearer to the "Extraordinary" anchor than to the "Unacceptable" anchor. As such, had respondents not been given the mid-point label, they may have given a higher rating to the object (i.e., a rating of 6,7, or 8). The results from the scale without the midpoint label bear this out.
Together these two interpretations may explain, in part the observed spike in response frequency to the scale's mid-point value. An obvious and easy solution to control this type of bias is not to include a mid-point label when telephone interviewing will be used to collect attitudinal data.
Inappropriately labeled end-points can seriously distort the response profile (means, variances, shape) if they do not provide respondents an appropriate frame of reference to rate the object. This could result in too many people giving a "top-box" or "bottombox" response or too little variance to work with for segmentation purposes (These concerns are addressed in greater detail in Research on Research #44). If perceptions of "Completely Satisfied" were perceived as less extreme than perceptions of "Perfect in Every Way" or "Extraordinary" as ten-point scale descriptors, then this may account for the lower percentage of "top-box" percentages among the scales employing the latter descriptors. The same type of conjecture could be made for differences noted at the lower tail of the distributions. If the researcher has access to response profile data derived from rating scales that varied only in the description of the endpoint labels, then he/she can compare these distributions to note biases that may be attributable to the particular end-point descriptors used. The researcher can then use this information to make a reasoned decision about the choice of end-point labels to use in future research studies.
Conclusion
The researcher must consider the impact that scale design properties and administration methods have on respondents' answers to rating scale questions if an adequate attitudinal assessment is to be made. As was shown in this research, responses to attitudinal questions asked via telephone interviews can be influenced by the order in which response alternatives are described, the inclusion of a scale mid-point descriptor, and the choice of end-point labels. Even apparently minor or trivial changes (e.g., altering the response descriptor order) can influence the resulting distribution, making the original and altered response profiles non-comparable.
Exhibit A - Wording of Questions
- Overall, considering everything, using a "1" to "10" scale where "1" stands for "Unacceptable", "5" stands for "Quite Acceptable" and "10" stands for "Extraordinary", how would you rate your experience with your primary bank or financial in stitution?
- Overall, considering everything, using a scale from "1" to "10" where "1" means "Completely Unsatisfied", and "10" means "Completely Satisfied", how would you rate your experience with your primary bank or financial institution?
- Overall, considering everything, using a scale from "1" to "10" where "1" means "Completely Unsatisfactory", and "10" means "Perfect in Every Way", how would you rate your experience with your primary bank or financial institution?
- Overall, considering everything, using a "1" to "10" scale where "10" stands for "Extraordinary", "5" stands for "Quite Acceptable" and "1" stands for "Unacceptable", how would you rate your experience with your primary bank or financial institution?
- Overall, considering everything, using a "1" to "10" scale where "10" stands for "Completely Satisfied", and "1" stands for "Completely Unsatisfied", how would you rate your experience with your primary bank or financial institution?
- Overall, considering everything, using a "1" to "10" scale where "10" stands for "Extraordinary", and "1" stands for "Unacceptable", how would you rate your experience with your primary bank or financial institution?
Weekend 1 (1st Scale End-Point Described First)
Version
Weekend 2 (10th Scale End-Point Described First)
