Number-type Data: Structured vs. Open-ends

Collecting "number-type" information (price paid, phone calls per week, etc.) in a structured or interval approach vs. an "open-end" method can produce quite different results.


Background:

Researchers frequently seek to collect very specific "number-type" information. Prime examples are number of purchases of, or price paid for, a specific product or service. The researcher has an option of formating the question so that the respondent can answer in two ways:

  1. Structured or interval approach – respondent selects the interval which includes his/her answer
  2. Open-end approach – respondent indicates a specific number which describes his/her answer.

The structured approach offers data processing efficiencies. However, it requires knowledge of the shape of eventual frequency distributions.

If the objective of a "number-type" question is merely to divide responses at some pre-determined point (example: under $10 vs. $10 and over), either approach may work equally well.

However, researchers frequently compute means or medians based on responses to "number-type" questions. Actual means or medians can be developed from a question using an "open-end" approach.

On the other hand, such computations from data collected in an interval or structured approach requires that specific values be assigned to each interval.

Medians may offer a higher level of statistical accuracy than means because the margin of error can be no wider than the range of the interval in which the median falls. Mean computation, on the other hand, can be subject to error in the selection of values for each frequency interval, particularly an interval in which no upper limit is defined (i.e. $50 and over).

Regardless of inherent difficulties, means are frequently calculated by assigning the interval a value equal to the mid-point of the interval with upper and lower limits. The value assigned to the interval with no upper limit is particularly arbitrary and may significantly effect the computed mean.

Researchers should be aware that means computed in such a manner may differ significantly from "actual" means using an open-end approach – even if the question is identical in all other respects.


Research Method:

Market Facts. Inc. investigated the two alternative approaches to "number-type" responses using Consumer Mail Panel and its Data Gage service. A questionnaire was mailed to 4000 female heads of household. All respondents were asked:

Half the respondents were presented with structured response intervals and asked to "X" the interval which included their answer. The other half was asked to write in the "actual" number which best described their answer. This mail-out resulted in approximately 3200 completed questionnaires.


Results:

The data obtained from the alternative response approaches are compared in the chart in terms of both frequency distribution and mean. Responses to the open-end approach were re-classified into frequency intervals identical to those used in the structured approach. Means were computed for both groups. The means for the open-end group were calculated on the actual reported numbers. In contrast, means for the structured group were computed by assigning a mid point value to each interval which had defined upper and lower limits. For intervals which had no upper limit (i.e. 51 or more calls), the actual mean of the open-end group within the intervals were used to "weight" the responses. Although this "luxury" is not normally possible, it is believed that such a weighting procedure would concentrate the analysis on "actual" differences between groups and not differences resulting from an arbitrary selection of upper-limit interval values.

Detailed statistical analyses were conducted to compare response groups on both the frequency interval and mean observations.


Frequency Interval

For both weekly telephone calls and weekly grocery store expenditures, significant differences exceeding the 95% level of confidence were observed between answers of the open-end and structured response groups.


Mean Calculation

Responses to the telephone call question yielded significantly higher mean scores for the open-end response group at the 95 %level of confidence. Further analysis of the raw data indicated a likely reason for a higher average score. A disproportionate number of respondents in the open-end groups gave answers rounded to the nearest 5 or 10 (20, 25, 30, etc.). A number of arbitrarily selected answer intervals ended in 0 or 5.Mid-point values were assigned to frequency intervals in computing means among structured group responses wherever possible. This mid-point value was substantially lower than the "actual" number answers of open-end respondents within the same frequency interval. This tended to depress the mean score of respondents in the structured response group.

In contrast, no significant difference was observed in the mean scores calculated for weekly grocery store expenditures, even though the groups differed substantially within frequency intervals. In this case, arbitrarily selected answer intervals presented to the structured response group tended to begin with multiples of 5 or 10 ($25, $50, $75). A relatively larger proportion of open-end group respondents wrote in these numbers. Because mid-point intervals were used to calculate means for the structured response group, the resulting score was "inflated" relative to the open-end group.



In conclusion, if a decision is made to use structured answer alternatives in a "number-type" question, care should be taken in the selection of frequency intervals. The mid-point of such intervals should be close to the mode of that interval.

In addition, extreme caution should be used in comparing answers to seemingly identical "number-type" questions that were collected using differing response techniques.


Click to enlarge