Statistical Designs for Ordering and Rotating Products in Product Tests

Introduction


This paper summarizes aspects to consider when designing a product test and is a sequel to Research on Research Paper Number 41, "Some Methodological issues in Product Testing." Issues of design addressed in this paper concern the practical and statistical aspects of physically presenting the products to be tested.

A most important issue, greatly affecting the choice of design, is the impact of order or context or carry-over effects: the effect one product may have on others that are seen with it, and specifically after it, in the product evaluati on scheme. This is considered so important that a characteristic of all designs used (that can be adapted as such) should be that each product will follow each other product in a reasonably balanced fashion. Along with the use of randomization, order effects should be balanced relatively well so that the ability to estimate these effects statistically is enhanced. (A well designed product test cannot eliminate order effects, but, rather effectively, allows for the measurement, statistical assessment and separation of such effects from those involving product differences.)

Design features discussed below are most applicable to product tests where the products have no underlying factorial structure. Although the general principles to be presented can be used in conjunction with factorial designs, additional design features, like confounding, may better suit such studies. Lastly, although this paper addresses the presentation of products, the discussion is pertinent and completely applicable to the design of concept tests.

The specifics of any product test design are based primarily on two constraints concerning the size of the product test: 1) the total number of products to be tested and 2) the number of products to be evaluated by each respondent. These considerations, along with sample size (which may be viewed as a constraint by which the design is burdened), dictate the type of design used. The comments below are divided into sections, corresponding to the size of the intended product test. (A number of designs are discussed below. More information, including references to catalogs from which the designs were taken, is available upon request from members of Decision Systems in Chicago.)


A Small Example

Consider a small, simple experiment in which three food products, "A," "B" and "C," are to be tested. Respondents will be asked to evaluate two of the three products, offering information on typical sensory characteristics. A balanced incomplete block design (BIBO), shown in Table One, can be used. The arrangement is balanced in that each product will be evaluated an equal number of times and each pair of products will be tasted together an equal number of times across the entire sample. Further, the design is incomplete since each respondent evaluates only a subset of products rather than the entire set. Products to be evaluated together are arrayed in blocks which are, then, assigned to respondents.

Each product appears twice in this design, in two of the three blocks. Each pair of products is presented once. These blocks are used, or replicated, a number of times to obtain a sample of specific size. Sample size requirements may be satisfied by either total sample needed, or, more typically, the number of evaluations necessary per product. For example, 100 respondents may be desired to complete the product evaluations. Thirty-three replications of the above three blocks requires 99 respondents, close enough to the 100 to be satisfactory. This number of respondents yields 66 evaluations per product, since each product appears in two of the three blocks. Alternatively, 70 evaluations per product might be required which may then be satisfied by 35 replications of the blocks, for a total sample of 105.

Although balanced with regard to features guaranteed by the BIBD, the blocks listed in Table One are unbalanced with respect to the order of products. The product "A" is in the first position within the blocks in which it appears and would correspond to the product tried first by those respondents to whom those blocks were assigned. Further, product "C" will always be tasted last. In fact, each block represents only one of two possible ways to order products for presentation to respondents. To achieve order balance, products within blocks must be permuted. The two possible permutations per block are presented in Table Two. As such, two permuted sets of three blocks, which yield six product orderings or versions, are created in which order of presentation is balanced in addition to the balance supplied by the BIBD.



The six versions constitute a completely balanced design which would better serve as a basis for replication to achieve a sample of a specific size. Reconsidering the required sample of 100 respondents, 16 replications of the six versions could be used. Ninety-six respondents are needed (16 x 6) from whom 64 evaluations per product are obtained. (Each set of six versions contributes four evaluations per product, which, when replicated 16 times, produces the 64 evaluations.) Alternatively, if somewhere in the vicinity of 70 ratings are needed for each product, 17 or 18 replications could be used, with total samples of 102 or 108, respectively. When possible, sample size requirements, either in terms of total sample or number of evaluations per product, should be a multiple of the basic number of versions, such as those presented in Table Two. In the examples cited here, multiples of 16 to 18 were considered. Using complete replications assures the researcher of a well balanced design. A situation where complete replication is not practical is discussed later.

The order in which versions (permuted blocks) are assigned to respondents is determined by a three stage randomization plan. Versions within each permuted set are randomized first. The randomization is done separately for each permuted set. Next, the sequence of permuted sets is randomized, determining the order in which these sets are used. Lastly, the prior two steps are redone separately for each replication. Considering the design shown in Table Two, the order in which versions are listed within each permuted set is determined first. Which permuted set is listed first in its entirety is, then, determined. This randomization scheme is redone for each of the 16 to 18 replications needed. The randomization process can be carried out by the computer generation of random numbers, where the versions within permuted sets and the permuted sets to be randomized are assigned a number. Sorting by the assigned number results in a random sequence. In lieu of a computer, tables of random numbers are available in many statistical texts and may be used. A final listing of versions is, then, supplied to the field or interviewing service for implementation.

Randomization, as used here, is purely a safeguard against unsuspected sources or influences which may affect product ratings. For example, three interviewers may work in rotation which, prior to randomization, could coincide with specific versions. To the extent different interviewers cause systematic differences in product evaluations (interviewer bias), products within a specific version will be affected. Randomization greatly reduces this possible systematic influence by increasing the chance that each interviewer will work with all versions. Even though it may be difficult to control for or "design around" interviewer effect, randomization serves the goal of more evenly distributing possible disturbing effects across all versions.

The use of randomization within permuted sets within replication (rather than across all replications at once) ensures a reasonably balanced product presentation over time and, specifically, within short time intervals. By keeping all versions within a replication together, albeit in random order from one replication to the next, all products will appear for testing roughly equally often within small subsets of respondents, and, hence, within a small time span. Potentially a typical responses due to time of day will be spread roughly evenly and will affect all products roughly equally.


Additional Comments for Three or Four Products

In general, permutations and randomization of groupings of products can be used, whether respondents are asked to evaluate all products or some subset. If some subset of two or more products is to be evaluated, then a BIBD should serve as the basis for constructing versions. Only one block need be considered for designs in which respondents evaluate all products to be tested. All permutations of order, each defining a version, follow from this block. A three-product test leads to 3! (factorial), or six versions, as listed in Table Three. Examination of product placement in the table shows the virtues of permutation: each product appears equally often in each position, corresponding to the order in which they will be tried, and is preceded and followed by each other product equally often.

These six versions are, then, Fully replicated as many times as are necessary to fulfill sample size requirements. Each replicate of the six versions is randomized separately to ensure a balanced product presentation. The number of versions necessary increases considerably for a four-product test in which respondents are exposed to all products. Again, one block containing all products serves as the basis from which 4! or 24 versions are grown. Replication and randomization proceed as before.

The 24 versions need only be re-randomized by replication. (An alternative design, called a "complete" Latin square, requiring fewer blocks, is discussed later.)


Tests of Five Products

Designs in which five products are to be tested are considered next. When two or three products are to be evaluated, again a BIBD serves as the basis for version construction. All permutations are, then, generated for each block to supply a basic set of versions. For example, the needs of a three of five product test design, where respondents evaluate three of the five products, can be satisfied by a BIBD with 10 blocks, which are listed in Table Four.

Six permutations of product order per block (3!) for each of the 10 blocks leads to six permuted sets, a total of 60 versions. Each product receives 36 ratings across this set of versions. (Each product appears six times within the 10 blocks of the BIBD. Since product order can be permuted six ways within each block, 36 product ratings result.) Ideally, for the sake of a balanced design, the number of ratings per product needed would be a multiple of 36, say 72 ratings. If this is the case, the set of 60 versions would be replicated in its entirety. The total sample size would, then, be 120.

However, say 50 ratings are desired. Thirty-six ratings from a sample of 60, one respondent per version, are too few and 72 ratings from a sample of 120, using two sets of the 60 version design, are too many. Any "in-between" number of versions produces an unbalanced design. Each product may not be rated an equal number of times (although all may be rated approximately 50 times) and each pair of products may not occur equally often, a feature guaranteed by the use of BIBDs. Also, some orderings or permutations of products may occur more frequently than others. A two-step procedure is followed to accommodate the addition of versions to achieve the required sample size with as much balance as possible: 1) a random selection of versions from the original set of 50 are added to the 50-version design and 2) an inspection and possible modification of the versions selected is performed to ensure some semblance of balance along the lines just mentioned. As an example, 84 respondents are required to obtain roughly 50 ratings in the three of five design discussed above. From the set of 60 versions, 24 were randomly selected to complete the sample requirements. The ordering of products within some of these versions can be changed to obtain better balance. If each product appears roughly the same number of times, then no new versions, as replacements for those drawn, are necessary. Once created, the 84 versions are randomized, by block and by version within block for each replication, and listed for use by the interviewers.

A note on the random selection used here: the basic set of 60 versions is composed of six permuted sets of 10 blocks each. The first set of 10 blocks may correspond to the ordering as found in a text cataloging BIBDs. The second set of 10 is a consistent permutation of the first set, e.g., all products in the first and second position switch, and so on. The random selection used to obtain 24 more versions is applied to blocks, not versions. Two of the six permuted sets of blocks are randomly chosen and retained in their entirety. A third set is also selected, randomly, but from which only four versions are chosen to round out the required sample size. This randomization takes into account the inherent structure of the 60 versions, that products are balanced within each of these sets of 10 blocks.


Latin and Youden Squares

There are too many permutations of orders when respondents are asked to evaluate all five products. Specifically, there are 5! or 120 versions. This type of design is completely balanced but may require too large a sample. Random selection of these 120 versions could be made but order is at the mercy of randomization and is sacrificed. An effective alternative is a "complete" latin square design. Such squares are considered complete in that each product is followed by each other product an equal number of times. "Completeness" exists in addition to the usual latin square feature that each product appears equally often in each position. As such, each product is tried first equally often, as well as in every other position. For odd numbers of products, two squares are required to achieve "completeness."

Therefore, for five products, ten blocks are required, as listed in Table Five.

These squares can be replicated to get to the necessary sample size. Randomization would proceed in two stages, with blocks of the latin square treated as versions. The order of blocks is randomized separately within each replication.

Latin squares yield Youden squares when one column of the square is deleted. This is of special interest when designs are needed to accommodate product tests where respondents are to rate one fewer than the total number of products. Of specific interest here is the test where four of five products are to be evaluated. A BIBD can be constructed having five blocks, with 24 permutations of products. Within blocks, there are too many versions (5 x 24 or 120), requiring too large a sample. However, properties of a BIBD and a complete latin square can be combined so that some balance is retained. This is a Youden square. A Youden square design constructed from the complete latin squares shown in Table Five is presented in Table Six.

As was the case with the complete Latin square for the five of five product test referenced above, one replication of the Youden square requires 10 blocks. The appearance and trial of products will be evenly distributed across the design and each product will follow each other product, although not equally often. Other sets of Youden squares can be easily obtained by simply dropping other columns of the complete square(s). As such, five sets of Youden squares of ten blocks each can be created for the four of five product test case, each obtained by dropping one of the five columns of the complete squares.

For even numbers of products, one "complete" square is truncated to yield a Youden square. The number of evaluations obtained per product is one fewer than the number of blocks. For the design where respondents see four of five products, the ten blocks from two squares yield eight ratings per product. Sample size needs can be satisfied by replicating any or all of the original sets of Youden squares. Finally, randomization is performed. Blocks are again treated as versions, and randomization is performed independently for each replication.


More Than Five Products to be Tested

For tests involving more than five products, some comments are offered to show the generalizability of the above design considerations. When respondents are asked to evaluate three of six or seven products, the creation of versions following the use of a BIBD and permuted product orders can work well. In the three of six case, 60 versions (10 blocks, six orders per block) are required. Forty-two versions (seven blocks, six orders per block) are needed for the three of seven case. Further, Youden and Latin square arrangements are useful for larger designs.

Basically, the tools used for smaller product tests can be extended for some situations. However, consider a test in which respondents are asked to evaluate four of six products. A BIBD of 15 blocks with 24 order permutations per block is required.

Clearly, too many versions are needed to achieve the kind of balance discussed above. Two alternatives have some utility. First, rely on the 15 blocks from the BIBD and randomly permute order for as many versions (respondents in the study) as are necessary. Or, use a partially balanced incomplete block design (PBIBD) of, say, three blocks with 24 permutations each. (Each set of three blocks supplies two product ratings. With all permutations, 72 versions, 48 product ratings, are obtained.) A PBIBD for the four of six case is displayed in Table Seven.

A characteristic of PBIBDs is that fewer blocks are required to construct a design. All products would be evaluated equally often (twice for the above design), but all pairs of products do not appear with equal frequency. Specifically, two or more sets (called associate classes) of pairs result, each occurring with a different frequency. (Some pairs, such as "AD," appear twice while others, like "CA" appear only once.) As such, some statistical comparisons between products will be calculated with greater precision than others. If some products are of secondary importance, perhaps this PBIBD characteristic could be used to advantage. The assessment of order effects is affected as well. The partial balance supplied by PBIBDs restricts the ability to balance order effects. With complete permutation, for example yielding 72 versions from the three blocks in Table Seven, all products will still precede and succeed all others, but not with equal frequency.

Smaller, more manageable designs as characterized by PBIBDs are traded-off against larger but balanced designs supplied by BIBDs. Which alternative is best would depend on the specifics of the design concerning statistical and practical efficiencies. In general, though, the balanced approach provided by a BIBD is to be preferred.