Displaying group differences using Biplots
"Perceptual maps" are among the most commonly used tools for describing and portraying group differences on multiple attributes. The term "perceptual map" is a general one which has been used to refer to a variety of statistical techniques, including discriminant analysis, multidimensional scaling, plots of group means on principal components or factors, and a relatively new technique known as the "biplot." Biplots, unlike most other "mapping" techniques, can be used with many types of data, such as means, percentages, and frequency counts. This report describes the use and interpretation of biplots and describes the relationship of the method to discriminant analysis.
Example
The biplot will be illustrated using data obtained by asking respondents to indicate which of four attributes describe men who wear various brands of after-shave. The attributes comprise a checklist, and respondents were free to check as many attributes as they wished for each brand. Table One shows the percentage of respondents who checked each attribute for each brand. For example, 29.9% of the respondents indicated that men who wear brand A are conservative. These percentages are actually group means, where individual responses are coded as "100" if an attribute is checked and "0" if it is not.
Table One
Percentage of respondents who indicated that men who wear this brand are ...
| Brand | Conservative | Masculine | Romantic | Successful |
| A | 29.9% | 23.7% | 9.4% | 13.8% |
| B | 30.1 | 37.5 | 20.6 | 24.6 |
| C | 18.0 | 34.7 | 25.1 | 24.0 |
| D | 9.3 | 16.1 | 18.4 | 22.1 |
| MEAN | 21.8 | 28.0 | 18.4 | 21.1 |
| S.D. | 8.7 | 8.6 | 5.7 | 4.3 |
A biplot of these data is shown in Figure One. In this plot, brands are shown as points and attributes as vectors. The prefix "bi" refers to the fact that both groups and attributes are shown on the plot. Horizontal and vertical reference axes have been drawn through the origin (the (0,0) point). About 99% of the information about brand differences is captured in the biplot, so it is an accurate representation of the data. The following sections describe how to interpret the attribute vectors and brand points.
The Attribute Vectors
Each attribute vector has two important components: length and direction. The length of an attribute vector indicates the extent to which brands differ on that attribute: it represents the standard deviation of the brand means on that characteristic. These standard deviations are listed in Table One. The attributes with the longest vectors are those along which the brands are most widely separated. In this example, the attributes "conservative" and "masculine" best differentiate the brands, since they have the longest vectors. The standard deviations, from Table One, are 8.7 and 8.6, respectively. The attribute "successful," which has the shortest vector, is the attribute on which the brands are most similar; the standard deviation is 4.3.
The "direction" of an attribute vector is best viewed in terms of its angles with other attribute vectors. Angles between attribute vectors represent correlations among attributes. Correlations of zero are shown as attribute vectors at 90-degree angles, positive correlations as angles less than 90 degrees, and negative correlations as angles greater than 90 degrees. Large positive correlations appear as attribute vectors very close to one another, whereas large negative correlations appear as vectors emanating in nearly opposite directions from the origin. The biplot for the example data reveals a strong positive correlation between "romantic" and "successful"; the actual correlation is .94. This indicates that the brands with the most "romantic" image are those with the most "successful" image, which can be verified by looking at Table One. These two attributes have moderate negative correlations (approximately -.4) with the attribute "conservative" since the angles between "romantic" and "conservative," and between "successful" and "conservative" are greater than 90 degrees. Finally, the attribute "masculine" has moderate positive correlations (about .5) with each of the other attributes.
The Group Points
The position of a group (brand) point in the biplot is determined by the means of that group on the attributes. Therefore, distances between group points on the plot reflect differences between group means. Groups which have similar means on all the attributes will appear close together, whereas groups which are different will be further apart. In this example, the four brands are well separated from one another. Brands Band C are the most similar, since they are the closest together.
The direction (or position) of the attribute vectors in the two-dimensional space provides a basis for understanding the nature of the differences among brands. Generally, a brand has relatively high means, compared to other brands, on those attributes whose vectors appear in the same region of the plot as the brand point.
Detailed information about brand differences on each attribute can be obtained by "projecting" the brand points onto the attribute vectors. This is done by drawing perpendicular lines from the brand points to the attribute vectors, as illustrated by the broken lines in Figure Two. Note that the attribute vectors have been extended through the origin. These "extensions" correspond to the "low" ends of the attributes. For example, brands in the lower right portion of the plot would have the highest means on "conservative", whereas brands in the upper left portion of the plot would have the lowest means on this characteristic.
"Predicted" brand means corresponding to the projections of the brand points onto the attribute vectors are shown in Table Two. Comparison of this table with Table One shows close, but not perfect, agreement. This is to be expected since 99% of the information about brand differences is represented in the biplot. Therefore, whatever differences exist between the actual means in Table One and the predicted means in Table Two are minor. All of the predicted values are within one point of the actual values except those far brands B and C on "successful". The predicted values suggest that brand C has a slightly higher mean (25.8) than brand B (22.9) when, in fact, the actual values (from Table One) are about equal: 24.6 for brand B and 24.0 for brand C. However, as noted earlier, "successful" is the least important attribute in terms of discriminating among brands.
Table Two
Predicted percentage of respondents who indicated that men who wear this brand are ...
| Brand | Conservative | Masculine | Romantic | Successful |
| A | 30.2% | 23.4% | 9.3% | 14.6% |
| B | 29.5 | 38.2 | 20.8 | 22.9 |
| C | 18.7 | 33.9 | 24.9 | 25.8 |
| D | 8.9 | 16.5 | 18.5 | 21.2 |
At this point, one can examine the ordering of and distances among groups on each attribute vector. Men who wear brand A are perceived as relatively conservative, but not masculine, romantic or successful. For brand C, the reverse is true: men who wear it are perceived as masculine, romantic, and successful, but not conservative. Brand B has relatively high means on all four attributes. Finally, brand D is about average (i.e., near the origin) on "romantic" and "successful," but has a clearly less masculine and conservative image than the other brands.
This interpretation coincides with the general impression one obtains by simply looking at the relative positions of the brand points and attribute vectors. Once the meaning associated with the length and direction of the attribute vectors is understood, it is usually not necessary to actually draw the projections of the group points onto these vectors, unless specific information about individual groups on specific attributes is desired. Even then, drawing projections is usually unnecessary, since "predicted" means for each group on each attribute are supplied with biplots, along with "actual" group means, correlations among attributes, distances between all pairs of groups, percentages of information explained by each dimension of the plot, and other information to aid interpretation. The procedure for projecting group points onto attribute vectors is described here in order to more precisely define the positions of the group points and to illustrate the specific information that can be recovered from a biplot.
Biplots vs. Discriminant Analysis
Biplots look very similar to plots which accompany a discriminant analysis. The visual similarity between biplots and "discriminant maps" arises because the objective of both techniques is the same: to describe group differences on several attributes in a few dimensions. The types of computations involved are also quite similar. The major differences between the techniques are (1) complete data for every respondent is required for discriminant analysis, whereas a biplot can be constructed from group means alone, and (2) the biplot does not require adherence to many of the statistical assumptions underlying discriminant analysis.
The choice between the two techniques depends on four factors: (1) the size of the problem (the number of groups and attributes), (2) the scale on which the attributes are measured, (3) the nature of the experimental design underlying the data collection, and (4) the amount of missing data.
When the number of groups and attributes is very large (e.g., 20 or more groups and 50 or more attributes), discriminant analysis can become expensive. In such cases, biplots offer a reasonable and less expensive alternative.
Biplots are preferred when the attributes represent a "checklist" rather than a collection of rating scales, i.e., when individual responses are either "yes" or "no" ("checked" or "not checked"). The data for the example used here was of this type. In such cases, discriminant analysis is not a good choice because some of the assumptions underlying the technique are violated, thus rendering the tests of statistical significance invalid.
A third situation in which biplots are more appropriate is in the case of "incomplete block designs." These are designs in which each respondent rates a subset of products, brands, etc. For example, each respondent may evaluate 3 of 6 products. Discriminant analysis requires that the total variability among responses be divided into two parts: variability due to differences among products and variability within products. For incomplete designs, this cannot be accomplished while taking into account the fact that each respondent rates more than one product. Biplots can be used in this case because no such partitioning of the variability is necessary; only variability due to differences among products is analyzed.
"Finally, biplots can be useful when there is a great deal of missing data. Whereas discriminant analysis requires a complete set of responses to all attributes for every respondent, biplots require only that none of the group means on the attributes be missing. If missing data is extensive, replacing the missing values with "estimates" (such as group means) can distort a discriminant map, but would have no effect on a biplot.
In summary, the biplot reveals which groups are most (and least) different, how the groups differ on individual attributes, which attributes best separate the groups, and how the attributes are related to one another. Even with a small set of data, like the example used here, the biplot is an efficient way to present all these features of the data. With very large data sets, some type of graphic display is virtually essential for organizing and interpreting the information.
Some Cautions About Interpreting Biplots
When interpreting group differences in the manner described, one must keep in mind that a group can be viewed only in relation to the other groups in the set. The inclusion of additional groups or attributes can change the picture entirely.
One should also take into consideration the percentage of information about group differences that is actually represented in the plot. The biplot is constructed in a manner that provides the "best" two-dimensional display of the data, in the sense of capturing as much of the information about group differences as possible. In the example used here, two dimensions accounted for nearly all of the information about brand differences. However, with many groups and many attributes, it may happen that only 50 or 60 percent of the variability can be displayed in two dimensions. If this occurs, the distances among groups on individual attributes represented in the plot may not correspond very closely to the actual data. As a rough rule of thumb, if two dimensions account for less than about 75% of the variability in the data, one should exercise caution when examining group differences on individual attributes in the biplot. Although the general impressions conveyed by the plot are usually reasonably accurate in such cases, the details obtained by projecting group points onto individual attribute vectors (i.e., "predicted" group means on individual attributes) are less reliable.
Another consideration is the relative importance of the two dimensions of the plot. The horizontal dimension will always be at least as important as the vertical dimension in explaining (or accounting for) differences among groups. In the example, the horizontal dimension accounted for 59% of the information about brand differences, the second dimension about 40%. The group points on the plot, however, are "standardized" so that the horizontal variability among groups on the plot is the same as the vertical variability. This is necessary in order for the projections of the group points onto the attribute vectors to be accurate. This means that a given distance between group points in the horizontal direction corresponds to a larger difference between group means than the same distance in the vertical direction. In cases where the first dimension exhausts virtually all of the information about group differences (e.g., 80% or more), one can be misled by what appear to be large vertical distances among groups. This happens when the groups can be ordered along a single continuum (or dimension), and once this is done, very little information about group differences remains. Therefore, in such cases it is usually best to ignore the second dimension and consider only the "horizontal" differences among the groups on the plot.
Other uses of Biplots
The emphasis so far has been on the use of biplots for examining differences among groups on attributes. However, the technique is very general and therefore has other potential uses as well.
First, the columns of the table don't have to be attributes. For example, one might have a table, similar to Table One, in which rows represent age groups, columns represent brands, and entries in the table indicate frequency of usage of the various brands in each age group. In this case, the biplot can be used to port ray age differences in brand usage. For the categories with several brands, such as wine, a biplot may convey the age differences more clearly and efficiently than a large table of numbers.
For data sets that are not too large, the biplot can be used to cluster individuals. In this case, the table to be analyzed would have individuals as rows and attributes as columns. The "attributes" could be attitude statements, brands, etc. If the columns are brands, the entries in the table could indicate frequency of use or overall ratings, for example. One would examine the biplot for distinct groups (clusters) of individuals and then use the attribute information in the plot to characterize them. Essentially, the interpretation is the same as outlined earlier, with "groups" replaced by "respondents." The biplot in this case is analogous to performing a principal components ("factor") analysis of the attributes and constructing a plot showing both relationships of the attributes with the factors and factor scores for individuals.
