Function plots of multi-dimensional data
Through a technique known as function plotting, one can display multi-dimensional data in a single two dimensional plot. Such plots are particularly useful for highlighting differences and similarities among products or groups of respondents when data are collected on several variables.
Introduction:
Marketing data is typically multivariate in nature: respondents are asked to evaluate one or more products on several attributes or to describe themselves in terms of a variety of attitudinal and behavioral characteristics. The greater the amount of information obtained from respondents, the more difficult it becomes to summarize the data in a form that is concise enough to be manageable yet not an over-simplification of the data.
Attitude statements or product characteristics are generally correlated. This suggests possible data reduction through some sort of multivariate analysis, such as factor analysis of the variables, cluster analysis of respondents, or discriminant analysis to differentiate among products or groups of respondents. Such techniques make explicit use of patterns of responses to the variables and typically produce a large amount of numerical information to be interpreted .
Graphic techniques can play an important role in the analysis of multi-dimensional data. However, most types of plots are limited in their ability to depict such data because they can portray responses on only one or two variables at a time.
Table One
Means of Four Cold Remedies On Four Att ributes
| Headache | Coughing | Runny Nose | Fever | |||||
| A | 5.28 | (-.27) | 5.50 | ( .28) | 4.75 | (-.12) | 5.69 | (-.05 |
| B | 6.17 | ( .62) | 5.09 | (-.13) | 4.24 | (-.63) | 5.29 | (-.45) |
| C | 5.77 | ( .22) | 4.79 | (-.43) | 5.14 | ( .27) | 5.69 | (-.05) |
| D | 4.98 | (-.57) | 5.50 | ( .28) | 5.35 | ( .48) | 6.29 | ( .55) |
| Over all mean | 5.55 | 0 | 5.22 | 0 | 4.87 | 0 | 5.74 | 0 |
Note: Deviation of brand mean from overall variable mean is shown in parentheses.
A technique known as function plotting provides a unique and informative way of displaying multi-dimensional data. A single function plot can be used to illustrate differences among groups, products, etc. on a wide range of combinations of the variables. Often a function plot can reveal patterns of differences that may be difficult to identify using other techniques.
An example is used below to illustrate the appearance and interpretation of a function plot. The appendix describes how such plots are constructed.
Example Data:
In a study of the effectiveness of various cold remedies, consumers were asked to rate four products on how well each relieves headache, coughing, runny nose, and fever. Seven-point rating scales were used, with 7 indicating that the product relieves the symptom completely and 1 indicating that it does not relieve the symptom at all.
The brand means are expressed in the original scale and as deviations from the overall means of the variables in Table One. The overall means are shown in the last row of the table. By expressing the means as deviations, differences among the brands can be seen more clearly, both in the table and in a function plot.
Function Plot of Example Data:
A function plot of these data appears in Figure One (See last page). The plot contains four curves, one for each brand. These curves show the brand means on a weighted combination of variables that gradually changes as one moves from the top to the bottom of the plot. The vertical axis, labeled "T", ranges from - π to π. The value of T is used to generate the variable weights, as described in detail in the appendix. Table Two shows the weights of the variables for several values of T. Each weight ranges from - 1 to 1. At any given value of T, the further to the right a curve is, the more favorably perceived that brand is on the corresponding combination of variables.
If two brands were close on all four variables, their curves would be close over the entire range of the plot. Conversely, the more unlike two brands are, the more their curves would be separated.
Interpreting the Plot:
From the figure, it is readily apparent that no brands are consistently close over the entire range of the plot. This implies that for any pair of brands there is some combination of variables that will differentiate them. Also, since no two curves are parallel, no two brands have similar patterns of high and low means on the variables. Closer scrutiny of the plot reveals that the curves for brands B and D are nearly exact opposites of one another, as are the curves for brands A and C.
Referring back to the brand deviations in Table One, we can determine why these patterns arise. Brands B and D have opposite patterns of means: B is relatively high on headache relief and low on the other attributes, whereas for brand D the reverse is true. Also, brand A is high on cough relief and low on relief of headaches and runny nose; the pattern is reversed for brand C. Since brand means typically are not presented in deviation form, these patterns could have escaped notice without the function plot. This would be especially likely to occur when the number of variables is large.
Identifying the Largest Brand Differences:
Further analysis of the plot would entail identifying combinations of variables that best discriminate among all brands or that separate one particular brand from the others. In this example, differences among the products are greatest at T = 1.96 and appear large for all values of T between 1.4 and 2.5. Table Two shows that in this range headache receives a large positive weight and atleast one other variable has a large negative weight. (For T values between 1.7 and 2.5, at least two variables have negative weights.) Therefore, this portion of the plot indicates the extent to which brands are perceived as relatively effective in relieving headaches and ineffective in relieving other symptoms.
Since brand B has a relatively high mean on headache relief and low means on other characteristics (compared to the other brands), its function value is large relative to the other products in this portion of the plot. That is, the curve for brand B is furthest to the right on the plot when T is between 1.4 and 2.5. For brand D, the situation is just the opposite. Since brands A and C are intermediate on headache relief and have mixtures of high and low means on the other attributes, their function values are near zero (closer to the center) in this portion of the plot.
Comparing one brand with the others:
If the client's product was brand A, one would probably be interested in the variable weights for values of T where brand A has the largest function value. This occurs when T is between - .7 and - 1.5. This is the section of the plot where brand A is perceived more favorably than the others. Table Two indicates that in this range cough relief receives positive weight and at least two other variables have negative weights. Thus, the plot shows the degree to which brands perform relatively well on cough relief and poorly on other symptoms. Brand A is the only product with this pattern of means. Brand C, which has the opposite pattern, has the lowest function value in this range.
Usefulness of Function Plots:
As the preceding example illustrates, various features of a function plot may be of interest, depending on the objectives of the study. A single function plot can yield more information concerning group differences than most other graphic techniques because it covers such a wide range of combinations of the variables. This makes function plots especially useful when used with other multivariate analyses. For example, whereas a discriminant analysis may not clearly differentiate the product(s) of primary interest from the others, a function plot may do so quite effectively. Also, it is possible to establish confidence limits around the curves, so one can determine if the differences revealed in the plot are statistically significant. Another use of function plots is in connection with a cluster analysis of respondents. In this case, the plot would graphically illustrate how the cluster groups differ.
Appendix:
Constructing function Plots
A function plot was described as a plot of the data points (or means) on several weighted combinations of the variables. This section presents a graphic interpretation of what it means to gradually change the variable weights and describes how these weights are determined.
Concepts Behind Function Plots:
The logic and construction of a function plot rely on the notions of using lines to represent variables and of projecting points onto a line. The process is diagrammed in Figure Two. The horizontal axis represents variable X, the vertical axis variable Y, and L some variable that is a combination, or function, of X and Y. In this example, L is drawn at a 45° angle, so that X and Y receive equal weight in computing values on variable L. If L were drawn closer to the X axis, variable X would have greater weight than Y. In the extremes, if L was horizontal, it would be identical to X; that is. X would receive all the weight in computing scores on L, and Y would have no weight at all. Similarly, if L was vertical, it would be identical to variable Y. At any angle in between these extremes, values of variable L are equal to some weighted combination of X and Y.
Example:
In Figure Two, three points (with their respective X and Y values in parentheses) are projected onto both the X axis and line L. Graphically, the projection of a point P on any line L is determined by drawing a line through P that is perpendicular (at a 90° angle) to L. The distance from the origin (the (0.0) point in the figure) to this projected point is the score or value of the point P on the variable that line represents.
The projections of A, B, and C on the X axis are simply their respective X values: 1, 4, and 2. As indicated earlier, scores on variable L (as it appears in the figure) are computed by assigning equal weights to X and Y, e.g., wX + wY, where w is the weight. To keep variable L in a scale commensurate with the scales of X and Y, the sum of the squared weights must equal one, i.e., w² + w² = 1, so w² = .5 and w = .707. Hence, the projections of points A, B, and C on line L have values equal to .707X + .707Y, or .707(1) + .707(2) = 2.12 for A, .707(4) + .707(1) = 3.54 for B, and .707(2) + .707(3) = 3.54 for C.
A more general way of obtaining the weights for X and Y is by using trigonometric functions. If L is at an angle of d° to the X axis, then the weight for X is cos(d) and the weight for Y is sin(d). In this example, d = 45°, and cos(d) = sin(d) = .707, as before. If L was at a 30° angle, then d = 30°, cos(d) = .866, and sin(d) = .5; X would receive greater weight (.866) than Y (.5), as indicated earlier.
Extension of these ideas to several dimensions is straight forward. In a four-dimensional example, where the variables are X1, X2, X3, and X4, the projections of points on a line L would have values equal to w1X1 + w2 X2 + w3X3 + w4X4 where w1 through w4 are the weights associated with variables X1 through X4. These weights define the direction or position of L; they change as the direction of L changes.
What a Function Plot Shows:
Regardless of the number of variables, an infinite number of lines could be placed through the space containing the points. Each line represents a different weighted combination of the variables. A function plot shows the projection of each point on a line whose direction is gradually and continuously changed. Since changing the direction of a line is equivalent to changing the weights of the variables, the plot reveals to locations of the points as one gradually changes the variable weights.
The process of changing the weights is accomplished using trigonometric functions. Let X1, X
2,... Xp, represent the values of p variables for a particular point. These values
are substituted into the equation
FUNCTION = X1sin(T) + X2cos(T) + X3sin(2T) + X4cos(2T) +
X5sin(3T)...
In this equation, T is an angle measured in radians (1 radian = 57.3 degrees; there are 2 π radians in a circle). The values sin(T), cos(T), sin(2T), cos(2T) ... are weights associated with the variables. Each weight ranges from - 1 to 1, but they take on these extremes at different values of T and change at different rates.
As T is gradually changed from - π to π, which corresponds to moving a line in a full circle, each point
(or group centroid in the example of the previous section), produces a curve drawn down the page. This curve shows the projection of the point on a moving line, indicating the location of the point as seen from a continuum of directions. (If T was expressed in degrees rather than radians, it would range from – 180° to 180°, but the function plot would took the same.)
For the example used earlier, FUNCTION = HEADACHE*sin(T) + COUGHING*cos(T) + RUNNY*sin(2T) + FEVER*cos(2T). Therefore, the variable weights in Table Two are sin(T), cos(T), sin(2T), and cos(2T) for headache, coughing, runny, and fever, respectively.
A function plot does not display the data as seen from every possible vantage point, but it presents more views of the data than any other technique. Reordering the variables or using different trigonometric functions will yield different plots (different views of the data). However, two curves on a function plot will be close across the entire plot only if the corresponding points are close in the p-dimensional space. Also, if the points differ on some variables, the curves will be distant somewhere in the plot. These statements hold for any function plot of the data. Thus, a single plot is sufficient for determining which points are close together in the p-dimensional space.
