Bootstrapping

Introduction:

Several statistical methods involving resampling of observations have been developed in recent years. Resampling allows statistics of interest to be estimated when statistical assumptions are inappropriate, or no known statistical approaches exist, or known statistical approaches are too complex to carry out routinely. One such resampling technique called jackknifing was introduced earlier in Research on Research Report #11. Another method, called bootstrapping, is applied to the problem of estimating brand share and its variability.


Estimation of Brand Share and Its Variability:

Brand share is defined (as discussed in Research on Research Report #11) as the ratio of total brand sales across all stores in the market to total sales in the product category across all stores in the market (or sample within the market). The result is one share figure for each market. Since there is just one share figure, the usual measures of variability (such as the standard deviation) cannot be calculated. A technique called jackknifing for assessing the variability of a brand share estimate is described in Research on Research Report #11. Another procedure called bootstrapping can also be used for estimating the variability.

The bootstrap estimate of variability is obtained through resampling of the store data. Imagine that the data for each store are duplicated an enormous number of times, and the resulting duplicates are thoroughly mixed. Several samples of size n, where n is the number of stores in the original sample, are then selected at random. This is, in effect, sampling the original store data with replacement: after a store is "sampled" it is returned to the dataset to have a 1/n probability of being drawn again, or "resampled". A particular store will appear in several of these bootstrap samples and may appear more than once in a given sample. A brand share is calculated for each of the bootstrap samples selected. (This entire process can be accomplished in a rapid and mathematically equivalent way with the aid of a high speed computer).

When brand shares for all the bootstrap samples have been obtained, an estimate of variability can be calculated. The standard error of the brand share is estimated by the standard deviation of the bootstrap sample brand shares. Also, a bootstrap estimate of brand share it self can be calculated by averaging the bootstrap sample brand shares. The estimates are obtained by:



where Si is the brand share for bootstrap sample i, and b is the number of bootstrap samples. The mean is, generally, a somewhat biased estimate of brand share. The bias may be removed with the adjustment: where Sn is the initial brand share estimate, (calculated without bootstrapping). This estimate of brand share is also less biased than the initial estimate (Sn).


Example:

Consider the following data (also used in Research on Research Report #11):


Table One

Store Brand XYZ Product Category
1 16 325
2 112 617
3 83 492
4 125 506
5 97 432
Total 433 2372

The initial estimate of brand share is 18.25%. This is calculated by dividing the sum of Brand XYZ sales (433) by the category sales (2372). For reference, the jackknife estimate of brand share is 18.39% and the estimated standard error is 2.61.

As an illustration, the bootstrap technique is employed with the 5 bootstrap samples listed below. These samples are typical of the bootstrap samples which could be selected. As a result of resampling with replacement, particular stores occur in several of the samples and repeatedly within samples.


Table Two: Typical Bootstrap Samples

Bootstrap Sample (i) Stores in Sample Brand XYZ Total Product Category Total Brand Share (Si)
1 1,4,3,1,5 337 2080 16.20%
2 4,1,4,2,4 503 2460 20.45
3 3,4,1,3,3 390 2307 16.91
4 1,5,5,5,4 432 2127 2031
5 5,2,1,2,3 420 2483 16.92
Average   18.16

The brand share for the first bootstrap sample above is 16.20%. This is calculated by dividing the sum of Brand XYZ sales (337) by the category sales (2080). An estimate of brand share is obtained by averaging the brand shares for the 5 bootstrap samples. The bootstrap estimate ( ) in this instance is 18.16%. If we adjust for bias:

= 2(18.25) - 18.16 = 18.34%. The bootstrap estimate of standard error is . Note that the term in the denominator of this calculation is: (b - 1) = 4.


Choosing the Number of Bootstrap Samples:

The small number of bootstrap samples in the preceding illustration was chosen for simplicity. In actual application a large number of bootstrap samples should be selected so that better approximations of brand share and its variability may be obtained. The bootstrap technique was employed, using the data in Table One, with the number of bootstrap samples ranging from 5 to 10,000. The results appear in Table Three.


Table Three: Bootstraping with 5 - 10,000 samples (b)

b Bootstrap Brand Share Estimate ( ) Adjusted Bootstrap Brand Share Estimate Standard Error Computing Time Required (sec)
5 19.45% 17.05% 2.58% 0.9
10 19.57 16.93 2.35 1.0
25 18.74 17.76 2.58 1.2
50 18.13 18.37 2.25 1.9
100 18.30 18.20 2.44 2.6
250 18.03 18.47 2.69 5.0
500 17.72 18.78 2.59 10.2
1,000 18.18 18.32 2.49 19.0
5,000 18.17 18.33 2.55 90.5
10,000 18.16 18.34 2.54 189.1


The bootstrap estimate of brand share with 1000 samples is 18.18% (18.32% when adjusted for bias). The standard error of the bootstrap estimate is 2.49. (These results differ slightly from those obtained through jackknifing, 18.39% and 2.61, respectively). The distribution of brand shares for the 1000 bootstrap samples is presented in the figure below. Examining the other results above, bootstrapping with 1000 samples appears to be sufficient for this case. The estimates appear to stabilize once the number of samples reaches 1000, and little is gained by increasing the number of samples further.



The bootstrapping procedure with 1000 samples was repeated 10 times to assess the stability / reliability of the bootstrap estimate. The results of these 10 trials are shown below.


Table Four: Bootstraping with 1000 Samples, 10 Trials

Trial Bootstrap Brand Share Estimate ( ) Adjusted Bootstrap Brand Share Estimate ( ) Standard Error
1 18.12% 18.38% 2.47
2 18.24 18.26 2.52
3 18.24 18.26 2.57
4 18.18 18.32 2.58
5 18.10 18.40 2.52
6 18.09 18.41 2.53
7 18.23 18.27 2.52
8 18.15 18.35 2.55
9 18.19 18.31 2.46
10 18.18 18.32 2.61


The estimates obtained in the 10 trials differ little from trial to trial. This also indicates that 1000 samples is sufficient for obtaining good estimates in this case.


Use of Bootstrap Estimates for Confidence Intervals:

Given a bootstrap estimate of brand share and its standard error, approximate confidence limits can be placed around the share figure. The bootstrap estimate of the standard error and the appropriate constant from the t-table are used. The degrees of freedom associated with the standard error are one less than the number of stores in the original sample, or (n - 1). The 95% confidence bounds around the adjusted bootstrap estimate of brand share with 1000 samples are: 18.32 ± (2.49) (2.776), or from a 11.41% share to a 25.23% share. The 2.776 above is the constant from the t-table corresponding to (n - 1) = 4 degrees of freedom at the 95% level of confidence.


Other Applications of the Bootstrap:

Like the jackknife, the bootstrap procedure is applicable in a variety of situations for estimating unusual or complex statistics and their standard errors. Bootstrapping is particularly useful when dealing with ratios, and can be used when people, rather than stores, are the unit of analysis. For example, the bootstrap procedure could be used for estimating the percentage of total viewing time spent watching a particular television station (the ratio of viewing time spent watching the station to total viewing time). Further, both techniques can be used to estimate standard errors associated with medians.


Summary:

The bootstrap procedure is particularly useful for estimation of brand shares and their standard errors. These estimates can then be used for calculating approximate confidence bounds. The bootstrap procedure is also extremely useful when dealing with other ratios, and is applicable in a variety of situations.


Market Facts provides the superior options