Two popular methods to find the bandwidth $latex {h}&fg=000000$ for the nonparametric density estimator are the plug-in method and the method cross-validation. The first one we will focus in the “quick and dirty” plug-in method introduced by Silverman (1986). In cross-validation we will minimize a modified version of the quadratic risk of $latex {\hat{f}_{h}}&fg=000000$.

**The normal reference rule**

This method works well only if the true density is very smooth. Assume that $latex {f}&fg=000000$ is normal distributed. Then we have

$latex \displaystyle h_{plug}=1.06\sigma n^{-1/5}. &fg=000000$

Usually $latex {\sigma}&fg=000000$ is estimated by $latex {\min\{s,Q/1.34\}}&fg=000000$ where $latex {s}&fg=000000$ is the sample standard deviation and $latex {Q}&fg=000000$ is the interquartile range. Recall that the interquartile range is the $latex {75^{\text{th}}}&fg=000000$ percentile minus the $latex {25^{\text{th}}}&fg=000000$ percentile. Here, $latex {Q/1.34}&fg=000000$ gives a consistent estimate of $latex {\sigma}&fg=000000$ if the data comes from a $latex {N(\mu,\sigma^{2})}&fg=000000$.

We can summarize this method in the following way

$latex \displaystyle h_{plug}=1.06\min\left\{ \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}},\frac{Q}{1.34}\right\} n^{-1/5}. &fg=000000$

**Cross-validation**

Define the *integrated squared error *as

$latex \displaystyle \begin{array}{rl} \displaystyle {\rm ISE}(\hat{f}_{h}) & =\displaystyle \int\left(\hat{f}_{h}(x)-f(x)\right)^{2}dx\nonumber \\ & =\displaystyle \int\hat{f}_{h}^{2}(x)dx-2\int\hat{f}_{h}(x)f(x)dx+\int f^{2}(x)dx. \end{array} &fg=000000$

Notice that the MISE is indeed the expected value of ISE. Our goal is minimize the ISE as small as possible. Remark that the last term in (0) does not depends on $latex {h}&fg=000000$, so minimize this risk the is equivalent to minimizing the expected value of

$latex \displaystyle {\rm ISE}(\hat{f}_{h})-\int f^{2}(x)dx=\int\hat{f}_{h}^{2}(x)dx-2\int\hat{f}_{h}(x)f(x)dx &fg=000000$

If we look closer the term $latex {\int\hat{f}_{h}(x)f(x)dx}&fg=000000$ we notice that is the expected value of $latex {\mathbb E(\hat{f}_{h}(X))}&fg=000000$. The straight estimate for this expected value is

$latex \displaystyle \frac{1}{n}\sum_{i=1}^{n}\hat{f}_{h}(X_{i})=\frac{1}{n^{2}h}\sum_{i=1}^{n}\sum_{j=1}^{n}K\left(\frac{X_{j}-X_{i}}{h}\right). \ \ \ \ \ (1)&fg=000000$

The problem with it is that the observations to estimate the expectation are dependent of the observations to estimate $latex {\hat{f}_{h}}&fg=000000$. The solution to solve this, it is remove the $latex {i^{\text{th}}}&fg=000000$ observation for $latex {\hat{f}_{h}}&fg=000000$. Then, we define the leave-one-out cross-validation estimator of $latex {\int\hat{f}_{h}(x)f(x)dx}&fg=000000$ as

$latex \displaystyle \frac{1}{n}\sum_{i=1}^{n}\hat{f}_{h,-i}(X_{i}), &fg=000000$

where

$latex \displaystyle \hat{f}_{h,-i}(x)=\frac{1}{n-1}\mathop{\sum_{j=1}^{n}}_{j\neq i}K_{h}(x-X_{j}). &fg=000000$

The following figure illustrates the idea behind the leave-one cross validation. The idea is to take one data point as your test data and the rest as your training data for each iteration.

Following with the $latex {\int\hat{f}_{h}^{2}(x)dx}&fg=000000$ term we have

$latex \displaystyle \begin{array}{rl} \displaystyle \int\hat{f}_{h}^{2}(x)dx & =\displaystyle \int\left(\frac{1}{n}\sum_{i=1}^{n}K_{h}(x-X_{i})\right)^{2}dx\\ & =\displaystyle \frac{1}{n^{2}h^{2}}\sum_{i=1}^{n}\sum_{i=1}^{n}\int K\left(\frac{x-X_{i}}{h}\right)K\left(\frac{x-X_{j}}{h}\right)dx\\ & =\displaystyle \frac{1}{n^{2}h}\sum_{i=1}^{n}\sum_{i=1}^{n}\int K\left(u\right)K\left(\frac{X_{i}-X_{j}}{h}-u\right)du\\ & =\displaystyle \frac{1}{n^{2}h}\sum_{i=1}^{n}\sum_{i=1}^{n}K*K\left(\frac{X_{i}-X_{j}}{h}\right). \end{array} &fg=000000$

where $latex {K*K}&fg=000000$ means the convolution of $latex {K}&fg=000000$ with itself.

Finally it is possible define a reasonable criterion to choose the bandwidth,

$latex \displaystyle CV(h)=\frac{1}{n^{2}h}\sum_{i=1}^{n}\sum_{j=1}^{n}K*K\left(\frac{X_{i}-X_{j}}{h}\right)-\frac{2}{n(n-1)}\sum_{i=1}^{n}\mathop{\sum_{j=1}^{n}}_{j\neq i}K_{h}(X_{i}-X_{j}). &fg=000000$

**Note:** An alternative way to implement the leave-one-out cross validation is,

$latex \displaystyle CV(h)=\int\hat{f}_{h}^{2}(x)dx-\frac{2}{n(n-1)}\sum_{i=1}^{n}\mathop{\sum_{j=1}^{n}}_{j\neq i}K_{h}(X_{i}-X_{j}) &fg=000000$

and then calculate numerically the integral.

**Sources:**

- Hardle, W. (2004). Nonparametric and Semiparametric Models. Springer Series in Statistics. Springer.
- Tsybakov, A. (2009).
*Introduction to nonparametric estimation*. Springer. - Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, volume 26. Chapman & Hall/CRC.

###### Related articles

- Check your missing-data imputations using cross-validation (andrewgelman.com)