Today we will apply the ideas of the others post by a simple example. Before, we are going to answer the question of the last week.

What is exactly the $latex {h_{opt}}&fg=000000$ if we assume that

$latex \displaystyle \displaystyle f(x) = \frac{1}{\sqrt{2\pi}} \text{exp}\left(\frac{-x^2}{2}\right)? &fg=000000$

How $latex {f(x)}&fg=000000$ is the density of standard normal distribution. It is easy to see that $latex {f^\prime(x)=(-x)f(x)}&fg=000000$, so we have

$latex \displaystyle \displaystyle \|f^\prime \|_2^2=\frac{1}{\sqrt{4\pi}}\int x^2 \frac{1}{\sqrt{2\pi}} \frac{1}{\sqrt{\frac{1}{2}}} \text{exp}(-x^2)dx. &fg=000000$

We recognized the integral as the variance of a random variable with mean equal to 0 and variance equal to $latex {\sqrt{1/2}}&fg=000000$. Then we get,

$latex \displaystyle \displaystyle \|f^\prime \|_2^2 = \frac{1}{\sqrt{4\pi}} \frac{1}{2} = \frac{1}{4\sqrt{\pi}}. &fg=000000$

Finally, using the results of Part III,

$latex \displaystyle \displaystyle h_{opt}= \left(\frac{6}{n\|f^\prime \|_2^2}\right) = \left(\frac{24\sqrt{\pi}}{n}\right)\approx 3.5 n^{-1/3}. &fg=000000$

This $latex {h_{opt}}&fg=000000$ is not useful in many cases, but it could work as a rule-of-thumb bindwidth.

For the example, we will use a sample of 1000 normal distributed numbers and we will try to use all the theory seen before. I did this little script in R to compute the MSE, MISE and plot the bias-variance trade-offs  between them (Note: this is my very first script in R so I will appreciate any comment ).

[sourcecode language=”r”]
<pre># Function to get the breaks
breaks <- function(x,h){
b = floor(min(x)/h) : ceiling(max(x)/h)
b = b*h
return(b)
}
#Generate 1000 numbers with normal standard distribution
x=rnorm(1000)
n=length(x);
# Point to evaluate the MSE
x0 = 0
# Real value of f(x0)
f_x0 = dnorm(x0);
# || f’ ||_2^2 for a normal standard distribution
norm_f_prime = 1/(4*sqrt(pi));
#Sequence of bins
hvec = seq(0.1,0.7,by=0.0005)
Bias = numeric();
Var = numeric();
MSE = numeric();
Bias_MISE = numeric();
Var_MISE = numeric();
MISE = numeric();

par(mfrow=c(1,2))
hist(x,breaks=breaks(x,0.001),freq=F,xlab ="Bandwidth with h=0.001")
hist(x,breaks=breaks(x,2),freq=F,xlab ="Bandwidth with h=2")
par(mfrow=c(1,1))

for(h in hvec){
xhist = hist(x,breaks=breaks(x,h),plot=FALSE)
# Average of bins near x0
bins_near_x0 = xhist$breaks=x0-h; p = mean(xhist$counts[bins_near_x0])/n;
# Expectation of \hat{f} in x0
E_fhat_x0 = p/h;
#Compute of Bias, Var, MSE and MISE
Bias = c(Bias,E_fhat_x0 – f_x0);
Var = c(Var,(p*(1-p))/(n*h^2))
MSE = c(MSE,tail(Var,1) + tail(Bias,1)^2)
Var_MISE = c(Var_MISE,1/(n*h))
Bias_MISE = c(Bias_MISE,h^2 * norm_f_prime /12)
MISE = c(MISE,tail(Var_MISE,1) + tail(Bias_MISE,1))
}

max_range = range(Bias^2,Var,MSE)
plot(hvec,MSE,ylim=max_range,type="l",col="blue",lwd=3, xlab="Bandwidth h")
lines(hvec,Bias^2,type="l",lty=2,col="red",lwd=3)
lines(hvec,Var,type="S",lty=6,col="black",lwd=3)
legend(x="topleft",max_range,legend=c("MSE","Bias^2","Var"),col=c("blue","red","black"),lwd=3,lty=c(1,2,6))

max_range = range(Bias_MISE,Var_MISE,MISE)
plot(hvec,MISE,ylim=max_range,type="l",col="blue",lwd=3,xlab="Bandwidth h")
lines(hvec,Bias_MISE,type="l",lty=5,col="red",lwd=3)
lines(hvec,Var_MISE,type="S",lty=6,col="black",lwd=3)
legend(x="topleft",max_range,legend=c("MISE","Bias^2_MISE","Var_MISE"),col=c("blue","red","black"),lwd=3,lty=c(1,2,6))

# h optimal for the point x0
h_x0=hvec[MSE==min(MSE)]

# h optimal for any point using the minimal MISE
h_opt_MISE=hvec[MISE==min(MISE)]

# h optimal for any point using the rule-of-thumb
h_opt = (6/(n*norm_f_prime))^(1/3)# histogram with h_opt
breaks = floor((min(x))/h_opt):ceiling((max(x))/h_opt)
breaks = breaks*h_opt#Plot the histogram
h=hist(x,breaks=breaks,freq=FALSE,ylim=c(0,0.5))
lines(sort(x),dnorm(sort(x)),type="l")
[/sourcecode]

To start notice that if the binwidth is too small or too large we will get a very bad approximation. As we can see, in the next plot, we will get a terrible fitting of the histogram for $latex h=0.01$ and $latex h=2$:

The plots of the MSE and MISE trade-offs are respectively

For the MSE in $latex x_0=0$ the optimal value is $latex h=0.255$. In the case of MISE the optimal value by minimization is $latex h=0.349$ and the real value, using the formula seen before, is $latex h=0.349083021225025$. Finally, we get the best fitted histogram:

The next week, we will start with density estimation using kernels more generals and we will see that the histogram is, in fact, a particular case of one of them.