Today we will apply the ideas of the others post by a simple example. Before, we are going to answer the question of the last week.

What is exactly the $latex {h_{opt}}&fg=000000$ if we assume that

$latex \displaystyle \displaystyle f(x) = \frac{1}{\sqrt{2\pi}} \text{exp}\left(\frac{-x^2}{2}\right)? &fg=000000$

How $latex {f(x)}&fg=000000$ is the density of standard normal distribution. It is easy to see that $latex {f^\prime(x)=(-x)f(x)}&fg=000000$, so we have

$latex \displaystyle \displaystyle \|f^\prime \|_2^2=\frac{1}{\sqrt{4\pi}}\int x^2 \frac{1}{\sqrt{2\pi}} \frac{1}{\sqrt{\frac{1}{2}}} \text{exp}(-x^2)dx. &fg=000000$

We recognized the integral as the variance of a random variable with mean equal to 0 and variance equal to $latex {\sqrt{1/2}}&fg=000000$. Then we get,

$latex \displaystyle \displaystyle \|f^\prime \|_2^2 = \frac{1}{\sqrt{4\pi}} \frac{1}{2} = \frac{1}{4\sqrt{\pi}}. &fg=000000$

Finally, using the results of Part III,

$latex \displaystyle \displaystyle h_{opt}= \left(\frac{6}{n\|f^\prime \|_2^2}\right) = \left(\frac{24\sqrt{\pi}}{n}\right)\approx 3.5 n^{-1/3}. &fg=000000$

This $latex {h_{opt}}&fg=000000$ is not useful in many cases, but it could work as a rule-of-thumb bindwidth.

For the example, we will use a sample of 1000 normal distributed numbers and we will try to use all the theory seen before. I did this little script in R to compute the MSE, MISE and plot the bias-variance trade-offs between them (Note: this is my very first script in R so I will appreciate any comment ).

[sourcecode language=”r”]

<pre># Function to get the breaks

breaks <- function(x,h){

b = floor(min(x)/h) : ceiling(max(x)/h)

b = b*h

return(b)

}

#Generate 1000 numbers with normal standard distribution

x=rnorm(1000)

n=length(x);

# Point to evaluate the MSE

x0 = 0

# Real value of f(x0)

f_x0 = dnorm(x0);

# || f’ ||_2^2 for a normal standard distribution

norm_f_prime = 1/(4*sqrt(pi));

#Sequence of bins

hvec = seq(0.1,0.7,by=0.0005)

Bias = numeric();

Var = numeric();

MSE = numeric();

Bias_MISE = numeric();

Var_MISE = numeric();

MISE = numeric();

par(mfrow=c(1,2))

hist(x,breaks=breaks(x,0.001),freq=F,xlab ="Bandwidth with h=0.001")

hist(x,breaks=breaks(x,2),freq=F,xlab ="Bandwidth with h=2")

par(mfrow=c(1,1))

for(h in hvec){

xhist = hist(x,breaks=breaks(x,h),plot=FALSE)

# Average of bins near x0

bins_near_x0 = xhist$breaks=x0-h;

p = mean(xhist$counts[bins_near_x0])/n;

# Expectation of \hat{f} in x0

E_fhat_x0 = p/h;

#Compute of Bias, Var, MSE and MISE

Bias = c(Bias,E_fhat_x0 – f_x0);

Var = c(Var,(p*(1-p))/(n*h^2))

MSE = c(MSE,tail(Var,1) + tail(Bias,1)^2)

Var_MISE = c(Var_MISE,1/(n*h))

Bias_MISE = c(Bias_MISE,h^2 * norm_f_prime /12)

MISE = c(MISE,tail(Var_MISE,1) + tail(Bias_MISE,1))

}

#Tradeoff MSE in x0

max_range = range(Bias^2,Var,MSE)

plot(hvec,MSE,ylim=max_range,type="l",col="blue",lwd=3, xlab="Bandwidth h")

lines(hvec,Bias^2,type="l",lty=2,col="red",lwd=3)

lines(hvec,Var,type="S",lty=6,col="black",lwd=3)

legend(x="topleft",max_range,legend=c("MSE","Bias^2","Var"),col=c("blue","red","black"),lwd=3,lty=c(1,2,6))

# Tradeoff MISE

max_range = range(Bias_MISE,Var_MISE,MISE)

plot(hvec,MISE,ylim=max_range,type="l",col="blue",lwd=3,xlab="Bandwidth h")

lines(hvec,Bias_MISE,type="l",lty=5,col="red",lwd=3)

lines(hvec,Var_MISE,type="S",lty=6,col="black",lwd=3)

legend(x="topleft",max_range,legend=c("MISE","Bias^2_MISE","Var_MISE"),col=c("blue","red","black"),lwd=3,lty=c(1,2,6))

# h optimal for the point x0

h_x0=hvec[MSE==min(MSE)]

# h optimal for any point using the minimal MISE

h_opt_MISE=hvec[MISE==min(MISE)]

# h optimal for any point using the rule-of-thumb

h_opt = (6/(n*norm_f_prime))^(1/3)# histogram with h_opt

breaks = floor((min(x))/h_opt):ceiling((max(x))/h_opt)

breaks = breaks*h_opt#Plot the histogram

h=hist(x,breaks=breaks,freq=FALSE,ylim=c(0,0.5))

lines(sort(x),dnorm(sort(x)),type="l")

[/sourcecode]

To start notice that if the binwidth is too small or too large we will get a very bad approximation. As we can see, in the next plot, we will get a terrible fitting of the histogram for $latex h=0.01$ and $latex h=2$:

The plots of the MSE and MISE trade-offs are respectively

For the MSE in $latex x_0=0$ the optimal value is $latex h=0.255$. In the case of MISE the optimal value by minimization is $latex h=0.349$ and the real value, using the formula seen before, is $latex h=0.349083021225025$. Finally, we get the best fitted histogram:

The next week, we will start with density estimation using kernels more generals and we will see that the histogram is, in fact, a particular case of one of them.

###### Related articles

- Density Estimation by Histograms (Part II) (maikolsolis.wordpress.com)
- Density Estimation by Histograms (Part III) (maikolsolis.wordpress.com)

## Comments

Pingback: Kernel density estimation | Blog about Statistics.

Pingback: Optimizing the binwidth for the histogram using cross-validation | Maximum Entropy

Pingback: Optimizing the binwidth for the histogram using cross validation - Maikol Solís