Density Estimation by Histograms (Part IV)

Final Histogram

Today we will apply the ideas of the others post by a simple example. Before, we are going to answer the question of the last week.

What is exactly the $latex {h_{opt}}&fg=000000$ if we assume that

$latex \displaystyle \displaystyle f(x) = \frac{1}{\sqrt{2\pi}} \text{exp}\left(\frac{-x^2}{2}\right)? &fg=000000$

How $latex {f(x)}&fg=000000$ is the density of standard normal distribution. It is easy to see that $latex {f^\prime(x)=(-x)f(x)}&fg=000000$, so we have

$latex \displaystyle \displaystyle \|f^\prime \|_2^2=\frac{1}{\sqrt{4\pi}}\int x^2 \frac{1}{\sqrt{2\pi}} \frac{1}{\sqrt{\frac{1}{2}}} \text{exp}(-x^2)dx. &fg=000000$

We recognized the integral as the variance of a random variable with mean equal to 0 and variance equal to $latex {\sqrt{1/2}}&fg=000000$. Then we get,

$latex \displaystyle \displaystyle \|f^\prime \|_2^2 = \frac{1}{\sqrt{4\pi}} \frac{1}{2} = \frac{1}{4\sqrt{\pi}}. &fg=000000$

Finally, using the results of Part III,

$latex \displaystyle \displaystyle h_{opt}= \left(\frac{6}{n\|f^\prime \|_2^2}\right) = \left(\frac{24\sqrt{\pi}}{n}\right)\approx 3.5 n^{-1/3}. &fg=000000$

This $latex {h_{opt}}&fg=000000$ is not useful in many cases, but it could work as a rule-of-thumb bindwidth.

For the example, we will use a sample of 1000 normal distributed numbers and we will try to use all the theory seen before. I did this little script in R to compute the MSE, MISE and plot the bias-variance trade-offs  between them (Note: this is my very first script in R so I will appreciate any comment ).

[sourcecode language=”r”]
<pre># Function to get the breaks
breaks <- function(x,h){
b = floor(min(x)/h) : ceiling(max(x)/h)
b = b*h
#Generate 1000 numbers with normal standard distribution
# Point to evaluate the MSE
x0 = 0
# Real value of f(x0)
f_x0 = dnorm(x0);
# || f’ ||_2^2 for a normal standard distribution
norm_f_prime = 1/(4*sqrt(pi));
#Sequence of bins
hvec = seq(0.1,0.7,by=0.0005)
Bias = numeric();
Var = numeric();
MSE = numeric();
Bias_MISE = numeric();
Var_MISE = numeric();
MISE = numeric();

hist(x,breaks=breaks(x,0.001),freq=F,xlab ="Bandwidth with h=0.001")
hist(x,breaks=breaks(x,2),freq=F,xlab ="Bandwidth with h=2")

for(h in hvec){
xhist = hist(x,breaks=breaks(x,h),plot=FALSE)
# Average of bins near x0
bins_near_x0 = xhist$breaks=x0-h;
p = mean(xhist$counts[bins_near_x0])/n;
# Expectation of \hat{f} in x0
E_fhat_x0 = p/h;
#Compute of Bias, Var, MSE and MISE
Bias = c(Bias,E_fhat_x0 – f_x0);
Var = c(Var,(p*(1-p))/(n*h^2))
MSE = c(MSE,tail(Var,1) + tail(Bias,1)^2)
Var_MISE = c(Var_MISE,1/(n*h))
Bias_MISE = c(Bias_MISE,h^2 * norm_f_prime /12)
MISE = c(MISE,tail(Var_MISE,1) + tail(Bias_MISE,1))

#Tradeoff MSE in x0
max_range = range(Bias^2,Var,MSE)
plot(hvec,MSE,ylim=max_range,type="l",col="blue",lwd=3, xlab="Bandwidth h")

# Tradeoff MISE
max_range = range(Bias_MISE,Var_MISE,MISE)
plot(hvec,MISE,ylim=max_range,type="l",col="blue",lwd=3,xlab="Bandwidth h")

# h optimal for the point x0

# h optimal for any point using the minimal MISE

# h optimal for any point using the rule-of-thumb
h_opt = (6/(n*norm_f_prime))^(1/3)# histogram with h_opt
breaks = floor((min(x))/h_opt):ceiling((max(x))/h_opt)
breaks = breaks*h_opt#Plot the histogram

To start notice that if the binwidth is too small or too large we will get a very bad approximation. As we can see, in the next plot, we will get a terrible fitting of the histogram for $latex h=0.01$ and $latex h=2$:

Comparison of Histograms with diferents Binwidths
Comparison of Histograms with different Binwidths

The plots of the MSE and MISE trade-offs are respectively

Trade-off Bias-Variance of Histogram in x0=0
Trade-off Bias-Variance of Histogram in x0=0
Trade-off Bias-Variance for MISE
Trade-off Bias-Variance for MISE

For the MSE in $latex x_0=0$ the optimal value is $latex h=0.255$. In the case of MISE the optimal value by minimization is $latex h=0.349$ and the real value, using the formula seen before, is $latex h=0.349083021225025$. Finally, we get the best fitted histogram:

Final Histogram
Histogram of 1000 numbers normal standard distributed with h=0.349083021225025

The next week, we will start with density estimation using kernels more generals and we will see that the histogram is, in fact, a particular case of one of them.


  1. Pingback: Kernel density estimation | Blog about Statistics.

  2. Pingback: Optimizing the binwidth for the histogram using cross-validation | Maximum Entropy

  3. Pingback: Optimizing the binwidth for the histogram using cross validation - Maikol Solís

Leave a Reply