1. Preliminaries Given a random variable $latex {X}&fg=000000$, we define the cumulative distribution function(or distribution function) as follows,

Briefly, we shall see the definition of a kernel density estimator in the multivariate case. Suppose that the data is d-dimensional so that $latex {X_{i}=(X_{i1},\ldots,X_{id})}&fg=000000$. We will use the product kernel $latex \displaystyle \hat{f}_{h}(x)=\frac{1}{nh_{1}\cdots h_{d}}\left\{ \prod_{j=1}^{d}K\left(\frac{x_{j}-X_{ij}}{h_{j}}\right)\right\} . &fg=000000$ The risk is given by $latex \displaystyle \mathrm{MISE}\approx\frac{\left(\mu_{2}(K)\right)^{4}}{4}\left[\sum_{j=1}^{d}h_{j}^{4}\int f_{jj}^{2}(x)dx+\sum_{j\neq k}h_{j}^{2}h_{k}^{2}\int f_{jj}f_{kk}dx\right]+\frac{\left(\int K^{2}(x)dx\right)^{d}}{nh_{1}\cdots h_{d}} &fg=000000$

Two popular methods to find the bandwidth $latex {h}&fg=000000$ for the nonparametric density estimator are the plug-in method and the method cross-validation. The first one we will focus in the “quick and dirty” plug-in method introduced by Silverman (1986). In cross-validation we will minimize a modified version of the quadratic risk of $latex {\hat{f}_{h}}&fg=000000$. The …

The last post I forget to say that we use Mikownski classes of densities because the MISE is a risk corresponding to the $latex {\mathbb L^2({\mathbb R})}&fg=000000$ norm. Thus, it is natural to assume that $latex {p}&fg=000000$ is smooth with respect to this norm. Another way to describe smoothness in $latex {\mathbb L^{2}({\mathbb R})}&fg=000000$ are …

Photos of Sergey Nikolskii from The Russian Academy of Sciences The MSE gives an error of the estimator $latex {\hat{p}_{n}}&fg=000000$ at an arbitrary point $latex {x_{0}}&fg=000000$, but it is worth to study a global risk for $latex {\hat{p} _{n}}&fg=000000$. The mean integrated squared error (MISE) is an important global measure, $latex \displaystyle \mathrm{MISE}\triangleq\mathop{\mathbb E}_{p}\int\left(\hat{p} _{n}(x)-p(x)\right)^{2}dx &fg=000000$ …

I will make a summary of ideas about nonparametric estimation, including some basics results to develop more advanced theory later. In the first post we talk something about the density estimation and the nonparametric regression. Later, in posts about histogram (I,II,III,IV) , we saw how the histogram is a nonparametric estimator and we studied its …

Today we will apply the ideas of the others post by a simple example. Before, we are going to answer the question of the last week. What is exactly the $latex {h_{opt}}&fg=000000$ if we assume that $latex \displaystyle \displaystyle f(x) = \frac{1}{\sqrt{2\pi}} \text{exp}\left(\frac{-x^2}{2}\right)? &fg=000000$ How $latex {f(x)}&fg=000000$ is the density of standard normal distribution. It is …

Before to continue with today’s post we will answer the question of last week, Is it $latex {\hat{f}_{h}(x)}&fg=000000$ a consistent estimator? The answer is yes. Because convergence in mean squared implies convergence in probability.

We continue our presentation about the estimation of histograms and its statistical properties. Today we will start the theory for reducing the mean squared error. In order to study the statistical properties of $latex {\hat{f}_{h}(x)}&fg=000000$We will start introducing the concept of mean squared error (MSE) or quadratic risk. We define

We are going to introduce the histogram as a simple nonparametric density estimator. I will divide this presentation in several posts for simplicity reasons. Let us $latex {X_1,\ldots,X_n}&fg=000000$ with pdf $latex {f}&fg=000000$. The histogram is the simplest nonparametric estimator of $latex {f}&fg=000000$.