# A global measure of risk for kernel estimators in Nikolski classes

###### Photos of Sergey Nikolskii from The Russian Academy of Sciences

The MSE  gives an error of the estimator $latex {\hat{p}_{n}}&fg=000000$ at an arbitrary point $latex {x_{0}}&fg=000000$, but it is worth to study a global risk for $latex {\hat{p} _{n}}&fg=000000$. The mean integrated squared error (MISE) is an important global measure,

$latex \displaystyle \mathrm{MISE}\triangleq\mathop{\mathbb E}_{p}\int\left(\hat{p} _{n}(x)-p(x)\right)^{2}dx &fg=000000$

which by Fubini Theorem and the MSE bias-variance decomposition, we have

$latex \displaystyle \mathrm{MISE}=\int\mathrm{MSE} dx=\int b^{2}(x)dx+\int\sigma^{2}(x)dx. &fg=000000$

We proceed as the same way as the $latex {\mathrm{MSE}}&fg=000000$, we analyze by apart the bias term$latex {\int b^{2}(x)dx}&fg=000000$ and the variance$latex {\int\sigma^{2}(x)dx}&fg=000000$.
1.1. Variance term of MISE

Lets study first the variance term

Proposition 1 Suppose that $latex {K:{\mathbb R}\rightarrow{\mathbb R}}&fg=000000$ is a function satisfying

$latex \displaystyle \int K^{2}(u)du<\infty. &fg=000000$

Then for any $latex {h>0}&fg=000000$, $latex {n\geq1}&fg=000000$ and any probability density $latex {p}&fg=000000$ we have

$latex \displaystyle \int\sigma^{2}(x)dx\leq\frac{1}{nh}\int K^{2}(u)du. &fg=000000$

Proof: In Proposition 2 of the last post we got

$latex \displaystyle \sigma^{2}(x)=\frac{1}{nh^{2}}\mathop{\mathrm{Var}}\left(K\left(\frac{X_{1}-x_{0}}{h}\right)\right)\leq\frac{1}{nh^{2}}\mathop{\mathbb E}_{p}\left[K^{2}\left(\frac{X_{1}-x_{0}}{h}\right)\right] &fg=000000$

for all $latex {x\in{\mathbb R}}&fg=000000$. Therefore

$latex \displaystyle \begin{array}{rl} \int\sigma^{2}(x)dx\leq & \displaystyle\frac{1}{nh^{2}}\int\left[\int K^{2}\left(\frac{z-x}{h}\right)p(z)dz\right]dx\\ = & \displaystyle\frac{1}{nh^{2}}\int p(z)\left[\int K^{2}\left(\frac{z-x}{h}\right)dx\right]dz\\ = & \displaystyle\frac{1}{nh^{2}}\int K^{2}\left(u\right)dx. \end{array} &fg=000000$

$latex \Box&fg=000000$

1.2. Bias term of MISE

For the bias term, it is possible to control it only in a subset of smooth densities. For example we assume that $latex {p}&fg=000000$ belongs to a Nikol’ski class of functions defined as follows.

Definition 2 Let $latex {\beta>0}&fg=000000$ and $latex {L>0}&fg=000000$. The Nikol’ski class $latex {\mathcal{H}(\beta,L)}&fg=000000$ is the set of functions $latex {f:{\mathbb R}\rightarrow{\mathbb R}}&fg=000000$ whose derivatives $latex {f^{(l)}}&fg=000000$ of order $latex {l=\left\lfloor \beta\right\rfloor }&fg=000000$ exist and satisfy

$latex \displaystyle \left[\int\left(f^{(l)}(x+t)-f^{(l)}(x)\right)^{2}dx\right]^{1/2}\leq L\left|t\right|^{\beta-l},\quad\forall t\in R. &fg=000000$

The next inequality will be very useful in Proposition 3.

Lemma (Generalized Minkowski inequality):For any Borel function g $latex {{\mathbb R}\times{\mathbb R}}&fg=000000$, we have

$latex \displaystyle \int\left(\int g(u,x)du\right)^{2}\leq\left[\int\left(\int g^{2}(u,x)dx\right)^{\text{1/2}}du\right]^{2}. &fg=000000$

We will assume that $latex {p}&fg=000000$ belongs to the following class of densities

$latex \displaystyle \mathcal{P_{H}}(\beta,L)=\left\{ p\in\mathcal{H}(\beta,L)\left|p\geq0\quad\text{and}\quad\int p(x)dx=1\right.\right\} . &fg=000000$

Proposition 3 Assume that $latex {p\in\mathcal{P_{H}}(\beta,L)}&fg=000000$ and let $latex {K}&fg=000000$ be a kernel of order $latex {l=\left\lfloor \beta\right\rfloor }&fg=000000$ satisfying

$latex \displaystyle \int|u|^{\beta}|K(u)|du<\infty. &fg=000000$

Then, for any $latex {h>0}&fg=000000$ and $latex {n\geq1}&fg=000000$,

$latex \displaystyle \int b^{2}(x)dx\leq C_{2}^{2}h^{2\beta}, &fg=000000$

where

$latex \displaystyle C_{2}=\frac{L}{l!}\int|u|^{\beta}|K(u)|du. &fg=000000$

Proof: For any $latex {x\in{\mathbb R}}&fg=000000$, $latex {u\in{\mathbb R}}&fg=000000$, $latex {h>0}&fg=000000$ and write the Taylor expansion

$latex \displaystyle p(x+uh)=p(x)+p^{\prime}(x)uh+\cdots+\frac{(uh)^{l}}{(l-1)!}\int_{0}^{1}(1-\tau)^{l-1}p^{(l)}(x+\tau uh)d\tau. &fg=000000$

Since the kernel $latex {K}&fg=000000$ is of order $latex {l=\left\lfloor \beta\right\rfloor }&fg=000000$ we obtain

$latex \displaystyle \begin{array}{rl} b(x)= & \displaystyle\int K(u)\frac{(uh)^{l}}{(l-1)!}\left[\int_{0}^{1}(1-\tau)^{l-1}p^{(l)}(x+\tau uh)d\tau\right]du\\ = & \displaystyle\int K(u)\frac{(uh)^{l}}{(l-1)!}\left[\int_{0}^{1}(1-\tau)^{l-1}\left(p^{(l)}(x+\tau uh)-p^{(l)}(x)\right)d\tau\right]du \end{array} &fg=000000$

Applying twice the generalized Minkowski inequality and using the fact that $latex {p}&fg=000000$ belongs to the class $latex {\mathcal{H}(\beta,L)}&fg=000000$, we get the following upper bound

$latex \displaystyle \begin{array}{rl} \int b^{2}(x)dx\leq & \int\int\left(|K(u)|\frac{|uh|^{l}}{(l-1)!}\right.\\ & \left.\left[\int_{0}^{1}(1-\tau)^{l-1}\left|p^{(l)}(x+\tau uh)-p^{(l)}(x)\right|d\tau\right]du\right)^{2}dx\\ \leq & \left(\int\left(\int\left(|K(u)|\frac{|uh|^{l}}{(l-1)!}\right)^{2}\right.\right.\\ & \left.\left.\left[\int_{0}^{1}(1-\tau)^{l-1}\left|p^{(l)}(x+\tau uh)-p^{(l)}(x)\right|d\tau\right]^{2}dx\right)^{\frac{1}{2}}du\right)^{2}\\ \leq & \left(\int|K(u)|\frac{|uh|^{l}}{(l-1)!}\right.\\ & \left.\left(\int\left[\int_{0}^{1}(1-\tau)^{l-1}\left|p^{(l)}(x+\tau uh)-p^{(l)}(x)\right|d\tau\right]^{2}dx\right)^{\frac{1}{2}}du\right)^{2}\\ \leq & \left(\int|K(u)|\frac{|uh|^{l}}{(l-1)!}\right.\\ & \left.\left(\int_{0}^{1}(1-\tau)^{l-1}\left[\int\left(p^{(l)}(x+\tau uh)-p^{(l)}(x)\right)^{2}dx\right]^{\frac{1}{2}}d\tau\right)du\right)^{2}\\ \leq & \left(\int|K(u)|\frac{|uh|^{l}}{(l-1)!}\left(\int_{0}^{1}(1-\tau)^{l-1}L|uh|^{\beta-l}d\tau\right)du\right)^{2}\\ = & C_{2}^{2}h^{2\beta}. \end{array} &fg=000000$

$latex \Box&fg=000000$

Summarizing Propositions 1 and 3 we find

$latex \displaystyle \mathrm{MISE}\leq C_{2}^{2}h^{2\beta}+\frac{1}{nh}\int K^{2}(u)du, &fg=000000$

and minimizing with respect $latex {h}&fg=000000$ we get

$latex \displaystyle h_{n}^{*}=\left(\frac{\int K^{2}(u)du}{2\beta C_{2}^{2}}\right)^{1/(2\beta+1)}n^{-1/(2\beta+1)}. &fg=000000$

Lastly, taking $latex {h=h_{n}^{*}}&fg=000000$ we see that

$latex \displaystyle \mathrm{MISE}=O\left(n^{-2\beta/(2\beta+1)}\right),\quad n\rightarrow\infty. &fg=000000$

This rate is exactly the same as the $latex {\mathrm{MSE}}&fg=000000$.

Recapitulating, for $latex {\alpha>0}&fg=000000$ and $latex {h=\alpha n^{-2\beta/(2\beta+1)}}&fg=000000$, then the kernel estimator of $latex {\hat{p} _{n}}&fg=000000$ satisfies

$latex \displaystyle \sup_{p\in\mathcal{P_{H}}(\beta,L)}\mathop{\mathbb E}_{p}\int\left(\hat{p} _{n}(x)-p(x)\right)^{2}dx\leq Cn^{-2\beta/(2\beta+1)}, &fg=000000$

where $latex {C>0}&fg=000000$ is a constant depending only on $latex {\beta}&fg=000000$, $latex {L}&fg=000000$, $latex {\alpha}&fg=000000$ and the kernel $latex {K}&fg=000000$.

For the next time we will change the space of densities and see what happens there. Also if the time allow me, we will see the main problem of kernel estimators and some ideas of how to fix it.

As always any comment/suggestion/idea is welcome.

Happy 2012 for all and see you the next year!

Source:
Tsybakov, A. (2009). Introduction to nonparametric estimation. Springer.