Briefly, we shall see the definition of a kernel density estimator in the multivariate case.

Suppose that the data is d-dimensional so that $latex {X_{i}=(X_{i1},\ldots,X_{id})}&fg=000000$. We will use the product kernel

$latex \displaystyle \hat{f}_{h}(x)=\frac{1}{nh_{1}\cdots h_{d}}\left\{ \prod_{j=1}^{d}K\left(\frac{x_{j}-X_{ij}}{h_{j}}\right)\right\} . &fg=000000$

The risk is given by

$latex \displaystyle \mathrm{MISE}\approx\frac{\left(\mu_{2}(K)\right)^{4}}{4}\left[\sum_{j=1}^{d}h_{j}^{4}\int f_{jj}^{2}(x)dx+\sum_{j\neq k}h_{j}^{2}h_{k}^{2}\int f_{jj}f_{kk}dx\right]+\frac{\left(\int K^{2}(x)dx\right)^{d}}{nh_{1}\cdots h_{d}} &fg=000000$

where $latex {f_{jj}}&fg=000000$ is the second partial derivative of $latex {f}&fg=000000$. The optimal bandwidth satisfies $latex {h_{i}=O(n^{-1/(4+d)})}&fg=000000$ leading to a risk of order $latex {O(n^{-4/(4+d)})}&fg=000000$ (for further details see Hardle (2004)).

The interesting effect of $latex {O(n^{-4/(4+d)})}&fg=000000$ here is that the risk increase exponentially as the dimension grows. We call to this behavior the curse of dimensionality. This phenomena says that the data is more sparse as we increase the dimensionality. This table from Silverman (1986) shows the sample size required to ensure a relative mean squared error less than 0.1 at 0 when the density is multivariate normal and the optimal bandwidth is selected.

 Dimension Sample size 1 4 2 19 3 67 4 223 5 768 6 2790 7 10,700 8 43,700 9 187,000 10 842,000

For this reason it is important to search methods for dimension reduction. One of these methods was proposed by Li (1991) in its article Sliced Inverse Regression for Dimension Reduction. I used this method to find another efficient estimator based in a Taylor approximation (see Solís Chacón, M et al. (2012) ). In a next post I going to talk a little about the details of those articles.

Sources: 