JdS 2012: Efficient estimation of conditional covariance matrices for dimension reduction

In the framework of the Journées de Statisque 2012 in Bruxelles, I presented the paper “Efficient estimation of conditional covariance matrices” made under Jean-Michel Loubes and Clement Marteau direction. You could check the program and the slides of the presentation.

Today I will present you some ideas about the problem studied and the solution found in our research.

Simulations from the article of Li (1991)1. Background: Sliced Inverse Regression

Consider the problem

$latex \displaystyle Y=\varphi(X)+\epsilon, &fg=000000$

where $latex {X\in{\mathbb R}^{p}}&fg=000000$, $latex {Y\in{\mathbb R}}&fg=000000$ and $latex {\mathbb E\epsilon=0}&fg=000000$. If $latex {p}&fg=000000$ is large we are under the problem of the curse of dimensionality.

The idea of Li (1991) was to reformulate the latter regression problem to

$latex \displaystyle Y=\phi(\beta_{1}^{\top}X,\ldots,\beta_{K}^{\top}X,\epsilon), &fg=000000$

where the $latex {\beta}&fg=000000$’s are unknown vectors in $latex {{\mathbb R}^{p}}&fg=000000$, $latex {\epsilon}&fg=000000$ is independent of $latex {X}&fg=000000$ and $latex {\phi}&fg=000000$ is an arbitrary function in $latex {{\mathbb R}^{K+1}.}&fg=000000$ This model can gather all the relevant information about the variable $latex {Y}&fg=000000$, with only the $latex {X}&fg=000000$ projection onto the $latex {K\ll p}&fg=000000$ dimensional subspace $latex {(\beta_{1}^{\top}X,\ldots,\beta_{K}^{\top}X)}&fg=000000$. In the case when $latex {K}&fg=000000$ is small, it is possible to reduce the dimension by estimating the $latex {\beta}&fg=000000$’s efficiently.

Li showed that the eigenvectors of $latex {\mathop{\mathrm{Cov}}\left(\mathbb E\left(X\vert Y\right)\right)}&fg=000000$ spans the same subspace that the $latex {\beta}&fg=000000$’s . We will call them the effective dimension reduction directions (e.d.r.d’s). The natural question is: Why does these $latex {\beta}&fg=000000$’s spans the desire subspace?

1.1. Inverse Regression Curve

We will assume that For any $latex {b\in\mathbb{R}^{p}}&fg=000000$, there are constants $latex {c_{0},\ldots,c_{K}}&fg=000000$ such that

$latex \displaystyle \mathbb{E}(bX|\beta_{1}^{\top}X,\ldots,\beta_{K}^{\top}X)=c_{0}+c_{1}\beta_{1}^{\top}X+\cdots+c_{1}\beta_{K}^{\top}X. &fg=000000$

This condition is valid when $latex {X}&fg=000000$ is elliptically symmetric distributed (e.g. the normal distribution).  Li showed that

 Under the model $latex {Y=\phi(\beta_{1}^{\top}X,\ldots,\beta_{K}^{\top}X,\epsilon)}&fg=000000$ and Condition 1, the centered inverse regression curve $latex {\mathbb{E}(X|Y)-E(X)}&fg=000000$, belongs to the linear subspace spanned by $latex {\beta_{k}\Sigma_{XX}\ (k=1,\ldots,K)}&fg=000000$, where $latex {\Sigma_{XX}}&fg=000000$ is the covariance matrix of $latex {X}&fg=000000$.

Assume that $latex {X}&fg=000000$ has been standardized to $latex {Z}&fg=000000$. Define $latex {\eta_{k}=\beta_{k}^{\top}\Sigma_{XX}^{1/2}}&fg=000000$. Then;

Under the same hypothesis of the Theorem 1 the standardized inverse regression curve $latex {\mathbb{E}(Z|Y)}&fg=000000$ is contained in the linear space spanned by $latex {\eta_{1},\ldots,\eta_{K}}&fg=000000$ (the standardized e.d.r.d).

With this Corollary we conclude that $latex {\mathop{\mathrm{Cov}}\left(\mathbb E\left(Z\vert Y\right)\right)}&fg=000000$ is degenerated in any direction orthogonal to the $latex {\eta_{k}}&fg=000000$’s. Let $latex {v\perp\mathrm{span}(\eta_{1},\ldots,\eta_{K})}&fg=000000$. By the Corollary 1, we have that $latex {\mathbb E\left(Z\vert Y\right)^{\top}v=0}&fg=000000$. Thus, $latex {\mathop{\mathrm{Cov}}\left(\mathbb E\left(Z\vert Y\right)\right)v=\mathbb E\left(\mathbb E\left(Z\vert Y\right)\mathbb E\left(Z\vert Y\right)^{\top}\right)v=0}&fg=000000$.

It means, the eigenvectors ($latex {\eta_{1},\ldots,\eta_{K}}&fg=000000$) associated with the largest $latex {K}&fg=000000$ eigenvalues of $latex {\mathop{\mathrm{Cov}}\left(\mathbb E\left(Z\vert Y\right)\right)}&fg=000000$ span the standardized subspace e.d.r. Therefore, $latex {\eta_{k}\Sigma_{XX}^{-1/2}\ (k=1,\ldots,K)}&fg=000000$ are the e.d.r. directions.

The main issue then is to estimate $latex {\mathop{\mathrm{Cov}}\left(\mathbb E\left(Z\vert Y\right)\right)}&fg=000000$. For this, he proposed the following algorithm:

Suppose a sample $latex {(X_{i},Y_{i}),(i=1,\ldots,n)}&fg=000000$:

  • $latex {\left(i\right)}&fg=000000$ Normalize $latex {X_{i}}&fg=000000$ to $latex {Z_{i}=\hat{\Sigma}^{-1/2}(X_{i}-\bar{X})}&fg=000000$, where $latex {\hat{\Sigma}}&fg=000000$ and $latex {\bar{X}}&fg=000000$ are the empirical covariance and mean, respectively.
  • $latex {\left(ii\right)}&fg=000000$ Divide the rang of $latex {Y}&fg=000000$ in $latex {H}&fg=000000$ cuts, $latex {I_{1},\ldots,I_{H}}&fg=000000$. Denote $latex {\hat{p}_{h}}&fg=000000$ the proportion of $latex {Y_{i}}&fg=000000$’s which belongs to $latex {I_{h}}&fg=000000$.
  • $latex {\left(iii\right)}&fg=000000$ In each cut, compute empirical mean of $latex {Z_{i}}&fg=000000$. That means, $latex {{\displaystyle \hat{m}_{h}=\frac{1}{n\hat{p}_{h}}\sum_{Y_{i}\in I_{h}}Z_{i}}}&fg=000000$.
  • $latex {\left(iv\right)}&fg=000000$ Build the matrix $latex {\hat{V}=\sum_{h=1}^{H}\hat{p}_{h}\hat{m}_{h}\hat{m}_{h}^{\top}}&fg=000000$, then compute the eigenvalues and eigenvectors of $latex {\hat{V}}&fg=000000$ (Weighted Principal Components Analysis ).
  • $latex {\left(v\right)}&fg=000000$ Associate the $latex {K}&fg=000000$ first largest eigenvalues to their respective eigenvectors and call them $latex {\eta_{k}\ (k=1,\ldots,K)}&fg=000000$. The output is $latex {\hat{\beta}_{k}=\eta_{k}\hat{\Sigma}^{-1/2}}&fg=000000$.

2. New estimator
Since here, we have written only of the dimension reduction problem motivation. To solve this problem, we will build an new type of estimator for the matrix

$latex \displaystyle \mathop{\mathrm{Cov}}\left(\mathbb E\left(X\vert Y\right)\right)=\mathbb E\left(\mathbb E\left(X\vert Y\right)\mathbb E\left(X\vert Y\right)^{\top}\right)-\left(\mathbb E X\right)\left(\mathbb E X\right)^{\top}, &fg=000000$

using ideas developed by Da Veiga & Gamboa (2012) , inspired by the earlier work of Laurent (1996). More precisely, since $latex {\mathbb E X\mathbb E X^{\top}}&fg=000000$ can be easily estimated with many usual methods, we will focus on finding an estimator of $latex {\mathbb E\left(\mathbb E\left(X\vert Y\right)\mathbb E\left(X\vert Y\right)^{\top}\right)}&fg=000000$. For this, we will show that this estimation implies a quadratic functional estimator. This method has the advantage of getting an efficient estimator in a semi-parametric framework.

In the next paragraphs I am  going to make a little summary for the method tobuild this estimator and study its asymptotic properties.

3. Methodology for this new estimator

We can estimate the quadratic term efficiently using the non linear functional for the squared integrable functions, denoted as $latex {f\in\mathbb L^{2}(dx_{i},dx_{j},dy)}&fg=000000$ such that $latex f\mapsto T_{ij}(f)&fg=000000$

$latex \displaystyle \int\left(\frac{\int x_{i}f(x_{i},x_{j},y)dx_{i}dx_{j}}{\int f(x_{i},x_{j},y)dx_{i}dx_{j}}\right)\left(\frac{\int x_{j}f(x_{i},x_{j},y)dx_{i}dx_{j}}{\int f(x_{i},x_{j},y)dx_{i}dx_{j}}\right)f(x_{i},x_{j},y)dx_{i}dx_{j}dy. \ \ \ \ \ (1)&fg=000000$

In the case $latex {i=j}&fg=000000$, Da Veiga & Gamboa (2012) and Laurent (1996) considered theses kind of estimators in their works. Here we extend their method to this case. Assume we have at hand an i.i.d sample $latex {(X_{i}^{(k)},X_{j}^{(k)},Y^{(k)}),\ k=1,\ldots,n}&fg=000000$ such that it is possible to build a preliminary estimator $latex {\hat{f}}&fg=000000$ of $latex f &fg=000000$ with a subsample of size $latex {n_{1}<n}&fg=000000$. The Taylor’s expansion of $latex {T_{ij}(f)}&fg=000000$ in a neighborhood of $latex {\hat{f}}&fg=000000$ is:

$latex \displaystyle T_{ij}(f)=\int H_{1}(\hat{f},x_{i},x_{j},y)f(x_{i},x_{j},y)dx_{i}dx_{j}dy
\\+\int H_{2}(\hat{f},x_{i1},x_{j2},y)f(x_{i1},x_{j1},y)f(x_{i2},x_{j2},y)dx_{i1}dx_{j1}dx_{i2}dx_{j2}dy+\Gamma_{n}&fg=000000$

where $latex {H_{1}(\hat{f},x_{i},x_{j},y)}&fg=000000$ is a linear functional, $latex {H_{2}(\hat{f},x_{i1},x_{j2},y)}&fg=000000$ is a quadratic one and $latex {\Gamma_{n}}&fg=000000$ is an error term.

4. Efficient Estimation of $latex {T_{ij}(f)}&fg=000000$

We build the following estimator, projecting $latex {f}&fg=000000$ in a basis orthonormal $latex {(p_{l}(x_{i},x_{j},y))_{l\in D}}&fg=000000$ of $latex {\mathbb{L}^{2}(dx_{i}dx_{j}dy)}&fg=000000$, where $latex {M_{n}\subseteq D}&fg=000000$ is countable:

$latex \displaystyle\widehat{T}_{ij}^{(n)}=\frac{1}{n_{2}}\sum_{k=1}^{n_{2}}H_{1}(\hat{f},X_{i}^{(k)},X_{j}^{(k)},Y^{(k)})\\
+\frac{1}{n_{2}(n_{2}-1)}\sum_{l\in M}\sum_{k\neq k^{\prime}=1}^{n_{2}}p_{l}(X_{i}^{(k)},X_{j}^{(k)},Y^{(k)})\\
\int p_{l}(x_{i},x_{j},Y^{(k^{\prime})})H_{3}(\hat{f},x_{i},x_{j},X_{i}^{(k^{\prime})},X_{j}^{(k^{\prime})},Y^{(k^{\prime})})dx_{i}dx_{j}\\
-\frac{1}{n_{2}(n_{2}-1)}\sum_{l,l^{\prime}\in M}\sum_{k\neq k^{\prime}=1}^{n_{2}}p_{l}(X_{i}^{(k)},X_{j}^{(k)},Y^{(k)})p(X_{i}^{(k^{\prime})},X_{j}^{(k^{\prime})},Y^{(k^{\prime})})\\
\int p_{l}(x_{i1},x_{j1},y)p_{l^{\prime}}(x_{i2},x_{j2},y)H_{2}(\hat{f},x_{i1},x_{j2},y)dx_{i1}dx_{j1}dx_{i2}dx_{j2}dy.&fg=000000$

where $latex {H_{3}(f,x_{i1},x_{j1},x_{i2},x_{j2},y)=H_{2}(f,x_{i1},x_{j2},y)+H_{2}(f,x_{i2},x_{j1},y)}&fg=000000$ and $latex {n_{2}=n-n_{1}}&fg=000000$ and $latex {\Gamma_{n}=O(1/n)}&fg=000000$.

We can show that

$latex \displaystyle \sqrt{n}\bigl(\widehat{T}_{ij}^{(n)}-T_{ij}(f)\bigr)\rightsquigarrow N(0,C_{ij}(f)), \ \ \ \ \ (2)&fg=000000$

and

$latex \displaystyle \lim_{n\rightarrow\infty}n\mathbb E\left(\widehat{T}_{ij}^{(n)}-T_{ij}(f)\right)^{2}=C_{ij}(f), \ \ \ \ \ (3)&fg=000000$

where

$latex \displaystyle C_{ij}(f)=\mathop{\mathrm{Var}}\left(H_{1}(f,X_{i},X_{j},Y)\right). &fg=000000$

Indeed, we can show the this estimator attain the semi-parametric Cramér-Rao bound

$latex \displaystyle \inf_{\{\mathcal{V}_{r}(f_{0})\}_{r>0}}\liminf_{n\rightarrow\infty}\sup_{f\in\mathcal{V}_{r}(f_{0})}n\mathbb E\left(\widehat{T}_{ij}^{(n)}-T_{ij}(f_{0})\right)^{2}\geq C_{ij}(f_{0}) &fg=000000$

where $latex {\mathcal{V}_{r}(f_{0})=\left\{ f:\Vert f-f_{0}\Vert<r\right\} }&fg=000000$ for $latex {r>0}&fg=000000$ and $latex {f_{0}}&fg=000000$ is the true density.Let define the estimator matrix $latex {\boldsymbol{\widehat{T}}^{(n)}=\bigl(\widehat{T}_{ij}^{(n)}\bigr)}&fg=000000$ and $latex {\boldsymbol{H_{1}}(f)}&fg=000000$ denote the matrix with entries $latex {\left(H_{1}(f,x_{i},x_{j},y)\right)_{i,j}}&fg=000000$. Now we can show the asymptotic normality of $latex {\boldsymbol{\widehat{T}}^{(n)}}&fg=000000$. In other words,

$latex \displaystyle \sqrt{n}\mathop{\mathrm{vech}}\bigl(\boldsymbol{\widehat{T}}^{(n)}-\boldsymbol{T}(f)\bigr)\rightsquigarrow{\mathbb N}\left(0,\boldsymbol{C}(f)\right), &fg=000000$

$latex \displaystyle \lim_{n\rightarrow\infty}n\mathbb E\left(\mathop{\mathrm{vech}}\bigl(\boldsymbol{\widehat{T}}^{(n)}-\boldsymbol{T}(f)\bigr)\mathop{\mathrm{vech}}\bigl(\boldsymbol{\widehat{T}}^{(n)}-\boldsymbol{T}(f)\bigr)\right)^{\top}=\boldsymbol{C}(f) &fg=000000$

where

$latex \displaystyle \boldsymbol{C}(f)=\mathop{\mathrm{Cov}}\left(\mathop{\mathrm{vech}}(\boldsymbol{H_{1}}(f))\right). &fg=000000$

References

  • Da Veiga, S. & Gamboa, F. (2012). Efficient Estimation of Sensitivity Indices .
  • Laurent, B. (1996). Efficient estimation of integral functionals of a density. The Annals of Statistics, 24(2), 659–681.
  • Li, K. C. (1991a). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414), 316–327.
  • Solís Chacón, M., Loubes, J. M., Marteau, C., & Da Veiga, S. (2011). Efficient estimation of conditional covariance matrices for dimension reduction.

Comments

  1. Pingback: Summary JdS 2012 « Maximum Entropy

  2. Pingback: Paper’s review: Zhu & Fang, 1996. Asymptotics for kernel estimate of sliced inverse regression. | Maximum Entropy

Leave a Reply