Introduction to Minimax Lower Bounds

In my most recent research, I’m working on finding “Minimax Lower Bounds” for some kind of estimators. Therefore,  to learn a little more and get my ideas clear, I’ll going to start a series of posts about the topic. I pretend to make some review in the general method and introduce some bounds depending on the divergence between two probability measures. Also, I want to study the classic results of Le Cam, Fano and Assouad. I hope that these publications are very educational for all of us.

A journey of a thousand miles begins with a single step. Lao-tzu. Photo credit: Larry Becker

In older posts (here, here, here and here), we have already analyzed upper bounds for the MSE or MISE which have the form

$latex \displaystyle \sup_{f\in\mathcal{F}}\mathbb E\left[d^{2}(\hat{f}_{n},f)\right]\leq C\psi_{n} &fg=000000$

where $latex {\psi_{n}^{2}}&fg=000000$ is going to zero (for the density estimation case $latex {\psi_{n}=n^{-4/5}}&fg=000000$) and $latex {C<\infty}&fg=000000$. We would like to know if these rates are tight, in the sense that there is no other estimator that is better. To make this, we define a lower bound like

$latex \displaystyle \inf_{\hat{f}}\sup_{f\in\mathcal{F}}\mathbb E\left[d^{2}(\hat{f}_{n},f)\right]\geq c\psi_{n}. &fg=000000$

We will assume the following three element:

  • A class of functions $latex {\mathcal{F}}&fg=000000$ , containing the “true” function $latex {f}&fg=000000$. For example, $latex {\mathcal{F}}&fg=000000$ could be a Hölder class $latex {\Sigma(\beta,L)}&fg=000000$ or a Sobolev class $latex {W(\beta,L)}&fg=000000$.
  • A probability measure family $latex {\{\mathbb{P}_{f},f\in\mathcal{F}\}}&fg=000000$, indexed by $latex {\mathcal{F}}&fg=000000$ on a measurable space $latex {(\mathcal{X},\mathcal{A})}&fg=000000$ associated with the data. For example, in the density model, $latex {\mathbb{P}_{f}}&fg=000000$ is the probability measure associated with a sample $latex {(X_{1},\ldots,X_{n})}&fg=000000$ when the $latex {X_{i}}&fg=000000$’s density is $latex {f}&fg=000000$.
  • A distance $latex {d(\cdot,\cdot)\geq0}&fg=000000$ which compares the performance between the estimate $latex {\hat{f}_{n}}&fg=000000$ and the real value $latex {f}&fg=000000$. For example $latex {d(f,g)=\Vert f-g\Vert_{2}}&fg=000000$, $latex {d(f,g)=\vert f(x_{0})-g(x_{0})\vert}&fg=000000$ for $latex {x_{0}}&fg=000000$ fix.
    In fact, it is enough use a semi-distance which satisfies the triangle inequality.

Define, for a stochastic model $latex {\{\mathbb{P}_{f},f\in\mathcal{F}\}}&fg=000000$ and a distance $latex {d}&fg=000000$, the minimax riskas

$latex \displaystyle \mathcal{R}^{*}\triangleq\inf_{\hat{f}}\sup_{f\in\mathcal{F}}\mathbb E\left[d^{2}(\hat{f}_{n},f)\right] &fg=000000$

where the $latex {\inf_{\hat{f}}}&fg=000000$ is taken over all estimators. The main goal is to get a result of the form

$latex \displaystyle \mathcal{R}^{*}\geq c\psi_{n} &fg=000000$

for $latex {c>0}&fg=000000$ and $latex {\psi_{n}\rightarrow0}&fg=000000$.

Suppose that we have proved that

$latex \displaystyle \liminf_{n\rightarrow\infty}\psi_{n}^{-1}\mathcal{R}^{*}\geq c>0 \ \ \ \ \ (1)&fg=000000$

and if for a particular $latex {\hat{f}_{n}}&fg=000000$

$latex \displaystyle \limsup_{n\rightarrow\infty}\sup_{f\in\mathcal{F}}\psi_{n}^{-1}\mathbb E\left[d^{2}(\hat{f}_{n},f)\right]\leq C. &fg=000000$

It implies directly that

$latex \displaystyle \limsup_{n\rightarrow\infty}\psi_{n}^{-1}\mathcal{R}^{*}\leq C. \ \ \ \ \ (2)&fg=000000$

If inequalities (1) and (2) are satisfied simultaneously, we say that $latex {\psi_{n}}&fg=000000$ is the optimal rate of convergence for this problem and that $latex {\hat{f}_{n}}&fg=000000$ attains that rate.

Remark 1 Two rates of convergence $latex {\psi_{n}}&fg=000000$ and $latex {\psi_{n}^{\prime}}&fg=000000$ are equivalent (we write $latex {\psi_{n}\asymp\psi_{n}^{\prime}}&fg=000000$) if

$latex \displaystyle 0<\liminf_{n\rightarrow\infty}\frac{\psi_{n}}{\psi_{n}^{\prime}}\leq\limsup_{n\rightarrow\infty}\frac{\psi_{n}}{\psi_{n}^{\prime}}<\infty. &fg=000000$

The big question here is: How can we prove the equation (1) if it is necessary bound $latex {\mathcal{R}^{*}}&fg=000000$ for all the estimators $latex {\hat{f}_{n}}&fg=000000$ of $latex {f}&fg=000000$?

At a first glance, it will be an impossible task, just imagine the massive quantity of possible estimators for $latex {f}&fg=000000$. Fortunately, we can apply a “reduction procedure” in order to simplify the problem and find the bound required. We will study it the next post.

As always any thoughts, suggestions or improvements are welcome in the commentaries.

 

Comments

  1. Pingback: A general reduction schema for minimax lower bounds | Blog about Statistics

  2. Pingback: A first minimax lower bound in the two hypothesis scenario | Blog about Statistics

  3. Pingback: Minimax Lower Bounds using the Total Variation Divergence | Blog about Statistics

Leave a Reply