Given $n$ independent replicates of a jointly distributed pair $(X,Y)
\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed
sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a
deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk
is small. We investigate the possibility of empirically assessing
the {\em complexity} of each model class, that is, the actual
difficulty of the estimation problem within each class. The estimated
complexities are in turn used to define an adaptive model selection
procedure, which is based on complexity penalized empirical risk.
The available data are divided into two parts. The first is used to
form an empirical cover of each model class, and the second is used
to select a candidate rule from each cover based on empirical risk. The
covering radii are determined empirically to optimize a tight upper
bound on the estimation error. An estimate is chosen from the list of
candidates in order to minimize the sum of class complexity and
empirical risk. A distinguishing feature of the approach is that the
complexity of each model class is assessed empirically, based on the
size of its empirical cover.
Finite sample performance bounds are established for the estimates, and
these bounds are applied to several non-parametric estimation problems.
The estimates are shown to achieve a favorable tradeoff between
approximation and estimation error, and to perform as well as if the
distribution-dependent complexities of the model classes were known
beforehand. In addition, it is shown that the estimate can be consistent,
and even possess near optimal rates of convergence, when each model class
has an infinite VC or pseudo dimension.
For regression estimation with squared loss we modify our estimate to
achieve a faster rate of convergence.