We study model selection strategies based on penalized empirical
loss minimization. We point out a tight relationship between
error estimation and data-based complexity penalization: any
good error estimate may be converted into a data-based penalty
function and the performance of the estimate is governed by
the quality of the error estimate. We consider several penalty
functions, involving error estimates on independent test data,
empirical {\sc vc} dimension, empirical {\sc vc} entropy, ...
We study model selection strategies based on penalized empirical
loss minimization. We point out a tight relationship between
error estimation and data-based complexity penalization: any
good error estimate may be converted into a data-based penalty
function and the performance of the estimate is governed by
the quality of the error estimate. We consider several penalty
functions, involving error estimates on independent test data,
empirical {\sc vc} dimension, empirical {\sc vc} entropy, and
margin-based quantities. We also consider the maximal difference
between the error on the first half of the training data and the
second half, and the expected maximal discrepancy, a closely
related capacity estimate that can be calculated by Monte Carlo
integration. Maximal discrepancy penalty functions are appealing
for pattern classification problems, since their computation is
equivalent to empirical risk minimization over the training data
with some labels flipped.
+