A simple trick for estimating the weight decay parameter

Title: A simple trick for estimating the weight decay parameter
Author: Thorsteinn Rögnvaldsson

Year: 1998

PublicationType: Conference Paper

HostPublication: Neural Networks : Tricks of the Trade

DOI: http://dx.doi.org/10.1007/3-540-49430-8_4

Diva url: http://hh.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:541008

Abstract | We present a simple trick to get an approximate estimate of the weight decay parameter lambda. The method combines early stopping and weight decay, into the estimate lambda=parallel to del E(W(es))parallel to/parallel to 2W(es)parallel to, where W(es) is the set of weights at the early stopping point, and E(W) is the training data fit error. The estimate is demonstrated and compared to the standard cross-validation procedure for lambda selection on one synthetic and four real life data sets. The result is that lambda is as good an estimator for the optimal weight decay parameter value as the standard search estimate, but orders of magnitude quicker to compute. The results also show that weight decay can produce solutions that are significantly superior to committees of networks trained with early stop ping. |