Abstract
Barron (1993) obtained a deterministic approximation rate (in L2-norm) of r-1/2 for a class of single hidden layer feedforward artificial neural networks (ANN) with r hidden units and sigmoid activation functions when the target function satisfies certain smoothness conditions. Hornik, Stinchcombe, White, and Auer (HSWA, 1994) extended Barron's result to a class of ANNs with possibly non-sigmoid activation approximating the target function and its derivatives simultaneously. Recently Makovoz (1996) obtained an improved degree of approximation rate r-(1+1/d)/2 for Barron's ANNs with sigmoid activation function where d is the dimension of the domain of the target function.
When applying Barron's ANNs with sigmoid activation functions to nonparametrically estimate a regression function (the target), Barron (1994) obtained a root mean square convergence rate of OP([n/log n]-1/4) for a minimum complexity regression estimator with i.i.d. observations, where n is the sample size (number of training examples). Unfortunately, this rate is not fast enough to establish root-n asymptotic normality for plug-in estimates of functionals of the regression function, according to a recent result obtained by Chen and Shen (1996).
In this paper, we first obtain an improved approximation rate (in Sobolev norm) of r-1/2-a /(d+1), 0<a <1, for HSWA's ANNs with possibly non-sigmoid activation functions, where a is related to the choice of activation function. We then obtain a root mean square convergence rate of OP ([n/ log (n)]-(1+2a/(d+1))/[4(1+a /(d+1))]) = oP(n-1/4) for general nonparametric ANN sieve extremum estimators, by letting the number of hidden units rn increase with the sample size n on the order of (rn)2(1+a /(d+1)) log (rn) = O(n). Our rates are valid for i.i.d. as well as for uniform mixing and absolutely regular (b-mixing) stationary time series data. Among other things, this rate provides theoretical justification for the popularity of ANN models in fitting multivariate financial data, since many nonlinear financial time series are plausibly modeled as b-mixing processes. In addition, the rate is fast enough to deliver root-n asymptotic normality for plug-in estimates of smooth functionals using general ANN sieve estimators. As interesting applications to nonlinear time series, we establish rates for ANN sieve estimators of three different target functions: a multivariate conditional mean function, a joint density, and a conditional density. We also obtain root-n asymptotic normality results for semiparametric models and average derivative statistics.
* University of Chicago