How E ective are Neural Networks at Forecasting and Prediction?A Review and Evaluation
MONICA ADYA 1*AND FRED COLLOPY 2
1
University of Maryland at Baltimore County,USA 2Ca Western Rerve University,USA
ABSTRACT
Despite increasing applications of arti®cial neural networks (NNs)to fore-casting over the past decade,opinions regarding their contribution are mixed.Evaluating rearch in this area has been di cult,due to lack of clear criteria.We identi®ed eleven guidelines that could be ud in evaluat-ing this literature.Using the,we examined applications of NNs to business forecasting and prediction.We located 48studies done between 1988and 1994.For each,we evaluated how e ectively the propod tech-nique was compared with alternatives (e ectiveness of validation)and how well the technique was implemented (e ectiveness of implementation).We found that eleven of the studies were both e ectively validated and imple-mented.Another eleven studies were e ectively validated and produced positive results,even though there were some problems with respect to the quality of their NN imple
mentations.Of the 22studies,18supported the potential of NNs for forecasting and prediction.#1998John Wiley &Sons,Ltd.
KEY WORDS arti®cial intelligence;machine learning;validationshld
INTRODUCTION
An arti®cial neural network (NN)is a computational structure modelled looly on biological process.NNs explore many competing hypothes simultaneously using a massively parallel network compod of non-linear relatively computational elements interconnected by links with variable weights.It is this interconnected t of weights that contains the knowledge generated by the NN.NNs have been successfully ud for low-level cognitive tasks such as speech recognition and character recognition.They are being explored for decision support and knowledge induction (Shocken and Ariav,1994;Dutta,Shekhar and Wong,1994;Yoon,Guimaraes,and Swales 1994).In general,NN models are speci®ed by network topology,node characteristics,and training or learning rules.NNs are compod of a large number of simple processing units,each interacting CCC 0277±6693/98/050481±15$17.50#1998John Wiley &Sons,Ltd.
Journal of Forecasting J.Forecast.17,481±495(1998)
*Correspondence to:Monica Adya,Department of Information Systems,University of Maryland at Baltimore County,Baltimore,MD 21250,USA,E-mail:adya@umbc.edu
do the best482Monica Adya and Fred Collopy
with others via excitatory or inhibitory connections.Distributed reprentation over a large number of units,together with interconnectedness among processing units,provides a fault tolerance.Learning is achieved through a rule that adapts connection weights in respon to input patterns.Alterations in the weights associated with the connections permits adaptability to new situations(Ralston and Reilly,1993).Lippmann(1987)surveys the wide variety of top-ologies that are ud to implement NNs.
Over the past decade,increasing rearch e orts have been directed at applying NNs to business situations.Despite this,opinions about the value of the technique have been mixed. Some consider them e ective for unstructured decision-making Dutta et al.,1994); other rearchers have expresd rervations about their potential,suggesting that stronger empirical evidence is Chat®eld,1993).
The structure of this paper is as follows.First,we explain how studies were lected.Then we describe the criteria that we ud to evaluate them.Next,we discuss our®ndings when we applied th
e criteria to the studies.Finally,we make some recommendations for improving rearch in this area.
HOW STUDIES WERE SELECTED
We were interested in the extent to which studies in NN rearch have contributed to improve-ments in the accuracy of forecasts and predictions in business.We arched three computer databas(the Social Science Citation Index,and the Science Citation Index,and ABI Inform) and the proceedings of the IEEE/INNS Joint International Conferences.Our arch yielded a wide range of forecasting and prediction-oriented applications,from weather forecasting to predicting stock prices.For this evaluation we eliminated studies related to weather,biological process,purely mathematical ries,and other non-business applications.We identi®ed additional studies through citations.This process yielded a total of46studies.We subquently surveyed primary authors of the studies to determine if our interpretation of their work was accurate and to locate any other studies that should be included in this review.Twelve(26%)of the authors responded and two identi®ed one additional study each.The two were included in the review.The current review,therefore,includes48studies between1988and1994that ud NNs for business forecasts and predictions.
CRITERIA USED TO EVALUATE THE STUDIES
In evaluating the studies,we were interested in answering two questions.First,did the study appropriately evaluate the predictive capabilities of the propod network?Second,did the study implement the NN in such a way that it stood a reasonable chance of performing well?We call the e ectiveness of validation and e ectiveness of implementation respectively.
E ectiveness of validation
There is a well-established tradition in forecasting rearch of comparing techniques on the basis of empirical results.If a new approach is to be taken riously,it must be evaluated in terms of alternatives that are or could be ud.If such a comparison is not conducted it is di cult to argue that the study has taught us much about the value of NNs for forecasting.In fairness to the
#1998John Wiley&Sons,Ltd.J.forecast.17,481±495(1998)
E ectiveness of Neural Networks483 rearchers conducting the studies,it should be noted that this is not always their objective. Sometimes they are using the forecasting or prediction ca as a vehicle to explore the dynamics of a particular technique or domain.(For instance,Piramuthu,Shaw and Gentry,1994,pro-pod the u of a modi®ed backpropagation algorithm and tested it in the domain of loan evaluations.)Still,our purpo here is to answer the question,what do the technique
s con-tribute to our understandings and abilities as forecasters?backpack
To evaluate the e ectiveness of validation,we applied the three guidelines described in Collopy,Adya and Armstrong(1994).
Comparisons with well-accepted models
设计师资格证Forecasts from a propod model should perform at least as well as some well-accepted reference models.For example,if a propod model does not produce forecasts that are at least as accurate as tho from a naive extrapolation(random walk),it cannot really be argued that the modelling process contributes knowledge about the trend.
U of ex ante validations
Comparison of forecasts should be bad on ex ante(out-of-sample)performance.In other words,the sample ud to test the predictive capabilities of a model must be di erent from the samples ud to develop and train the model.This matches the conditions found in real-world tasks,where one must produce predictions about an unknown future or a ca for which the results are not available.
U of a reasonable sample of forecasts
The size of the validation samples should be adequate to allow inferences to be drawn.We examined the size of the validation samples ud in the classi®cation and time ries studies parately.Most of the classi®cation studies ud40or more cas to validate.Time ries studies typically ud larger samples.Most of them ud75or more forecasts in their validations.
E ectiveness of implementation
For studies that have e ectively validated the NN we asked a cond question:How well was the propod architecture implemented?While a study that su ers from poor validation is not of much u in asssing the applicability of the technique to forecasting situations,one that su ers from poor implementation might still have some value.If a method performs comparatively well, even when it has not bene®ted from the best possible implementation,there is reason to be encouraged that it will be a contender when it has.
神社是什么意思In determining the e ectiveness with which a NN had been developed and tested,we ud the guidelines for evaluating network performance suggested by Refenes(1995).Our implementation of some of the criteria(particularly that regarding stability of an implementation),varies from that of Refenes(1995).
.
Convergence:Convergence is concerned with the problem of whether the learning procedure is capable of learning the classi®cation de®ned in a data t.In evaluating this criterion, therefore,we were interested in the in-sample performance of the propod network since it determines the network's convergence capability and ts a benchmark for asssing the
citibank#1998John Wiley&Sons,Ltd.J.forecast.17,481±495(1998)
484Monica Adya and Fred Collopy
ante performance,of the network.If a study does not report in-sample performance on the network,we suggest caution in acceptance of its ex ante results.
留学黑中介.Generalization:Generalization measures the ability of NNs to recognize patterns outside the training sample.The accuracy rates achieved during the learning pha typically de®ne the bounds for generalization.If performance on a new sample is similar to that in the convergence pha,the NN is considered to have learned well.
.Stability:Stability is the consistency of results,during the validation pha,with di erent samples of data.This criterion,then,evaluates whether the NN con®guration determined during the learning phas
e and the results of the generalization pha are consistent across di erent samples of test data.Studies could demonstrate stability either through u of iterative resampling from the same data t or by using multiple samples for training and validation. The criteria are su ciently general to be applicable to any NN architecture or learning mechanism.Furthermore,they reprent a distillation of the literature's best practice.The fact that a study failed to meet the criteria is not necessarily an indictment of that study.If we wish to u empirical studies to make a ca for or against the applicability of NNs to forecasting or prediction,though,we must be able to determine which reprent good implementations for that purpo.
pptpIn summary then,studies were classi®ed as being of three types.Tho that are well imple-mented and well validated are of interest whatever their outcome.They can be ud either to argue that NNs are uful in forecasting or that they are not,depending upon outcome.The would em to be the most valuable studies.The cond type are studies which have been well validated,even though their implementation might have su ered in some respects.The are important when the technique they propo does well despite the limitations of the imple-mentation.They can be ud to argue that NNs are applicable and to establish a lower bound on their performance.Finally,there are studies that are of little interest,from the point of view of telling us about the applicability of neural nets to forecasting
and prediction.Some of the have little value becau their validation su ers.Others are e ectively validated but produce null or negative results.Since it is not possible to determine whether the negative results are becau the technique is not applicable or the result of implementation di culties,the studies have little value as forecasting studies.
RESULTS
meath note
Twenty-ven of the studies were e ectively validated.Appendix A reports our asssment of the validation e ectiveness of each of the48studies.Eleven of the studies met the criteria for both implementation and validation e ectiveness.Of the remaining37studies,16were e ectively validated but had some problems with implementation.Eleven of the reported NN perform-ance that was better than comparative models.Twenty-two(46%)studies,then,produced results that are relevant to evaluating the applicability of neural networks to forecasting and prediction problems.Table I provides a summary.
Five studies that met the criteria for e ective validation but failed to meet tho for e ective implementation produced negative or mixed results.The most common problem with the studies was their failure to report in-sample performance of the NN,making it di cult to asss the appropriateness of the NN con®guration implemented.It also makes it di cult to evaluate
#1998John Wiley&Sons,Ltd.J.forecast.17,481±495(1998)
the generalizability of the NN since there is no benchmark for comparison.Conquently,the results of the studies must be viewed with some rervation.Of the 48studies,27were e ect-ively validated.Appendix B contains the evaluation of the implementations for each of the.E ectively validated and implemented Of the eleven studies that met the criteria for both implementation and validation e ectiveness,eight were implemented in classi®cation domains such as bankruptcy prediction.The remaining three studied time-ries forecasting.Two of the eight classi®cation studies satis®ed all of the e ectiveness criteria yet failed to support their hypothes that NNs would produce superior predictions.Gorr,Nagin and Szczypula (1994)compared linear regression,stepwi polynomial regression,and a three-layer NN with a linear decision rule ud by an admissions committee for predicting student GPAs in a professional school.In a study of bankruptcy classi®cation,Udo (1993)reported that NNs performed as well as,or only slightly better than,multiple regression although this conclusion was not con®rmed by statistical tests.Wilson and Sharda (1994)and Tam and Kiang (1990,1992)developed NNs for bankruptcy classi®cation.Wilson and Sharda (1994)reported that although NNs performed better than discriminant analysis,the di erences were not always signi®cant.The authors trained and tested the network using three sample composit
ions:50%each of bankrupt and non-bankrupt ®rms,80%of non-bankrupt and 20%of bankrupt ®rms,and 90%of non-bankrupt and 10%of bankrupt ®rms.Each such sample was tested on a 50/50,80/20,and 90/10training t yielding a total of nine comparisons.The NN outperformed discriminant analysis on all but one sample combination for which performance of the methods was not statistically di erent.Tam and Kiang (1990,1992)compared the performance of NNs with multiple alternatives:regression,discriminant analysis,logistic,k Nearest Neighbour,and ID3.They reported that the NNs outperformed all comparative methods when data from one year prior to bankruptcy was ud to train the network.In instances where data for two years before bankruptcy was ud to train,discriminant analysis outperformed NNs.In both instances,a NN with one hidden layer outperformed a linear network with no hidden layers.In a similar domain,Salchenberger,Cinar and Lash (1992)and Coats and Fant (1992)ud NNs to classify a ®nancial institution as failed or not.Salchenberger et al .(1992)compared the performance of NNs with logit models.The network performed better than logit models in most instances where the training and testing sample had equal reprentation of failed or non-failed institutions.The NN outperformed logit models in a diluted sample where about 18%of the sample was comprid of failed institutions'data.Coats and Fant (1993)ud the Cascade Correlation algorithm for predicting ®nancial distress.Comparative asssments were made Table I.Relationship of e ectiveness to outcomes (number of studies)
NN better
NN wor or inconclusive Not compared Problems with validations 1137Problems only with implementation 1150No problems either criteria
无忧雅思口语机经8
30Studies in bold contribute to forecasting knowledge.#1998John Wiley &Sons,Ltd.J.forecast.17,481±495(1998)E ectiveness of Neural Networks
485
486Monica Adya and Fred Collopy
with discriminant analysis.The NN outperformed discriminant analysis on samples with large percentages of distresd®rms,but failed to do so on tho with a more equal mix of distresd and non-distresd®rms.
Refenes,Azema-Barac and Zapranis(1993)tested NNs in the domain of stock ranking. Comparisons with multiple regression indicated that the propod network gave better®tness on the test data over
multiple regression by an order of magnitude.The network outperformed regression on the validation sample by an average of36%.
Three of the eleven e ective studies compared the performance of alternative models in the prediction of time ries.Of the,one indicated mixed results in this comparison of neural networks with alternative techniques.Ho,Hsu and Young(1992)tested a propod algorithm, the Adaptive Learning Algorithm(ALA),in the domain of short-term load forecasting.The ALA automatically adapts the momentum of the training process as a function of the error. Performance of the network was compared to that of a rule-bad system and to the judgmental forecasts of the operator.Although the network performed slightly better than the rule-bad system and the operator,the Mean Absolute Errors(MAEs)were not very di erent for the three approaches and no tests were performed to determine if the results were signi®cantly better with the NN.
Foster,Collopy and Ungar(1992)compared the performance of linear regression and combining with that of NNs in the prediction of181annual and203quarterly time ries from the M-Competition(Makridakis et al.,1982).They ud one network to make direct predictions (network combining).The authors reported that while the direct network performed signi®cantly wor than the comparative methods,network combining signi®cantly outperformed both regression and simple com
bining.Interestingly,the networks became more conrvative as the forecast horizon incread or as the data became more noisy.This re¯ects the approach that an expert might take with such data.
Connor,Martin and Atlas(1994)compared the performance of various NN con®gurations in the prediction of time ries.They compared performance of recurrent and feedforward nets for power load forecasting.The recurrent net outperformed the traditional feedforward net while successfully modelling the domain with more parsimony than the competing architecture.
E ectively validated with positive results despite implementation issues
Eleven additional studies that were e ectively validated reported NN performance that was better than comparative models.Dutta et al.(1994)ud simulated data,corporate bond rating, and product purcha frequency as test beds for their implementation of a NN.NNs performed better than multiple regression on the simulated data,despite a training advantage for the regressions.In the prediction of bond rating,NNs consistently outperformed regression,while only one con®guration outperformed regression in the purcha frequency domain.
Lee and Jhee(1994)ud a NN for ARMA model identi®cation with Extended Sample Autocorrelation Function(ESACF).The NN demonstrated superior classi®cation accuracy on simulated data.The NN
was then tested on data from three prior studies where the models were identi®ed using traditional approaches.The authors report that the NN correctly identi®ed the model for US GNP,Consumer Price Index,and ca eine data.
Other studies in the domain of prediction included tho by Fletcher and Goss(1993), DeSilets et al.(1992),and Kimoto et al.(1990).Fletcher and Goss(1993)developed NNs for bankruptcy classi®cation and compared their NN with a logit model.The NN outperformed logit models,having a lower prediction error and less variance.DeSilets et al.(1992)compared
#1998John Wiley&Sons,Ltd.J.forecast.17,481±495(1998)