A decomposition–enmble model with data-characteristic-driven reconstruction for crude oil price
forecasting
Lean Yu,Zishu Wang,Ling Tang ⇑
School of Economics and Management,Beijing University of Chemical Technology,Beijing 100029,China
h i g h l i g h t s
A decomposition–enmble model is propod for crude oil price forecasting. A data-characteristic-driven reconstruction is formulated and introduced. Four steps are involved:decomposition,reconstruction,prediction and enmble. Empirical study statistically verifies the effectiveness of the propod model.
a r t i c l e i n f o Article history:
Received 8April 2015
Received in revid form 29June 2015Accepted 12July 2015
Available online 23July 2015Keywords:
Decomposition enmble model Divide and conquer Data characteristics Reconstruction
大学校庆征文Time ries analysis
Crude oil price forecasting
a b s t r a c t
To enhance prediction accuracy and reduce computation complexity,a decomposition–enmble methodology with data-characteristic-driven reconstruction is propod for crude oil price forecasting,bad on two promising principles of ‘‘divide and conquer’’and ‘‘data-characteristic-driven modeling’’.Actually,this propod model improves the existing decomposition–enmble techniques in the ‘‘divide and conquer’’framework,by formulating and incorporating a data-characteristic-driven reconstruction method bad on the ‘‘data-characteristic-driven modeling’’.Four main steps are involved in the pro-pod ,data decomposition for simplifying the complex data,component reconstruction bad on the ‘‘data-characteristic-driven modeling’’for capturing inner factors and reducing computa-tional cost,individual prediction for each reconstructed component via a certain artificial intelligence (AI)tool,and enmble prediction for final output.In the propod data-characteristic-driven reconstruc-tion,all decompod modes are thoroughly analyzed to explore the hidden data characteristics,and are accordingly reconstructed into some meaningful components.For illustration and verification,the West Texas Intermediate (WTI)and Brent crude oil spot prices are ud as the sample data,and the empirical results indicate that the propod model statistically outperforms all considered benchmark models (including popular AI single models,typical
decomposition–enmble models without reconstruction,and similar decomposition–enmble models with other existing reconstruction methods),since it has higher prediction accuracy and less computational time.
Ó2015Elvier Ltd.All rights rerved.
1.Introduction
Crude oil price forecasting has become an increasingly hot issue within the rearch fields of data analysis and prediction,due to the important role in global energy system and even in global eco-nomic system.However,crude oil price forecasting has fully been proven to be an extremely difficult task [1].On the one hand,like other commodities,crude oil price is driven by various market
,supply and demand.On the other hand,as a special energy resource,crude oil price is strongly influenced by some exogenous factors,such as irregular events [2],global economic status [3],speculation activities [4],and political and social atti-tudes [5],who effects on crude oil market are sometimes hard to quantify.Therefore,this paper focus on crude oil price fore-casting,especially to investigate the inner hidden factors and fur-ther to improve model performance in terms of prediction accuracy and time-saving.
According to existing literature,abundant time ries forecast-ing models have been propod and applied to crude oil price fore-casting,which can be generally described as [6]:
x t þh ¼f ðX t Þþe t
ð1Þ
dx.doi/10.1016/j.apenergy.2015.07.0250306-2619/Ó2015Elvier Ltd.All rights rerved.huvecs
⇑Corresponding author at:School of Economics and Management,Beijing University of Chemical Technology,15Beisanhuan East Road,Beijing 100029,China.Tel./fax:+861064412210.
E-mail address:tangling@mail.buct.edu (L.Tang).
where x t denotes the crude oil price at time t,X t={x tÀ1,x tÀ2,...x tÀl} are the history values before period t with lag l,h is the prediction horizon,and e t is the prediction errors following independent iden-tical distribution.According to different function designs f(*)and the corresponding parameter evaluation methods,the existing models for crude oil price forecasting can fall into three main types: traditional econometric models with relatively simplefixed func-tions and strict data ,stationarity and linearity) in parameter evaluation,artificial intelligence(AI)technique
s with flexible functions and powerful lf-learning capability in model training,and currently popular hybrid models combining veral single models systematically.
As for traditional econometric models,auto-regressive inte-grated moving average(ARIMA),generalized autoregressive condi-tional heteroskedasticity(GARCH),random walk(RW),vector auto-regression(VAR)and error correction models(ECM)have popularly been ud for crude oil price forecasting.For example, Xiang and Zhuang[7]utilized the ARIMA model to predict the Brent monthly crude oil prices for the sample period from November2012to April2013.Nomikos and Andriosopoulos[8] estimated the conditional mean and volatility of West Texas Intermediate(WTI)daily crude oil spot prices from December9, 2000to January2,2010,bad on the GARCH family models.The RW,one basic time ries model,can also be employed as a bench-mark model to forecast oil price movements[9].Mirmirani and Li [10]ud the VAR model to predict the US monthly oil price cover-ing the period from January1980to November2002.Lanza et al.
[11]ud the ECM model to predict the WTI and Brent weekly crude oil prices with the sample period from1994to2002.
As for the AI techniques,artificial neural networks(ANN),sup-port vector regression(SVR)and least su
pport vector regression (LSSVR)might be the most predominant models for crude oil price forecasting,and the empirical investigations have repeatedly shown their superiority over the traditional linear models.As for the ANN,Movagharnejad et al.[12]introduced the ANN to forecast the quantitative data of crude oil prices over the period from January2000to April2010.Chiroma et al.[13]prented an evo-lutionary neural network to predict the WTI monthly crude oil price data from May1987to December2011.As for the SVR,Xie et al.[14]compared the SVR with the ARIMA and back-propagation neural network(BPNN),and witnesd the supe-riority of the SVR in the prediction for the WTI monthly prices from January1970to December2003.Khashman and Nwulu[15]pre-dicted the WTI weekly spot crude oil prices from January03, 1986to December25,2009,bad on the SVR model.Similarly, Li and Ge[16]predicted the crude oil prices from May1994to December1995bad on a e-SVR model with dynamic errors cor-rection.As for the LSSVR,Li et al.[17]predicted the WTI weekly data from January4,2008to October18,2013,and argued that the LSSVR outperformed the ARIMA,SVR and BPNN models. However,the AI models have their own ,param-eter nsitiveness and potential over-fitting[18].
As for hybrid models,under the promising concept of‘‘divide and conquer’’(or‘‘decomposition and enmble’’)[19],a ries of decomposition–enmble learning paradigms have currently been develo
ped and become a predominant type for crude oil price anal-ysis and forecasting.In a typical decomposition–enmble model, three main steps are ,data decomposition for simpli-fying the complex data,individual prediction for each decompod mode,and enmble prediction forfinal prediction result [18,20,21].Existing studies have demonstrated that decomposi-tion–enmble models,in the effective framework of‘‘divide and conquer’’,can provide satisfactory results for both capturing inner factors and enhancing prediction accuracy.Some current studies on the crude oil price forecasting via decomposition–enmble models can be listed as below.Yu et al.[20]introduced empirical mode decomposition(EMD)to decompo the original data of the WTI daily crude oil spot prices from January1,1986to September30,2006and the Brent daily crude oil spot prices form May20,1987to September30,2006,and produced better predic-tion results.Tang et al.[22]formulated a novel decomposition–enmble learning paradigm for crude oil price forecasting by uti-lizing the data decomposition tool of complementary enmble EMD(CEEMD),and the results supported the efficiency of the decomposition strategy in improving model performance.Yu et al.[1]propod a compresd nsing bad AI learning para-digm for daily crude oil price forecasting and achieved a similar conclusion,with the sample period from January3,2011to July 17,2013.Yu et al.[23]propod a novel learning paradigm bad on enmble EMD(EEMD)and extended extreme learning machine (EELM),to predict the WTI daily crude oil prices from January2, 1986to October21,
2013.All above empirical studies showed that the decomposition–enmble strategy can significantly improve the model performance in crude oil price prediction.
However,even though the decomposition–enmble models can effectively model the complex data of crude oil price compared with single models,another important issue may ari concerning the model complexity and computational cost.Since decomposi-tion–enmble models decompo the original data into a ries of modes,modeling all decompod modes might be a quite time-consuming process in the step of individual prediction,even sometimes leading to a poorfinal result since the estimation errors for all modes can be accumulated in the enmble prediction step. To address this problem,an additional step of component recon-struction has been introduced into the typical decomposition–enmble models,between the steps of data decomposition and individual prediction.In component reconstruction,the decom-pod modes obtained from the data decomposition step are reconstructed into some certain components for further analysis in the next step of individual prediction.
大拌菜都有哪些菜Accordingly,some modified decomposition–enmble models with reconstruction have currently been developed and shown effective in understanding inner factors,enhancing prediction accuracy and reducing computational cost.For example,Wang et al.[24]implemented the run-length-judgment
method to recon-struct the specific modes decompod by EMD into high frequency, medium frequency,low frequency and trend quences,and the empirical analysis showed that the novel decomposition–enmble
Nomenclature
AI artificial intelligence
ANN artificial neural network
EEMD enmble empirical mode decomposition EMD empirical mode decomposition
FNN feed-forward neural network FFT fast Fourier transform
ICSS iterative cumulative sums of squares IMF intrinsic mode function
LSSVR least squares support vector regression SVR support vector regression
252L.Yu et al./Applied Energy156(2015)251–267
model with reconstruction outperformed its original form without reconstruction in crude oil price pred
iction.Yan et al.[25]intro-duced thefine-to-coar grouping as the reconstruction method to develop a novel decomposition–enmble paradigm for uranium resource price forecasting.Zhang et al.[26]for wind speed forecast-ing,reconstructed new components bad on sample entropy mea-surement.However,the above studies conducted the component reconstruction according to only one certain ,fre-quency[24],average[25]or complexity[26],while neglecting other data characteristics.To effectively capture the meaningful components hidden in the data dynamics,a comprehensive data analysis for all decompod modes is strongly recommended in component reconstruction.
Under such a background,this paper aims to improve existing decomposition–enmble models,especially by formulating a component reconstruction bad on data characteristics and exploring the reconstruction rules.In particular,an interesting concept of‘‘data-characteristic-driven modeling’’[6]is strongly recommended to help reconstruct the decompod modes into meaningful components bad on their own data characteristics hidden in the data itlf.According to the‘‘data-characteristic-dri ven modeling’’idea,the data characteristics of all decompod modes should be carefully investigated when grouping them into meaningful components.Actually,such a promising principle of‘‘data-characteristic-driven modeling’’has already been utilized to formulate some powerfu
l forecasting learning paradigms and sig-nificantly improved model performance.For example,Tang et al.
[6]propod a data-characteristic-driven modeling methodology for nuclear energy consumption forecasting,where forecasting methods were carefully designed according to the data character-istics of the obrved data.Tang et al.[27]developed a novel mode-characteristic-bad decomposition–enmble model for nuclear energy consumption forecasting,by using the‘‘data-charac teristic-driven modeling’’to predict the decompod modes.Wang et al.[28]considered the asonality data characteristic of hydro-power consumption data and accordingly propod a asonal decomposition bad enmble forecasting approach.
Notable,multi-step-ahead predictions with different horizons h (e Eq.(1))are also considered in this paper[22–23,29–32].First, multi-step-ahead predictions can effectively capture the dynamic behavior of crude oil price in the future,which helps the practition-ers and government agencies make and modify various decisions for different periods.Second,the robustness of the propod model can be thoroughly verified by checking whether it can perform well at different horizons.Therefore,multi-step-ahead predictions were usually conducted in the existing studies for crude oil price fore-casting.For example,Tang et al.[22]conducted1-to 5-day-ahead predictions for crude oil price forecas
ting,Yu et al.
[23]for1-,3-and6-day-ahead predictions,Fan et al.[29]for 22-day-head prediction,Ye et al.[30]for1-to3-month-ahead pre-diction,Jammazi and Aloui[31]for2-,3-and4-month-ahead pre-dictions,Xiong et al.[32]for4-,8-,12-,16-,20-and 24-week-ahead predictions.Therefore,multi-step-ahead predic-tions at the horizons of one,two and four weeks are performed in this study to test the robustness of the propod model.
Generally speaking,this paper tends to propo a decomposi-tion–enmble methodology with data-characteristic-driven reconstruction for crude oil price forecasting,by coupling the two promising principles of‘‘divide and conquer’’and‘‘data-charac teristic-driven modeling’’.Actually,this propod model improves the existing powerful decomposition–enmble techniques in the effective‘‘divide and conquer’’framework,by formulating and introducing a data-characteristic-driven reconstruction. Accordingly,four main steps are involved in the propod ,data decomposition for simplifying the complex data, component reconstruction bad on the‘‘data-characteristic-dri ven modeling’’for capturing inner factors and reducing computa-tional cost,individual prediction for each reconstructed compo-nent,and enmble prediction forfinal prediction results.In data decomposition,the effective data decomposition tool of enmble EMD(EEMD)is utilized to decompo
the complex data of crude oil price into a ries of modes.In component reconstruction,the interesting‘‘data-characteristic-driven modeling’’concept is espe-cially introduced,in which all decompod modes are analyzed in terms of data characteristics and are further reconstructed into some meaningful components.In individual prediction,one certain AI ,the ANN or LSSVR,is employed to model all reconstructed components.In enmble prediction,all predicted values are aggregated intofinal forecasting results.For illustration and verification,the WTI and Brent crude oil spot prices are ud as the sample data,and the most popular AI single models of ANN and LSSVR,typical decomposition–enmble models(without recon-struction),and similar decomposition–enmble models(with other existing reconstruction methods)are introduced as the benchmark models for comparison purpo.
The main aim of this paper is to propo a decomposition–enmble methodology with data-characteristic-driven recon-struction for crude oil price forecasting,and to compare it with other forecasting techniques(including popular AI single models, typical decomposition–enmble models without reconstruction, and similar decomposition–enmble models with other existing reconstruction strategies).The remainder of this paper is organized as follows.Section2describes the formulation process of the pro-pod method.The empirical results are reported and further dis-cusd in Section3.Section4concludes the paper and outlines the further rearch directions.
2.Methodology formulation
This ction prents a decomposition–enmble methodology with data-characteristic-driven reconstruction for crude oil price ,the data-characteristic-driven reconstruction bad decomposition–enmble model.In particular,Section2.1 provides an overview of the propod methodology,and Sections 2.2–2.5respectively describe the four main steps of the model, together with the related techniques.
2.1.Overview of the propod methodology
To enhance prediction accuracy and reduce computation com-plexity,a decomposition–enmble methodology with data-characteristic-driven reconstruction is propod for crude oil price forecasting,bad on two promising principles of‘‘divide and conquer’’and‘‘data-characteristic-driven modeling’’.Actually,this propod model improves the existing decomposition–enmble techniques in the‘‘divide and conquer’’framework[18–20],by incorporating a data-characteristic-driven reconstruction bad on the‘‘data-characteristic-driven modeling’’concept[6,27,33],i.e., the data-characteristic-driven reconstruction bad decomposi-tion–enmble model.
Since the original data are decompod into a ries of modes in decomposition–enmble models,
modeling all decompod modes might be a quite time-consuming process in the step of individual prediction,even sometimes leading to a poorfinal result since the estimation errors for modes can be accumulated in the enmble prediction step.To address this problem,an additional step of component reconstruction is introduced into the original decomposition–enmble model.Four main steps are accordingly involved in the propod ,data decomposition via EEMD for simplifying the complex data,component reconstruc-tion bad on the‘‘data-characteristic-driven modeling’’for both capturing inner factors and reducing computational cost,
L.Yu et al./Applied Energy156(2015)251–267253
individual prediction for each component via a certain powerful AI tool,and enmble prediction via simple addition(ADD)approach forfinal prediction results.Accordingly,the data-characteristic-driven reconstruction bad decomposition–enmble methodol-ogy can be formulated,as illustrated in Fig.1.
Step1:Data decomposition
An efficient data decomposition ,the EEMD,is uti-lized to decompo the original time ries x t;ðt¼1;2;...;TÞinto N modes,including NÀ1intrinsic mode functions(IMFs),c tðiÞ, (i=1,2,...,NÀ
1),and one residue r t.
Step2:Component reconstruction using‘‘data-characteris tic-driven modeling’’
Bad on the principle of‘‘data-characteristic-driven modeling’’, a data-characteristic-driven reconstruction method is propod, including two sub-steps:all decompod modes are thoroughly analyzed to capture the key data characteristics,and are further reconstructed into some certain meaningful components, d tðjÞ;ðj¼1;2;...;KÞ,according to their own different data charac-teristics.Specially,the propod data-characteristic-driven recon-struction approach is formulated through thoroughly investigating the relationship between data characteristics and reconstruction rules,as discusd in Section2.3.1.
Step3:Individual prediction
In the third step,one certain powerful AI ,the ANN or LSSVR,is employed as the forecasting tool to model the reconstructed components.Accordingly,the prediction results ^d
t
ðjÞfor the corresponding component d tðjÞcan be obtained.
Step4:Enmble prediction
A simple but effective enmble ,the ADD,is per-formed as the enmble tool,since the original ries data are decompod into a linear expansion of modes which are further aggregated into ,x t¼
P NÀ1
i¼1
c tðiÞþr t¼
P K
j¼1
d tðjÞ. In particular,all predicted components ar芥末木耳
e simply added up into thefinal prediction for the original ,^x t¼
P K
j¼1
^d
t
ðjÞ.
Sections2.2–2.5respectively give details for the above four main steps of the propod methodology,together with the related techniques.
2.2.Data decomposition
In data decomposition,an effective data decomposition tool,the EEMD,is employed.While other traditional decomposition tech-niques hold their respective data assumptions which might contra-dict with the reality,the EMD family are empirical,intuitive,direct and lf-adaptive data processing approaches which can effectively capture various hidden patterns in the complex data systems even without a priori knowledge.For example,the statistical time-domain analysis ,exponential moving average (EMA)and the X12-ARIMA)are performed under the data assump-tions of stationarity and linearity[34],traditional time–frequency analysis ,Fourie
r transformation)are especially for smooth cyclical signals[35],and wavelet analysis method can effective capture the transient actions within singles under八七福利电影
the Fig.1.Framework of the propod data-characteristic-driven reconstruction bad decomposition–enmble model.
data assumptions of nonstationary but linearity[36].Due to the merits offlexibility and lf-adaptivity,the EMD family,especially the EEMD successfully addressing the disadvantage of mode mix-ing in the EMD,have widely been applied to various nonstationary and complex data analysis[20].
The EMD algorithmfirst propod by Huang et al.[37]us the Hilbert–Huang transform(HHT)to decompo the original ries data into some independent and nearly periodic IMFs bad on local characteristic scales,which meet the following two condi-tions.First,the numbers of ,both maxima and min-ima)and zero crossings should be equal or different at the most by one in each function.Second,the functions are symmetric with respect to local zero mean.The detailed decomposition process of EMD can be referred to Ref.[37].Finally,the EMD decompos the original data ries,x t,(t=1,2,...,T)into a linear expansion of NÀ1IMFs c tðiÞ;ði¼1;2;...;NÀ1Þand one residue r n;t:
x t¼
X NÀ1
i¼1
c tðiÞþr tð2Þ
In practice,the total number of the IMFs can be t to log2T,where T is the length of sample ries[38].
To overcome the shortcoming of the ,potential mode mixing,Wu and Huang[38]developed the EEMD technique,by adding white noi to the original data before decomposition. Such white noi can help extract the true IMFs,and offt itlf via enmble averaging according to the well-established statisti-cal rule[38]:
e0¼
e
ffiffiffiffiffiffi
NE
pð3Þ
where e is the amplitude of the white noi,e0is the standard devi-ation of thefinal errors,and NE is the number of enmble members.
椰子汁的功效与禁忌2.3.Component reconstruction
Bad on the‘‘data-characteristic-driven modeling’’idea,a data-characteristic-driven reconstruction method is propod in this subction.Two sub-steps are involved.First,all decompod modes obtained from the previous step are thoroughly analyzed to capture the key hidden data characteristics.Second,they are fur-ther reconstructed into some meaningful components,according to their own data characteristics.In particular,Section2.3.1first thoroughly investigates the relationship between data characteris-tics and reconstruction rules,to formulate the data-characteristic-driven reconstruction,and Sections2.3.2and 2.3.3describe the corresponding data characteristic testing tech-niques for time ries data.
2.3.1.Data characteristics and reconstruction rules
Data characteristics of time ries data can generally fall into two main ,nature charact
eristics and pattern charac-teristics[33].Nature and pattern characteristics explore time r-ies data from distinct perspectives.Nature characteristics directly investigate the data dynamics from a whole system perspective, amongst which the complexity characteristic can be considered as one important criterion in component reconstruction[26]. Specifically,complexity covers various nonlinear characteristics, e.g.,chaoticity,fractality,irregularity and long-range memorabil-ity.Pattern characteristics aim to discover the main hidden factors driving the data dynamics,involving cyclicity,asonality,salta-tion(or mutability)and so on[6,27].
The term complexity describes an intermediate state between completely regular process and completely random process[33]. Generally,a lower-level complexity indicates that the obrved ries data is more likely to follow a deterministic process which can befinely captured and predicted,while a higher-level com-plexity reprents less regular rules controlling the ries data which might be otherwi more unpredictable and difficult to be understood[6,27].In component reconstruction,a high-level com-plexity accordingly implies that the target mode might follow an irregular process which is difficult to be modeled,and it is strongly recommended to consider such mode as one component other than to combine it with other modes,to avoid amplifying predic-tion errors.
Since the EEMD tends to decompo the original data into some certain meaningful modes,pattern characteristics can effectively help discover the main driving factors(or economic meanings)of different modes.Typically,the main factors hidden in economic data can be referred to cyclical(including asonal)patterns,muta-ble patterns(sudden changes by extreme events),and central trend[24,39].Cyclical(or asonal)pattern,which returns to the beginning and repeats itlf in the same quence with peaks and troughs,might be the most important factor corresponding to certain unique hidden rules of the obrved data dynamics. Therefore,the mode with the main pattern characteristic of cyclic-ity(or asonality)is strongly recommended to be left alone as one component,to avoid cycle mixing.Mutable pattern might corre-spond to sudden changes stemming from some certain emergency events,such as economic cris,political changes and social insta-bility[39].Even with sudden changes,mutable pattern is compar-atively smooth and predictable in the short-term[6].Thus,the modes with the main pattern characteristic of mutability once at low level of complexity can be grouped together,reprenting the effects of extreme events.Central trend discovers the long-term tendency of the original data,often accompanied by low-level complexity[27],and they can be incorporated into other similar modes,to reduce computational cost even with little nega-tive impact on prediction accuracy.
According to the above discussions,the data-characteristic-driven reconstruction approach can be formulated,and the rela-tionship between data characteristics and reconstruction rules is reported in Table1.
As can be en from Table1,there are three main conclusions about data-characteristics-bad reconstruction.(1)As for the mode with high-level complexity,it would be better to consider such an irregular mode as one component,to avoid amplifying pre-diction errors.(2)As for the mode with the main pattern character-istic of cyclicality(or asonality),it is strongly recommended to
Table1
The data-characteristic-driven reconstruction approach.
Data
characteristics
Implications Reconstruction rules
Complexity
High The mode might be difficult to
be modeled
Consider it as one
component,to avoid
amplifying prediction
errors
Low The mode can befinelydota2卡尔
captured and predicted
Combine it with other
similar ones,to reduce
computational cost Main pattern characteristic恭喜别人生女儿的祝福语
Cyclicity The mode mainly corresponds
to a cyclical(or asonal)
factor
Consider it as one
component,to avoid cycle
mixing
Mutability The mode mainly corresponds
to the effect of some certain
extreme events
Combine it with other
similar ones,to reduce
computational cost Tendency The mode mainly corresponds
to a long-term tendency
Combine it with other
similar ones,to reduce
computational cost
L.Yu et al./Applied Energy156(2015)251–267255