Deep neural networks:A promising tool for fault characteristic
mining and intelligent diagnosis of rotating machinery
with massive data
Feng Jia,Yaguo Lei n,Jing Lin,Xin Zhou,Na Lu
State Key Laboratory for Manufacturing Systems Engineering,Xi’an Jiaotong University,No.28Xianning West Road,Xi'an710049,China
a r t i c l e i n f o
Article history:
Received28January2015
Received in revid form
剑桥商务英语考试
5September2015
Accepted26October2015
Available online18November2015
Keywords:
quiteafew
Deep learning
Deep neural networks
Intelligent fault diagnosis
Rotating machinery
Massive data
diagnosis results,
numerous studies have
金俊秀uncommittedbeen conducted on intelligent fault diagnosis of
rotating machinery.Among the studies,the methods bad on artificial neural networks
(ANNs)are commonly ud,which employ signal processing techniques for extracting
features and further input the features to ANNs for classifying faults.Though the
methods did work in intelligent machinery,they still have two
deficiencies.(1)The features on much prior knowledge
about signal processing In addition,the manual
are to a specific diagnosis for
issues.(2)adopted in the methods have which
the capacity to learn the complex diag-
nosis issues.As a breakthrough in artificial intelligence,deep learning holds the potential
to overcome the aforementioned deficiencies.Through deep learning,deep neural net-
works(DNNs)with deep instead of shallow ones,could be established to
mine the uful information data and approximate complex non-linear func-
tions.Bad on DNNs,a novel method is propod in this paper to overcome
the deficiencies of the aforementioned intelligent diagnosis methods.The effectiveness of
the propod method is validated using datats from rolling element bearings and pla-
netary gearboxes.The datats contain massive measured signals involving different
health conditions under various operating conditions.The diagnosis results show that the
propod method is able to not only adaptively mine available fault characteristics from
the measured also obtain superior diagnosis accuracy compared with the
existing methods.
&2015Elvier Ltd.All rights rerved.
1.Introduction
In order to fully health conditions of rotating machinery,condition monitoring systems are ud to collect real-time data and therefore massive data are acquired after long time operation of the machines[1].As the data is generally collected faster than diagnosticians can analyze it[2],there is an urgent need for diagnosis methods that
Contents lists available at ScienceDirect
journal homepage:/locate/ymssp
怎么样克服紧张Mechanical Systems and Signal Processing
dx.doi/10.ssp.2015.10.025
0888-3270/&2015Elvier Ltd.All rights rerved.
n Corresponding author.
E-mail address:yaguolei@mail.xjtu.edu(Y.Lei).
Mechanical Systems and Signal Processing72-73(2016)303–315
can effectively analyze massive data and automatically provide accurate diagnosis results.This kind of methods is called intelligent fault diagnosis methods,in which arti ficial intelligence techniques,such as arti ficial neural networks (ANNs),support vector machine (SVM),fuzzy inference,etc.,are ud for distinguishing machinery health conditions [3–5].Bad on the results produced by the intelligent diagnosis methods,it is possible to take appropriate maintenance actions and ensure healthy operation of the machines [6].Correspondingly,intelligent fault diagnosis methods have been widely investigated and applied in the field of fault diagnosis of rotating machinery [7].Samanta [8]extracted time-domain features and employed three optimized neural networks to detect pump faults.In addition,Samanta et al.[9]utilized time-domain features to characterize the bearing health conditions and employed ANNs and SVM to diagno faults of bearings.Statistical features were extracted by Tran et al.[10]for reprenting the health conditions of induction motor and then decision tree and adaptive neuro-fuzzy inference system (ANFIS)were utilized for distinguishing the faults.Moreover,Tran et al.[11]calculated features from thermal imaging bad on bi-dimensional empirical mode decomposition,and then input lected features into relevance vector machine (RVM)for fault classi fication.Two features were propod by Lei et al.[12]to characterize health conditions of planetary gearboxes and ANFIS was applied to recognize the health conditions.Widodo et al.[13]calculated statistical features from the measured signals and carried out
RVM and SVM to diagno the bearing faults.Lai et al.[14]introduced cumulants as input features and ud radial basis function network as the fault classi fier.A method was prented by Bin et al.[15],utilizing wavelet packets-empirical mode decomposition for feature extraction and multi-layer perceptron network for fault classi fication.
Through the literature review,we notice that ANNs are one of the most commonly ud classi fiers in the intelligent fault diagnosis methods,which generally include two main fault feature extraction using signal processing techniques and fault classi fication using ANN classi fiers.Feature extraction involves mapping of measured signals onto reprentative features characterizing the health conditions of machinery.And fault classi fication is to distinguish the health conditions bad on the extracted features.Thanks to the reprentative features from the measured signals and adaptive learning capability of ANNs,the ANN-bad methods are suppod to displace diagnosticians for making decisions and work well in intelligent fault diagnosis [7].The ANN-bad methods reported in literature,however,have two obvious de ficiencies:
(1)The features input into classi fiers are extracted and lected by diagnosticians from the measured signals,largely depending on prior knowledge about signal processing techniques and diagnostic experti.In addition,the features are lected according to a speci fic diagnosis issue and
probably unsuitable for other issues.Thus it is necessary to adaptively mine the characteristics hidden in the measured signals to re flect the different health conditions of machinery,instead of extracting and lecting features manually.(2)The ANNs commonly adopted in intelligent fault diagnosis of rotating machinery have shallow architectures,which means that only one hidden layer is included in an ANN architecture,like the ANNs in Refs.[8,9,14,15].Such simple architectures limit the capacity of ANNs to learn the complex non-linear relationships in fault diagnosis issues.Thus it is necessary to establish a deep architecture network for distinguishing the health con-ditions of machinery.
Deep learning [16]holds the potential to overcome the aforementioned de ficiencies in current intelligent diagnosis methods.It refers to a class of machine learning techniques,where many layers of information processing stages in deep architectures are exploited for pattern classi fication and other tasks [17].Using deep learning,deep neural networks (DNNs)with deep architectures can be established.Due to the deep architectures,DNNs are able to adaptively capture the reprentative information from raw data through multiple non-linear transformations and approximate complex non-linear functions with a small error.Since the idea of deep learning appeared in Science ,it has attracted lot of attention from rearchers in different fields [18].Dahl et al.[19]propod a pre-trained
deep neural network hidden Markov model for large-vocabulary speech recognition and obtained an accuracy improvement compared with traditional models.Krizhevsky et al.[20]developed a DNN-bad method in large scale visual recognition challenge involving millions of labeled images,and got the best result.Deep learning methods were utilized by Baldi et al.[21]to arch for exotic particles in high-energy physics and the results demonstrated that the methods can improve the arching ability of collider.The aforementioned applications prove that deep learning is a promising tool in dealing with massive data.But it attracts few attentions in the field of fault diagnosis.Bad on Teager –Kair energy operator and deep belief network trained by deep learning,Tran et al.[22]propod a new method for diagnosing faults of reciprocating compressor valves.In this method,they treated deep belief network as a classi fier and still manually extracted features to input the classi fier,which ignored the ability of the network in mining fault characteristics.
Bad on DNNs trained through deep learning,this paper propos a novel intelligent diagnosis method to overcome the two de ficiencies of the ANN-bad methods in fault diagnosis of rotating machinery.In this method,DNNs are utilized to implement both fault feature extraction and intelligent diagnosis.The DNNs are first pre-trained by an unsupervid layer-by-layer learning and then fine-tuned with a supervid algorithm,where the unsupervid process helps the fault characteristic mini
ng and the supervid process contributes to construct the discriminative fault characteristics for classi fication [23].The merits of the pro-pod method are summarized as follows.(1)It is able to adaptively mine fault characteristics from the measured signals for various diagnosis issues.(2)The method is good at establishing the non-linear mapping relationship between the different health conditions of machinery and the corresponding measured signals.Therefore,the propod method is expected to obtain higher diagnosis accuracy compared with the methods bad on shallow ANNs.The rest of this paper is organized as follows.Section 2brie fly introduces the theoretical background of DNNs.Section 3is dedicated to a description of the propod intelligent diagnosis method.In Section 4,the effectiveness of the propod method is validated using four rolling element bearing datats and a planetary gearbox datat.The bearing datats contain thousands of signals with different fault categories and verities under various operating loads.And the gearbox datat includes tens of thousands of signals with different fault modes and locations
F.Jia et al./Mechanical Systems and Signal Processing 72-73(2016)303–315
304
under various operating conditions,like different rotating speeds and loads.In addition,the propod
广州在职研究生method is compared with veral intelligent methods using the same bearing datats in this ction.Conclusions are drawn in Section 5.
2.A brief introduction to DNNs
DNNs have deep architectures containing multiple hidden layers and each hidden layer conducts a non-linear trans-formation from the previous layer to next one [18,24].Through deep learning addresd by Hinton et al.[16],DNNs are trained according to the following two main procedures:(1)Pre-train the DNNs layer by layer with unsupervid techni-ques,like autoencoders.(2)Further fine-tune the DNNs with back propagation (BP)algorithm for classi fication.
2.1.Autoencoders
An autoencoder is one type of unsupervid neural networks with three layers [24,25]and the output target of the autoencoder is the input data.As depicted in Fig.1,the autoencoder compris two ,encoder network and decoder network.The encoder network transforms the input data from a high-dimensional space into codes in a low-dimensional space and the decoder network reconstructs the inputs from the corresponding codes.
The encoder network is explicitly de fined as an encoding function denoted by f θ[24].This function is called the encoder.For each measured signal x m from a datat f x m g M m ¼1of rotating machinery,we de fine
h m ¼f θðx m Þ
pastimeð1Þwhere h m is the encode vector obtained from x m .The decoder network is de fined as a reconstruction function denoted by g θ0,namely the decoder.It maps h m from the low-dimensional space back into the high-dimensional space,producing a reconstruction
^x m ¼g θ0ðh m Þð2ÞThe parameter ts of the encoder and decoder are learned simultaneously on the task of reconstructing as well as
possible the original input,attempting to incur the lowest possible reconstruction error L ðx ;^x
Þover the M training examples,where L ðx ;^x Þis a loss function that measures the discrepancy between x and ^x [24].In summary,the autoencoder training aims to find the parameter ts Q and θ0minimizing reconstruction error:
ϕAE ðθ;θ0Þ¼1M X M m ¼1L ðx m ;g θ0ðf θðx m ÞÞÞð3Þ
The commonly ud forms for the encoder and decoder are af fine mappings [26],optionally followed by a non-linearity:
f θðx Þ¼s f ðWx þb Þ
ð4Þg θ0ðx Þ¼s g ðW T x þd Þ
ð5ÞL ðx ;^x Þ¼‖x À^x ‖2ð6Þ
where s f and s g are the encoder and decoder activation functions,respectively.Thus,the parameter ts of the autoencoder are θ¼f W ;b g and θ0¼f W T ;d g ,where b and d are bias vectors,and W and W T are the weight matrices.
Decoder Encoder
Input data Input data reconstruction
m x
2ˆm n x m x 2m x 1m n x
ˆm x
1ˆFig.1.Architectural graph of an autoencoder.
F.Jia et al./Mechanical Systems and Signal Processing 72-73(2016)303–315305
2.2.Pre-training andfine-tuning
N autoencoders could be stacked to pre-train an N-hidden-layer DNN.When given input signal x m,the input layer and thefirst hidden layer of the DNN are regarded as the encoder network of thefirst autoencoder.After thefirst autoencoder is trained through minimizing the reconstruction error in Eq.(3),the trained parameter tθ1of the encoder network is ud to initialize thefirst hidden layer of the DNN.And thefirst encode vector h m
1
of the x m is calculated as follows:
mpih m 1¼fθ
1
ðx mÞð7Þ
Then the encode vector h m
spooky
1
is the input data,thefirst hidden layer and the cond hidden layer of the DNN are regarded
as the encoder network of the cond autoencoder.Correspondingly,the cond hidden layer of the DNN is initialized by the cond trained autoencoder.The process is conducted in the quence until the N th autoencoder is trained for initializing
thefinal hidden layer of the DNN.And the N th encode vector h m
N
of the x m is calculated as
h m N¼fθ
N ðh m
NÀ1
Þð8Þ
whereθN is the parameter t of the N th autoencoder.
In this way,through training N stacked autoencoders,all the hidden layers of the DNN are pre-trained.This pre-training process is proven to yield significantly better local minima than random initialization of the DNN and helps achieve better generalization in classification tasks[26,27],as well as in fault diagnosis of rotating machinery.
After the DNN is pre-trained,fine-tuning process is utilized in next step of the DNN training.The output layer of the DNN is employed to contain the output targets for classification tasks.The output of the DNN calculated from the input signal x m is
y m¼fθ
Nþ1ðh m
N
Þð9Þ
Fig.2.Flowchart of the propod method.
F.Jia et al./Mechanical Systems and Signal Processing72-73(2016)303–315
306
where θN þ1is the parameter t of output layer.In order to approximate the output target properly,BP algorithm is utilized to minimize the error of the output by adjusting the parameters in the DNN backwards [28].Supposing that the output
target of the x m is d m ,the error criterion is described as
ϕDNN ðΘÞ¼1X m L ðy m ;d m Þð10Þ
where Θ¼f θ1;θ2;⋯;θN þ1g .The parameter t Θcan be updated as follows.
Θ¼ΘÀη∂ϕDNN
ðΘÞ∂Θð11Þ
where ηis the learning rate of the fine-tuning process,which is introduced to guarantee a convergenc
e in the update procedure [29].
3.DNN-bad intelligent diagnosis method
Bad on DNNs,this study propos a novel intelligent fault diagnosis method that adaptively mines the fault char-acteristics from raw signals of rotating machinery and automatically classi fies machinery health conditions with the fault characteristics.The raw signals refer to the measured si
gnals in the frequency ,frequency spectra.And the main reason of using frequency spectra is that the frequency spectra of rotating machinery show how their constitutive com-ponents are distributed with discrete frequencies and may provide clear information about the health conditions of rotating machinery [30].
As shown in Fig.2,the propod method includes the following four procedures:(1)Obtain the frequency spectra of rotating machinery under different health conditions.The spectra compri the training t x i ;d i n o M i ¼1
,where x i is the i th frequency spectrum for training,d i is the health condition label of the x i
and M is the number of the frequency spectra.(2)Build a DNN with multiple hidden layers,in which the number of the input units is the dimension of the frequency spectrum x i .Then utilize the unlabeled training t x ¼x i ÈÉM i ¼1to pre-train the DNN layer by layer with a stack of autoencoders,where the number of autoencoders refers to the number of hidden layers inside the DNN.The process is speci fically displayed in Fig.3.Firstly,regard first hidden layer of the DNN as hidden layer of the first autoencoder and utilize the unlabeled training t x as input data and output target to train the first autoencoder,as shown in Fig.3(a).The trained parameters f W 1;b 1g of the aut
kassoencoder are ud to initialize the parameters of the first hidden layer of the DNN,and h 1is the encode vector computed from frequency spectra of rotating machinery by the first autoencoder.Then,u h 1as the inputs and outputs to train the cond autoencoder for initializing parameters of the cond hidden layer of the DNN,and obtain h 2in Fig.3(b).Finally,continue the training steps in the quence until the N th autoencoder is trained and the frequency spectra are coded into h N in Fig.3(c).In this way,all of the hidden layers of the DNN are pre-trained.(3)Determine the dimension of the output layer according to the number of the machinery health conditions.And implement the BP algorithm to fine-tune the parameters of the DNN through minimizing the error between the Inputs HL 1校园女生发型
HL 2
HL N
HL N -1Outputs The cond step: N th step:
The first step:
}
1
−N h N h Fig.3.Diagram of illustrating the pre-training process (HL is short for hidden layer):(a)train the first autoencoder of the DNN,(b)train the cond autoencoder and (c)train the N th autoencoder.F.Jia et al./Mechanical Systems and Signal Processing 72-73(2016)303–315307