Kesci：Tensorflow实现LSTM——时间序列预测（超详细）

更新时间:2023-06-17 04:51:34 阅读：评论：0

云脑项⽬3 -真实业界数据的时间序列预测挑战

这篇⽂章将讲解如何使⽤lstm进⾏时间序列⽅⾯的预测，重点讲lstm的应⽤，原理部分可参考以下两篇⽂章：

编程环境：python3.7，tensorflow 1.14

本⽂所⽤的数据集来⾃于kesci平台，由云脑机器学习实战训练营提供：

本项⽬的⽬标是建⽴内部与外部特征结合的多时序协同预测系统。数据集采⽤来⾃业界多组相关时间序列（约40组）与外部特征时间序列（约5组）。课题通过进⾏数据探索，特征⼯程，传统时序模型探索，机器学习模型探索，深度学习模型探索（RNN，LSTM等），算法结合，结果分析等步骤来学习时序预测问题的分析⽅法与实战流程。

# 加载数据分析常⽤库

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

% matplotlib inline

import os

import tensorflow as tf

1 数据导⼊

# ⽂件路径

trian_path ='../input/industry/industry_timeries/timeries_train_data/'

test_path ='../input/industry/industry_timeries/timeries_predict_data/'

1.1 先看⼀下数据⼤概长什么样

!head -n 2../input/industry/industry_timeries/timeries_train_data/11.csv

!head -n 2../input/industry/industry_timeries/timeries_predict_data/11.csv

2015,2,1,1.900000,-0.400000,0.787500,75.000000,814.155800

2015,2,2,6.200000,-3.900000,1.762500,77.250000,704.251112

2016,9,1,31.900000,20.400000,26.237500,65.500000

2016,9,2,34.300000,19.300000,26.200000,67.750000

1.2 数据格式说明

训练数据有8列：

高中力学

⽇期 - 年: int

⽇期 - ⽉: int

⽇期 - ⽇: int，时间跨度为2015年2⽉1⽇ - 2016年8⽉31⽇

当⽇最⾼⽓温 - 摄⽒度（下同）: float

当⽇最低⽓温: float

当⽇平均⽓温: float

当⽇平均湿度: float

输出 - float

预测数据没有输出部分，其他与预测⼀样。时间跨度为2016年9⽉1⽇ - 2016年11⽉30⽇

训练与预测都各⾃包含46组数据，每组数据代表不同数据源，组之间的温度与湿度信息⼀样⽽输出不同.

1.3 预期⽬标

对于训练集和测试集内部⽽⾔，表格的时间、温度以及湿度都是相同的，唯⼀不同的就是target输出，也就是说我们可以⽤这些属性来对不同的输出做预测

对于训练集和测试集的任⼀个表格⽽⾔，它的时间跨度都是⼀样的，训练集是从2015年2⽉1⽇ - 2016年8⽉31⽇，测试集是从2016年9⽉1⽇ - 2016年11⽉30⽇

我们的⽬的就是要⽤训练集中的数据去学习模型，然后⽤模型去预测同名测试集中的target

可以假设有关联进⾏建模，也可以视作独⽴事件建模

我们可以对每⼀个train和test同名⽂件单独建⽴模型预测，分别单独学习46个模型

也可以把数据融合到⼀块建⽴⼀个模型，但不是简单的叠加，涉及multi-task的内容：

1.多组数据汇总后学习⼀个模型

2.对于每组数据再单独学习⼀个模型。即有⼀部分是共性模型，共性模型的基础上⼜存在个性模型

参考⽂档：

1.4 总体可视化举例

将每⽇的平均⽓温这⼀个指标和46个不同输出值target进⾏⽐较。红⾊粗线为scale之后的每⽇平均⽓温，多条不同的灰⾊细线为不同地区分别的输出数据。

data =[]

name =[]

for file_name in os.listdir(trian_path):

file_path = os.path.join(trian_path, file_name)

name.append(file_name.split('.')[0])

d = np.genfromtxt(file_path, delimiter=',', dtype=float)

data.append(anspo()))

target_COL =7

avgC_COL =5

# plot the output vs. avgC

plt.style.u('ggplot')

plt.figure(figsize=(24,8))

for i, d in enumerate(data):

plt.plot(d[target_COL])

# add scaled average daily temperature to the plot

plt.plot((data[0][avgC_COL]+10)*40, linewidth=3, color='r')

plt.show()

形容白的词语2 先取第1个CSV的⽂件建⽴⼀个模型看看

2.1 数据导⼊

时间已经按照从过去到现在的顺序排列好了，不需要再做调整，同时在做时间序列预测时主要涉及到5个属性，我们的⽬标是⽤其余4个属性去预测给定⽇期的下⼀天的target。

df11 = pd.read_csv(trian_path+'11.csv',header=None,names=['year','month','day','maxC','minC','avgC','avgH','target']) test11 = pd.read_csv(test_path+'11.csv',header=None,names=['year','month','day','maxC','minC','avgC','avgH'])

data = df11.ix[:,3:8]

test = test11.ix[:,3:8]

data.head()

maxC minC avgC avgH target

0 1.9 -0.4 0.7875 75.000 814.155800

1 6.

2 -3.9 1.7625 77.250 704.251112

2 7.8 2.0 4.2375 72.750 756.958978

3 8.5 -1.2 3.0375 65.875 640.645401

4 7.9 -3.6 1.862

5 55.375 631.725130

2.2 Z-Score标准化后对数据进⾏可视化⼤体看看分布情况

from sklearn.preprocessing import StandardScaler

磨穿铁砚ss = StandardScaler()

data_sd = ss.fit_transform(data)

plt.figure(figsize=(24,8))

plt.plot(data_sd[:,:4])

plt.plot(data_sd[:,4],label ='target',color='red')

plt.legend(loc ='upper left',fontsize =24)

plt.show()

3 LSTM模型

data = np.array(df11.ix[:,3:8])

test = np.array(test11.ix[:,3:8])

3.1 设置常量

rnn_unit =10# 隐层数量

input_size =4

output_size =1

lr =0.0006# 学习率

epochs =500

3.2 获取训练集

def get_train_data(batch_size=60, time_step=20,train_begin=0, train_end=len(data)): batch_index =[]

data_train = data[train_begin:train_end]

normalized_train_data =(

an(data_train, axis=0))/np.std(data_train, axis=0)# 标准化

train_x, train_y =[],[]# 训练集

for i in range(len(normalized_train_data)-time_step):

if i % batch_size ==0:

batch_index.append(i)

x = normalized_train_data[i:i+time_step,:4]

y = normalized_train_data[i:i+time_step,4, np.newaxis]

train_x.list())

train_y.list())

batch_index.append((len(normalized_train_data)-time_step))

return batch_index, train_x, train_y

3.3 获取测试集

def get_test_data(time_step=20,data=data,test_begin=0):上一题下一题

data_test = data[test_begin:]

mean = np.mean(data_test, axis=0)

std = np.std(data_test, axis=0)

normalized_test_data =(data_test-mean)/std # 标准化

size =(len(normalized_test_data)+time_step-1)//time_step # 有size个sample

test_x, test_y =[],[]

for i in range(size-1):

x = normalized_test_data[i*time_step:(i+1)*time_step,:4]

y = normalized_test_data[i*time_step:(i+1)*time_step,4]

test_x.list())

d(y)

test_x.append((normalized_test_data[(i+1)*time_step:,:4]).tolist())

d((normalized_test_data[(i+1)*time_step:,4]).tolist())

return mean, std, test_x, test_y

3.4 神经⽹络变量定义

道路交通事故受伤人员伤残评定# 输⼊层、输出层权重、偏置

weights ={

'in': tf.Variable(tf.random_normal([input_size, rnn_unit])),

'out': tf.Variable(tf.random_normal([rnn_unit,1]))

}

bias ={

'in': tf.stant(0.1, shape=[rnn_unit,])),

'out': tf.stant(0.1, shape=[1,]))

}

3.5 建⽴lstm模型

batch_size = tf.shape(X)[0]

time_step = tf.shape(X)[1]

w_in = weights['in']

b_in = bias['in']

input= tf.reshape(X,[-1, input_size])# 需要将tensor转成2维进⾏计算，计算后的结果作为隐藏层的输⼊ input_rnn = tf.matmul(input, w_in)+b_in

# 将tensor转成3维，作为lstm cell的输⼊

input_rnn = tf.reshape(input_rnn,[-1, time_step, rnn_unit])

cell = BasicLSTMCell(rnn_unit)

init_state = _state(batch_size, dtype=tf.float32)

output_rnn, final_states = tf.nn.dynamic_rnn(

cell, input_rnn, initial_state=init_state, dtype=tf.float32)

output = tf.reshape(output_rnn,[-1, rnn_unit])

w_out = weights['out']

b_out = bias['out']

pred = tf.matmul(output, w_out)+b_out

return pred, final_states

3.6 训练模型

def train_lstm(batch_size=60, time_step=20,epochs=epochs, train_begin=0, train_end=len(data)):

X = tf.placeholder(tf.float32, shape=[None, time_step, input_size])

Y = tf.placeholder(tf.float32, shape=[None, time_step, output_size])

batch_index, train_x, train_y = get_train_data(batch_size, time_step, train_begin, train_end)

with tf.variable_scope("c_lstm"):

pred, _ = lstm(X)

loss = tf.reduce_mean(

tf.shape(pred,[-1])-tf.reshape(Y,[-1])))

train_op = tf.train.AdamOptimizer(lr).minimize(loss)

总价saver = tf.train.Saver(tf.global_variables(), max_to_keep=15)

with tf.Session()as ss:

ss.run(tf.global_variables_initializer())

for i in range(epochs):# 这个迭代次数，可以更改，越⼤预测效果会更好，但需要更长时间

for step in range(len(batch_index)-1):

_, loss_ = ss.run([train_op, loss], feed_dict={X: train_x[batch_index[

怎么选柚子

step]:batch_index[step+1]], Y: train_y[batch_index[step]:batch_index[step+1]]}) if(i+1)%50==0:

print("Number of epochs:", i+1," loss:", loss_)

print("model_save: ", saver.save(ss,'model_save/modle.ckpt'))

# 我是在window下跑的，这个地址是存放模型的地⽅，模型参数⽂件名为modle.ckpt

# 在Linux下⾯⽤ 'model_save2/modle.ckpt'

print("The train has finished")

train_lstm()

阿甘骑士

本文发布于:2023-06-17 04:51:34，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1042010.html

上一篇：graphpad prism 5 教程

下一篇：英语口语在青春的赛道上用力奔跑

标签：预测数据模型学习时间序列探索

留言与评论（共有 0 条评论）