【python】详解sample根据时间聚合采样(⼀)
⾸先我们直接看官⽅的⽂档:
label=None, convention='start', kind=None, lofft=None,
limit=None, ba=0, on=None, level=None)
聚合的时间参数rule
参数如下:
rule : 表⽰⽬标转换的偏移字符串或对象,⼀般是时间参数,⽐如“M”,“A”,“Q”,“BM”,“BA”,“BQ”和“W”;
axis : int, optional, default 0
clod : {‘right’, ‘left’};间隔的哪⼀侧是关闭的,对于除“M”,“A”,“Q”,“BM”,“BA”,“BQ”和“W”之外的所有频率偏移,默认值为“左”,其默认值均为“右”label : {‘right’, ‘left’};⽤于标记bins,间隔的哪⼀侧是关闭的,对于除“M”,“A”,“Q”,“BM”,“BA”,“BQ”和“W”之外的所有频率偏移,默认值为“左”,其默认值均为“右”
convention : {‘start’, ‘end’, ‘s’, ‘e’}:For PeriodIndex only, controls whether to u the start or end of rule
kind: {‘timestamp’, ‘period’}, optional;Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By defa ult the input reprentation is retained.
lofft : 调整重新采样的时间标签
on : 对于DataFrame,要使⽤的列⽽不是索引进⾏重新采样。列必须与⽇期时间相似的数据。
实例
1.1 创建时间索引,并查看按时间合并后进⾏求和计算
#⾸先创建⼀个包含9个⼀分钟时间戳的系列
index = pd.date_range('1/1/2000', periods=9, freq='T') # ⽣成频率为每分钟的数据
ries = pd.Series(range(9), index=index)
ries
Out[31]:
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
1837年2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
变道信号交警手势Freq: T, dtype: int64
# 3T就是rule,合并的⽅向默认是axis = 0;
Out[34]: DatetimeIndexResampler [freq=<3 * Minutes>, axis=0, clod=left, label=left, convention=start, ba=0]
Out[35]:
2000-01-01 00:00:00 3
烟花爆竹简笔画2000-01-01 00:03:00 12
2000-01-01 00:06:00 21
Freq: 3T, dtype: int64
1.2 clo和left的左右
# 间隔的哪⼀侧是关闭的,对于除“M”,“A”,“Q”,“BM”,“BA”,“BQ”和“W”之外的所有频率偏移,默认值为“左”,其默认值均为“右”
Out[37]: DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, clod=right, label=right, convention=start, ba=0] sample('T')
Out[38]: DatetimeIndexResampler [freq=<Minute>, axis=0, clod=left, label=left, convention=start,
ba=0]
# 如果是T分钟频率,合并后的K线是从合并的时间开始计算,如果是2T,就从开始的第⼀根往后数2根;
Out[40]:
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
形容女子美貌的成语2000-01-01 00:08:00 8
Freq: T, dtype: int64
# 对⽐可以看出是左侧关闭的
Out[42]:
2000-01-01 00:00:00 1
2000-01-01 00:02:00 5
2000-01-01 00:04:00 9
2000-01-01 00:06:00 13
2000-01-01 00:08:00 8
Freq: 2T, dtype: int64
# 如果是W周频率,那么整个合并的结果归集到最后⼀天
ries = pd.Series(range(9), index=index)
Out[53]:
2000-01-02 0
2000-01-09 3
2000-01-16 12
2000-01-23 13
2000-01-30 8
Freq: W-SUN, dtype: int64
ries
东方主战场Out[54]:
2000-01-01 0
2000-01-04 1
2000-01-07 2
2000-01-10 3
2000-01-13 4
2000-01-16 5
2000-01-19 6
2000-01-22 7
2000-01-25 8
Freq: 3D, dtype: int64
1.3 ⾃定义label的⽅向,也就是数据归集的位置
index = pd.date_range('1/1/2000', periods=9, freq='T') # ⽣成频率为每分钟的数据ries = pd.Series(range(9), index=index)
无月不登楼
Out[57]:
2000-01-01 00:03:00 3
2000-01-01 00:06:00 12
2000-01-01 00:09:00 21
Freq: 3T, dtype: int64
1.4 ⾃定义clo的⽅向
index = pd.date_range('1/1/2000', periods=9, freq='T') # ⽣成频率为每分钟的数据ries = pd.Series(range(9), index=index)
ries
Out[63]:zpp
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
Freq: T, dtype: int64
# 被合并的数据的最右不显⽰,关闭箱间隔的右侧
Out[64]:
1999-12-31 23:57:00 0
2000-01-01 00:00:00 6
2000-01-01 00:03:00 15
2000-01-01 00:06:00 15
Freq: 3T, dtype: int64
1.5 上采⽤成30s的数据,加上asfreq()
Out[66]:
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 1.0
2000-01-01 00:01:30 NaN
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 NaN
2000-01-01 00:03:00 3.0
2000-01-01 00:03:30 NaN
日本论理2000-01-01 00:04:00 4.0
2000-01-01 00:04:30 NaN
2000-01-01 00:05:00 5.0
2000-01-01 00:05:30 NaN
2000-01-01 00:06:00 6.0
2000-01-01 00:06:30 NaN
2000-01-01 00:07:00 7.0
2000-01-01 00:07:30 NaN
2000-01-01 00:08:00 8.0
Freq: 30S, dtype: float64
1.6 将系列上采样到30秒的箱中,并使⽤pad⽅法填充NaN值
Out[67]:
2000-01-01 00:00:00 0
2000-01-01 00:00:30 0
2000-01-01 00:01:00 1
2000-01-01 00:01:30 1
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
1.7 将系列上上采样30秒的箱中,并使⽤bfill⽅法填充NaN值
Out[68]:
2000-01-01 00:00:00 0
2000-01-01 00:00:30 1
2000-01-01 00:01:00 1
2000-01-01 00:01:30 2
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
1.8 通过apply传递⾃定义功能
def custom_resampler(array_like):
return np.sum(array_like)+5
import numpy as np
Out[74]:
2000-01-01 00:00:00 8
2000-01-01 00:03:00 17
2000-01-01 00:06:00 26
Freq: 3T, dtype: int64
1.9 对于具有PeriodIndex的Series,关键字约定可⽤于控制是否使⽤规则的开头或结尾。