首页 > 美文鉴赏

【python】详解sample根据时间聚合采样（一）

更新时间:2023-07-06 23:52:36 阅读：评论：0

【python】详解sample根据时间聚合采样（⼀）

⾸先我们直接看官⽅的⽂档：

label=None, convention='start', kind=None, lofft=None,

limit=None, ba=0, on=None, level=None)

聚合的时间参数rule

参数如下：

rule : 表⽰⽬标转换的偏移字符串或对象，⼀般是时间参数，⽐如“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”；

axis : int, optional, default 0

clod : {‘right’, ‘left’}；间隔的哪⼀侧是关闭的，对于除“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”之外的所有频率偏移，默认值为“左”，其默认值均为“右”label : {‘right’, ‘left’}；⽤于标记bins，间隔的哪⼀侧是关闭的，对于除“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”之外的所有频率偏移，默认值为“左”，其默认值均为“右”

convention : {‘start’, ‘end’, ‘s’, ‘e’}：For PeriodIndex only, controls whether to u the start or end of rule

kind: {‘timestamp’, ‘period’}, optional；Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By defa ult the input reprentation is retained.

lofft : 调整重新采样的时间标签

on : 对于DataFrame，要使⽤的列⽽不是索引进⾏重新采样。列必须与⽇期时间相似的数据。

实例

1.1 创建时间索引，并查看按时间合并后进⾏求和计算

#⾸先创建⼀个包含9个⼀分钟时间戳的系列

index = pd.date_range('1/1/2000', periods=9, freq='T') # ⽣成频率为每分钟的数据

ries = pd.Series(range(9), index=index)

ries

Out[31]:

2000-01-01 00:00:00 0

2000-01-01 00:01:00 1

2000-01-01 00:02:00 2

2000-01-01 00:03:00 3

2000-01-01 00:04:00 4

1837年2000-01-01 00:05:00 5

2000-01-01 00:06:00 6

2000-01-01 00:07:00 7

2000-01-01 00:08:00 8

变道信号交警手势Freq: T, dtype: int64

# 3T就是rule，合并的⽅向默认是axis = 0；

Out[34]: DatetimeIndexResampler [freq=<3 * Minutes>, axis=0, clod=left, label=left, convention=start, ba=0]

Out[35]:

2000-01-01 00:00:00 3

烟花爆竹简笔画2000-01-01 00:03:00 12

2000-01-01 00:06:00 21

Freq: 3T, dtype: int64

1.2 clo和left的左右

# 间隔的哪⼀侧是关闭的，对于除“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”之外的所有频率偏移，默认值为“左”，其默认值均为“右”

Out[37]: DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, clod=right, label=right, convention=start, ba=0] sample('T')

Out[38]: DatetimeIndexResampler [freq=<Minute>, axis=0, clod=left, label=left, convention=start,

ba=0]

# 如果是T分钟频率，合并后的K线是从合并的时间开始计算，如果是2T，就从开始的第⼀根往后数2根；

Out[40]:

2000-01-01 00:00:00 0

2000-01-01 00:01:00 1

2000-01-01 00:02:00 2

2000-01-01 00:03:00 3

2000-01-01 00:04:00 4

2000-01-01 00:05:00 5

2000-01-01 00:06:00 6

2000-01-01 00:07:00 7

形容女子美貌的成语2000-01-01 00:08:00 8

Freq: T, dtype: int64

# 对⽐可以看出是左侧关闭的

Out[42]:

2000-01-01 00:00:00 1

2000-01-01 00:02:00 5

2000-01-01 00:04:00 9

2000-01-01 00:06:00 13

2000-01-01 00:08:00 8

Freq: 2T, dtype: int64

# 如果是W周频率，那么整个合并的结果归集到最后⼀天

ries = pd.Series(range(9), index=index)

Out[53]:

2000-01-02 0

2000-01-09 3

2000-01-16 12

2000-01-23 13

2000-01-30 8

Freq: W-SUN, dtype: int64

ries

东方主战场Out[54]:

2000-01-01 0

2000-01-04 1

2000-01-07 2

2000-01-10 3

2000-01-13 4

2000-01-16 5

2000-01-19 6

2000-01-22 7

2000-01-25 8

Freq: 3D, dtype: int64

1.3 ⾃定义label的⽅向，也就是数据归集的位置

index = pd.date_range('1/1/2000', periods=9, freq='T') # ⽣成频率为每分钟的数据ries = pd.Series(range(9), index=index)

无月不登楼

Out[57]:

2000-01-01 00:03:00 3

2000-01-01 00:06:00 12

2000-01-01 00:09:00 21

Freq: 3T, dtype: int64

1.4 ⾃定义clo的⽅向

index = pd.date_range('1/1/2000', periods=9, freq='T') # ⽣成频率为每分钟的数据ries = pd.Series(range(9), index=index)

ries

Out[63]:zpp

2000-01-01 00:00:00 0

2000-01-01 00:01:00 1

2000-01-01 00:02:00 2

2000-01-01 00:03:00 3

2000-01-01 00:04:00 4

2000-01-01 00:05:00 5

2000-01-01 00:06:00 6

2000-01-01 00:07:00 7

2000-01-01 00:08:00 8

Freq: T, dtype: int64

# 被合并的数据的最右不显⽰，关闭箱间隔的右侧

Out[64]:

1999-12-31 23:57:00 0

2000-01-01 00:00:00 6

2000-01-01 00:03:00 15

2000-01-01 00:06:00 15

Freq: 3T, dtype: int64

1.5 上采⽤成30s的数据，加上asfreq()

Out[66]:

2000-01-01 00:00:00 0.0

2000-01-01 00:00:30 NaN

2000-01-01 00:01:00 1.0

2000-01-01 00:01:30 NaN

2000-01-01 00:02:00 2.0

2000-01-01 00:02:30 NaN

2000-01-01 00:03:00 3.0

2000-01-01 00:03:30 NaN

日本论理2000-01-01 00:04:00 4.0

2000-01-01 00:04:30 NaN

2000-01-01 00:05:00 5.0

2000-01-01 00:05:30 NaN

2000-01-01 00:06:00 6.0

2000-01-01 00:06:30 NaN

2000-01-01 00:07:00 7.0

2000-01-01 00:07:30 NaN

2000-01-01 00:08:00 8.0

Freq: 30S, dtype: float64

1.6 将系列上采样到30秒的箱中，并使⽤pad⽅法填充NaN值

Out[67]:

2000-01-01 00:00:00 0

2000-01-01 00:00:30 0

2000-01-01 00:01:00 1

2000-01-01 00:01:30 1

2000-01-01 00:02:00 2

Freq: 30S, dtype: int64

1.7 将系列上上采样30秒的箱中，并使⽤bfill⽅法填充NaN值

Out[68]:

2000-01-01 00:00:00 0

2000-01-01 00:00:30 1

2000-01-01 00:01:00 1

2000-01-01 00:01:30 2

2000-01-01 00:02:00 2

Freq: 30S, dtype: int64

1.8 通过apply传递⾃定义功能

def custom_resampler(array_like):

return np.sum(array_like)+5

import numpy as np

Out[74]:

2000-01-01 00:00:00 8

2000-01-01 00:03:00 17

2000-01-01 00:06:00 26

Freq: 3T, dtype: int64

1.9 对于具有PeriodIndex的Series，关键字约定可⽤于控制是否使⽤规则的开头或结尾。

本文发布于:2023-07-06 23:52:36，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1070881.html

上一篇：pandas处理日期的几种常用方法

下一篇：On the periods of the exponential and elliptic fun

标签：时间合并数据频率

留言与评论（共有 0 条评论）