FlinkSqlWith1.14查询-WindowJoin(译)
Window Join
窗⼝连接将时间维度添加到连接标准本⾝。这样做时,窗⼝连接将两个流的元素连接起来,这两个流共享⼀个公共键并位于同⼀个窗⼝中。窗⼝连接的语义与
对于流式查询,与连续表上的其他连接不同,窗⼝连接不会发出中间结果,⽽只会在窗⼝结束时发出最终结果。此外,窗⼝连接会在不再需要时清除所有中间状态。
通常,Window Join 与⼀起使⽤。此外,Window Join 可以在其他基于的操作之后进⾏,例如、和。
⽬前,Window Join 要求 join on 条件包含输⼊表的窗⼝开始相等和输⼊表的窗⼝结束相等。
Window Join ⽀持 INNER/LEFT/RIGHT/FULL OUTER/ANTI/SEMI JOIN。
内 / 左 / 右 / 全外
下⾯显⽰了 INNER / LEFT / RIGHT / FULL OUTER Window Join 语句的语法。
SELECT ...
FROM L [LEFT|RIGHT|FULL OUTER] JOIN R -- L and R are relations applied windowing TVF
ON L.window_start = R.window_start AND L.window_end = R.window_end AND ...
INNER / LEFT / RIGHT / FULL OUTER WINDOW JOIN 的语法⾮常相似,这⾥我们只举⼀个FULL OUTER JOIN 的例⼦。执⾏窗⼝连接时,所有具有公共键和公共翻转窗⼝的元素都会连接在⼀起。我们仅给出⼀个适⽤于 Tumble Window TVF 的 Window Join ⽰例。通过将连接的时间区域限定为固定的五分钟间隔,我们将数据集划分为两个不同的时间窗⼝:[12:00, 12:05) 和 [12:05, 12:10)。L2 和
R2 ⾏⽆法连接在⼀起,因为它们落⼊单独的窗⼝中。
Flink SQL> desc LeftTable;
+----------+------------------------+------+-----+--------+----------------------------------+
| name | type | null | key | extras | watermark |
+----------+------------------------+------+-----+--------+----------------------------------+
宽带号码在哪看| row_time | TIMESTAMP(3) *ROWTIME* | true | | | `row_time` - INTERVAL '1' SECOND |
| num | INT | true | | | |
| id | STRING | true | | | |
+----------+------------------------+------+-----+--------+----------------------------------+
Flink SQL> SELECT * FROM LeftTable;
+------------------+-----+----+椿芽鸡蛋
| row_time | num | id |
+------------------+-----+----+
| 2020-04-15 12:02 | 1 | L1 |
| 2020-04-15 12:06 | 2 | L2 |
| 2020-04-15 12:03 | 3 | L3 |
+------------------+-----+----+
Flink SQL> desc RightTable;
+----------+------------------------+------+-----+--------+----------------------------------+
| name | type | null | key | extras | watermark |
+----------+------------------------+------+-----+--------+----------------------------------+
| row_time | TIMESTAMP(3) *ROWTIME* | true | | | `row_time` - INTERVAL '1' SECOND |
按劳取酬| num | INT | true | | | |
| id | STRING | true | | | |
+----------+------------------------+------+-----+--------+----------------------------------+
Flink SQL> SELECT * FROM RightTable;
+------------------+-----+----+
| row_time | num | id |
+------------------+-----+----+
| 2020-04-15 12:01 | 2 | R2 |
| 2020-04-15 12:04 | 3 | R3 |
| 2020-04-15 12:05 | 4 | R4 |
+------------------+-----+----+
Flink SQL> SELECT L.num as L_Num, L.id as L_Id, R.num as R_Num, R.id as R_Id, L.window_start, L.window_end
FROM (
SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
) L
FULL JOIN (
SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
) R
ON L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end;
+-------+------+-------+------+------------------+------------------+
| L_Num | L_Id | R_Num | R_Id | window_start | window_end |
+-------+------+-------+------+------------------+------------------+
| 1 | L1 | null | null | 2020-04-15 12:00 | 2020-04-15 12:05 |
| null | null | 2 | R2 | 2020-04-15 12:00 | 2020-04-15 12:05 |
| 3 | L3 | 3 | R3 | 2020-04-15 12:00 | 2020-04-15 12:05 |
| 2 | L2 | null | null | 2020-04-15 12:05 | 2020-04-15 12:10 |
| null | null | 4 | R4 | 2020-04-15 12:05 | 2020-04-15 12:10 |
+-------+------+-------+------+------------------+------------------+
注意:为了更好地理解窗⼝化的⾏为,我们简化了时间戳值的显⽰以不显⽰尾随零,例如如果类型为 ,2020-04-15 08:05则应显⽰为2020-04-15 08:05:00.000在 Flink SQL 客户端中TIMESTAMP(3)。
半连接(SEMI)
如果在 Semi Window Joins的右侧⾄少有⼀个匹配⾏,则半窗⼝连接从⼀个左侧记录返回⼀⾏。
Flink SQL> SELECT *
FROM (
SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
) L WHERE L.num IN (
SELECT num FROM (
SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR(row_time), INTERVAL '5' MIN
UTES))
) R WHERE L.window_start = R.window_start AND L.window_end = R.window_end);
+------------------+-----+----+------------------+------------------+-------------------------+
| row_time | num | id | window_start | window_end | window_time |
+------------------+-----+----+------------------+------------------+-------------------------+
| 2020-04-15 12:03 | 3 | L3 | 2020-04-15 12:00 | 2020-04-15 12:05 | 2020-04-15 12:04:59.999 |
+------------------+-----+----+------------------+------------------+-------------------------+
Flink SQL> SELECT *
FROM (
SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))小米路由器重置
) L WHERE EXISTS (
SELECT * FROM (
SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
) R WHERE L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end);
+------------------+-----+----+------------------+------------------+-------------------------+
| row_time | num | id | window_start | window_end | window_time |
+------------------+-----+----+------------------+------------------+-------------------------+
| 2020-04-15 12:03 | 3 | L3 | 2020-04-15 12:00 | 2020-04-15 12:05 | 2020-04-15 12:04:59.999 |
+------------------+-----+----+------------------+------------------+-------------------------+
注意:为了更好地理解窗⼝化的⾏为,我们简化了时间戳值的显⽰以不显⽰尾随零,例如如果类型为 ,2020-04-15 08:05则应显⽰为2020-04-15 08:05:00.000在 Flink SQL 客户端中TIMESTAMP(3)。
反连接(ANTI)
Anti Window Joins 是 Inner Window Join 的反⾯:它们包含每个公共窗⼝中的所有未连接⾏。
Flink SQL> SELECT *
FROM (
SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
) L WHERE L.num NOT IN (
SELECT num FROM (
SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
) R WHERE L.window_start = R.window_start AND L.window_end = R.window_end);
+------------------+-----+----+------------------+------------------+-------------------------+
| row_time | num | id | window_start | window_end | window_time |
+------------------+-----+----+------------------+------------------+-------------------------+
| 2020-04-15 12:02 | 1 | L1 | 2020-04-15 12:00 | 2020-04-15 12:05 | 2020-04-15 12:04:59.999 |
| 2020-04-15 12:06 | 2 | L2 | 2020-04-15 12:05 | 2020-04-15 12:10 | 2020-04-15 12:09:59.999 |
+------------------+-----+----+------------------+------------------+-------------------------+
英文动词Flink SQL> SELECT *
FROM (
SELECT * FROM TABLE(TUMBLE(TABLE LeftTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
蒋勋作品
) L WHERE NOT EXISTS (
SELECT * FROM (
SELECT * FROM TABLE(TUMBLE(TABLE RightTable, DESCRIPTOR(row_time), INTERVAL '5' MINUTES))
) R WHERE L.num = R.num AND L.window_start = R.window_start AND L.window_end = R.window_end);
+------------------+-----+----+------------------+------------------+-------------------------+
| row_time | num | id | window_start | window_end | window_time |
+------------------+-----+----+------------------+------------------+-------------------------+
| 2020-04-15 12:02 | 1 | L1 | 2020-04-15 12:00 | 2020-04-15 12:05 | 2020-04-15 12:04:59.999 |
| 2020-04-15 12:06 | 2 | L2 | 2020-04-15 12:05 | 2020-04-15 12:10 | 2020-04-15 12:09:59.999 |
+------------------+-----+----+------------------+------------------+-------------------------+
注意:为了更好地理解窗⼝化的⾏为,我们简化了时间戳值的显⽰以不显⽰尾随零,例如如果类型为 ,2020-04-15 08:05则应显⽰为2020-04-15 08:05:00.000在 Flink SQL 客户端中TIMESTAMP(3)。
对于的英语
局限说明
Join ⼦句局限
⽬前,window join要求join on条件包含窗⼝开始相等和窗⼝结束相等。将来,我们还可以简化join on⼦句,使其只包含窗⼝TVF为TUMBLE或HOP时的窗⼝start相等。
输⼊窗⼝ TVF 的限制
⽬前,左右输⼊的窗⼝ TVF 必须相同。这可以在将来扩展,例如,翻滚窗⼝加⼊具有相同窗⼝⼤⼩的滑动窗⼝。
在直接对 TVF 进⾏窗⼝化之后,对 Window Join 的限制
碗儿糕
⽬前,如果在 之后跟随 Window Join ,则 必须使⽤ Tumble Windows、Hop Windows 或 Cumulate Windows ⽽不是 Session 窗⼝。