clickhou实现排序排名
如何在ClickHou中实现ROW_NUMBER OVER 和DENSE_RANK OVER等同效果的查询,它们在⼀些其他数据库中可⽤于RANK排序。
CH中并没有直接提供对应的开窗函数,需要利⽤⼀些特殊函数变相实现,主要会⽤到下⾯⼏个数组函数,它们分别是:
arrayEnumerate
arrayEnumerateDen
arrayEnumerateUniq
这些函数均接受⼀个数组作为输⼊参数,并返回数组中元素出现的位置,例如:
SELECT
arrayEnumerate([10, 20, 30, 10, 40]) AS row_number,
arrayEnumerateDen([10, 20, 30, 10, 40]) AS den_rank,
arrayEnumerateUniq([10, 20, 30, 10, 40]) AS uniq_rank
简单折纸花
┌─row_number──┬─den_rank──┬─uniq_rank───┐
│ [1,2,3,4,5] │ [1,2,3,1,4] │ [1,1,1,2,1] │
└─────────────┴─────────────┴─────────────┘
--建表语句
CREATE TABLE _loan (
`loan_id` String,--登录ID
`customer_id` String,--⽤户ID
`loan_dt` Nullable (Date),--登录⽇期
`end_date` Nullable (Date),--退出⽇期
`due_days` Nullable (Int64)--与分组排序等暂⽆关字段,此处可以代指⼀些业务字段
)
ENGINE = MergeTree ()
ORDER BY
VID SETTINGS index_granularity = 8192;
我们的⽬标是实现开窗查询:
ROW_NUMBER() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerate
DENSE_RANK() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerateDen
奢华房车UNIQ_RANK() OVER( PARTITION BY id ORDER BY val )==>arrayEnumerateUniq
代码如下
SELECT
customer_id ,
groupArray(loan_dt) AS loan_dt,
groupArray(ifnull(end_date,toDate('2099-12-31'))) AS end_date,
groupArray(due_days) AS due_days,
groupArray(loan_id) AS loan_id,
arrayEnumerate(loan_id) AS row_number,
arrayEnumerateDen(loan_id) AS den_rank,
arrayEnumerateUniq(loan_id) AS uniq_rank
FROM (
SELECT * FROM _loan ORDER BY loan_dt ,loan_id
)
GROUP BY customer_id
数组展开,利⽤ARRAY JOIN将数组展开,并按照customer_id 、loan_id 列排序:
SELECT
customer_id
,loan_id
,loan_dt
,IF(end_date=toDate('2099-12-31'),null,end_date) as end_dt
,due_days
,row_number
,den_rank
,uniq_rank
from
(
SELECT
customer_id ,
groupArray(loan_dt) AS loan_dt,
groupArray(ifnull(end_date,toDate('2099-12-31'))) AS end_date,
groupArray(due_days) AS due_days,
groupArray(loan_id) AS loan_id,
arrayEnumerate(loan_id) AS row_number,
arrayEnumerateDen(loan_id) AS den_rank,
arrayEnumerateUniq(loan_id) AS uniq_rank
FROM (
SELECT * FROM _loan ORDER BY loan_dt ,loan_id
)
GROUP BY customer_id
)
ARRAY JOIN
loan_dt,法国娇兰
loan_id,
end_date,
due_days,
家常红烧牛肉
row_number,
den_rank,
uniq_rank
ORDER BY
customer_id ASC,
row_number ASC ,首尔大学留学条件
den_rank ASC
技巧:因为end_date可能为空值,会导致array长度不⼀致。报错。需要⽤特数值填充然后最后再转换回来。lag/lead实现:
neighbor(column, offt[, default_value])喜欢西游记的理由
函数的结果取决于受影响的数据块和数据块中数据的顺序。
如果使⽤ORDER BY进⾏⼦查询,并从⼦查询外部调⽤该函数,则可以得到预期的结果。
参数
列-列名或标量表达式。
偏移量—从列的当前⾏向前或向后的⾏数。Int64。
默认值-可选。如果偏移量超出块的范围,则返回的值。受影响的数据块的类型。
返回值
如果偏移值不在块边界之外,则当前⾏偏移距离中的列的值。
如果偏移值超出块边界,则列的默认值。如果给定了默认的_值,则将使⽤它。
类型:受影响数据块的类型或默认值类型。
代码如下
SELECT
customer_id
,loan_id
,loan_dt
,IF(end_date=toDate('2099-12-31'),null,end_date) as end_dt
,due_days
,
row_number as row_num
,den_rank
,uniq_rank
对父亲感恩的话,if(neighbor(row_num , 1)>1,neighbor(loan_dt , 1),null) as lead_loan_dt
,if(row_num<>1,neighbor(end_dt, -1),null) as lag_end_dt
from
(
SELECT
customer_id ,
groupArray(loan_dt) AS loan_dt,
groupArray(ifnull(end_date,toDate('2099-12-31'))) AS end_date,
groupArray(due_days) AS due_days,
福州商标groupArray(loan_id) AS loan_id,
arrayEnumerate(loan_id) AS row_number,
arrayEnumerateDen(loan_id) AS den_rank, arrayEnumerateUniq(loan_id) AS uniq_rank
FROM (
SELECT * FROM res_report.ipeso_loan ORDER BY loan_dt ,loan_id )
GROUP BY customer_id
)
ARRAY JOIN
loan_dt,
loan_id,
end_date,
due_days,
row_number,
den_rank,
uniq_rank
ORDER BY
customer_id ASC,
row_number ASC ,
den_rank ASC