Greenplum⾏存、列存,堆表、AO表的原理和选择
转载⾃:
背景
Greenplum⽀持⾏存和列存,⽀持堆表和AO表,那么他们有什么不同,如何选择呢?
⾏存和列存的原理
gotta go my own way1、⾏存,以⾏为形式组织存储,⼀⾏是⼀个tuple,存在⼀起。当需要读取某列时,需要将这列前⾯的所有列都进⾏deform,所以访问第⼀列和访问最后⼀列的成本实际上是不⼀样的。
在这篇⽂档中,有deform的详细介绍。
⾏存⼩结:
全表扫描要扫描更多的数据块。
压缩⽐较低。
读取任意列的成本不⼀样,越靠后的列,成本越⾼。
不适合向量计算、JIT架构。(简单来说,就是不适合批处理形式的计算)
需要REWRITE表时,需要对全表进⾏REWRITE,例如加字段有默认值。
2、列存,以列为形式组织存储,每列对应⼀个或⼀批⽂件。读取任⼀列的成本是⼀样的,但是如果要读取多列,需要访问多个⽂件,访问的列越多,开销越⼤。
列存⼩结:
压缩⽐⾼。
仅仅⽀持AO存储(后⾯会将)。
读取任意列的成本是⼀样的。
⾮常适合向量计算、JIT架构。对⼤批量数据的访问和统计,效率更⾼。
读取很多列时,由于需要访问更多的⽂件,成本更⾼。例如查询明细。
需要REWRITE表时,不需要对全表操作,例如加字段有默认值,只是添加字段对应的那个⽂件。
什么时候选择⾏存
如果OLTP的需求偏多,例如经常需要查询表的明细(输出很多列),需要更多的更新和删除操作时。可以考虑⾏存。
什么时候选择列存
如果OLAP的需求偏多,经常需要对数据进⾏统计时,选择列存。
需要⽐较⾼的压缩⽐时,选择列存。
如果⽤户有混合需求,可以采⽤分区表,例如按时间维度的需求分区,近期的数据明细查询多,那就使⽤⾏存,对历史的数据统计需求多那就使⽤列存。
堆表和AO表的原理
1、堆表,实际上就是PG的堆存储,堆表的所有变更都会产⽣REDO,可以实现时间点恢复。但是堆表不能实现逻辑增量备份(因为表的任意⼀个数据块都有可能变更,不⽅便通过堆存储来记录位点。)。
⼀个事务结束时,通过clog以及REDO来实现它的可靠性。同时⽀持通过REDO来构建MIRROR节点实现数据冗余。
2、AO表,看名字就知道,只追加的存储,删除更新数据时,通过另⼀个BITMAP⽂件来标记被删除的⾏,通过bit以及偏移对齐来判定AO 表上的某⼀⾏是否被删除。
事务结束时,需要调⽤FSYNC,记录最后⼀次写⼊对应的数据块的偏移。(并且这个数据块即使只有⼀条记录,下次再发起事务⼜会重新追加⼀个数据块)同时发送对应的数据块给MIRROR实现数据冗余。
因此AO表不适合⼩事务,因为每次事务结束都会FSYNC,同时事务结束后这个数据块即使有空余也不会被复⽤。(你可以测试⼀下,AO 表单条提交的IO放⼤很严重)。
虽然如此,AO表⾮常适合OLAP场景,批量的数据写⼊,⾼压缩⽐,逻辑备份⽀持增量备份,因此每次记录备份到的偏移量即可。加上每次备份全量的BITMAP删除标记(很⼩)。
什么时候选择堆表
当数据写⼊时,⼩事务偏多时选择堆表。
当需要时间点恢复时,选择堆表。
什么时候选择AO表
当需要列存时,选择AO表。
当数据批量写⼊时,选择AO表。
测试对⽐⾏存deform和列存的性能差别
1、创建⼀个函数,⽤于创建400列的表(⾏存堆表、AO⾏存表、AO列存表)。
stunningcreate or replace function f(name, int, text) returns void as $$
declare
res text := '';
512护士节演讲稿begin
for i in 1..$2 loop
res := res||'c'||i||' int8,';
end loop;
res := rtrim(res, ',');
if $3 = 'ao_col' then
res := 'create table '||$1||'('||res||') with (appendonly=true, blocksize=8192, compresstype=none, orientation=column)';
elsif $3 = 'ao_row' then
res := 'create table '||$1||'('||res||') with (appendonly=true, blocksize=8192, orientation=row)';
elsif $3 = 'heap_row' then
res := 'create table '||$1||'('||res||') with (appendonly=fal)';
el
rai notice 'u ao_col, ao_row, heap_row as $3';
return;
end if;
execute res;
end;
$$ language plpgsql;
2、创建表如下
postgres=# lect f('tbl_ao_col', 400, 'ao_col');
postgres=# lect f('tbl_ao_row', 400, 'ao_row');
postgres=# lect f('tbl_heap_row', 400, 'heap_row');
3、创建1个函数,⽤于填充数据,其中第⼀个和最后3个字段为测试数据的字段,其他都填充1。
create or replace function f_ins1(name, int, int8) returns void as $$
declare
res text := '';
begin
for i in 1..($2-4) loop
res := res||'1,';
end loop;
res := 'id,'||res;
res := rtrim(res, ',');
res := 'inrt into '||$1||' lect '||res||'id,random()*10000,random()*100000 from generate_ries(1,'||$3||') t(id)'; execute res;
end;
$$ language plpgsql;
4、填充数据
postgres=# lect f_ins1('tbl_ao_col',400,1000000);
5、创建1个函数,⽤于填充数据,其中前4个字段为测试数据的字段,其他都填充1。
create or replace function f_ins2(name, int, int8) returns void as $$
declare
res text := '';
begin
for i in 1..($2-4) loop
res := res||'1,';
end loop;
res := 'id,id,random()*10000,random()*100000,'||res;
毫升的英文res := rtrim(res, ',');
res := 'inrt into '||$1||' lect '||res||' from generate_ries(1,'||$3||') t(id)';
execute res;
end;
$$ language plpgsql;
6、填充数据
postgres=# lect f_ins1('tbl_ao_col',400,1000000);
f_ins1
--------
(1 row)
postgres=# inrt into tbl_ao_row lect * from tbl_ao_col;
INSERT 0 1000000
postgres=# inrt into tbl_heap_row lect * from tbl_ao_col;
INSERT 0 1000000
7、表分析
postgres=# analyze tbl_ao_col ;
ANALYZE
postgres=# analyze tbl_ao_row;
ANALYZE
postgres=# analyze tbl_heap_row;
ANALYZE
8、表⼤⼩
postgres=# lect pg_size_pretty(pg_relation_size('tbl_ao_col'));
pg_size_pretty
-
---------------
3060 MB
(1 row)
postgres=# lect pg_size_pretty(pg_relation_size('tbl_ao_row'));
pg_size_pretty
----------------
3117 MB
(1 row)
postgres=# lect pg_size_pretty(pg_relation_size('tbl_heap_row'));
pg_size_pretty
----------------
3473 MB
(1 row)
9、⾏存堆表,前⾯⼏个字段的统计
postgres=# explain analyze lect c2,count(*),sum(c3),avg(c3),min(c3),max(c3) from tbl_heap_row group by c2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; gments: 48) (cost=136132.40..136132.42 rows=1 width=96)
Rows out: 1 rows at destination with 135 ms to end, start offt by 1.922 ms.
-> HashAggregate (cost=136132.40..136132.42 rows=1 width=96)
charcoalGroup By: tbl_heap_row.c2
Rows out: 1 rows (g42) with 0.002 ms to first row, 36 ms to end, start offt by 48 ms.
-> Redistribute Motion 48:48 (slice1; gments: 48) (cost=136132.35..136132.37 rows=1 width=96)truth是什么意思
Hash Key: tbl_heap_row.c2
Rows out: 48 rows at destination (g42) with 53 ms to end, start offt by 48 ms.
-> HashAggregate (cost=136132.35..136132.35 rows=1 width=96)
Group By: tbl_heap_row.c2
Rows out: Avg 1.0 rows x 48 workers. Max 1 rows (g0) with 0.008 ms to first row, 1.993 ms to end, start offt by 48 ms.
-> Seq Scan on tbl_heap_row (cost=0.00..121134.54 rows=20831 width=16)
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (g42) with 40 ms to first row, 73 ms to end, start offt by 50 ms. Slice statistics:
(slice0) Executor memory: 345K bytes.
(slice1) Executor memory: 751K bytes avg x 48 workers, 751K bytes max (g0).
(slice2) Executor memory: 359K bytes avg x 48 workers, 374K bytes max (g42).
Statement statistics:
Memory ud: 128000K bytes
Settings: optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 138.524 ms
(22 rows)
10、⾏存堆表,末尾⼏个字段的统计
postgres=# explain analyze lect c398,count(*),sum(c399),avg(c399),min(c399),max(c399) from tbl_heap_row group by c398;
QUERY PLAN
tighten>esp是什么意思------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; gments: 48) (cost=136576.82..136799.05 rows=9877 width=96)
Rows out: 10001 rows at destination with 212 ms to end, start offt by 1.917 ms.
出现英语-> HashAggregate (cost=136576.82..136799.05 rows=206 width=96)
Group By: tbl_heap_row.c398
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (g17) with 0.001 ms to first row, 70 ms to end, start offt by 14 ms.
-> Redistribute Motion 48:48 (slice1; gments: 48) (cost=136132.35..136329.89 rows=206 width=96)
Hash Key: tbl_heap_row.c398
Rows out: Avg 8762.2 rows x 48 workers at destination. Max 9422 rows (g46) with 93 ms to end, start offt by 48 ms.
-> HashAggregate (cost=136132.35..136132.35 rows=206 width=96)
epornerGroup By: tbl_heap_row.c398
Rows out: Avg 8762.2 rows x 48 workers. Max 8835 rows (g2) with 0.003 ms to first row, 12 ms to end, start offt by 49 ms.
-> Seq Scan on tbl_heap_row (cost=0.00..121134.54 rows=20831 width=16)
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (g42) with 40 ms to first row, 133 ms to end, start offt by 51 ms. Slice statistics:
(slice0) Executor memory: 377K bytes.
(slice1) Executor memory: 1156K bytes avg x 48 workers, 1156K bytes max (g0).
(slice2) Executor memory: 414K bytes avg x 48 workers, 414K bytes max (g1).
Statement statistics:
Memory ud: 128000K bytes
Settings: optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 214.024 ms
(22 rows)
11、⾏存AO表,前⾯⼏个字段的统计
postgres=# explain analyze lect c2,count(*),sum(c3),avg(c3),min(c3),max(c3) from tbl_ao_row group by c2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; gments: 48) (cost=124755.04..124755.07 rows=1 width=96)
Rows out: 1 rows at destination with 149 ms to end, start offt by 1.890 ms.
-> HashAggregate (cost=124755.04..124755.07 rows=1 width=96)
Group By: tbl_ao_row.c2
Rows out: 1 rows (g42) with 0.004 ms to first row, 55 ms to end, start offt by 64 ms.
-> Redistribute Motion 48:48 (slice1; gments: 48) (cost=124755.00..124755.02 rows=1 width=96)
Hash Key: tbl_ao_row.c2
Rows out: 48 rows at destination (g42) with 32 ms to end, start offt by 64 ms.
-> HashAggregate (cost=124755.00..124755.00 rows=1 width=96)
Group By: tbl_ao_row.c2
Rows out: Avg 1.0 rows x 48 workers. Max 1 rows (g0) with 0.001 ms to first row, 46 ms to end, start offt by 59 ms.
-> Append-only Scan on tbl_ao_row (cost=0.00..109755.00 rows=20834 width=16)
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (g42) with 24 ms to end, start offt by 59 ms.
Slice statistics:
(slice0) Executor memory: 345K bytes.
(slice1) Executor memory: 770K bytes avg x 48 workers, 770K bytes max (g0).
(slice2) Executor memory: 359K bytes avg x 48 workers, 374K bytes max (g42).
Statement statistics:
Memory ud: 128000K bytes
Settings: optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 152.386 ms
(22 rows)
12、⾏存AO表,末尾⼏个字段的统计