overwrite

更新时间:2022-11-22 17:26:23 阅读: 评论:0


2022年11月22日发(作者:背单词游戏)

【原创】⼤叔问题定位分享(22)hive同时执⾏多个

inrtoverwritetabl。。。

hive2.1

⼀问题

最近有⼀个场景,要向⼀个表的多个分区写数据,为了缩短执⾏时间,采⽤并发的⽅式,多个sql同时执⾏,分别写不同的分区,同时开启

动态分区:

ion=true

inrtoverwritetabletest_tablepartition(dt)lect*fromtest_table_anotherwheredt=1;

结果发现只有1个sql运⾏,其他sql都会卡住;

查看hivethriftrver线程堆栈发现请求都卡在DbTxnManager上,hive关键配置如下:

rency=true

r=anager

配置对应的默认值及注释:

nf

HIVE_SUPPORT_CONCURRENCY("rency",fal,

"WhetherHivesupportsconcurrencycontrolornot.n"+

"AZooKeeperinstancemustbeupandrunningwhenusingzookeeperHivelockmanager"),

HIVE_TXN_MANAGER("r",

"xnManager",

"anageraspartofturningonHiven"+

"transactions,,n"+

"s,rency(true),ingn"+

"(true),(nonstrict).n"+

"ThedefaultDummyTxnManagerreplicatespre-Hive-0.13behaviorandprovidesn"+

"notransactions."),

⼆代码分析

hive中执⾏sql最终都会调⽤到,run会调⽤runInternal,下⾯直接看runInternal代码:

privateCommandProcessorResponrunInternal(Stringcommand,booleanalreadyCompiled)

throwsCommandNeedRetryException{

...

if(requiresLock()){

//acheckpointtoeifthethreadisinterruptedornotbeforeanexpensiveoperation

if(isInterrupted()){

ret=handleInterruption("atacquiringthelock.");

}el{

ret=acquireLocksAndOpenTxn(startTxnImplicitly);

}

...

privatebooleanrequiresLock(){

if(!checkConcurrency()){

returnfal;

}

//Lockoperationsthemlvesdon'trequirethelock.

if(isExplicitLockOperation()){

returnfal;

}

if(!lVar(conf,_LOCK_MAPRED_ONLY)){

returntrue;

}

Queue>taskQueue=newLinkedList>();

(tTasks());

while(()!=null){

Task<?extendsSerializable>tsk=();

if(eLock()){

returntrue;

}

...

privatebooleancheckConcurrency(){

booleansupportConcurrency=lVar(_SUPPORT_CONCURRENCY);

if(!supportConcurrency){

("Concurrencymodeisdisabled,notcreatingalockmanager");

returnfal;

}

returntrue;

}

privateintacquireLocksAndOpenTxn(booleanstartTxnImplicitly){

...

eLocks(plan,ctx,urFromUGI);

...

runInternal会调⽤requiresLock判断是否需要lock,requiresLock有两个判断:

调⽤checkConcurrency,checkConcurrency会检查rency=true才需要lock;

调⽤eLock,只有部分task才需要lock;

如果判断需要lock,会调⽤acquireLocksAndOpenTxn,acquireLocksAndOpenTxn会调⽤eLocks来获取lock;

1)先看那些task需要lock:

anticAnalyzer

privatevoidanalyzeAlterTablePartMergeFiles(ASTNodeast,

StringtableName,HashMappartSpec)

throwsSemanticException{

...

DDLWorkddlWork=newDDLWork(getInputs(),getOutputs(),mergeDesc);

dLock(true);

...

可见DDL操作需要;

2)再看怎样获取lock:

anager

publicvoidacquireLocks(QueryPlanplan,Contextctx,Stringurname)throwsLockException{

try{

acquireLocksWithHeartbeatDelay(plan,ctx,urname,0);

...

voidacquireLocksWithHeartbeatDelay(QueryPlanplan,Contextctx,Stringurname,longdelay)throwsLockException{

LockStatels=acquireLocks(plan,ctx,urname,true);

...

LockStateacquireLocks(QueryPlanplan,Contextctx,Stringurname,booleanisBlocking)throwsLockException{

...

switch(e()){

caDATABASE:

ame(aba().getName());

break;

caTABLE:

caDUMMYPARTITION://incaofdynamicpartitioninglockthetable

t=le();

ame(ame());

leName(leName());

break;

caPARTITION:

titionName(tition().getName());

t=tition().getTable();

ame(ame());

leName(leName());

break;

default:

//Thisisafileorsomethingwedon'tholdlocksfor.

continue;

}

...

LockStatelockState=((),queryId,isBlocking,locks);

eLocks(locks);

returnlockState;

}

可见当开启动态分区时,锁的粒度是DbName+TableName,这样就会导致多个sql只有1个sql可以拿到lock,其他sql只能等待;

三总结

解决问题的⽅式有⼏种:

1.关闭动态分区:ion=fal

2.关闭并发:rency=fal

3.关闭事务:r=xnManager

三者任选其⼀,推荐第1种,因为在刚才的场景下,不需要动态分区;

本文发布于:2022-11-22 17:26:23,感谢您对本站的认可!

本文链接:http://www.wtabcd.cn/fanwen/fan/90/514.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

下一篇:soybean
标签:overwrite
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图