【原创】⼤叔问题定位分享(22)hive同时执⾏多个
inrtoverwritetabl。。。
hive2.1
⼀问题
最近有⼀个场景,要向⼀个表的多个分区写数据,为了缩短执⾏时间,采⽤并发的⽅式,多个sql同时执⾏,分别写不同的分区,同时开启
动态分区:
ion=true
inrtoverwritetabletest_tablepartition(dt)lect*fromtest_table_anotherwheredt=1;
结果发现只有1个sql运⾏,其他sql都会卡住;
查看hivethriftrver线程堆栈发现请求都卡在DbTxnManager上,hive关键配置如下:
rency=true
r=anager
配置对应的默认值及注释:
nf
HIVE_SUPPORT_CONCURRENCY("rency",fal,
"WhetherHivesupportsconcurrencycontrolornot.n"+
"AZooKeeperinstancemustbeupandrunningwhenusingzookeeperHivelockmanager"),
HIVE_TXN_MANAGER("r",
"xnManager",
"anageraspartofturningonHiven"+
"transactions,,n"+
"s,rency(true),ingn"+
"(true),(nonstrict).n"+
"ThedefaultDummyTxnManagerreplicatespre-Hive-0.13behaviorandprovidesn"+
"notransactions."),
⼆代码分析
hive中执⾏sql最终都会调⽤到,run会调⽤runInternal,下⾯直接看runInternal代码:
privateCommandProcessorResponrunInternal(Stringcommand,booleanalreadyCompiled)
throwsCommandNeedRetryException{
...
if(requiresLock()){
//acheckpointtoeifthethreadisinterruptedornotbeforeanexpensiveoperation
if(isInterrupted()){
ret=handleInterruption("atacquiringthelock.");
}el{
ret=acquireLocksAndOpenTxn(startTxnImplicitly);
}
...
privatebooleanrequiresLock(){
if(!checkConcurrency()){
returnfal;
}
//Lockoperationsthemlvesdon'trequirethelock.
if(isExplicitLockOperation()){
returnfal;
}
if(!lVar(conf,_LOCK_MAPRED_ONLY)){
returntrue;
}
Queue
(tTasks());
while(()!=null){
Task<?extendsSerializable>tsk=();
if(eLock()){
returntrue;
}
...
privatebooleancheckConcurrency(){
booleansupportConcurrency=lVar(_SUPPORT_CONCURRENCY);
if(!supportConcurrency){
("Concurrencymodeisdisabled,notcreatingalockmanager");
returnfal;
}
returntrue;
}
privateintacquireLocksAndOpenTxn(booleanstartTxnImplicitly){
...
eLocks(plan,ctx,urFromUGI);
...
runInternal会调⽤requiresLock判断是否需要lock,requiresLock有两个判断:
调⽤checkConcurrency,checkConcurrency会检查rency=true才需要lock;
调⽤eLock,只有部分task才需要lock;
如果判断需要lock,会调⽤acquireLocksAndOpenTxn,acquireLocksAndOpenTxn会调⽤eLocks来获取lock;
1)先看那些task需要lock:
anticAnalyzer
privatevoidanalyzeAlterTablePartMergeFiles(ASTNodeast,
StringtableName,HashMap
throwsSemanticException{
...
DDLWorkddlWork=newDDLWork(getInputs(),getOutputs(),mergeDesc);
dLock(true);
...
可见DDL操作需要;
2)再看怎样获取lock:
anager
publicvoidacquireLocks(QueryPlanplan,Contextctx,Stringurname)throwsLockException{
try{
acquireLocksWithHeartbeatDelay(plan,ctx,urname,0);
...
voidacquireLocksWithHeartbeatDelay(QueryPlanplan,Contextctx,Stringurname,longdelay)throwsLockException{
LockStatels=acquireLocks(plan,ctx,urname,true);
...
LockStateacquireLocks(QueryPlanplan,Contextctx,Stringurname,booleanisBlocking)throwsLockException{
...
switch(e()){
caDATABASE:
ame(aba().getName());
break;
caTABLE:
caDUMMYPARTITION://incaofdynamicpartitioninglockthetable
t=le();
ame(ame());
leName(leName());
break;
caPARTITION:
titionName(tition().getName());
t=tition().getTable();
ame(ame());
leName(leName());
break;
default:
//Thisisafileorsomethingwedon'tholdlocksfor.
continue;
}
...
LockStatelockState=((),queryId,isBlocking,locks);
eLocks(locks);
returnlockState;
}
可见当开启动态分区时,锁的粒度是DbName+TableName,这样就会导致多个sql只有1个sql可以拿到lock,其他sql只能等待;
三总结
解决问题的⽅式有⼏种:
1.关闭动态分区:ion=fal
2.关闭并发:rency=fal
3.关闭事务:r=xnManager
三者任选其⼀,推荐第1种,因为在刚才的场景下,不需要动态分区;
本文发布于:2022-11-22 17:26:23,感谢您对本站的认可!
本文链接:http://www.wtabcd.cn/fanwen/fan/90/514.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |