sgbm算法_OpenCV源代码分析——SGBM
上⼀篇看了较为简单的BM算法,后⾯⼜看了下opencv⾥的SGBM算法。之所以叫SGBM是因为opencv并没有使⽤MI作为匹配代价,⽽是仍然使⽤了块匹配的⽅法,相关cost的度量为Birchfield-Tomasi metric。⽽且opencv提供了多种cost aggregation的⽅式,包括只使⽤3个、5个或全部8个⽅向的⽅法。总体上的实现也⽐较直观,结合论⽂也⽐较好懂。
对于理论就不再赘述了,需要的话可以直接参考我之前的⽂章
David LEE:⼀⽂读懂经典双⽬稠密匹配算法SGM z
这⾥就直接探讨opencv中关于SGBM的源代码。
/*
computes disparity for "roi" in img2 and write it to disp1buf.
督查报告that is, disp1buf(x, y)=d means that img1(x+roi.x, y+roi.y) ~ img2(x+roi.x-d, y+roi.y).
minD <= d < maxD.
disp2full is the rever disparity map, that is:
disp2full(x+roi.x,y+roi.y)=d means that img2(x+roi.x, y+roi.y) ~ img1(x+roi.x+d, y+roi.y)
note that disp1buf will have the same size as the roi and
disp2full will have the same size as img1 (or img2).
On exit disp2buf is not the final disparity, it is an intermediate result that becomes
final after all the tiles are procesd.
the disparity in disp1buf is written with sub-pixel accuracy
(4 fractional bits, e StereoSGBM::DISP_SCALE),
using quadratic interpolation, while the disparity in disp2buf
is written as is, without interpolation.
disp2cost also has the same size as img1 (or img2).
It contains the minimum current cost, ud to find the best disparity, corresponding to the minimal cost.
*/
static void computeDisparitySGBM( const Mat& img1, const Mat& img2,
Mat& disp1, const StereoSGBMParams& params,
Mat& buffer )
{
const int ALIGN = 16;
const int DISP_SHIFT = StereoMatcher::DISP_SHIFT;
const int DISP_SCALE = (1 << DISP_SHIFT);
const CostType MAX_COST = SHRT_MAX;
int minD = params.minDisparity, maxD = minD + params.numDisparities;
Size SADWindowSize;
SADWindowSize.width = SADWindowSize.height = params.SADWindowSize > 0 ? params.SADWindowSize : 5;
int ftzero = std::max(params.preFilterCap, 15) | 1;
int uniquenessRatio = params.uniquenessRatio >= 0 ? params.uniquenessRatio : 10;
int disp12MaxDiff = params.disp12MaxDiff > 0 ? params.disp12MaxDiff : 1; // consisit check 所允许的最⼤差异
int P1 = params.P1 > 0 ? params.P1 : 2, P2 = std::max(params.P2 > 0 ? params.P2 : 5, P1+1);
int k, width = ls, height = ws;
int minX1 = std::max(maxD, 0), maxX1 = width + std::min(minD, 0);
int D = maxD - minD, width1 = maxX1 - minX1; // 视差值的有效搜索范围
int INVALID_DISP = minD - 1, INVALID_DISP_SCALED = INVALID_DISP*DISP_SCALE;
int INVALID_DISP = minD - 1, INVALID_DISP_SCALED = INVALID_DISP*DISP_SCALE;
int SW2 = SADWindowSize.width/2, SH2 = SADWindowSize.height/2;
bool fullDP = de == StereoSGBM::MODE_HH;
int npass = fullDP ? 2 : 1;
const int TAB_OFS = 256*4, TAB_SIZE = 256 + TAB_OFS*2;
PixType clipTab[TAB_SIZE];
/
/ 同stereoBM, xsobel索引表
for( k = 0; k < TAB_SIZE; k++ )
clipTab[k] = (PixType)(std::min(std::max(k - TAB_OFS, -ftzero), ftzero) + ftzero);
if( minX1 >= maxX1 )
{
disp1 = Scalar::all(INVALID_DISP_SCALED);
return;
}
CV_Asrt( D % 16 == 0 );
// NR - the number of directions. the loop on x below that computes Lr assumes that NR == 8.
// if you change NR, plea, modify the loop as well.
int D2 = D+16, NRD2 = NR2*D2;
// the number of L_r(.,.) and min_k L_r(.,.) lines in the buffer:
// for 8-way dynamic programming we need the current row and
// the previous row, i.e. 2 rows in total
const int NLR = 2;
const int LrBorder = NLR - 1;
// for each possible stereo match (img1(x,y) <=> img2(x-d,y))
// we keep pixel difference cost (C) and the summary cost over NR directions (S).截偏旁
// we also keep all the partial costs for the previous line L_r(x,d) and also min_k L_r(x, k)
size_t costBufSize = width1*D;
size_t CSBufSize = costBufSize*(fullDP ? height : 1); // cost aggregation 采⽤8个⽅向的话则需要WHD⼤⼩的内存
size_t minLrSize = (width1 + LrBorder*2)*NR2, LrSize = minLrSize*D2; // minLr 存储最优视差对应的最⼩cost,⼤⼩为有效宽度(再加2)乘以8个⽅向
// Lr 在此基础上需要存储160个视差值所对应的全部cost。
// DP 算法每次只⽤到上⼀⾏或下⼀⾏的信息,因此只保留两⾏的数据,最后乘以NR2
int hsumBufNRows = SH2*2 + 2; // ⼀个匹配窗⼝所包含的⾏数再加1,⽅便滑动窗⼝法每次多算⼀⾏
size_t totalBufSize = (LrSize + minLrSize)*NLR*sizeof(CostType) + // minLr[] and Lr[]
costBufSize*(hsumBufNRows + 1)*sizeof(CostType) + // hsumBuf, pixdiff // costBufSize * 1 是pixeldiff的内存⼤⼩,⼀⾏各个像素的所有可能视差值对应的cost // costBufSize * hsumBufNRows 是hsumBuf的内存⼤⼩,滑动窗⼝内各⾏的像素对应的所有视差值的cost
CSBufSize*2*sizeof(CostType) + // C, S // C 和S 都需要保存WHD个计算结果
width*16*img1.channels()*sizeof(PixType) + // temp buffer for computing per-pixel cost
width*(sizeof(CostType) + sizeof(DispType)) + 1024; // disp2cost + disp2
if( pty() || !buffer.isContinuous() ||
// summary cost over different (nDirs) directions
CostType* Cbuf = (CostType*)alignPtr(buffer.ptr(), ALIGN); // C
CostType* Sbuf = Cbuf + CSBufSize; // S
CostType* hsumBuf = Sbuf + CSBufSize;
CostType* pixDiff = hsumBuf + costBufSize*hsumBufNRows;
CostType* disp2cost = pixDiff + costBufSize + (LrSize + minLrSize)*NLR; // 预留给hsumBuf, Lr[]和minLr[]
DispType* disp2ptr = (DispType*)(disp2cost + width);
PixType* tempBuf = (PixType*)(disp2ptr + width);
// add P2 to every C(x,y). it saves a few operations in the inner loops
for(k = 0; k < (int)CSBufSize; k++ )
Cbuf[k] = (CostType)P2;
for( int pass = 1; pass <= npass; pass++ )
{
int x1, y1, x2, y2, dx, dy;
int x1, y1, x2, y2, dx, dy;
if( pass == 1 ) // 正向遍历,先计算正向遍历可以计算的4个⽅向的Lr和minLr,并保存结果在C和S中
{
y1 = 0; y2 = height; dy = 1;
mg42机枪x1 = 0; x2 = width1; dx = 1;
}
el // 逆序遍历
{
小麻球
y1 = height-1; y2 = -1; dy = -1;
x1 = width1-1; x2 = -1; dx = -1;
}
// 处理⽅向变化时重新分配指针
CostType *Lr[NLR]={0}, *minLr[NLR]={0};
for( k = 0; k < NLR; k++ )
{
// shift Lr[k] and minLr[k] pointers, becau we allocated them with the borders,
/
/ and will occasionally u negative indices with the arrays
// we need to shift Lr[k] pointers by 1, to give the space for d=-1.
// however, then the alignment will be imperfect, i.e. bad for SSE,
// thus we shift the pointers by 8 (8*sizeof(short) == 16 - ideal alignment)
Lr[k] = pixDiff + costBufSize + LrSize*k + NRD2*LrBorder + 8;
// 8是为了s优化,NRD2 * LrBorder 是边界⼀个像素8个⽅向的所有视差值对应的⼤⼩
memt( Lr[k] - LrBorder*NRD2 - 8, 0, LrSize*sizeof(CostType) );
minLr[k] = pixDiff + costBufSize + LrSize*NLR + minLrSize*k + NR2*LrBorder;
memt( minLr[k] - LrBorder*NR2, 0, minLrSize*sizeof(CostType) );
}
for( int y = y1; y != y2; y += dy )
{
int x, d;
DispType* disp1ptr = disp1.ptr<DispType>(y); // 视差图第y⾏
CostType* C = Cbuf + (!fullDP ? 0 : y*costBufSize); // 跳过前y⾏
CostType* S = Sbuf + (!fullDP ? 0 : y*costBufSize);
if( pass == 1 ) // compute C on the first pass, and reu it on the cond pass, if any.
{
int dy1 = y == 0 ? 0 : y + SH2, dy2 = y == 0 ? SH2 : dy1;
对方的英文for( k = dy1; k <= dy2; k++ ) // y = 0,先把滑动窗⼝第上半部分计算好存储在hsumBuf中
// y != 0,计算y + SH2⾏
{
/
/ 类似于stereoBM,保存并复⽤计算结果
CostType* hsumAdd = hsumBuf + (std::min(k, height-1) % hsumBufNRows)*costBufSize;
if( k < height )
{
// 计算第k⾏,每个像素的所有视差值的cost
calcPixelCostBT( img1, img2, k, minD, maxD, pixDiff, tempBuf, clipTab, TAB_OFS, ftzero );
memt(hsumAdd, 0, D*sizeof(CostType));
for( x = 0; x <= SW2*D; x += D ) // 累加左半边窗⼝的cost
{
int scale = x == 0 ? SW2 + 1 : 1; // 第⼀⾏要加SW2 + 1次,因为第⼀个窗⼝从第⼀列开始,取不到前⾯的列就需要累加第⼀列 // 后⾯相减 (pixSub) 也是⼀样的
for( d = 0; d < D; d++ )
hsumAdd[d] = (CostType)(hsumAdd[d] + pixDiff[x + d]*scale); // 累加当前⾏前SW2 + 1个像素每个视差值的cost
}
if( y > 0 )
{
// 滑动窗⼝法
const CostType* hsumSub = hsumBuf + (std::max(y - SH2 - 1, 0) % hsumBufNRows)*costBufSize; // 所要减去的对应⾏
const CostType* Cprev = !fullDP || y == 0 ? C : C - costBufSize; // 上⼀⾏的C
const CostType* Cprev = !fullDP || y == 0 ? C : C - costBufSize; // 上⼀⾏的C
for( x = D; x < width1*D; x += D )
{
const CostType* pixAdd = pixDiff + std::min(x + SW2*D, (width1-1)*D); // 所要加上和减去的列
const CostType* pixSub = pixDiff + std::max(x - (SW2+1)*D, 0);
{
for( d = 0; d < D; d++ )
{
// 滑动窗⼝法
int hv = hsumAdd[x + d] = (CostType)(hsumAdd[x - D + d] + pixAdd[d] - pixSub[d]);
// 当前⾏某个像素视差为d的cost为上⼀⾏对应的cost加上该像素匹配的cost
C[x + d] = (CostType)(Cprev[x + d] + hv - hsumSub[x + d]);
}
}
}
}
el
{
for( x = D; x < width1*D; x += D ) // 开始对第0⾏进⾏计算,直到最后⼀个像素
{
const CostType* pixAdd = pixDiff + std::min(x + SW2*D, (width1-1)*D); // 前⾯累加了当前⾏前SW2个像素,现在从第SW2 + 1个像素开始累加 const CostType* pixSub = pixDiff + std::max(x - (SW2+1)*D, 0); // 从-SW2列开始减去
for( d = 0; d < D; d++ ) // 当前像素y = 0, x = x / D 的所有视差值
hsumAdd[x + d] = (CostType)(hsumAdd[x - D + d] + pixAdd[d] - pixSub[d]); // 滑动窗⼝,列⽅向上减去上⼀列,加上下⼀列
}
}
}
if( y == 0 ) // 针对第⼀⾏,初始化C
// 后⾯⾏的C 都需要前⼀⾏的信息
{
int scale = k == 0 ? SH2 + 1 : 1; // 第⼀⾏需要多加8次,因为后⾯的窗⼝从第⼀⾏开始,取不到前⾯的⾏就需要累加第⼀⾏的值
// 后⾯相减 (pixSub) 也是⼀样的
for( x = 0; x < width1*D; x++ )
C[x] = (CostType)(C[x] + hsumAdd[x]*scale);
}
}
// also, clear the S buffer
for( k = 0; k < width1*D; k++ )
S[k] = 0;
}
// clear the left and the right borders
// 置0处理,⼀⽅⾯防⽌后⾯的数组越界,⼀⽅⾯⽅便后续累加
memt( Lr[0] - NRD2*LrBorder - 8, 0, NRD2*LrBorder*sizeof(CostType) );
memt( Lr[0] + width1*NRD2 - 8, 0, NRD2*LrBorder*sizeof(CostType) );
memt( minLr[0] - NR2*LrBorder, 0, NR2*LrBorder*sizeof(CostType) );
memt( minLr[0] + width1*NR2, 0, NR2*LrBorder*sizeof(CostType) );
/
*
[formula 13 in the paper]
compute L_r(p, d) = C(p, d) +
min(L_r(p-r, d),
L_r(p-r, d-1) + P1,
L_r(p-r, d+1) + P1,
min_k L_r(p-r, k) + P2) - min_k L_r(p-r, k)
where p = (x,y), r is one of the directions.
we process all the directions at once:
0: r=(-dx, 0)
1: r=(-1, -dy)
2: r=(0, -dy)
3: r=(1, -dy)
4: r=(-2, -dy)
4: r=(-2, -dy)
5: r=(-1, -dy*2)
6: r=(1, -dy*2)
7: r=(2, -dy)
*/
// y 处理完,开始处理x
for( x = x1; x != x2; x += dx ) // 注意pass = npass (2) 时为逆序
{
int xm = x*NR2, xd = xm*D2;
// 待处理的4个⽅向的前⼀个像素与当前像素视差值相差⼤于2时的cost
// 待处理的4个⽅向对英语官⽅注释中的0~3。注意dx, dy的符号
int delta0 = minLr[0][xm - dx*NR2] + P2, delta1 = minLr[1][xm - NR2 + 1] + P2;
int delta2 = minLr[1][xm + 2] + P2, delta3 = minLr[1][xm + NR2 + 3] + P2;
// 待处理的4个⽅向的Lr的指针
// NRD2为视差范围*⽅向数,减去后对应前⼀个像素所有8个⽅向的所有视差值所对应的Lr值。Lr从第0个⽅向开始存储
// D2的值为视差范围,每加上⼀个D2就意味着跳过⼀个⽅向
// 与前⾯的delta对应,但数组下标差D2倍,因为minLr存储的是8个⽅向的Lr的8个最⼩值,⽽Lr存储 8*视差范围的所有值
/* 1 2 3
0 x 0
1 2 3 */
CostType* Lr_p0 = Lr[0] + xd - dx*NRD2; // pass = 1, dx = 1,为当前⾏上⼀列像素Lr的第0个⽅向
// pass = 2, dx = -1,为当前⾏下⼀列像素Lr的第0个⽅向
CostType* Lr_p1 = Lr[1] + xd - NRD2 + D2; // pass = 1,上⼀⾏上⼀列像素的Lr的第1个⽅向
陈皮泡水功效与作用// pass = 2,下⼀⾏上⼀列像素的Lr的第1个⽅向
CostType* Lr_p2 = Lr[1] + xd + D2*2; // pass = 1,上⼀⾏当前列像素Lr的第2个⽅向
// pass = 2,下⼀⾏当前列像素Lr的地2个⽅向
CostType* Lr_p3 = Lr[1] + xd + NRD2 + D2*3; // pass = 1,上⼀⾏下⼀列像素Lr的第3个⽅向
// pass = 2,下⼀⾏下⼀列像素Lr的第3个⽅向
// 解释:pass = 1时好理解。当pass = 2时,从最后⼀⾏的最后⼀列开始逆序处理。由于最后⼀⾏只能
在正向遍历时计算出前4个⽅向的Lr值, // 因此再处理倒数第⼆⾏时,只能访问下⼀⾏当前4个⽅向的Lr。
// 当pass = 2时,第⼀个⽅向Lr_p0,正序处理时它应该是上⼀列的像素,⽽逆序处理时应该是下⼀列的像素,因此由dx控制⽅向。
// ⽽⾏⽅向上上下⾏的差异在处理完分别交换Lr, minLr内的指针时就处理好了。
// 之后,倒数第⼆⾏开始逆序处理时,其计算结果也更新在了Lr的前4个⽅向中,相当于把正序处理时的结果覆盖了。
// 但因为⽤不到了,只要S能够正确累加被覆盖也没有关系
Lr_p0[-1] = Lr_p0[D] = Lr_p1[-1] = Lr_p1[D] =
Lr_p2[-1] = Lr_p2[D] = Lr_p3[-1] = Lr_p3[D] = MAX_COST; // 将多申请的前后两个视差值置为最⼤cost,为后⾯循环中的数组越界做准备
CostType* Lr_p = Lr[0] + xd; // 将指针调节⾄当前像素
const CostType* Cp = C + x*D;
CostType* Sp = S + x*D;
{
int minL0 = MAX_COST, minL1 = MAX_COST, minL2 = MAX_COST, minL3 = MAX_COST;
for( d = 0; d < D; d++ )
{
// 4个⽅向的相同disp的cost 和不同disp加上惩罚值后的cost 的计算
int Cpd = Cp[d], L0, L1, L2, L3;
L0 = Cpd + std::min((int)Lr_p0[d], std::min(Lr_p0[d-1] + P1, std::min(Lr_p0[d+1] + P1, delta0))) - delta0;
L1 = Cpd + std::min((int)Lr_p1[d], std::min(Lr_p1[d-1] + P1, std::min(Lr_p1[d+1] + P1, delta1))) - delta1;
L2 = Cpd + std::min((int)Lr_p2[d], std::min(Lr_p2[d-1] + P1, std::min(Lr_p2[d+1] + P1, delta2))) - delt
a2;
L3 = Cpd + std::min((int)Lr_p3[d], std::min(Lr_p3[d-1] + P1, std::min(Lr_p3[d+1] + P1, delta3))) - delta3;
// 存储对应的Lr值并获取各个⽅向的minLr
Lr_p[d] = (CostType)L0;
混凝土养生minL0 = std::min(minL0, L0);
Lr_p[d + D2] = (CostType)L1;
博物馆学minL1 = std::min(minL1, L1);
Lr_p[d + D2*2] = (CostType)L2;
minL2 = std::min(minL2, L2);