Towards Internet-scale Multi-view Stereo

更新时间:2023-06-25 02:57:17 阅读: 评论:0

Towards Internet-scale Multi-view Stereo
Yasutaka Furukawa1Brian Curless2Steven M.Seitz1,2Richard Szeliski3 1Google Inc.2University of Washington3Microsoft Rearch
Abstract
This paper introduces an approach for enabling exist-
炖牛骨头的做法ing multi-view stereo methods to operate on extremely large
unstructured photo collections.The main idea is to decom-
po the collection into a t of overlapping ts of photos
that can be procesd in parallel,and to merge the result-ing reconstructions.This overlapping clustering problem is formulated as a constrained optimization and solved it-eratively.The merging algorithm,designed to be parallel and out-of-core,incorporates robustfiltering steps to elim-inate low-quality reconstructions and enforce global visi-bility constraints.The approach has been tested on veral large datats downloaded ,including one with over ten thousand images,yielding a3D reconstruc-tion with nearly thirty million points.
1.Introduction
The state of the art in3D reconstruction from images has undergone a revolution in the last few years.Coupled with the explosion of imagery available online and advances in computing,we have the opportunity to run reconstruction algorithms at massive scale.Indeed,we can now attempt to reconstruct the entire ,every building,landscape, and(static)object that can be photographed.
张良怎么玩>职称论文格式The most important technological ingredients towards this goal are already in place.Matching , SIFT[17])provide accurate correspondences,structure-from-motion(SFM)algorithms u the correspondences to estimate preci camera po,and multi-view-stereo (MVS)methods take images with po as input and produce den3D models with accuracy nearly on par with lar scanners[22].Indeed,this type of pipeline has already been demonstrated by a few rearch groups[11,12,14,19], with impressive results.
To reconstruct everything,one key challenge is scala-bility.1In particular,how can we devi reconstruction al-gorithms that operate at ,on the millions of images available on Internet sites such ?
1There are other challenges such as handling complex BRDFs and lighting variations,which we do not address in this
paper.
Figure1.Our den reconstruction of Piazza San Marco(Venice) from13,703images with27,707,825reconstructed MVS points (further upsampled x9for high quality point-bad rendering).
Given recent progress on Internet-scale matching and SFM (notably Agarwal et al.’s Rome-in-a-day project[1]),we fo-cus our efforts in this paper on the last stage of the pipeline,
<,Internet-scale MVS.
MVS algorithms are bad on the idea of correlating measurements from veral images at once to derive3D surface information.Many MVS algorithms aim at recon-structing a global3D model by using all the images avail-able simultaneously[9,13,20,24].Such an approach is not feasible as the number of images grows.Instead,it becomes important to lect the right subt of images,and to cluster them into manageable pieces.
We propo a novel view lection and clustering scheme that allows a wide class of MVS algorithms to scale up to massive photo ts.Combined with a new merging method that robustlyfilte
rs out low-quality or erroneous points,we demonstrate our approach running for thousands of images of large sites and one entire city.Our system is thefirst to demonstrate an unstructured MVS approach at city-scale.
We propo an overlapping view clustering problem[2], in which the goal is to decompo the t of input images into clusters that have small overlap.Overlap is important for the MVS problem,as a strict partition would undersam-ple surfaces near cluster boundaries.Once clustered,we apply a state-of-the-art MVS algorithm to reconstruct den 3D points,and then merge the resulting reconstructions into 1
a single den point-bad model.Robustfiltering algo-rithms are introduced to handle reconstruction errors and the vast variations in reconstruction quality that occur be-tween distant and nearby views of objects in Internet photo collections.Thefilters are designed to be out-of-core and parallel,in order to process a large number of MVS points efficiently.We show visualizations of models containing tens of millions of points(e Figure1).
1.1.Related Work
Scalability has rarely been a consideration in prior MVS algorithms,as prior datats have been eithe
高兴的近义词r relatively small[22]or highly ,a video quence which can be decompod into short time intervals[19]).
Nevertheless,some algorithms lend themlves naturally to parallelization.In particular,veral algorithms operate by solving for a depth map for each image,using a local neighborhood of nearby images,and then merge the result-ing reconstructions[11,12,18,19].Each depth map can be computed independently and in parallel.However,the depth maps tend to be noisy and highly redundant,leading to wasted computational effort.Therefore,the algorithms typically require additional post-processing steps to clean up and merge the depth maps.
Many of the best performing MVS algorithms instead reconstruct a global3D model directly from the input im-ages[9,13,20,24].Global methods can avoid redun-dant computations and often do not require a clean-up post-process,but scale poorly.One exception is Jancok et al.
[14]who achieve scalability by designing the algorithm out-of-core.However,this is a quential algorithm.In contrast, we ek an out-of-core algorithm that is also parallelizable.
With depth-map bad MVS algorithms,veral authors have succeeded in large-scale MVS reconstructions[18, 19].Pollefeys et al.[19]prent a real-time MVS sys-tem for long image quence
s.They estimate a depth map for each input image,reduce noi by fusing nearby depth maps,and merge the resulting depth maps into a single mesh model.Micusik et al.[18]propo a piece-wi planar depth map computation algorithm with very similar clean-up and merging steps.However,both methods have been tested only on highly structured,street-view datats obtained by a video camera mounted on a moving van,and not the unstructured photo collections that we consider in this paper,which po additional challenges.
Besides scalability,variation in reconstruction quality is another challenge in handling large unorganized image col-lections,as surfaces may be imaged from both clo up and far away.Goele et al.[12]propod thefirst MVS method applied to Internet photo collections,which handles varia-tion in image sampling resolutions by lecting images with the most compatible resolution.Gallup et al.[10]lect images at different balines and image resolutions to con-
SFM points
{P1,P2, ...}
Images
{I1,I2, ...}
Image clusters
{C1,C2, ...}
Figure2.Our view clustering algorithm takes images{I i},SFM points{P j},and their associated visibility information{V j},then produces overlapping image clusters{C k}.
trol depth accuracy.Both of the methods handle variation by lecting images prior to reconstruction.The tech-niques may be ud in conjunction with the methods pro-pod here,but the major difference in our work is that we also handle the variation in a post-processing step,when merging reconstructions.We note that some prior depth map merging algorithms take into account estimates of ,by taking weighted combinations of depth samples to recover a mesh[4,25].While such approaches can handle noi variation,wefind they do not perform well for large Internet photo collections,where resolution varia-tion is a major factor,becau combining high and low reso-lution geometries in the standard ways will tend to attenuate high resolution detail.We i
nstead propo a simple merg-ing strategy thatfilters out low resolution geometry,which we have found to be robust and well-tailored to recovering a point-bad model as output.
The rest of the paper is organized as follows.The view-clustering algorithm is explained in Section2,and details of the MVS point merging and rendering are given in Sec-tion3.Experimental results are provided in Section4and we conclude the paper in Section5.Our implementation of the propod view-clustering algorithm is available at[6].
2.View Clustering
We assume that our input images{I i}have been pro-cesd by an SFM algorithm to yield camera pos and a spar t of3D points{P j},each of which is visible in a t of images denoted by V j.We treat the SFM points as spar samples of the den reconstruction that MVS will produce.As such,they can be ud as a basis for view clus-tering.More specifically,the goal of view clustering is to find(an unknown number of)overlapping image clusters {C k}such that each cluster is of manageable size,and each SFM point can be accurately reconstructed by at least one of the clusters(e Figure2).
2.1.Problem Formulation
The clustering formulation is designed to satisfy the fol-lowing three constraints:(1)redundant images are excluded from the clusters(compactness),(2)each cluster is small enough for an MVS reconstruction(size constraint);and
八本阵型
(3)MVS reconstructions from the clusters result in min-imal loss of content and detail compared to that obtainable by processing the full image t (coverage ).Compactness is important for computational efficiency but also to im-prove accuracy,as Internet photo collections often contain hundreds or thousands of photos acquired from nearly the same viewpoint,and a cluster consisting entirely of near-duplicate views will yield a noisy reconstruction due to in-sufficient baline.More concretely,our objective is to minimize the total number of images  k |C k |in the output clusters,subject to the following two constraints.The first is an upper bound on the size of each cluster so that an MVS algorithm can be ud for each cluster independently:∀k,|C k |≤α.αis de-termined by computational resources,particularly memory limitations.
The cond encourages the coverage of the final MVS reconstructions as follows.We say an SFM point P j is cov-ered if it is sufficiently well reconstructed by the cameras in at least one cluster C k .To quantify this notion of “well-reconstructed,”we introduce a function f (P,C )that mea-sures the expected reconstruction accuracy achieved at a 3D location P by a t of images C .This function de
pends on the camera balines and pixel sampling rates (e the Ap-pendix for our definition of f ).We say that P j is covered if its reconstruction accuracy in at least one of the clusters C k is at least λtimes f (P j ,V j ),which is the expected accuracy obtained when using all of P j ’s visible images V j :P j is covered if max k
f (P j ,C k ∩V j )≥λf (P j ,V j ),
where λ=0.7in our experiments.The coverage constraint is that for each t of SFM points visible in one image,the ratio of covered points must be at least δ(also t to 0.7,in our experiments).Note that we enforce this coverage ratio on each image,instead of on the entire reconstruction,to encourage good spatial coverage and uniformity.
In summary,our overlapping clustering formulation is defined as follows:
Minimize
k
|C k |subject to (compactness )•∀k |C k |≤α,(size )•∀i
{#of covered points in I i }
{#of points in I i }
≥δ.
(coverage )
There are a couple of points worth noting about this for-mulation.First,the minimization caus redundant images to be discarded,whenever constraints can be achieved with a smaller t of images.Second,the propod formulation automatically allows overlapping clusters.Finally,the for-mulation implicitly incorporates image quality factors (e.g.,nsor noi,blur,poor exposure),as poor quality images have fewer SFM points,and are thus more costly to include with the coverage constraint.
Figure 3.View clustering algorithm consists of four steps,where the last two steps are iterated until all
the constraints are satisfied.
2.2.View Clustering Algorithm
Solving the propod clustering problem is challenging,becau the constraints are not in a form readily handled by existing methods like k-means,normalized cuts [16,23],etc.Before prenting our algorithm,we first introduce some neighborhood relations for images and SFM points.A pair of images I l and I m are defined to be neighbors if there exists an SFM point that is visible in both images.Similarly,a pair of image ts are neighbors if there exists a pair of images (one from each t)that are neighbors.Fi-nally,a pair of SFM points P j and P k are defined to be neighbors if 1)they have similar visibility,that is,their vis-ible image ts V j and V k are neighbors according to the above definition,and 2)the projected locations of P j and P k are within τ1pixels in every image in (V j ∪V k ),where τ1=64is ud.
Figure 3provides an overview of our approach,which consists of four steps.The first two steps are pre-processing,while the last two steps are repeated in an iterative loop.1.SFM filter –merging SFM points:Having accurate measures of point visibility is key to the success of our view clustering procedure.Undetected or unmatched image fea-tures lead to errors in the point visibility estimates V
j (typ-ically in the form of missing images).We obtain more reli-able visibility estimates by aggregating visibility data over a local neighborhood,and merging points in that neighbor-hood.The position of the merged point is the average of its neighbors,while the visibility becomes the union.This step also significantly reduces the number of SFM points and improves running time of the remaining three steps.Specif-ically,starting from a t of SFM points,we randomly lect one point,merge it with its neighbors,output the merged point,and remove both the point and its neighbors from the input t.We repeat the procedure until the input t is empty.The t of merged points becomes the new point t,which,with some abu of notation,is also denoted as {P j }.2See Figure 4for a sample output of this step.
2.Image lection –removing redundant images:Start-2An
even better approach would be to re-detect and match new image
features to improve visibility information as in [8].However,this algo-rithm would be significantly more expensive.
View points
View point clusters
SFM points
Merged SFM points
Figure 4.Top:The first step of our algorithm is to merge SFM points to enrich visibility information (SFM filter).Bottom:Sam-ple results of our view clustering algorithm.View points belonging to extracted clusters are illustrated in different colors.
ing with the full image t,we test each image and remove it if the coverage constraint still holds after the removal.The removal test is performed for all the images enumerated in increasing order of image resolution (#of pixels),so that low-resolution images are removed first.Note that images are permanently discarded in this step to speed up the fol-lowing main optimization steps.
3.Cluster division –enforcing the size constraint:Next,we enforce the size constraint by splitting clusters,while ig-noring coverage.More concretely,we divide an image clus-ter into smaller components if it violates the size constraint.The division of a cluster is performed by the Normalized-Cuts algorithm [23]on a visibility graph,where nodes are images.The edge weight e lm between an image pair (I l ,I m )measures how much I l and I m together contribute to MVS reconstruction at relevant SFM points:e lm = P j ∈Θlm f (P j ,{I l ,I m })
f (P j ,V j )
,where Θlm denotes a t of SFM points visible in both I l and I m .Intuitively,images with high MVS contribution have high edge weights among them and are less likely to be cut.The division of a cluster repeats until the size constraint is satisfied for all the clusters.
4.Image addition –enforcing coverage:The coverage constraint may have been violated in step 3,and we now add images to each cluster in order to cover more SFM points and reestablish coverage.In this step,we first construct a list of possible actions,where each action measures the ef-fectiveness of adding an image to a cluster to increa cov-erage.More concretely,for each uncovered SFM point P j ,let C k =argmax C l f (P j ,C l )be the cluster with the max-imum reconstruction accuracy.Then,for P j ,we create an
action {(I →C k ),g }that adds image I (∈V j ,/∈
C k )to C k ,where g measures the effectiveness and is defined as f (P j ,C k ∪{I })−f (P j ,C k ).Note that we only consider actions that add images to C k instead of every cluster that could cover P j for computational efficiency.Since actions with the same image and cluster are generated from multi-ple SFM points,we merge such actions while summing up
the measured effectiveness g .Actions in the list are sorted in a decreasing order of their effectiveness.
Having constructed an action list,one approach would be to take the action with the highest score,then recompute the list again,which is computationally too expensive.In-stead,we consider actions who scores are more than 0.7times the highest score in the list,then repeat taking an ac-tion from the top of the list.Since an action may change the effectiveness of other similar actions,after taking one action,we remove any conflicting ones from the list,where two actions {(I →C ),g },{(I  →C  ),g  }are conflicting if I and I  are neighbors.The list construction and image addition repeat until the coverage constraint is satisfied.After the image addition,the size constraint may be vi-olated,and the last two steps are repeated until both con-straints are satisfied.
We note that the size and coverage constraints are not difficult to satisfy;indeed,an extreme solution is to creat-ing a small cluster for each SFM point with sufficient ba-line/resolution.In this extreme ca,the resulting clusters will likely contain many duplicates and therefore have a poor compactness score.Typically,our approach of split-ting clusters then adding a few images (usually at bound-aries)tends to rapidly and easily satisfy the constraints while achieving reasonable (though not optimal)compact-ness scores;it terminates in a couple of iterations in all of our experiments.Whil
e our approach is not globally op-timal,we note that achieving optimal compactness is not critical for our application.
3.MVS Filtering and Rendering
Having extracted image clusters,Patch-bad MVS soft-ware (PMVS)by Furukawa et al .[7]is ud to reconstruct 3D points for each cluster independently.Any MVS algo-rithm could be ud,but we cho PMVS,which is publicly available.In this ction,we propo two filters that are ud in merging reconstructed points to handle reconstruc-tion errors and variations in reconstruction quality (e Fig-ures 5and 6).Our filtering algorithms are designed to be out-of-core and operate in parallel to handle a large number of MVS points efficiently.We now describe the two fil-tering algorithms,discuss their scalability,and explain how merged MVS points are rendered.
3.1.Quality Filter
The same surface region may be reconstructed in mul-tiple clusters with varying reconstruction quality:nearby clusters produce den,accurate points,while distant clus-ters produce spar,noisy points.We want to filter out the latter,which is realized by the following quality filter .Let P j and V j denote an MVS point and its visibility informa-tion estimated by the MVS algorithm,respectively.Sup-
Quality Filter
For each reference cluster C k on each node
Initialize histograms of MVS points
reconstructed from C k;
For each cluster C l
For each point P in C k
Compute histogram entry H l for P;
Filter out points in C k using histogram;
绵怎么组词Visibility Filter
For each reference cluster C k on each node
For each image I in C k
Compute a depth-map for I;
For each remaining cluster C l
For each point P in C l
For each image I in C k
If P and I conflicts
Increment conflict count for P;
Save conflict counts for C l to file;
For each cluster C k on each node
Read files containing conflict counts for C k;
Filter out points in C k with conflict counts; Figure5.Quality and visibilityfilters are ud in merging MVS reconstructions,where bothfilters are designed to be out-of-core as well as parallel.Left:MVS point P is tested against thefilters.Right:Pudo codes,where loops highlighted in blue can be executed in parallel.
po P j has been reconstructed from cluster C k(a refer-
ence cluster).Wefirst collect MVS points{Q m}and their
associated visibility information{V m}from all the clusters
1)that have compatible normals with P ,angle dif-
ference being less than90◦;and2)who projected loca-
tions are withinτ2pixels from that of P j in every image
in V j(τ2=8in our experiments).From the collected
MVS points,we compute a histogram{H l},where H l is
the sum of reconstruction accuracies f(Q m,V m)associated
with MVS points reconstructed from C l.Since a cluster
with accurate and den points should have a significantly
larger value than the others,P j isfiltered out if the cor-
经常吃紧急避孕药有什么危害responding histogram value H k is less than half the max-
imum:H k<0.5max l H l.We repeat this procedure by
examining each reference cluster in turn,which can be exe-
cuted in parallel.
3.2.Visibility Filter
The visibilityfilter enforces consistency in visibility in-
formation associated with MVS points over the entire re-
construction.Thefilter is,
in
fact,
very
similar
to
the one
ud in PMVS
[7,
9].The difference is that PMVS enforces
the intra-cluster consistency inside each cluster,while our
filter enforces inter-cluster visibility consistency over an en-
tire reconstruction by comparing PMVS outputs from all the
clusters.More concretely,for each MVS point,we count
the number of times it conflicts with reconstructions from
other clusters.The point isfiltered out if the conflict count
is more than three.Conflict counts are computed as follows.
LetΘk denote a t of MVS points reconstructed from
cluster C k(a reference cluster).We construct depths maps
for images in C k by projectingΘk into their visible images.we卷毛
Depth maps also store reconstruction accuracies associated
with MVS points.3We compute the conflict count of each
MVS point P in non-reference clusters as the number of
depth maps in C k that conflict with P.P is defined to con-
flict with a depth map if P is clor to the camera than the
3PMVS recovers a point for every abutting t of2×2pixels in our
tting,and depth maps are computed at half resolution.For an image
belonging to multiple clusters,multiple depth maps are computed.
Before MVS Filters After MVS Filters
Figure6.MVS points reconstructed from three view clusters be-
fore and after the MVSfilters for St Peter’s Basilica.Before the
filtering,MVS points have large overlaps.Each cluster has its
own3D space,highlighted in color rectangles,where reconstruc-
tions become the most accurate over all the clusters.Points outside
such a space are mostly removed by ourfilters.
depth map by a small margin,and the reconstruction ac-
curacy of P is less than half the value stored in the depth
map.Note that we repeat this procedure by changing the
reference cluster one by one,which can again be executed
in parallel.The conflict count of the same MVS point is
computed multiple times from different executions of this
step,and their sum is tested against the threshold.
3.3.Scalability
MVS reconstruction andfiltering are the most computa-
tionally expensive and memory intensive parts of our sys-
tem.Here,we focus on memory complexity,which is the
more critical factor for scalability.4
The memory expen of the MVS reconstruction step
depends on the choice of the core MVS algorithm,but is
not an issue with our system,becau the number of im-
4Memory consumption is more critical for MVS,becau PMVS(and
some other MVS algorithms[12])can utilize SFM visibility information
to restrict ts of images to be matched instead of exhaustively trying every
possible pair.Therefore,running-time of such algorithms is more or less
linear in the amount of surface area to be reconstructed,while memory
limitation is an unavoidable issue.

本文发布于:2023-06-25 02:57:17,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/1033277.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:骨头   职称论文   阵型   避孕药   做法
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图