Towards Internet-scale Multi-view Stereo

更新时间:2023-06-25 02:57:17 阅读：评论：0

Yasutaka Furukawa1Brian Curless2Steven M.Seitz1,2Richard Szeliski3 1Google Inc.2University of Washington3Microsoft Rearch

Abstract

This paper introduces an approach for enabling exist-

炖牛骨头的做法ing multi-view stereo methods to operate on extremely large

unstructured photo collections.The main idea is to decom-

po the collection into a t of overlapping ts of photos

that can be procesd in parallel,and to merge the result-ing reconstructions.This overlapping clustering problem is formulated as a constrained optimization and solved it-eratively.The merging algorithm,designed to be parallel and out-of-core,incorporates robustﬁltering steps to elim-inate low-quality reconstructions and enforce global visi-bility constraints.The approach has been tested on veral large datats downloaded ,including one with over ten thousand images,yielding a3D reconstruc-tion with nearly thirty million points.

1.Introduction

The state of the art in3D reconstruction from images has undergone a revolution in the last few years.Coupled with the explosion of imagery available online and advances in computing,we have the opportunity to run reconstruction algorithms at massive scale.Indeed,we can now attempt to reconstruct the entire ,every building,landscape, and(static)object that can be photographed.

张良怎么玩>职称论文格式The most important technological ingredients towards this goal are already in place.Matching , SIFT[17])provide accurate correspondences,structure-from-motion(SFM)algorithms u the correspondences to estimate preci camera po,and multi-view-stereo (MVS)methods take images with po as input and produce den3D models with accuracy nearly on par with lar scanners[22].Indeed,this type of pipeline has already been demonstrated by a few rearch groups[11,12,14,19], with impressive results.

To reconstruct everything,one key challenge is scala-bility.1In particular,how can we devi reconstruction al-gorithms that operate at ,on the millions of images available on Internet sites such ?

1There are other challenges such as handling complex BRDFs and lighting variations,which we do not address in this

paper.

Figure1.Our den reconstruction of Piazza San Marco(Venice) from13,703images with27,707,825reconstructed MVS points (further upsampled x9for high quality point-bad rendering).

Given recent progress on Internet-scale matching and SFM (notably Agarwal et al.’s Rome-in-a-day project[1]),we fo-cus our efforts in this paper on the last stage of the pipeline,

<,Internet-scale MVS.

MVS algorithms are bad on the idea of correlating measurements from veral images at once to derive3D surface information.Many MVS algorithms aim at recon-structing a global3D model by using all the images avail-able simultaneously[9,13,20,24].Such an approach is not feasible as the number of images grows.Instead,it becomes important to lect the right subt of images,and to cluster them into manageable pieces.

We propo a novel view lection and clustering scheme that allows a wide class of MVS algorithms to scale up to massive photo ts.Combined with a new merging method that robustlyﬁlte

rs out low-quality or erroneous points,we demonstrate our approach running for thousands of images of large sites and one entire city.Our system is theﬁrst to demonstrate an unstructured MVS approach at city-scale.

We propo an overlapping view clustering problem[2], in which the goal is to decompo the t of input images into clusters that have small overlap.Overlap is important for the MVS problem,as a strict partition would undersam-ple surfaces near cluster boundaries.Once clustered,we apply a state-of-the-art MVS algorithm to reconstruct den 3D points,and then merge the resulting reconstructions into 1

a single den point-bad model.Robustﬁltering algo-rithms are introduced to handle reconstruction errors and the vast variations in reconstruction quality that occur be-tween distant and nearby views of objects in Internet photo collections.Theﬁlters are designed to be out-of-core and parallel,in order to process a large number of MVS points efﬁciently.We show visualizations of models containing tens of millions of points(e Figure1).

1.1.Related Work

Scalability has rarely been a consideration in prior MVS algorithms,as prior datats have been eithe

高兴的近义词r relatively small[22]or highly ,a video quence which can be decompod into short time intervals[19]).

Nevertheless,some algorithms lend themlves naturally to parallelization.In particular,veral algorithms operate by solving for a depth map for each image,using a local neighborhood of nearby images,and then merge the result-ing reconstructions[11,12,18,19].Each depth map can be computed independently and in parallel.However,the depth maps tend to be noisy and highly redundant,leading to wasted computational effort.Therefore,the algorithms typically require additional post-processing steps to clean up and merge the depth maps.

Many of the best performing MVS algorithms instead reconstruct a global3D model directly from the input im-ages[9,13,20,24].Global methods can avoid redun-dant computations and often do not require a clean-up post-process,but scale poorly.One exception is Jancok et al.

[14]who achieve scalability by designing the algorithm out-of-core.However,this is a quential algorithm.In contrast, we ek an out-of-core algorithm that is also parallelizable.

With depth-map bad MVS algorithms,veral authors have succeeded in large-scale MVS reconstructions[18, 19].Pollefeys et al.[19]prent a real-time MVS sys-tem for long image quence

s.They estimate a depth map for each input image,reduce noi by fusing nearby depth maps,and merge the resulting depth maps into a single mesh model.Micusik et al.[18]propo a piece-wi planar depth map computation algorithm with very similar clean-up and merging steps.However,both methods have been tested only on highly structured,street-view datats obtained by a video camera mounted on a moving van,and not the unstructured photo collections that we consider in this paper,which po additional challenges.

Besides scalability,variation in reconstruction quality is another challenge in handling large unorganized image col-lections,as surfaces may be imaged from both clo up and far away.Goele et al.[12]propod theﬁrst MVS method applied to Internet photo collections,which handles varia-tion in image sampling resolutions by lecting images with the most compatible resolution.Gallup et al.[10]lect images at different balines and image resolutions to con-

SFM points

{P1,P2, ...}

Images

{I1,I2, ...}

Image clusters

{C1,C2, ...}

Figure2.Our view clustering algorithm takes images{I i},SFM points{P j},and their associated visibility information{V j},then produces overlapping image clusters{C k}.

trol depth accuracy.Both of the methods handle variation by lecting images prior to reconstruction.The tech-niques may be ud in conjunction with the methods pro-pod here,but the major difference in our work is that we also handle the variation in a post-processing step,when merging reconstructions.We note that some prior depth map merging algorithms take into account estimates of ,by taking weighted combinations of depth samples to recover a mesh[4,25].While such approaches can handle noi variation,weﬁnd they do not perform well for large Internet photo collections,where resolution varia-tion is a major factor,becau combining high and low reso-lution geometries in the standard ways will tend to attenuate high resolution detail.We i

nstead propo a simple merg-ing strategy thatﬁlters out low resolution geometry,which we have found to be robust and well-tailored to recovering a point-bad model as output.

The rest of the paper is organized as follows.The view-clustering algorithm is explained in Section2,and details of the MVS point merging and rendering are given in Sec-tion3.Experimental results are provided in Section4and we conclude the paper in Section5.Our implementation of the propod view-clustering algorithm is available at[6].

2.View Clustering

We assume that our input images{I i}have been pro-cesd by an SFM algorithm to yield camera pos and a spar t of3D points{P j},each of which is visible in a t of images denoted by V j.We treat the SFM points as spar samples of the den reconstruction that MVS will produce.As such,they can be ud as a basis for view clus-tering.More speciﬁcally,the goal of view clustering is to ﬁnd(an unknown number of)overlapping image clusters {C k}such that each cluster is of manageable size,and each SFM point can be accurately reconstructed by at least one of the clusters(e Figure2).

2.1.Problem Formulation

The clustering formulation is designed to satisfy the fol-lowing three constraints:(1)redundant images are excluded from the clusters(compactness),(2)each cluster is small enough for an MVS reconstruction(size constraint);and

八本阵型

(3)MVS reconstructions from the clusters result in min-imal loss of content and detail compared to that obtainable by processing the full image t (coverage ).Compactness is important for computational efﬁciency but also to im-prove accuracy,as Internet photo collections often contain hundreds or thousands of photos acquired from nearly the same viewpoint,and a cluster consisting entirely of near-duplicate views will yield a noisy reconstruction due to in-sufﬁcient baline.More concretely,our objective is to minimize the total number of images k |C k |in the output clusters,subject to the following two constraints.The ﬁrst is an upper bound on the size of each cluster so that an MVS algorithm can be ud for each cluster independently:∀k,|C k |≤α.αis de-termined by computational resources,particularly memory limitations.

The cond encourages the coverage of the ﬁnal MVS reconstructions as follows.We say an SFM point P j is cov-ered if it is sufﬁciently well reconstructed by the cameras in at least one cluster C k .To quantify this notion of “well-reconstructed,”we introduce a function f (P,C )that mea-sures the expected reconstruction accuracy achieved at a 3D location P by a t of images C .This function de

pends on the camera balines and pixel sampling rates (e the Ap-pendix for our deﬁnition of f ).We say that P j is covered if its reconstruction accuracy in at least one of the clusters C k is at least λtimes f (P j ,V j ),which is the expected accuracy obtained when using all of P j ’s visible images V j :P j is covered if max k

f (P j ,C k ∩V j )≥λf (P j ,V j ),

where λ=0.7in our experiments.The coverage constraint is that for each t of SFM points visible in one image,the ratio of covered points must be at least δ(also t to 0.7,in our experiments).Note that we enforce this coverage ratio on each image,instead of on the entire reconstruction,to encourage good spatial coverage and uniformity.

In summary,our overlapping clustering formulation is deﬁned as follows:

Minimize

|C k |subject to (compactness )•∀k |C k |≤α,(size )•∀i

{#of covered points in I i }

{#of points in I i }

≥δ.

(coverage )

There are a couple of points worth noting about this for-mulation.First,the minimization caus redundant images to be discarded,whenever constraints can be achieved with a smaller t of images.Second,the propod formulation automatically allows overlapping clusters.Finally,the for-mulation implicitly incorporates image quality factors (e.g.,nsor noi,blur,poor exposure),as poor quality images have fewer SFM points,and are thus more costly to include with the coverage constraint.

Figure 3.View clustering algorithm consists of four steps,where the last two steps are iterated until all

the constraints are satisﬁed.

2.2.View Clustering Algorithm

Solving the propod clustering problem is challenging,becau the constraints are not in a form readily handled by existing methods like k-means,normalized cuts [16,23],etc.Before prenting our algorithm,we ﬁrst introduce some neighborhood relations for images and SFM points.A pair of images I l and I m are deﬁned to be neighbors if there exists an SFM point that is visible in both images.Similarly,a pair of image ts are neighbors if there exists a pair of images (one from each t)that are neighbors.Fi-nally,a pair of SFM points P j and P k are deﬁned to be neighbors if 1)they have similar visibility,that is,their vis-ible image ts V j and V k are neighbors according to the above deﬁnition,and 2)the projected locations of P j and P k are within τ1pixels in every image in (V j ∪V k ),where τ1=64is ud.

Figure 3provides an overview of our approach,which consists of four steps.The ﬁrst two steps are pre-processing,while the last two steps are repeated in an iterative loop.1.SFM ﬁlter –merging SFM points:Having accurate measures of point visibility is key to the success of our view clustering procedure.Undetected or unmatched image fea-tures lead to errors in the point visibility estimates V

j (typ-ically in the form of missing images).We obtain more reli-able visibility estimates by aggregating visibility data over a local neighborhood,and merging points in that neighbor-hood.The position of the merged point is the average of its neighbors,while the visibility becomes the union.This step also signiﬁcantly reduces the number of SFM points and improves running time of the remaining three steps.Specif-ically,starting from a t of SFM points,we randomly lect one point,merge it with its neighbors,output the merged point,and remove both the point and its neighbors from the input t.We repeat the procedure until the input t is empty.The t of merged points becomes the new point t,which,with some abu of notation,is also denoted as {P j }.2See Figure 4for a sample output of this step.

2.Image lection –removing redundant images:Start-2An

even better approach would be to re-detect and match new image

features to improve visibility information as in [8].However,this algo-rithm would be signiﬁcantly more expensive.

View points

View point clusters

SFM points

Merged SFM points

Figure 4.Top:The ﬁrst step of our algorithm is to merge SFM points to enrich visibility information (SFM ﬁlter).Bottom:Sam-ple results of our view clustering algorithm.View points belonging to extracted clusters are illustrated in different colors.

ing with the full image t,we test each image and remove it if the coverage constraint still holds after the removal.The removal test is performed for all the images enumerated in increasing order of image resolution (#of pixels),so that low-resolution images are removed ﬁrst.Note that images are permanently discarded in this step to speed up the fol-lowing main optimization steps.

3.Cluster division –enforcing the size constraint:Next,we enforce the size constraint by splitting clusters,while ig-noring coverage.More concretely,we divide an image clus-ter into smaller components if it violates the size constraint.The division of a cluster is performed by the Normalized-Cuts algorithm [23]on a visibility graph,where nodes are images.The edge weight e lm between an image pair (I l ,I m )measures how much I l and I m together contribute to MVS reconstruction at relevant SFM points:e lm = P j ∈Θlm f (P j ,{I l ,I m })

f (P j ,V j )

,where Θlm denotes a t of SFM points visible in both I l and I m .Intuitively,images with high MVS contribution have high edge weights among them and are less likely to be cut.The division of a cluster repeats until the size constraint is satisﬁed for all the clusters.

4.Image addition –enforcing coverage:The coverage constraint may have been violated in step 3,and we now add images to each cluster in order to cover more SFM points and reestablish coverage.In this step,we ﬁrst construct a list of possible actions,where each action measures the ef-fectiveness of adding an image to a cluster to increa cov-erage.More concretely,for each uncovered SFM point P j ,let C k =argmax C l f (P j ,C l )be the cluster with the max-imum reconstruction accuracy.Then,for P j ,we create an

action {(I →C k ),g }that adds image I (∈V j ,/∈

C k )to C k ,where g measures the effectiveness and is deﬁned as f (P j ,C k ∪{I })−f (P j ,C k ).Note that we only consider actions that add images to C k instead of every cluster that could cover P j for computational efﬁciency.Since actions with the same image and cluster are generated from multi-ple SFM points,we merge such actions while summing up

the measured effectiveness g .Actions in the list are sorted in a decreasing order of their effectiveness.

Having constructed an action list,one approach would be to take the action with the highest score,then recompute the list again,which is computationally too expensive.In-stead,we consider actions who scores are more than 0.7times the highest score in the list,then repeat taking an ac-tion from the top of the list.Since an action may change the effectiveness of other similar actions,after taking one action,we remove any conﬂicting ones from the list,where two actions {(I →C ),g },{(I →C ),g }are conﬂicting if I and I are neighbors.The list construction and image addition repeat until the coverage constraint is satisﬁed.After the image addition,the size constraint may be vi-olated,and the last two steps are repeated until both con-straints are satisﬁed.

We note that the size and coverage constraints are not difﬁcult to satisfy;indeed,an extreme solution is to creat-ing a small cluster for each SFM point with sufﬁcient ba-line/resolution.In this extreme ca,the resulting clusters will likely contain many duplicates and therefore have a poor compactness score.Typically,our approach of split-ting clusters then adding a few images (usually at bound-aries)tends to rapidly and easily satisfy the constraints while achieving reasonable (though not optimal)compact-ness scores;it terminates in a couple of iterations in all of our experiments.Whil

e our approach is not globally op-timal,we note that achieving optimal compactness is not critical for our application.

3.MVS Filtering and Rendering

Having extracted image clusters,Patch-bad MVS soft-ware (PMVS)by Furukawa et al .[7]is ud to reconstruct 3D points for each cluster independently.Any MVS algo-rithm could be ud,but we cho PMVS,which is publicly available.In this ction,we propo two ﬁlters that are ud in merging reconstructed points to handle reconstruc-tion errors and variations in reconstruction quality (e Fig-ures 5and 6).Our ﬁltering algorithms are designed to be out-of-core and operate in parallel to handle a large number of MVS points efﬁciently.We now describe the two ﬁl-tering algorithms,discuss their scalability,and explain how merged MVS points are rendered.

3.1.Quality Filter

The same surface region may be reconstructed in mul-tiple clusters with varying reconstruction quality:nearby clusters produce den,accurate points,while distant clus-ters produce spar,noisy points.We want to ﬁlter out the latter,which is realized by the following quality ﬁlter .Let P j and V j denote an MVS point and its visibility informa-tion estimated by the MVS algorithm,respectively.Sup-

Quality Filter

For each reference cluster C k on each node

Initialize histograms of MVS points

reconstructed from C k;

For each cluster C l

For each point P in C k

Compute histogram entry H l for P;

Filter out points in C k using histogram;

绵怎么组词Visibility Filter

For each reference cluster C k on each node

For each image I in C k

Compute a depth-map for I;

For each remaining cluster C l

For each point P in C l

For each image I in C k

If P and I conflicts

Increment conflict count for P;

Save conflict counts for C l to file;

For each cluster C k on each node

Read files containing conflict counts for C k;

Filter out points in C k with conflict counts; Figure5.Quality and visibilityﬁlters are ud in merging MVS reconstructions,where bothﬁlters are designed to be out-of-core as well as parallel.Left:MVS point P is tested against theﬁlters.Right:Pudo codes,where loops highlighted in blue can be executed in parallel.

po P j has been reconstructed from cluster C k(a refer-

ence cluster).Weﬁrst collect MVS points{Q m}and their

associated visibility information{V m}from all the clusters

1)that have compatible normals with P ,angle dif-

ference being less than90◦;and2)who projected loca-

tions are withinτ2pixels from that of P j in every image

in V j(τ2=8in our experiments).From the collected

MVS points,we compute a histogram{H l},where H l is

the sum of reconstruction accuracies f(Q m,V m)associated

with MVS points reconstructed from C l.Since a cluster

with accurate and den points should have a signiﬁcantly

larger value than the others,P j isﬁltered out if the cor-

经常吃紧急避孕药有什么危害responding histogram value H k is less than half the max-

imum:H k<0.5max l H l.We repeat this procedure by

examining each reference cluster in turn,which can be exe-

cuted in parallel.

3.2.Visibility Filter

The visibilityﬁlter enforces consistency in visibility in-

formation associated with MVS points over the entire re-

construction.Theﬁlter is,

fact,

very

similar

the one

ud in PMVS

[7,

9].The difference is that PMVS enforces

the intra-cluster consistency inside each cluster,while our

ﬁlter enforces inter-cluster visibility consistency over an en-

tire reconstruction by comparing PMVS outputs from all the

clusters.More concretely,for each MVS point,we count

the number of times it conﬂicts with reconstructions from

other clusters.The point isﬁltered out if the conﬂict count

is more than three.Conﬂict counts are computed as follows.

LetΘk denote a t of MVS points reconstructed from

cluster C k(a reference cluster).We construct depths maps

for images in C k by projectingΘk into their visible images.we卷毛

Depth maps also store reconstruction accuracies associated

with MVS points.3We compute the conﬂict count of each

MVS point P in non-reference clusters as the number of

depth maps in C k that conﬂict with P.P is deﬁned to con-

ﬂict with a depth map if P is clor to the camera than the

3PMVS recovers a point for every abutting t of2×2pixels in our

tting,and depth maps are computed at half resolution.For an image

belonging to multiple clusters,multiple depth maps are computed.

Before MVS Filters After MVS Filters

Figure6.MVS points reconstructed from three view clusters be-

fore and after the MVSﬁlters for St Peter’s Basilica.Before the

ﬁltering,MVS points have large overlaps.Each cluster has its

own3D space,highlighted in color rectangles,where reconstruc-

tions become the most accurate over all the clusters.Points outside

such a space are mostly removed by ourﬁlters.

depth map by a small margin,and the reconstruction ac-

curacy of P is less than half the value stored in the depth

map.Note that we repeat this procedure by changing the

reference cluster one by one,which can again be executed

in parallel.The conﬂict count of the same MVS point is

computed multiple times from different executions of this

step,and their sum is tested against the threshold.

3.3.Scalability

MVS reconstruction andﬁltering are the most computa-

tionally expensive and memory intensive parts of our sys-

tem.Here,we focus on memory complexity,which is the

more critical factor for scalability.4

The memory expen of the MVS reconstruction step

depends on the choice of the core MVS algorithm,but is

not an issue with our system,becau the number of im-

4Memory consumption is more critical for MVS,becau PMVS(and

some other MVS algorithms[12])can utilize SFM visibility information

to restrict ts of images to be matched instead of exhaustively trying every

possible pair.Therefore,running-time of such algorithms is more or less

linear in the amount of surface area to be reconstructed,while memory

limitation is an unavoidable issue.

本文发布于:2023-06-25 02:57:17，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1033277.html

上一篇：小学语文教师个人述职报告(十七篇)

下一篇：个人工作业务述职报告(二十篇)

标签：骨头职称论文阵型避孕药做法

留言与评论（共有 0 条评论）