Single Image 3D Object Detection and Po Estimation for Grasping

更新时间:2023-07-05 00:35:13 阅读：评论：0

Single Image3D Object

Menglong Zhu1,Konstantinos G.

Mabel Zhang1,Cody Phillips

Abstract—We prent a novel approach for detecting and estimating their3D po in single images of scenes.Objects are given in terms of3D models accompanying texture cues.A deformable parts-bad trained on clusters of silhouettes of similar pos and hypothes about possible object locations at test time. are simultaneously gmented and veriﬁed inside each

esis bounding region by lecting the t of superpixels collective shape matches the model silhouette.Aﬁnal

on the6-DOF object po minimizes the distance

pianoslected image contours and the actual projection of model.We demonstrate successful grasps using our

and po estimate with a PR2robot.Extensive evaluation

a novel ground truth datat shows the considerable

of using shape-driven cues for detecting objects in cluttered scenes.

I.I NTRODUCTION

In this paper,we address the problem of a robot

3D objects of known3D shape from their

single images of cluttered scenes.In the context of grasping and manipulation,object recognition has

been deﬁned as simultaneous detection and gmentation in the2D image and3D localization.3D object recognition has experienced a revived interest in both the robotics and computer vision communities with RGB-D nsors having simpliﬁed the foreground-background gmentation problem. Nevertheless,difﬁculties remain as such nsors cannot gen-erally be ud in outdoor environments yet.

The goal of this paper is to detect and localize objects in single view RGB images of environments containing arbitrary ambient illumination and substantial clutter for the purpo of autonomous graspi

ng.Objects can be of arbitrary color and interior texture and,thus,we assume knowledge of only their3D model without any appearance/texture information.Using3D models makes an object detector immune to intra-class texture variations.

We further abstract the3D model by only using its2D silhouette and thus detection is driven by the shape of the 3D object’s projected occluding boundary.Object silhouettes with corresponding viewpoints that are tightly clustered on the viewsphere are ud as positive exemplars to train the state-of-the-art Deformable Parts Model(DPM)discrimina-tive classiﬁer[1].We term this shape-aware version S-DPM. 1The authors are with the GRASP Laboratory,Department of Com-puter Information Science,University of Pennsylvania,3330Walnut Street, Philadelphia,PA USA.{menglong,yinfeiy,samarthb, zmen,codyp,mlecce,kostas}@cis.upenn.edu

2Konstantinos G.Derpanis is with the Department of Computer Sci-ence,Ryerson University,245Church Street,Toronto,Ontario Canada. son.ca

小田切让香椎由宇a

February 16, 14

Fig.1:Demonstration of the propod approach on a PR2 robot platform.a)Single view input image,with the object of interest highlighted with a black rectangle.b)Object model (in green)is projected with the estimated po in3D,ready for grasping.The Kinect point cloud is shown for the purpo of visualization.

This detector simultaneously detects the object and coarly estimates the object s po.The focus of the current paper is on instance-bad rather than category-bad object detection and localization;however,our approach can be extended to multiple instance category recognition since S-DPM is agnostic to whether the positive exemplars are multiple pos from a single instance(as considered in the current paper) or multiple pos from multiple instances.

We propo to u an S-DPM classiﬁer as aﬁrst high recall step yielding veral bounding box hypothes.Given the hypothes,we solve for gmentation and localization simultaneously.After over-gmenting the hypothesis region into superpixels,we lect the superpixels that best match a model boundary using a shape-bad descriptor,the chordio-gram[2].A chordiogram-bad matching distance is ud to compute the foreground gment and rerank the hypothes. Finally,using the full3D model we estimate all6-DOF of the object by efﬁciently iterating on the po and computing matches using dynamic programming.

Our approach advances the state-of-the-art as follows:

•In terms of assumptions,our approach is among the

few in the literature that can detect3D objects in

ulove

single images of cluttered scenes independent of their

appearance.

•It combines the high recall of an existing discriminative

classiﬁer with the high precision of a holistic shape

descriptor achieving a simultaneous gmentation and

detection reranking.

•Due to the gmentation,it lects the correct image

contours to u for3D po reﬁnement,a task that was

previously only possible with stereo or depth nsors.

Fig.2:Overview of the propod approach.From left-to-right:a)The input image.b)S-DPM inferences on the gPb contour image yielding an object detection hypothesis.c)The hypothesis bounding box(red)is gmented into superpixels.d)The t of superpixels with the clost chordiogram distance to the model silhouette is lected.Po is iteratively reﬁned such that the model projection aligns well with the foreground mask silhouette.e)To visualize the po accuracy,the side of the 3D model facing the camera is textured with the corresponding2D pixel color;three textured synthetic views of theﬁnal po estimate are shown.

In the video supplement,we demonstrate our approach with a(PR2)robot grasping3D objects on a cluttered table bad on a single view RGB image.Figure8shows an example of the process.We rep

ort3D po accuracy by comparing the estimated po rendered by the propod approach with a ground truth point cloud recovered with a RGB-D nsor.Such grasping capability with accurate po is crucial for robot operation,where popular RGB-D nsors cannot be ,outdoors)and stereo nsors are chal-lenged by the uniformity of the object’s appearance within their boundary.We also document an extensive evaluation on outdoor imagery with diver backgrounds.The datat contains a t of3D object models,annotated single-view imagery of heavily cluttered outdoor scenes1,and indoor imagery of cluttered tabletops in RGB-D images.

II.R ELATED W ORK

Geometry-bad object recognition arguably outdates appearance-bad approaches.A major advantage of the approaches is their invariance to material properties,view-point and illumination.Weﬁrst survey approaches that u a3D model,either synthetic or obtained from3D recon-struction.Next,we describe approaches using multiple view exemplars annotated with their po.We clo with a brief description of2D shape-bad approaches and approaches applied to RGB-D test data.

Early approaches bad on using explicit3D models are summarized in Grimson’s book[3]and focus

on efﬁcient techniques for voting in po space.Horaud[4]investigated object recognition under perspective projection using a con-structive algorithm for objects that contain straight contours and planar faces.Hausler[5]derived an analytical method for alignment under perspective projection using the Hough transform and global geometric constraints.Aspect graphs in their strict mathematical deﬁnition(each node es the same 1The annotated datat and3D models are available at the project page:www.as.upenn.edu/˜menglong/outdoor-3d-objects.html t of singularities)were not considered practical enough for recognition tasks but the notion of sampling of the view-space for the purpo of recognition was introduced again in[6]which were applied in single images with no background.A Bayesian method for3D reconstruction from a single image was propod bad on the contours of objects with sharp surface interctions[7].Sethi et al.[8]compute global invariant signatures for each object from its silhouette under weak perspective projection.This approach was later extended[9]to perspective projection by sampling a large t of epipoles for each image to account for a range of potential viewpoints.Liebelt et al.work with a view space of rendered models in[10]and a generative geometry reprentation is developed in[11].Villamizar et al.[12]u a shared feature databa that creates po hypothes veriﬁed by a Random Fern po speciﬁc classiﬁer.In[13],a3D point cloud model is extracted from multiple view exemplars for clustering po speciﬁc appearance features.Others extend deformable part models to combine viewpoint estimates

and 3D parts consistent across ,[14].In[15],a novel combination of local and global geometric cues was ud toﬁlter2D image to3D model correspondences. Others have pursued approaches that not only gment the object and estimate the3D po but also adjusts the3D shape of the object model.For instance,Gaussian Process Latent Variable Models were ud for the dimensionality reduction of the manifold of shapes and a two-step iteration optimizes over shape and po,respectively[16].The drawback of the approaches is that in the ca of scene clutter they do not consider the lection of image contours.Further,in some cas tracking is ud forﬁnding the correct shape. This limits applicability to the analysis of image quences, rather than a single image,as is the focus in the current paper.

Our approach rembles early proposals that avoid ap-pearance cues and us only the silhouette , [6].None of the above or the exemplar-bad approaches surveyed below address the amount of clutter considered here

pivotaland in most cas the object of interest occupies a signiﬁcant portion of theﬁeld of view.

Early view exemplar-bad approaches typically assume an orthographic projection model that simpliﬁes the analysis. Ullman[17]reprented a3D object by a linear combina-tion of a small number

of images enabling an alignment of the unknown object with a model by computing the coefﬁcients of the linear combination,and,thus,reducing the problem to2D.In[18],this approach was generalized to objects bounded by smooth surfaces,under orthographic projection,bad on the estimation of curvature from three orﬁve images.Much of the multiview object detector work bad on discrete2D ,[19])has been founded on successful approaches to single view object detection, e.g.,[1].Savare and Fei-Fei[20]prented an approach for object categorization that combines appearance-bad descriptors including the canonical view for each part,and transformations between parts.This approach reasons about 3D surfaces bad on image appearance features.In[21], detection is achieved simultaneously with contour and po lection using convex relaxation.Hsiao et al.[22]also u exemplars for feature correspondences and show that ambiguity should be resolved during hypothesis testing and not at the matching pha.A drawback of the approaches is their reliance on discriminative texture-bad features that are hardly prent for the types of textureless objects considered in the current paper.

As far as RGB-D training and test examples are concerned, the most general and reprentative approach is[23].Here, an object-po tree structure was propod that simultane-ously detects and lects the correct object category and instance,and reﬁnes the po.In[24],a viewpoint feature histo

gram is propod for detection and po estimation. Several similar reprentations are now available in the Point Cloud Library(PCL)[25].We will not delve here into approaches that extract the target objects during scene parsing in RGB-D images but refer the reader to[26]and the citations therein.

The2D-shape descriptor,chordiogram[2],we u belongs to approaches bad on the optimal asmbly of image regions.Given an over-gmented ,superpixels), the approaches determine a subt of spatially contiguous regions who collective shape[2]or appearance[27]fea-tures optimize a particular similarity measure with respect to a given object model.An appealing property of region-bad methods is that they specify the image domain where the object-related features are computed and thus avoid con-taminating objected-related measurements from background clutter.你好吗英文翻译

III.T ECHNICAL APPROACH

An overview of the components of our approach is shown in Fig.2.3D models are acquired using a low-cost depth nsor(Sec.III-A).To detect an object robustly bad only on shape information,the gPb contour detector[28]is applied to the RGB input imagery(Sec.III-B).Detected contours are fed into a parts-bad object detector

trained Fig.3:Comparison of the two edge detection results on same image.(left-to-right)Input image,Canny edge and gPb, respectively.

on model silhouettes(Sec.III-C).Detection hypothes are over-gmented and shape veriﬁcation simultaneously com-putes the foreground gments and reranks the hypothes (Sec.III-E).Section III-D describes the shape descriptor ud for shape veriﬁcation.The obtained object mask enables the application of an iterative3D po reﬁnement algorithm to accurately recover the6-DOF object po bad on the initial coar po estimate rendered by the object detector(Sec. III-F).

A.3D model acquisition and rendering

3D CAD models have been shown to be very uful for object detection and po estimation both in

祈使句的回答

2D images and 3D point clouds.We utilize a low-cost RGB-D depth nsor and a den surface reconstruction algorithm,KinectFusion [29],to efﬁciently reconstruct3D object models from the depth measurements of real objects.The3D object model is acquired on a turntable with the camera pointing in aﬁxed position.After the model is reconstructed with the scene,we manually remove the background andﬁll holes in the model. To render object silhouettes from arbitrary pos,we synthesize a virtual camera at discretized viewpoints around the object center at aﬁxed distance.Each viewpoint is parameterized by the azimuth,a,elevation,e,and distance,d, of the camera relative to the object.Viewpoints are uniformly sampled on the viewsphere at aﬁxed distance and at every ten degrees for both the azimuth and elevation.

B.Image feature

Our approach to shape-bad recognition beneﬁts from recent advances in image contour detection.In unconstrained natural environments,the Canny edge detector[30]generally responds uniformly to both object occlusion boundaries and texture.One can fally piece together the silhouette of a target object from a den t of edge pixels.The state-of-the-art contour detection algorithm gPb[28]computes the likelihood of each pixel being an object contour and thus suppress many edges due to texture/clutter.Figure3 shows an example of Canny edge detection and gPb on t

he same input image.Compared to Canny edges,gPb suppress ubiquitous edge respons from background clutter. Given detected contours in the image,we ek to localize the subt of contour pixels that best reprent the object silhouette.We will show that for cluttered scenes,discrimi-native power is esntial to achieve high recall with desired precision.

Fig.4:Spray bottle detection using S-DPM.(ﬁrst row,left-to-right)Root appearance model,part appearance models centered at their respective anchor points and the quadratic deformation cost;brighter regions indicate larger penalty cost.(cond row)Input image and detection respon map of the spray-bottle;red,yellow and blue indicate large, intermediate and low detection respons,respectively.

C.Object detection

The Deformable Parts Model(DPM)[1]is arguably the most successful object detector to-date.DPM is a star-structured conditional randomﬁeld(CRF),with a root part, F0,capturing the holistic appearance of the object and veral parts(P0,...,P n)connected to the root where P i= (F i,v i,s i,a i,b i).Each model part has a default relative position(the anchor point),v i,with respect to the root position.Parts are also allowed to translate around the anchor point with a quadratic offt distance penalty,parameterized by the coefﬁcients a i and b i.The anchor points are learned from the training data and the scales of the root and parts, s i,areﬁxed.The detection score is deﬁned as:

i=0F i·φ(H,p i)−n i=1a i·(˜x i,˜y i)+b i·(˜x i2,˜y i2),(1)

whereφ(H,p i)is the histogram of gradients(HOG)[31] feature extracted at image location p i,and(˜x i,˜y i)is the offt to the part anchor point with respect to the root position p0.At test time,the root and part model weights are each parately convolved with the HOG feature of the input image.Due to the star structure of the model,maximizing the above score function at each image location can be computed efﬁciently via dynamic programming.To deal with intra-class variation,DPM is generalized by composing veral components,each trained on a subt of training instances of similar aspect ratio.We refer to[1]for more details. To simultaneously detect an object and coarly estimate its po from the edge map using only model silhouette shape information,we train a shape-aware modiﬁed version of DPM,which we term S-DPM.Each component of the learned S-DPM corresponds to a coar po of the object. More speciﬁcally,the silhouettes of the synthetic views of the object are clustered into16discrete pos by grouping nearby viewpoints.A S-DPM component is trained bad on the silhouettes of a coar po cluster ud as positive training data and silhouettes of other pos and objects and random background edges ud as negatives.Figure4 shows an example of a trained spray bottle model.During inference,each of the model components are evaluated on the input contour imagery and the hypothes with a detection score above a threshold are retained.Detections of different components are combined via non-maximum suppression. This step retains high scoring detections andﬁlters out neighboring lower scoring ones who correspondi

ng2D bounding box overlaps with that of the local maximum by greater than50%(PASCAL criteria[32]).The coar po of the object is determined by the maximum scoring component at each image location.

D.Shape descriptor

We reprent the holistic shape of each S-DPM detected object with the chordiogram descriptor[2].Given the ob-ject silhouette,this reprentation captures the distribution of geometric relationships(relative location and normals) between pairs of boundary edges,termed chords.Formally, the chordiogram is a K-dimensional histogram of all chord features on the boundary of a gmented object.A chord is a pair of points(p,q)on the boundary points.Chord feature d pq=(l pq,ψpq,θp−ψpq,θq−ψpq) is deﬁned by chord vector length l pq,orientationψpq and normals θp andθq of the object boundary at p and q.The edge normal direction points towards the gment interior to distinguish the same edge with different foreground lection of bordering superpixels.Figure5shows two examples of chord features and their corresponding chordiogram feature bins when the bordering foreground superpixels differ.The chordiogram is translation invariant since it only relates the relative position of boundary pixels rather than the absolute position in the image.

E.Shape veriﬁcation for silhouette extraction

We u the chordiogram descriptor for two tasks:(i)to recover the object foreground ,the silhouette)for accurate3D po estimation and(ii)to improve detection precision and recall by verifying that the shape of the foreground gmentation rembles the model mask.

The fact that S-DPM operates on HOG features provides ﬂexibility in dealing with contour extraction measurement noi and local shape variance due to po variation.How-ever,S-DPM only outputs the detections of the object hypothes rather than the exact location of the object contour.Even in the object hypothesis windows,the subt of edge pixels that correspond to the object silhouette is not apparent.In addition,contour-bad object detection in cluttered scenes is susceptible to fal detections caud by piecing together irrelevant contours.

Left:Example of a conﬁguration Right:A chordiogram d of the length and orientation an be decompod as the sum ords (bottom

right).E BoSS (s )=match (s,m )+group (s )(1)ollowing,we describe our shape reprentation and s of diograms as Shape Reprentation

valuate the similarity between a ﬁgure/ground g-on and the model mask we u a global

boundary-hape descriptor,called the chordiogram .It is in-y the Gestalt principle postulating that shape is per-s whole [18],as well as by the success of contour-hape descriptors [1].

eﬁne a chordiogram,consider all possible pairs of y edges of a gmented object,called chords.Each aptures the geometric conﬁguration of two bound-es,and their distribution can be ud to describe

bal shape.More precily,for each chord (p,q ),ﬁguration is described as:the length l pq and the on pq of the vector connecting p and q as well

我想学英语

orientations ✓p and ✓q of the normals to the g-on boundary at p and q (e Fig.2,left).The lat-ntations are deﬁned such that they point towards ct interior.Note that in this way we capture not e boundary but also the object interior.Thus,the

ation features of a chord (p,q )can be written as:✓p pq ,✓q pq ,l pq , pq )T ,where the normal ori-

3.The top 2principal components of chord using PCA for objects in the ETHZ Shape 5).(We omit the class ’Applelogos’for the sa ).

invariant.The descriptor is also inspired by C which captures topological properties of t of Another important difference is that we cap tour orientation relative to object interior.O boundary normals with respect to the interio to better discrimination,for example,between convex structures (conﬁgurations f pq and f p 0q 0in Fig.2),which otherwi would be indistingu discriminative power can be en on the right s where objects of four different types are well s ing chordiograms,provided we compute it o objects.If,however,we u all image con the object bounding box,we obtain cluttered (Fig.3,left),which are much harder to parate vates the coupling of the chordiogram with ﬁg tation,as explained next.This coupling allow descriptor support which covers the whole obj descriptor is ud globally.3.2.Boundary Structure Matching

The matching term in Eq.(1)compares the c of the model and an image gmentation.T the matching model,we need to express the d a function of the object gmentation s .It wi ful to provide an equivalent deﬁnition to Eq.(the contribution of a chord (p,q )to the desc noted by a chord descriptors d pq 2{0,1}K :

iff f pq 2bin (k ).Then Eq.(2)can be expres ear function of the chord contributions:d =

p’

q’

Fig.5:Chordiogram construction.image on the left denotes the object.Gray highlighted superpixels under consideration.p q

,the features,f pq and f p q

,fall into different bins of the ,the chordiogram shown on the right.At each boundary point,the foreground lection of bordering superpixels deﬁnes the normal direction.

To recover exact object contour pixel locations

and reduce fal positives,an additional shape matching step is required on top of the object hypoth

es.Here,we propo using the collective shape of a subt of superpixels within each

hypothesis region to verify the prence of an object.

For each detection hypothesis region,superpixels are

computed directly from gPb [28].Searching over the entire space of superpixel subts for the optimal match between the collective shape of the superpixels and the object model

is combinatorial and impractical.Instead,we u a greedy

algorithm to efﬁciently perform the arch.In practice,with limited superpixels to lect from,our greedy approach

recovers the correct region with high probability.Figure

6shows example results of shape veriﬁcation.The greedy algorithm begins with a t of connected superpixels as a

ed region and greedily arches over adjacent superpixels,

picking the superpixel that yields the smallest χ2distance to the chordiogram of model silhouette.Intuitively,if we have

a t of superpixels forming a large portion of the object

with a few missing pieces,adding the pieces yields the best score.The initial eds are formed by choosing all triplets of

adjacent superpixels,and limiting examination to the top ﬁve

eds that yield the smallest χ2

distance.The connectivity graph of superpixels is a planar graph with limited node

degrees.The complexity of ﬁnding triplets in such a planar graph is O (N log N )in the number of nodes.

Once the correct foreground superpixels are lected,the detection bounding box is re-cropped to reﬂect the recovered

foreground mask.Empirically,this cropping step yields a

better localization of the detection result over the S-DPM,as measured in terms of precision and recall,e Sec.IV

Edges of the foreground mask are extracted and ud in the

subquent processing stage for accurate 6-DoF continuous po estimation.

Fig.6:Shape descriptor-bad veriﬁcation examples.(left-to-right)Detection hypothesis window of the object,superpixel

over-gmentation of the hypothesis region,visualization of the coar object po from the object detector and lected foreground mask.

F .Po reﬁnement

搭档英文怎么说

Robotic grasping requires an accurate estimate of an

object’s 3D po.To improve upon the coar po estimate provided by the S-DPM,we perform a ﬁnal iterative po reﬁnement step to recover the full continuous 6-DoF po.

This step is restricted to the region of the veriﬁed superpixel

mask.Our iterative reﬁnement process consists of two steps:

(i)determining the correspondence between the projected occluding boundary of the 3D model and the contour points along object gmentation mask,and (ii)estimating an

optimal object po bad on the correspondences.The contour correspondences are estimated using dynamic programming (DP)to ensure local matching smoothness.Given the initial (coar)po output from the object detec-

tion stage,the 3D object model is rendered to the image and its corresponding projected occluding boundary is extracted.Each point on the contour is reprented by a descriptor capturing clo-range shape information.The 31-dimensional

contour descriptor includes the gradient orientation of a con-tour point (the central point)and the gradient orientations of the nearest 15points on each side of the central point along

the contour.The gradient orientation of the central point is subtracted from all elements of the descriptor,which gives in-plane rotation invariance.The matching cost between each

pair is t to be the l 2distance of the feature descriptor extracted at each point.DP is then ud to es

tablish the correspondences between contour points.

Fig.7:Reprentative images from the introduced outdoor datat.The datat was captured using a ground robot and includes diver ,rocks,sand and grass,with illumination changes.Portions of the terrain are non-ﬂat. Objects are scattered around the scene and typically do

not occupy a major portion of the scene.

To estimate the reﬁned po we u the motionﬁeld equation[33]:

石家庄心理咨询师u(x,y)=1

(xt z−t x)+ωx(xy)−ωy(x2+1)+ωz(y)

v(x,y)=1

(yt z−t y)−ωx(y2+1)−ωy(xy)+ωz(x),

where u(x,y),v(x,y)denote the horizontal and vertical components of the displacement vectors,respectively,be-tween the model and matched image contour points,com-puted by DP,Z(x,y)denotes the depth of the3D model point for the current po estimate and the Euler angles (ωx,ωy,ωz)and3D translation vector(t x,t y,t z)denote the (locally)optimal motion of the object yielding

the reﬁned po.The motion update of the current po is recovered using least squares.This procedure is applied iteratively until convergence.In practice,we usually obrve fast conver-gence with only three toﬁve iterations.The running time of the po reﬁnement is about one cond on an Intel2.4GHz i7CPU.

IV.E XPERIMENTS

Outdoor detection evaluation We introduce a challenging outdoor datat for3D object detection containing heavy background clutter.This datat was collected from a moving robot and consists of eight quences containing a total of3403test images;the dimensions of each image are 512×386.Figure7shows a t of reprentative imagery from the introduced datat.The scenes contain a variety of ,grass,rock,sand,and wood)obrved under various illumination conditions.The datat reprents the task of a robot navigating a complex environment and arching for objects of interest.The objects of interest are mostly comprid of textureless daily equipment,such as a watering pot,gas tank,watering can,spray bottle,dust pan, and liquid container.For each frame,2D bounding boxes that tightly outline each object are provided.Further,the datat includes the corresponding3D modelﬁles ud in our empirical evaluation.outro

On the outdoor datat,we performed a shape-bad object detection evaluation.We compared four methods,DOT[34], S-DPM with only the root model,full S-DPM with root and parts,and the full S-DPM plus shape veriﬁcation(propod approach),on a detection task on the introduced datat.Both DOT and S-DPM ud the same training instances from Sec. III-A with a slight difference.For S-DPM,we trained one model component for each of16discrete pos.For DOT, we ud the same quantization of the viewsphere but trained with10different depths ranging from clo to far in the scene.During testing,S-DPM is run on different scales by building an image pyramid.The input to both methods were the same gPb thresholded images.In all our experiments, the threshold is t to40(gPb respons range between0 and255),where edges with respons below the threshold were suppresd.The default parameters of gPb were ud. We did not obrve a noticeable difference in the detection and po estimate accuracy with varying the gPb parameter ttings.

Table III shows a comparison of the average precision for detection on the outdoor datat.The propod approach consisting of the full S-DPM plus shape veriﬁcation achieves the best mean average precision.It demonstrates that using shape veriﬁcation improves detection due to the reﬁnement of the bounding box to reﬂect the recovered silhouette.Full S-DPM outperforms both the root only S-DPM and DOT.This shows the beneﬁt of the underlyingﬂexibility in S-DPM. Table top eval

uation We evaluated our po reﬁnement approach under two ttings.First,we recorded an indoor RGB-D datat,with multiple objects on a table,from a head mounted Kinect on a PR2robot.The RGB-D data is ud as ground truth.We evaluated using three objects, watering can,gas tank,watering pot,placed at two different distances from the robot on the table and two different pos for each distance.For each scene,the target object was detected among all objects on the table and gmented using shape veriﬁcation,and then the6-DoF po was estimated,as described in Sec.III-F.The model point cloud was projected into the scene and Iterative Clost Point(ICP)[35]was performed between the model point cloud and the Kinect point cloud.We report ICP errors for both rotation and translation in Tables I and II,resp.Errors in the rotations and translations are small for different angles and different depth. Translation errors in the X and Y directions are smaller than in Z direction.Since Z is the depth direction,it is most affected by the3D model acquisition and robot calibration. Both measurements show our method is robust and suitable for grasping task.

In addition,using the object po estimated from our approach,we demonstrate with a PR2robot successful

本文发布于:2023-07-05 00:35:13，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/1078886.html

上一篇：弱约束非合作目标轨迹分析技术

下一篇：技术数据降维知识40题（附答案）

标签：香椎祈使句切让回答

留言与评论（共有 0 条评论）