Single Image3D Object
Menglong Zhu1,Konstantinos G.
Mabel Zhang1,Cody Phillips
Abstract—We prent a novel approach for detecting and estimating their3D po in single images of scenes.Objects are given in terms of3D models accompanying texture cues.A deformable parts-bad trained on clusters of silhouettes of similar pos and hypothes about possible object locations at test time. are simultaneously gmented and verified inside each
esis bounding region by lecting the t of superpixels collective shape matches the model silhouette.Afinal
on the6-DOF object po minimizes the distance
pianoslected image contours and the actual projection of model.We demonstrate successful grasps using our
and po estimate with a PR2robot.Extensive evaluation
a novel ground truth datat shows the considerable
of using shape-driven cues for detecting objects in cluttered scenes.
I.I NTRODUCTION
In this paper,we address the problem of a robot
3D objects of known3D shape from their
single images of cluttered scenes.In the context of grasping and manipulation,object recognition has
been defined as simultaneous detection and gmentation in the2D image and3D localization.3D object recognition has experienced a revived interest in both the robotics and computer vision communities with RGB-D nsors having simplified the foreground-background gmentation problem. Nevertheless,difficulties remain as such nsors cannot gen-erally be ud in outdoor environments yet.
The goal of this paper is to detect and localize objects in single view RGB images of environments containing arbitrary ambient illumination and substantial clutter for the purpo of autonomous graspi
ng.Objects can be of arbitrary color and interior texture and,thus,we assume knowledge of only their3D model without any appearance/texture information.Using3D models makes an object detector immune to intra-class texture variations.
We further abstract the3D model by only using its2D silhouette and thus detection is driven by the shape of the 3D object’s projected occluding boundary.Object silhouettes with corresponding viewpoints that are tightly clustered on the viewsphere are ud as positive exemplars to train the state-of-the-art Deformable Parts Model(DPM)discrimina-tive classifier[1].We term this shape-aware version S-DPM. 1The authors are with the GRASP Laboratory,Department of Com-puter Information Science,University of Pennsylvania,3330Walnut Street, Philadelphia,PA USA.{menglong,yinfeiy,samarthb, zmen,codyp,mlecce,kostas}@cis.upenn.edu
2Konstantinos G.Derpanis is with the Department of Computer Sci-ence,Ryerson University,245Church Street,Toronto,Ontario Canada. son.ca
b
小田切让香椎由宇a
February 16, 14
Fig.1:Demonstration of the propod approach on a PR2 robot platform.a)Single view input image,with the object of interest highlighted with a black rectangle.b)Object model (in green)is projected with the estimated po in3D,ready for grasping.The Kinect point cloud is shown for the purpo of visualization.
This detector simultaneously detects the object and coarly estimates the object s po.The focus of the current paper is on instance-bad rather than category-bad object detection and localization;however,our approach can be extended to multiple instance category recognition since S-DPM is agnostic to whether the positive exemplars are multiple pos from a single instance(as considered in the current paper) or multiple pos from multiple instances.
We propo to u an S-DPM classifier as afirst high recall step yielding veral bounding box hypothes.Given the hypothes,we solve for gmentation and localization simultaneously.After over-gmenting the hypothesis region into superpixels,we lect the superpixels that best match a model boundary using a shape-bad descriptor,the chordio-gram[2].A chordiogram-bad matching distance is ud to compute the foreground gment and rerank the hypothes. Finally,using the full3D model we estimate all6-DOF of the object by efficiently iterating on the po and computing matches using dynamic programming.
Our approach advances the state-of-the-art as follows:
•In terms of assumptions,our approach is among the
few in the literature that can detect3D objects in
ulove
single images of cluttered scenes independent of their
appearance.
•It combines the high recall of an existing discriminative
classifier with the high precision of a holistic shape
descriptor achieving a simultaneous gmentation and
detection reranking.
•Due to the gmentation,it lects the correct image
contours to u for3D po refinement,a task that was
previously only possible with stereo or depth nsors.
Fig.2:Overview of the propod approach.From left-to-right:a)The input image.b)S-DPM inferences on the gPb contour image yielding an object detection hypothesis.c)The hypothesis bounding box(red)is gmented into superpixels.d)The t of superpixels with the clost chordiogram distance to the model silhouette is lected.Po is iteratively refined such that the model projection aligns well with the foreground mask silhouette.e)To visualize the po accuracy,the side of the 3D model facing the camera is textured with the corresponding2D pixel color;three textured synthetic views of thefinal po estimate are shown.
In the video supplement,we demonstrate our approach with a(PR2)robot grasping3D objects on a cluttered table bad on a single view RGB image.Figure8shows an example of the process.We rep
ort3D po accuracy by comparing the estimated po rendered by the propod approach with a ground truth point cloud recovered with a RGB-D nsor.Such grasping capability with accurate po is crucial for robot operation,where popular RGB-D nsors cannot be ,outdoors)and stereo nsors are chal-lenged by the uniformity of the object’s appearance within their boundary.We also document an extensive evaluation on outdoor imagery with diver backgrounds.The datat contains a t of3D object models,annotated single-view imagery of heavily cluttered outdoor scenes1,and indoor imagery of cluttered tabletops in RGB-D images.
II.R ELATED W ORK
Geometry-bad object recognition arguably outdates appearance-bad approaches.A major advantage of the approaches is their invariance to material properties,view-point and illumination.Wefirst survey approaches that u a3D model,either synthetic or obtained from3D recon-struction.Next,we describe approaches using multiple view exemplars annotated with their po.We clo with a brief description of2D shape-bad approaches and approaches applied to RGB-D test data.
Early approaches bad on using explicit3D models are summarized in Grimson’s book[3]and focus
on efficient techniques for voting in po space.Horaud[4]investigated object recognition under perspective projection using a con-structive algorithm for objects that contain straight contours and planar faces.Hausler[5]derived an analytical method for alignment under perspective projection using the Hough transform and global geometric constraints.Aspect graphs in their strict mathematical definition(each node es the same 1The annotated datat and3D models are available at the project page:www.as.upenn.edu/˜menglong/outdoor-3d-objects.html t of singularities)were not considered practical enough for recognition tasks but the notion of sampling of the view-space for the purpo of recognition was introduced again in[6]which were applied in single images with no background.A Bayesian method for3D reconstruction from a single image was propod bad on the contours of objects with sharp surface interctions[7].Sethi et al.[8]compute global invariant signatures for each object from its silhouette under weak perspective projection.This approach was later extended[9]to perspective projection by sampling a large t of epipoles for each image to account for a range of potential viewpoints.Liebelt et al.work with a view space of rendered models in[10]and a generative geometry reprentation is developed in[11].Villamizar et al.[12]u a shared feature databa that creates po hypothes verified by a Random Fern po specific classifier.In[13],a3D point cloud model is extracted from multiple view exemplars for clustering po specific appearance features.Others extend deformable part models to combine viewpoint estimates
and 3D parts consistent across ,[14].In[15],a novel combination of local and global geometric cues was ud tofilter2D image to3D model correspondences. Others have pursued approaches that not only gment the object and estimate the3D po but also adjusts the3D shape of the object model.For instance,Gaussian Process Latent Variable Models were ud for the dimensionality reduction of the manifold of shapes and a two-step iteration optimizes over shape and po,respectively[16].The drawback of the approaches is that in the ca of scene clutter they do not consider the lection of image contours.Further,in some cas tracking is ud forfinding the correct shape. This limits applicability to the analysis of image quences, rather than a single image,as is the focus in the current paper.
Our approach rembles early proposals that avoid ap-pearance cues and us only the silhouette , [6].None of the above or the exemplar-bad approaches surveyed below address the amount of clutter considered here
pivotaland in most cas the object of interest occupies a significant portion of thefield of view.
Early view exemplar-bad approaches typically assume an orthographic projection model that simplifies the analysis. Ullman[17]reprented a3D object by a linear combina-tion of a small number
of images enabling an alignment of the unknown object with a model by computing the coefficients of the linear combination,and,thus,reducing the problem to2D.In[18],this approach was generalized to objects bounded by smooth surfaces,under orthographic projection,bad on the estimation of curvature from three orfive images.Much of the multiview object detector work bad on discrete2D ,[19])has been founded on successful approaches to single view object detection, e.g.,[1].Savare and Fei-Fei[20]prented an approach for object categorization that combines appearance-bad descriptors including the canonical view for each part,and transformations between parts.This approach reasons about 3D surfaces bad on image appearance features.In[21], detection is achieved simultaneously with contour and po lection using convex relaxation.Hsiao et al.[22]also u exemplars for feature correspondences and show that ambiguity should be resolved during hypothesis testing and not at the matching pha.A drawback of the approaches is their reliance on discriminative texture-bad features that are hardly prent for the types of textureless objects considered in the current paper.
As far as RGB-D training and test examples are concerned, the most general and reprentative approach is[23].Here, an object-po tree structure was propod that simultane-ously detects and lects the correct object category and instance,and refines the po.In[24],a viewpoint feature histo
gram is propod for detection and po estimation. Several similar reprentations are now available in the Point Cloud Library(PCL)[25].We will not delve here into approaches that extract the target objects during scene parsing in RGB-D images but refer the reader to[26]and the citations therein.
The2D-shape descriptor,chordiogram[2],we u belongs to approaches bad on the optimal asmbly of image regions.Given an over-gmented ,superpixels), the approaches determine a subt of spatially contiguous regions who collective shape[2]or appearance[27]fea-tures optimize a particular similarity measure with respect to a given object model.An appealing property of region-bad methods is that they specify the image domain where the object-related features are computed and thus avoid con-taminating objected-related measurements from background clutter.你好吗英文翻译
III.T ECHNICAL APPROACH
An overview of the components of our approach is shown in Fig.2.3D models are acquired using a low-cost depth nsor(Sec.III-A).To detect an object robustly bad only on shape information,the gPb contour detector[28]is applied to the RGB input imagery(Sec.III-B).Detected contours are fed into a parts-bad object detector
trained Fig.3:Comparison of the two edge detection results on same image.(left-to-right)Input image,Canny edge and gPb, respectively.
on model silhouettes(Sec.III-C).Detection hypothes are over-gmented and shape verification simultaneously com-putes the foreground gments and reranks the hypothes (Sec.III-E).Section III-D describes the shape descriptor ud for shape verification.The obtained object mask enables the application of an iterative3D po refinement algorithm to accurately recover the6-DOF object po bad on the initial coar po estimate rendered by the object detector(Sec. III-F).
A.3D model acquisition and rendering
3D CAD models have been shown to be very uful for object detection and po estimation both in
祈使句的回答
2D images and 3D point clouds.We utilize a low-cost RGB-D depth nsor and a den surface reconstruction algorithm,KinectFusion [29],to efficiently reconstruct3D object models from the depth measurements of real objects.The3D object model is acquired on a turntable with the camera pointing in afixed position.After the model is reconstructed with the scene,we manually remove the background andfill holes in the model. To render object silhouettes from arbitrary pos,we synthesize a virtual camera at discretized viewpoints around the object center at afixed distance.Each viewpoint is parameterized by the azimuth,a,elevation,e,and distance,d, of the camera relative to the object.Viewpoints are uniformly sampled on the viewsphere at afixed distance and at every ten degrees for both the azimuth and elevation.
B.Image feature
Our approach to shape-bad recognition benefits from recent advances in image contour detection.In unconstrained natural environments,the Canny edge detector[30]generally responds uniformly to both object occlusion boundaries and texture.One can fally piece together the silhouette of a target object from a den t of edge pixels.The state-of-the-art contour detection algorithm gPb[28]computes the likelihood of each pixel being an object contour and thus suppress many edges due to texture/clutter.Figure3 shows an example of Canny edge detection and gPb on t
he same input image.Compared to Canny edges,gPb suppress ubiquitous edge respons from background clutter. Given detected contours in the image,we ek to localize the subt of contour pixels that best reprent the object silhouette.We will show that for cluttered scenes,discrimi-native power is esntial to achieve high recall with desired precision.
Fig.4:Spray bottle detection using S-DPM.(first row,left-to-right)Root appearance model,part appearance models centered at their respective anchor points and the quadratic deformation cost;brighter regions indicate larger penalty cost.(cond row)Input image and detection respon map of the spray-bottle;red,yellow and blue indicate large, intermediate and low detection respons,respectively.
C.Object detection
The Deformable Parts Model(DPM)[1]is arguably the most successful object detector to-date.DPM is a star-structured conditional randomfield(CRF),with a root part, F0,capturing the holistic appearance of the object and veral parts(P0,...,P n)connected to the root where P i= (F i,v i,s i,a i,b i).Each model part has a default relative position(the anchor point),v i,with respect to the root position.Parts are also allowed to translate around the anchor point with a quadratic offt distance penalty,parameterized by the coefficients a i and b i.The anchor points are learned from the training data and the scales of the root and parts, s i,arefixed.The detection score is defined as:
n
i=0F i·φ(H,p i)−n i=1a i·(˜x i,˜y i)+b i·(˜x i2,˜y i2),(1)
whereφ(H,p i)is the histogram of gradients(HOG)[31] feature extracted at image location p i,and(˜x i,˜y i)is the offt to the part anchor point with respect to the root position p0.At test time,the root and part model weights are each parately convolved with the HOG feature of the input image.Due to the star structure of the model,maximizing the above score function at each image location can be computed efficiently via dynamic programming.To deal with intra-class variation,DPM is generalized by composing veral components,each trained on a subt of training instances of similar aspect ratio.We refer to[1]for more details. To simultaneously detect an object and coarly estimate its po from the edge map using only model silhouette shape information,we train a shape-aware modified version of DPM,which we term S-DPM.Each component of the learned S-DPM corresponds to a coar po of the object. More specifically,the silhouettes of the synthetic views of the object are clustered into16discrete pos by grouping nearby viewpoints.A S-DPM component is trained bad on the silhouettes of a coar po cluster ud as positive training data and silhouettes of other pos and objects and random background edges ud as negatives.Figure4 shows an example of a trained spray bottle model.During inference,each of the model components are evaluated on the input contour imagery and the hypothes with a detection score above a threshold are retained.Detections of different components are combined via non-maximum suppression. This step retains high scoring detections andfilters out neighboring lower scoring ones who correspondi
ng2D bounding box overlaps with that of the local maximum by greater than50%(PASCAL criteria[32]).The coar po of the object is determined by the maximum scoring component at each image location.
D.Shape descriptor
We reprent the holistic shape of each S-DPM detected object with the chordiogram descriptor[2].Given the ob-ject silhouette,this reprentation captures the distribution of geometric relationships(relative location and normals) between pairs of boundary edges,termed chords.Formally, the chordiogram is a K-dimensional histogram of all chord features on the boundary of a gmented object.A chord is a pair of points(p,q)on the boundary points.Chord feature d pq=(l pq,ψpq,θp−ψpq,θq−ψpq) is defined by chord vector length l pq,orientationψpq and normals θp andθq of the object boundary at p and q.The edge normal direction points towards the gment interior to distinguish the same edge with different foreground lection of bordering superpixels.Figure5shows two examples of chord features and their corresponding chordiogram feature bins when the bordering foreground superpixels differ.The chordiogram is translation invariant since it only relates the relative position of boundary pixels rather than the absolute position in the image.
E.Shape verification for silhouette extraction
We u the chordiogram descriptor for two tasks:(i)to recover the object foreground ,the silhouette)for accurate3D po estimation and(ii)to improve detection precision and recall by verifying that the shape of the foreground gmentation rembles the model mask.
The fact that S-DPM operates on HOG features provides flexibility in dealing with contour extraction measurement noi and local shape variance due to po variation.How-ever,S-DPM only outputs the detections of the object hypothes rather than the exact location of the object contour.Even in the object hypothesis windows,the subt of edge pixels that correspond to the object silhouette is not apparent.In addition,contour-bad object detection in cluttered scenes is susceptible to fal detections caud by piecing together irrelevant contours.
Left:Example of a configuration Right:A chordiogram d of the length and orientation an be decompod as the sum ords (bottom
right).E BoSS (s )=match (s,m )+group (s )(1)ollowing,we describe our shape reprentation and s of diograms as Shape Reprentation
valuate the similarity between a figure/ground g-on and the model mask we u a global
boundary-hape descriptor,called the chordiogram .It is in-y the Gestalt principle postulating that shape is per-s whole [18],as well as by the success of contour-hape descriptors [1].
efine a chordiogram,consider all possible pairs of y edges of a gmented object,called chords.Each aptures the geometric configuration of two bound-es,and their distribution can be ud to describe
bal shape.More precily,for each chord (p,q ),figuration is described as:the length l pq and the on pq of the vector connecting p and q as well
我想学英语
orientations ✓p and ✓q of the normals to the g-on boundary at p and q (e Fig.2,left).The lat-ntations are defined such that they point towards ct interior.Note that in this way we capture not e boundary but also the object interior.Thus,the
ation features of a chord (p,q )can be written as:✓p pq ,✓q pq ,l pq , pq )T ,where the normal ori-
3.The top 2principal components of chord using PCA for objects in the ETHZ Shape 5).(We omit the class ’Applelogos’for the sa ).
invariant.The descriptor is also inspired by C which captures topological properties of t of Another important difference is that we cap tour orientation relative to object interior.O boundary normals with respect to the interio to better discrimination,for example,between convex structures (configurations f pq and f p 0q 0in Fig.2),which otherwi would be indistingu discriminative power can be en on the right s where objects of four different types are well s ing chordiograms,provided we compute it o objects.If,however,we u all image con the object bounding box,we obtain cluttered (Fig.3,left),which are much harder to parate vates the coupling of the chordiogram with fig tation,as explained next.This coupling allow descriptor support which covers the whole obj descriptor is ud globally.3.2.Boundary Structure Matching
The matching term in Eq.(1)compares the c of the model and an image gmentation.T the matching model,we need to express the d a function of the object gmentation s .It wi ful to provide an equivalent definition to Eq.(the contribution of a chord (p,q )to the desc noted by a chord descriptors d pq 2{0,1}K :
iff f pq 2bin (k ).Then Eq.(2)can be expres ear function of the chord contributions:d =
p
q
p’
q’
Fig.5:Chordiogram construction.image on the left denotes the object.Gray highlighted superpixels under consideration.p q
,the features,f pq and f p q
,fall into different bins of the ,the chordiogram shown on the right.At each boundary point,the foreground lection of bordering superpixels defines the normal direction.
To recover exact object contour pixel locations
and reduce fal positives,an additional shape matching step is required on top of the object hypoth
es.Here,we propo using the collective shape of a subt of superpixels within each
hypothesis region to verify the prence of an object.
For each detection hypothesis region,superpixels are
computed directly from gPb [28].Searching over the entire space of superpixel subts for the optimal match between the collective shape of the superpixels and the object model
is combinatorial and impractical.Instead,we u a greedy
algorithm to efficiently perform the arch.In practice,with limited superpixels to lect from,our greedy approach
recovers the correct region with high probability.Figure
6shows example results of shape verification.The greedy algorithm begins with a t of connected superpixels as a
ed region and greedily arches over adjacent superpixels,
picking the superpixel that yields the smallest χ2distance to the chordiogram of model silhouette.Intuitively,if we have
a t of superpixels forming a large portion of the object
with a few missing pieces,adding the pieces yields the best score.The initial eds are formed by choosing all triplets of
adjacent superpixels,and limiting examination to the top five
eds that yield the smallest χ2
distance.The connectivity graph of superpixels is a planar graph with limited node
degrees.The complexity of finding triplets in such a planar graph is O (N log N )in the number of nodes.
Once the correct foreground superpixels are lected,the detection bounding box is re-cropped to reflect the recovered
foreground mask.Empirically,this cropping step yields a
better localization of the detection result over the S-DPM,as measured in terms of precision and recall,e Sec.IV
Edges of the foreground mask are extracted and ud in the
subquent processing stage for accurate 6-DoF continuous po estimation.
Fig.6:Shape descriptor-bad verification examples.(left-to-right)Detection hypothesis window of the object,superpixel
over-gmentation of the hypothesis region,visualization of the coar object po from the object detector and lected foreground mask.
F .Po refinement
搭档英文怎么说
Robotic grasping requires an accurate estimate of an
object’s 3D po.To improve upon the coar po estimate provided by the S-DPM,we perform a final iterative po refinement step to recover the full continuous 6-DoF po.
This step is restricted to the region of the verified superpixel
mask.Our iterative refinement process consists of two steps:
(i)determining the correspondence between the projected occluding boundary of the 3D model and the contour points along object gmentation mask,and (ii)estimating an
optimal object po bad on the correspondences.The contour correspondences are estimated using dynamic programming (DP)to ensure local matching smoothness.Given the initial (coar)po output from the object detec-
tion stage,the 3D object model is rendered to the image and its corresponding projected occluding boundary is extracted.Each point on the contour is reprented by a descriptor capturing clo-range shape information.The 31-dimensional
contour descriptor includes the gradient orientation of a con-tour point (the central point)and the gradient orientations of the nearest 15points on each side of the central point along
the contour.The gradient orientation of the central point is subtracted from all elements of the descriptor,which gives in-plane rotation invariance.The matching cost between each
pair is t to be the l 2distance of the feature descriptor extracted at each point.DP is then ud to es
tablish the correspondences between contour points.
Fig.7:Reprentative images from the introduced outdoor datat.The datat was captured using a ground robot and includes diver ,rocks,sand and grass,with illumination changes.Portions of the terrain are non-flat. Objects are scattered around the scene and typically do
not occupy a major portion of the scene.
To estimate the refined po we u the motionfield equation[33]:
石家庄心理咨询师u(x,y)=1
Z
(xt z−t x)+ωx(xy)−ωy(x2+1)+ωz(y)
v(x,y)=1
Z
(yt z−t y)−ωx(y2+1)−ωy(xy)+ωz(x),
where u(x,y),v(x,y)denote the horizontal and vertical components of the displacement vectors,respectively,be-tween the model and matched image contour points,com-puted by DP,Z(x,y)denotes the depth of the3D model point for the current po estimate and the Euler angles (ωx,ωy,ωz)and3D translation vector(t x,t y,t z)denote the (locally)optimal motion of the object yielding
the refined po.The motion update of the current po is recovered using least squares.This procedure is applied iteratively until convergence.In practice,we usually obrve fast conver-gence with only three tofive iterations.The running time of the po refinement is about one cond on an Intel2.4GHz i7CPU.
IV.E XPERIMENTS
Outdoor detection evaluation We introduce a challenging outdoor datat for3D object detection containing heavy background clutter.This datat was collected from a moving robot and consists of eight quences containing a total of3403test images;the dimensions of each image are 512×386.Figure7shows a t of reprentative imagery from the introduced datat.The scenes contain a variety of ,grass,rock,sand,and wood)obrved under various illumination conditions.The datat reprents the task of a robot navigating a complex environment and arching for objects of interest.The objects of interest are mostly comprid of textureless daily equipment,such as a watering pot,gas tank,watering can,spray bottle,dust pan, and liquid container.For each frame,2D bounding boxes that tightly outline each object are provided.Further,the datat includes the corresponding3D modelfiles ud in our empirical evaluation.outro
On the outdoor datat,we performed a shape-bad object detection evaluation.We compared four methods,DOT[34], S-DPM with only the root model,full S-DPM with root and parts,and the full S-DPM plus shape verification(propod approach),on a detection task on the introduced datat.Both DOT and S-DPM ud the same training instances from Sec. III-A with a slight difference.For S-DPM,we trained one model component for each of16discrete pos.For DOT, we ud the same quantization of the viewsphere but trained with10different depths ranging from clo to far in the scene.During testing,S-DPM is run on different scales by building an image pyramid.The input to both methods were the same gPb thresholded images.In all our experiments, the threshold is t to40(gPb respons range between0 and255),where edges with respons below the threshold were suppresd.The default parameters of gPb were ud. We did not obrve a noticeable difference in the detection and po estimate accuracy with varying the gPb parameter ttings.
Table III shows a comparison of the average precision for detection on the outdoor datat.The propod approach consisting of the full S-DPM plus shape verification achieves the best mean average precision.It demonstrates that using shape verification improves detection due to the refinement of the bounding box to reflect the recovered silhouette.Full S-DPM outperforms both the root only S-DPM and DOT.This shows the benefit of the underlyingflexibility in S-DPM. Table top eval
uation We evaluated our po refinement approach under two ttings.First,we recorded an indoor RGB-D datat,with multiple objects on a table,from a head mounted Kinect on a PR2robot.The RGB-D data is ud as ground truth.We evaluated using three objects, watering can,gas tank,watering pot,placed at two different distances from the robot on the table and two different pos for each distance.For each scene,the target object was detected among all objects on the table and gmented using shape verification,and then the6-DoF po was estimated,as described in Sec.III-F.The model point cloud was projected into the scene and Iterative Clost Point(ICP)[35]was performed between the model point cloud and the Kinect point cloud.We report ICP errors for both rotation and translation in Tables I and II,resp.Errors in the rotations and translations are small for different angles and different depth. Translation errors in the X and Y directions are smaller than in Z direction.Since Z is the depth direction,it is most affected by the3D model acquisition and robot calibration. Both measurements show our method is robust and suitable for grasping task.
In addition,using the object po estimated from our approach,we demonstrate with a PR2robot successful