Fast 3D Recognition and Po Using the Viewpoint Feature Histogram

更新时间:2023-06-25 02:24:27 阅读：评论：0

Fast3D Recognition and Po Using the Viewpoint Feature Histogram Radu Bogdan Rusu,Gary Bradski,Romain Thibaux,John Hsu

Willow Garage

68Willow Rd.,Menlo Park,CA94025,USA

{rusu,bradski,thibaux,hsu}@

汽车蜡什么牌子好Abstract—We prent the Viewpoint Feature Histogram (VFH),a descriptor for3D point cloud data that encodes geometry and viewpoint.We demonstrate experimentally on a t of60objects captured with stereo cameras that VFH can be ud as a distinctive signature,allowing simultaneous recognition of the object and its po.The po is accurate enough for robot manipulation,and the computational cost is low enough for real time operation.VFH was designed to be robust to large surface noi and missing depth information in order to work reliably on stereo data.

I.I NTRODUCTION

As part of a long term goal to develop reliable capabilities in the area of perception for mobile manipulation,we address a table top manipulation task involving objects that can be manipulated by on

e robot hand.Our robot is shown in Fig.1. In order to manipulate an object,the robot must reliably identify it,as well as its6degree-of-freedom(6DOF)po. This paper propos a method to identify both at the same time,reliably and at high speed.

We make the following assumptions.

•Objects are rigid and relatively Lambertian.They can be shiny,but not reﬂective or transparent.

•Objects are in light clutter.They can be easily g-mented in3D and can be grabbed by the robot hand without obstruction.

•The item of interest can be grabbed directly,so it is not occluded.

•Items can be grasped even given an approximate po.

The gripper on our robot can open to9cm and each grip is2.5cm wide which allows an object8.5cm wide object to be grasped when the po is off by+/-10 degrees.

Despite the assumptions our problem has veral prop-erties that make the task difﬁcult.

•The objects need not contain texture.

•Our datat includes objects of very similar shapes,for example many slight variations of typical wine glass.•To be usable,the recognition accuracy must be very high,typically much higher than,say,for image retrieval tasks,since fal positives have very high costs and so must be kept extremely rare.

•To interact ufully with humans,recognition cannot take more than a fraction of a cond.This puts constraints on computation,but more importantly this precludes the u of accurate but slow3D

acquisition Fig.1.A PR2robot from Willow Garage,showing its grippers and stereo cameras

怀孕有什么征兆using lars.Instead we rely on stereo data,which suffers from higher noi and missing data.

Our focus is perception for mobile manipulation.Working on a mobile versus a stationary robot means that we can’t depend on instrumenting the external world with active vision systems or special lighting,but we can put such devices on the robot.In our ca,we u projected texture1 to yield den stereo depth maps at30Hz.We also cannot ensure environmental conditions.We may move from a sunlit room to a dim hallway into a room with no light at all.The projected texture gives us a fair amount of resilience to local lighting conditions as well.

1Not structured light,this is random texture

Although this paper focus on3D depth features,2D imagery is clearly important,for example for shiny and transparent objects,or to distinguish items bad on texture such as telling apart a Coke can from a Diet Coke can.In our ca,the textured light alternates with no light to allow for2D imagery aligned with the texture bad den depth, however adding2D visual features will be studied in future work.Here,we look for an effective purely3D feature. Our philosophy is that one should u or design a recogni-tion algorithm thatﬁts one’s engineering needs such as scal-ability,tra

ining speed,incremental training needs,and so on, and thenﬁnd features that make the recognition performance of that architecture meet one’s speciﬁcations.For reasons of online training,and becau of large memory availability, we choo fast approximate K-Nearest Neighbors(K-NN) implemented in the FLANN library[1]as our recognition architecture.The key contribution of this paper is then the design of a new,computationally efﬁcient3D feature that yields object recognition and6DOF po.

The structure of this paper is as follows:Related work is described in Section II.Next,we give a brief description of our system architecture in Section III.We discuss our surface normal and gmentation algorithm in Section IV followed by a discussion of the Viewpoint Feature Histogram in Section V.Experimental tup and resulting computational and recognition performance are described in Section VI. Conclusions and future work are discusd in Section VII.

II.R ELATED W ORK

The problem that we are trying to solve requires global (3D object level)classiﬁcation bad on estimated features. This has been under investigation for a long time in various rearchﬁelds,such as computer graphics,robotics,and pattern matching,e[2]–[4]for comprehensive reviews.We address the most relevant work below.

Some of the widely ud3D point feature extraction approaches include:spherical harmonic invariants[5],spin images[6],curvature maps[7],or more recently,Point Feature Histograms(PFH)[8],and conformal factors[9]. Spherical harmonic invariants and spin images have been successfully ud for the problem of object recognition for denly sampled datats,though their performance ems to degrade for noisier and sparr datats[4].Our stereo data is noisier and sparr than typical line scan data which motivated the u of our new features.Conformal factors are bad on conformal geometry,which is invariant to isometric transformations,and thus obtains good results on databas of watertight models.Its main drawback is that it can only be applied to manifold meshes which can be problematic in stereo.Curvature maps and PFH descriptors have been studied in the context of local shape comparisons for data registration.A side study[10]applied the PFH descriptors to the problem of surface classiﬁcation into3D geometric primitives,although only for data acquired using preci lar nsors.A different pointﬁngerprint reprentation using the projections of geodesic circles onto the tangent plane at a point p i was propod in[11]for the problem of surface registration.As the authors note,geodesic distances are more nsitive to surface sampling noi,and thus are unsuitable for real nd data without a priori smoothing and reconstruction.A decomposition of objects into parts learned using spin images is prented in[12]for the problem of vehicle identiﬁcation.

Methods relying on global features include descriptors such as Extended Gaussian Images(EGI)[13],eigen shapes[14],or shape distributions[15].The latter samples statistics of the entire object and reprents them as distri-butions of shape properties,however they do not take into account how the features are distributed over the surface of the object.Eigen shapes show promising results but they have limits on their discrimination ability since important higher order variances are discarded.EGIs describe objects bad on the unit normal sphere,but have problems handling arbitrarily curved objects.

The work in[16]makes u of spin-image signatures and normal-bad signatures to achieve classiﬁcation rates over 90%with synthetic and CAD model datats.The datats ud however are very different than the ones acquired using noisy640×480stereo cameras such as the ones ud in our work.In addition,the authors do not provide timing information on the estimation and matching parts which is critical for applications such as ours.A system for fully automatic3D model-bad object recognition and gmentation is prented in[17]with good recognition rates of over95%for a databa of55objects.Unfortunately,the computational performance of the propod method is not suitable for real-time as the authors report the gmentation of an object model in a cluttered scene to be around2 minutes.Moreover,the objects in the databa are scanned using a high resolution Mi

nolta scanner and their geometric shapes are very different.As shown in Section VI,the objects ud in our experiments are much more similar in terms of geometry,so such a registration-bad method would fail. In[18],the authors propo a system for recognizing3D objects in photographs.The techniques prented can only be applied in the prence of texture information,and require a cumbersome generation of models in an ofﬂine step,which makes this unsuitable for our work.

As previously prented,our requirements are real-time object recognition and po identiﬁcation from noisy real-world datats acquired using projective texture stereo cam-eras.Our3D object classiﬁcation is bad on an extension of the recently propod Fast Point Feature Histogram(FPFH) descriptors[8],which record the relative angular directions of surface normals with respect to one another.The FPFH performs well in classiﬁcation applications and is robust to noi but it is invariant to viewpoint.

This paper propos a novel descriptor that encodes the viewpoint information and has two parts:(1)an extended FPFH descriptor that achieves O(k∗n)to O(n)speed up over FPFHs where n is the number of points in the point cloud and k is how many points ud in each local neighborhood;

(2)a new signature that encodes important statistics between the viewpoint and the surface normals on the object.We call

this new feature the Viewpoint Feature Histogram(VFH)as detailed below.

III.A RCHITECTURE

Our system architecture employs the following processing steps:

•Synchronized,calibrated and epipolar aligned left and right images of the scene are acquired.

•A den depth map is computed from the stereo pair.•Surface normals in the scene are calculated.•Planes are identiﬁed and gmented out and the remain-ing point clouds from non-planar objects are clustered in Euclidean space.

•The Viewpoint Feature Histogram(VFH)is calculated over large enough objects(here,objects having at least 100points).

–If there are multiple objects in a scene,they are procesd front to back relative to the camera.

–Occluded point clouds with less than75%of the number of points of the frontal objects are noted

but not identiﬁed.

八百字优秀作文

•Fast approximate K-NN is ud to classify the object and its view.

Some steps from the early processing pipeline are shown in Figure2.Shown left to right,top to bottom in thatﬁgure are: a moderately complex scene with many different vertical and horizontal surfaces,the resulting depth map,the estimated surface normals and the objects gmented from the planar surfaces in the

scene.

Fig.2.Early processing steps row wi,top to bottom:A scene,its depth

map,surface normals and gmentation into planes and outlier objects.

For computing3D depth maps,we u640x480stereo

with textured light.The textureﬂashes on only very brieﬂy

as the cameras take a picture resulting in lights that look dim

to the human eye but bright to the camera.Textureﬂashes

only every other frame so that raw imagery without texture

can be gathered alternating with denly textured scenes.The

stereo has a38degreeﬁeld of view and is designed for clo

in manipulation tasks,thus the objects that we deal with are

from0.5to1.5meters away.The stereo algorithm that we

u was developed in[19]and us the implementation in the

OpenCV library[20]as described in detail in[21],running

at30Hz.

IV.S URFACE N ORMALS AND3D S EGMENTATION

We employ gmentation prior to the actual feature es-

timation becau in robotic manipulation scenarios we are

婚内财产协议书

only interested in certain preci parts of the environment,

and thus computational resources can be saved by tackling

only tho parts.Here,we are looking to manipulate reach-

able objects that lie on horizontal surfaces.Therefore,our

gmentation scheme proceeds at extracting the horizontal

surfaces

ﬁrst.

Fig.3.From left to right:raw point cloud datat,planar and cluster

gmentation,more complex gmentation.

Compared to our previous work[22],we have improved如何正确使用手机

the planar gmentation algorithms by incorporating surface

normals into the sample lection and model estimation

steps.We also took care to carefully build SSE aligned

data structures in memory for any computationally expensive

operation.By rejecting candidates which do not support

our constraints,our system can gment data at about7Hz,

including normal estimation,on a regular Core2Duo laptop

using a single core.To get frame rate performance(realtime),

抽烟检讨书1000字we u a voxelized data structure over the input point cloud

and downsample with a leaf size of0.5cm.The surface

normals are therefore estimated only for the downsampled

result,but using the information in the original point cloud.

The planar components are extracted using a RMSAC(Ran-

domized MSAC)method that takes into account weighted

averages of distances to the model together with the angle

of the surface normals.We then lect candidate table planes

using a heuristic combining the number of inliers which

support the planar model as well as their proximity to the

camera viewpoint.This approach emphasizes the part of the

space where the robot manipulators can reach and grasp the

objects.

The gmentation of object candidates supported by the

table surface is performed by looking at points who projec-

tion falls inside the bounding2D polygon for the table,and

applying single-link clustering.The result of the processing

steps is a t of Euclidean point clusters.This works to

reliably gment objects that are parated by about half their

minimum radius from each other.An can be en in Figure3.

To resolve further ambiguities with to the chon candidate clusters,such as objects stacked on other planar objects(such as books),we repeat the mentioned step by treating each additional horizontal planar structure on top of the table candidates as a table itlf and repeating the gmentation step(e results in Figure3).

We emphasize that this gmentation step is of extreme importance for our application,becau it allows our methods to achieve favorable computational performances by extract-ing only the regions of interest in a ,objects that are to be manipulated,located on horizontal surfaces).In cas where our“light clutter”assumption does not hold and the geometric Euclidean clustering is prone to failure, a more sophisticated gmentation scheme bad on texture properties could be implemented.

V.V IEWPOINT F EATURE H ISTOGRAM

In order to accurately and robustly classify points with respect to their underlying surface,we borrow ideas from the recently propod Point Feature Histogram(PFH)[10]. The PFH is a histogram that collects the pairwi pan,tilt and yaw angles between every pair of normals on a surface patch (e Figure4).In detail,for a pair of3D points p i,p j ,and their estimated surface normals n i,n j ,the t of normal angular deviations can be estimated as:

α=v·n j

φ=u·(p j−p i)

θ=arctan(w·n j,u·n j)

(1)

where u,v,w reprent a Darboux frame coordinate system chon at p i.Then,the Point Feature Histogram at a patch of points P={p i}with i={1···n}captures all the ts of α,φ,θ between all pairs of p i,p j from P,and bins the results in a histogram.The bottom left part of Figure4 prents the lection of the Darboux frame and a graphical reprentation of the three angular features.

Becau all possible pairs of points are considered,the computation complexity of a PFH is O(n2)in the number of surface normals n.In order to make a more efﬁcient algorithm,the Fast Point Feature Histogram[8]was de-veloped.The FPFH measures the same angular features as PFH,but estimates the ts of values only between every point and its k nearest neighbors,followed by a reweighting of the resultant histogram of a point with the neighboring histograms,thus reducing the computational complexity to O(k∗n).

Our past work[22]has shown that a global descriptor (GFPFH)can be constructed from the classiﬁcation results of many local FPFH features,and ud on a wide range of confusable objects(20different types of glass,bowls, mugs)in500scenes achieving96.69%on object class reco

gnition.However,the categorized objects were only split into4distinct class,which leaves the scaling problem open.Moreover,the GFPFH is susceptible to the errors of the local classiﬁcation results,and is more cumbersome to estimate.

In any ca,for manipulation,we require that the robot not only identiﬁes objects,but also recognizes their6DOF pos for grasping.FPFH is invariant both to object scale (distance)and object po and so cannot achieve the latter task.

In this work,we decided to leverage the strong recognition results of FPFH,but to add in viewpoint variance while retaining invariance to scale,since the den stereo depth map gives us scale/distance directly.Our contribution to the problem of object recognition and po identiﬁcation is to extend the FPFH to be estimated for the entire object cluster (as en in Figure4),and to compute additional statistics between the viewpoint direction and the normals estimated at each point.To do this,we ud the key idea of mixing the viewpoint direction directly into the relative normal angle calculation in the FPFH.Figure6prents this idea with the new feature consisting of two parts:(1)a viewpoint direction component(e Figure5)and(2)a surface shape component comprid of an extended FPFH(e Figure4).

The viewpoint component is computed by collecting a histogram of the angles that the viewpoint direction makes with each normal.Note,we do not mean the view angle to each normal as this would not be scale invariant,but instead we mean the angle between the central viewpoint direction translated to each normal.The cond component measures the relative pan,tilt and yaw angles as described in[8],[10] but now measured between the viewpoint direction at the central point and each of the normals on the surface.We call the new asmbled feature the Viewpoint Feature Histogram (VFH).Figure6prents the resultant asmbled VFH for a random object.

Fig.5.The Viewpoint Feature Histogram is created from the extended

Fast Point Feature Histogram as en in Figure4together with the statistics

of the relative angles between each surface normal to the central viewpoint

direction.

The computational complexity of VFH is O(n).In our

experiments,we divided the viewpoint angles into128bins

and theα,φandθangles into45bins each or a total of263

dimensions.The estimation of a VFH takes about0.3ms on

average on a2.23GHz single core of a Core2Duo machine

using optimized SSE instructions.

p 7

p p 8

p 9

p 10

p 11

p 5

p 1

p p 3

p 4

n c =u

u标准站姿

描写桥的句子

n 5v=(p 5-c)×u w=u ×v

c p 5w

v αφ

θFig.4.The extended Fast Point Feature Histogram collects the statistics of the relative angles between the surface normals at each point to the surface normal at the centroid of the object.The bot

tom left part of the ﬁgure describes the three angular feature for an example pair of points.

Viewpoint component

extended FPFH component

Fig.6.An example of the resultant Viewpoint Feature Histogram for one of the objects ud.Note the two concatenated components.

VI.V ALIDATION AND E XPERIMENTAL R ESULTS To evaluate our propod descriptor and system archi-tecture,we collected a large datat consisting of over 60IKEA kitchenware objects as show in Figure 8.The objects consisted of many kinds each of:wine glass,tumblers,drinking glass,mugs,bowls,and a couple of boxes.In each of the categories,many of the objects were distinguished only by subtle variations in shape as can be en for example in the confusions in Figure 10.We captured over 54000scenes of the objects by spinning them on a turn table 180◦2at each of 2offts on a platform that tilted 0,8,16,22and 30degrees.Each 180◦rotation was captured with about 90images.The turn table is shown in Fig.7.We additionally worked with a subt of 20objects in 500lightly cluttered scenes with varying arrangements of horizontal and vertical surfaces,using the same data t provided by in [22].No

2We

didn’t go 360degrees so that we could keep the calibration box in

view

Fig.7.The turn table ud to collect views of objects with known orientation.

po information was available for this cond datat so we only ran experiments parately for object recognition results.

The complete source code ud to generate our experimen-tal results together with both object databas are available under a BSD open source licen in our ROS repository at Willow Garage 3.We are currently taking steps towards creating a web page with complete tutorials on how to fully replicate the experiments prented herein.

Both the objects in the [22]datat as well as the ones we acquired,constitute valid examples of objects of daily u that our robot needs to be able to reliably identify and manipulate.While 60objects is far from the number of objects the robot eventually needs to be able to recognize,it may be enough if we assume that the robot knows what

3ros

本文发布于:2023-06-25 02:24:27，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1033101.html

上一篇：最新第一次升旗心得体会600字(六篇)

下一篇：2023年毛相林事迹介绍(12篇)

标签：牌子怀孕财产手机检讨书描写汽车

留言与评论（共有 0 条评论）