Geometric ordering of concepts,logical disjunction,and learning
by induction
To appear in Compositional Connectionism in Cognitive Science,AAAI Fall Symposium Series,Washington,DC,October22-24,2004.
Dominic Widdows
Maya Design1
Abstract
In many of the abstract geometric models which have been ud to reprent concepts and their relationships,regions posssing some cohesive property such as convexity or linearity have
played a significant role.When the implication or containment relationship is ud as an ordering
relationship in such models,this gives ri to logical operators for which the disjunction of two
concepts is often larger than the t union obtained in Boolean models.This paper describes
some of the characteristic properties of such broad non-distributive composition operations,
which are related to the traditional inductive hypothes of learning algorithms.The lattice of
subspaces of a vector space is prented as a special example,in which the subspace lattice is
formally related to the tensor algebra,ud already for composition in quantum mechanics and
Holographic Reduced Reprentations.
1Closure conditions in geometric models
The hypothesis that concepts can be reprented by points and more general regions in spatial models has been ud by psychologists and cognitive scientists to simulate human language learning (Landauer and Dumais1997)and to reprent nsory stimuli such as tastes and colours(G¨a rdenfors 2000,§1.5).Of the traditional practical applications of such a spatial approach,the vector space model for information retrieval(Salton and McGill1983)is notable,and its generalizations such as latent mantic analysis(Landauer and Dumais1997),in which distributions o
f word usage learned from corpora become condend into a low-dimensional reprentation and ud,among other things,for discriminating between different ns of ambiguous words(Sch¨u tze1998).
Sch¨u tze’s(1998)paper exemplifies some the opportunities and challenges involved in such a spatial approach—the include learning to reprent individual objects as points in a geometric space(in this ca,word vectors),combining the points into appropriate ntence or document vectors(in this ca,using addition of vectors),and extrapolating from obrved points of informa-tion to apportion the geometric space into cohesive regions corresponding to recognizable concepts (in this ca,using clustering).
The last question—how are empirical obrvations gathered into class described by the same word or reprented by the same concept?—is of traditional importance in philosophy and many related disciplines.The extrapolation from obrved data to classifying previously unexperi-enced situations is implemented in a variety of theoretical models and practical applications,using smoothing and clustering(Manning and Sch¨u tze1999,Ch14),by exploiting a natural general-to-specific ordering on the space of obrvations(Mitchell1997,Ch.2),and by using similarity or distance measures to gauge the influence exerted by a cluster of obrvations upon its conceptual hinterland(
G¨a rdenfors2000,Ch.3,4).
Mathematically,such extrapolation techniques are related to closure conditions ,a t being clod if it has no tendency to include new members.A traditional example of closure is in the field of topology,which describes a t as being clod if it contains the limit point of every possible quence of
elements.
Figure 1:Two non-convex ts (dark grey)and the points added to form their convex closures (light grey)A more easily-grasped example is given by the property of con-
vexity .A t S is said to be convex if for any two points A and B in
白带黄稠
S ,the straight line AB lies entirely within S .The convex closure of
S is formed by taking the initial t and all such straight lines,this
being the smallest convex t containing S .Figure 1shows two simple
non-convex regions and their convex closures.
One of the best developed us of such closure methods for obtain-ing stable conceptual reprentations is in Formal Concept Analysis,where the closure operation is given by the relationship between the intent and the extent of a concept (Ganter and Wille 1999,§1.1).An important closure operation we will consider later is the linear span
of a t of vectors,which can also be thought of as the smallest subspace containing tho vectors.2Ordering,containment,implication and disjunction
The link between the geometric structures and logical operations aris becau the ordering rela-tionship given by containment or inclusion can be ud to model logical implication,an equivalence in
troduced by Aristotle.2That is to say,suppo the t A reprents tho situations where the asrtion ˜A is true,and B reprents tho situations where the asrtion ˜B is true.If A is con-tained in B ,then every situation in which ˜A is true is also a situation in which ˜B is true,which is equivalent to saying that ˜A血液的拼音
implies ˜B .The similar treatment of class of objects and propositions about them is discusd by Boole (1854,Ch.4).
Containment (between two ts)and implication (between two logical statements)are both ordering relationships,so both geometric regions and logical asrtions have the basic structure of an ordered t (Davey and Priestley 1990,Ch.1).The disjunction of two asrtions (or their corresponding geometric regions)is given by their least upper bound in this ordered t.That is to say that for two statements ˜A and ˜B ,their disjunction ˜A ∨˜B is the most specific statement implied by both ˜A
and ˜B ,which corresponds geometrically to the smallest region containing both A and B
.
商品的英语
Figure 2:The convex closure of the union of two ts.Now,suppo that concepts in our geometric model are repre-
nted only by convex ts.Then the least upper bound or disjunc-
曹仁tion of two concepts will be reprented by the smallest convex t
which contains them both,which is the convex closure of their t
union.(Figure 2).Note the similarity between Figure 2and the
convex closures in Figure 1,the only difference being that the initial
(dark grey)t is no longer connected.The resulting convex closure contains points which are in neither of the two original ts,and so this disjunction operation fails to
obey the Boolean distributive law (Davey and Priestley 1990,Ch.4).In this way,the behaviour of concepts under logical connectives is determined by the closure condition,ud to distinguish tho
regions which are cohesive enough to reprent concepts from tho which are not.The physical an
d philosophical conquences of relaxing the distributive law are discusd by Putnam(1976).
In language,we often encounter ts which do not correspond to any lexicalized concept.For example,there is no word in English for the t consisting of all rabbits,pigs and dolphins,and the most specific word which refers to all of the creatures(mammals)also refers to many other species.In this way,the concept mammal can be thought of as the disjunction of rabbits,pigs and dolphins in a concept lattice,and this disjunction does not obey the distributive law becau there are many mammals which are not rabbits,not pigs,and not dolphins.
We have so far demonstated that reprenting concepts using only tho ts which satisfy some closure condition leads to the u of logical operators that violate classical Boolean assumptions such as the distributive law(Boole1854,Ch.2,Eq.4),and that there are linguistic situations where this divergence from the classical theory offers a reasonable interpretation,since many possible ts do not correspond to concepts reprented in the lexicon.For more information on geometric ordering and its relationship with disjunction in a concept lattice,e Widdows(2004,Ch.1,8). 3Inductive bias for machine learning and non-distributivity
Machine learning algorithms are in various ways dependent on the inductive hypothesis,which more
or less states that the situations we will encounter in the future are coherent with tho we encountered in the past.However,we do not expect to encounter situations in the future which are identical to tho we have encountered in the past(cf.the dictum of Haraclitus,“you can never step into the same river twice”),and so some means must be found for correctly interpreting future obrvations bad on the general features they have in common with previous encounters.
In practice,this means that we always assume that the training data for any learning algorithm is incomplete,and the algorithm has to generalize from this training data to new situations.The method of generalization depends on the inductive bias of the algorithm,which is the t of premis which,when combined with the training data,provide deductive statements in novel situations (Mitchell1997,§2.7).
Inductive bias can thus be viewed as a kind of closure operation on the t of training examples. Insofar as the training examples provide a t of different situations any of which are possible,the final classifier learned from the training data combined with the inductive hypothesis can be thought of as a disjunction,who arguments are given by the training examples and who inductive bias is given by the closure condition needed to smooth out the training examples to form a cohesive conceptual reprentation.For example,Li and Abe(1998)u the closure condition that any cohesive
t of nouns is given by a tree-cut in a noun hierarchy,and u the miminum desription length algorithm to approximate such tree-cuts from training data in order to predict which kinds of nouns are usually taken as arguments by different verbs.
Introducing inductive bias therefore dispens with the distributive law,which would impo the condition that the disjunction of the training examples should simply be the t of all training examples,preventing the generalization to new situations.
静夜思教案4Vector lattices and composition
Thisfinal ction describes some of the special properties of vector spaces which make them par-ticularly appropriate for this sort of conceptual analysis.
The standard closure condition in a vector space V is given by the property of linearity—a
subt U ⊆V is considered to be a vector subspace of V if for all a and b in U ,the linear combination λa +µb is also in U .The simplest subspaces are lines and planes which pass through the origin.The corresponding closure operation on a t of vectors is to take their linear span,
span {a 1,...a n }={λ1a 1+...+λn a n }.
For example,the linear span of the lines OA and OB in Figure 3is the plane OAB .Thus,in the natural logic of the vector space,the plane OAB is the disjunction of the lines OA and OB (Putnam 1976).
Figure 3:Lines and planes in a vector space To date,vector models in information retrieval have mainly
treated words and documents as points rather than regions (though as stated,the process of clustering and classifying the points leads to
a consideration of regions in the space).The drawback of using only this most basic unit is that no point is contained in any other,and so
all of the logical structure we have described is not exploited.Using
the lattice of clod subspaces is an appealing alternative since it gives
临时身份证要多久
a natural ordering on concepts:so far,experiments have shown that
unwanted content can be removed from arch results more effectively
by reprenting the unwanted content as the vector disjunction (plane spanned by)the unwanted keywords,rather than treating the key-words as parate points (Widdows 2003;Widdows 2004,Ch.7).
Another composition operation on vectors in the tensor product (ud by Plate (2003)for composition in connectionist memory models and a standard way of reprenting the interaction between particles in quantum mechanics).Unlike vector addition (but like language!)tensor composition is not bound by the commutative law.The lattice of subspaces of a vector space is cloly related to the tensor product via the Pl¨u cker embedding (Ward and Wells 1990,§1.3),and both the tensor reprentations and the lattice of subspaces posss a rich and well-studied algebraic structure (cf.Grassmannian algebra,(Ward and Wells 1990,§1.1)).
Vector models therefore give some of the most mathematically sophisticated tools with which the combined geometric and logical approach for reprenting concepts might be developed.References
Boole,G.(1854).An Investigation of the Laws of Thought .Macmillan.Dover edition,1958.Davey,B.A.and H.A.Priestley (1990).Lattices and Order .Cambridge University Press.
Ganter,B.and R.Wille (1999).Formal Concept Analysis:Mathematical Foundations .Springer.G¨a rdenfors,P.(2000).Conceptual Spaces:The Geometry of Thought .Bradford Books MIT Press.Landauer,T.and S.Dumais (1997).A solution to Plato’s problem:The latent mantic analysis theory of acquisition.Psychological Review 104(2),211–240.
Li,H.and N.Abe (1998).Generalizing ca frames using a thesaurus and the MDL principle.Computational Linguistics 24(2),217–244.君子兰图片
Manning,C.D.and H.Sch¨u tze (1999).Foundations of Statistical Natural Language Processing .Cambridge,Massachutts:The MIT Press.
Mitchell,T.(1997).Machine Learning .McGraw-Hill.
Plate,T.(2003).Holographic Reduced Reprentations:Distributed Reprentation for Cognitive Structures.CSLI Publications.
Putnam,H.(1976).The logic of quantum maechanics.In Mathematics,Matter and Method,pp.
174–197.Cambridge University Press.
Salton,G.and M.McGill(1983).Introduction to modern information retrieval.New York,NY: McGraw-Hill.
Sch¨u tze,H.(1998).Automatic word n discrimination.Computational Linguistics24(1),97–124.
Ward,R.S.and R.O.Wells(1990).Twistor Geometry and Field Theory.Cambridge University Press.
Widdows,D.(2003).Orthogonal negation in vector spaces for modelling word-meanings and document retrieval.In Proceedings of the41st Annual Meeting of the Association for Com-putational Linguistics,Sapporo,Japan.
qq音乐桌面歌词
Widdows,D.(2004).Geometry and Meaning.CSLI publications.In press.