A Holistic Lexicon-Bad Approach to Opinion Mining
Xiaowen Ding Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street
Chicago, IL 60607-0753
xding@cs.uic.edu
Bing Liu
Department of Computer Science
University of Illinois at Chicago
851 S. Morgan Street
Chicago, IL 60607-0753
liub@cs.uic.edu
Philip S. Yu
Department of Computer Science
University of Illinois at Chicago
851 S. Morgan Street
Chicago, IL 60607-0753
psyu@cs.uic.edu
ABSTRACT
One of the important types of information on the Web is the opinions expresd in the ur generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study the problem of determining the mantic orientations (positive, negative or neutral) of opinions expresd on product features in reviews. This problem has many applications, e.g., opinion mining, summarization and arch. Most existing techniques utilize a list of opinion (bearing) words (also called opinion lexicon) for the purpo. Opinion words are words that express desirable (e.g., great, amazing, etc.) or undesirable (e.g., bad, poor, etc) states. The approaches, however, all have some major shortcomings. In this
paper, we propo a holistic lexicon-bad approach to solving the problem by exploiting external evidences and linguistic conventions of natural language expressions. This approach allows the system to handle opinion words that are context dependent, which cau major difficulties for existing algorithms. It also deals with many special words, phras and language constructs which have impacts on opinions bad on their linguistic patterns. It also has an effective function for aggregating multiple conflicting opinion words in a ntence. A system, called Opinion Obrver, bad on the propod technique has been implemented. Experimental results using a benchmark product review data t and some additional reviews show that the propod technique is highly effective. It outperforms existing methods significantly.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Information filtering. I.2.7 [Natural Language Processing] – Text analysis
General Terms
Algorithms, Experimentation.
Keywords
Opinion mining, ntiment analysis, context dependent opinions. 1.INTRODUCTION
With the rapid expansion of e-commerce over the past 10 years, more and more products are sold on the Web, and more and more people are buying products online. In order to enhance customer shopping experience, it has become a common practice for online merchants to enable their customers to write reviews on products that they have purchad. With more and more urs becoming comfortable with the Web, an increasing number of people are writing reviews. As a result, the number of reviews that a product receives grows rapidly. Some popular products can get hundreds of reviews or more at some large merchant sites. Many reviews are also long, which makes it hard for a potential customer to read them to make an informed decision on whether to purcha the product. If he/she only reads a few reviews, he/she only gets a biad view. The large number of reviews also makes it hard for product manufacturers or business to keep track of customer opinions and ntiments on their products and rvices. It is thus highly desirable to produce a summary of reviews [13, 21] (e below and also Section 3).
In the past few years, many rearchers studied the problem, which is called opinion mining or ntiment analysis [1, 3, 13, 15, 28, 37]. The main tasks are (1) to find product features that have been commented on by reviewers and (2) to decide whether the comments are positive or negative.
Both tasks are very challenging. In this paper, we focus on task (2). That is, given a t of product features of a product, we want to accurately identify the mantic orientations of opinions expresd on each product feature by each reviewer. Semantic orientation means whether the opinion is positive, negative or neutral. We will formally define the problem in Section 3, where we will e that our task is realistic and has many applications. Although veral works on opinion mining exist, there is still not a general framework or model that clearly articulates various aspects of the problem and their relationships. We make an attempt in this paper in Section 3. In [13], a lexicon-bad method is propod to u opinion bearing words (or simply opinion words) to perform task (2). Opinion words are words that are commonly ud to express positive or negative opinions (or ntiments), e.g., “amazing”, “great”, “poor” and “expensive”. The method basically counts the number of positive and negative opinion words that are near the product feature in each review ntence. If there are more positive opinion words than negative opinion words, the final opinion on the feature is positive and otherwi negative. The opinion lexicon or the t of opinion words was obtained through a bootstrapping process using WordNet (wordnet.princeton.edu/) [8]. This method is simple and efficient, and gives reasonable results. However, this technique has some major shortcomings.
Permission to make digital or hard copies of all or part of this work for personal or classroom u is g
ranted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwi, or republish, to post on rvers or to redistribute to lists, requires prior specific permission and/or a fee.
WSDM’08, February 11-12, 2008, Palo Alto, California, USA. Copyright 2008 ACM 978-1-59593-927-9/08/0002…$5.00.
First of all, it does not have an effective mechanism for dealing with context dependent opinion words. There are many such words. For example, the word “small” can indicate a positive or a negative opinion on a product feature depending on the product feature and the context. There is probably no way to know the mantic orientation of a context dependent opinion word by looking at only the word and the product feature that it modifies without prior knowledge of the product or the product feature. Asking a domain expert or ur to provide such knowledge is not scalable due to the huge number of products, product features and opinion words. Several rearchers have attempted the problem [11, 16, 28]. However, their approaches still have some major limitations as we will e in the next ction. In this paper, we propo a holistic lexicon-bad approach to solving the problem, which improves the lexicon-bad method in [13]. Instead of looking at the current ntenc
e alone, this approach exploits external information and evidences in other ntences and other reviews, and some linguistic conventions in natural language expressions to infer orientations of opinion words. No prior domain knowledge or ur inputs are needed. Bad on our experiment results, we are fairly confident to say that context dependent opinion words no longer prent a major problem. Second, when there are multiple conflicting opinion words in a ntence, existing methods are unable to deal with them well. We propo a new method to aggregate orientations of such words by considering the distance between each opinion word and the product feature. This turns out to be highly effective.
To complete the propod approach, a t of linguistic patterns are devid to handle special words, phras and constructs bad on their underlying meanings or usage patterns, which have not been handled satisfactorily so far by existing methods.
The propod technique has been evaluated using the benchmark review data t ud in [13, 28] which consists of a large number of reviews of five products, and also a new data t consisting of reviews of three products. The results show that the new method outperforms the existing methods significantly.
2.RELATED WORK
Opinion analysis has been studied by many rearchers in recent years. Two main rearch directions are ntiment classification and feature-bad opinion mining. Sentiment classification investigates ways to classify each review document as positive, negative, or neutral. Reprentative works on classification at the document level include [4, 5, 9, 12, 26, 27, 29, 32]. The works are different from ours as we are interested in opinions expresd on each product feature rather than the whole review.
Sentence level subjectivity classification is studied in [10], which determines whether a ntence is a subjective ntence (but may not express a positive or negative opinion) or a factual one. Sentence level ntiment or opinion classification is studied in [10, 13, 17, 23, 28, 33, etc]. Our work is different from the ntence level analysis as we identify opinions on each feature. A review ntence can contain multiple features, and the orientations of opinions expresd on the features can also be different, e.g., “the voice quality of this phone is great and so is the reception, but the battery life is short.” “voice quality”, “reception” and “battery life” are features. The opinion on “voice quality”, “reception” are positive, and the opinion on “battery life” is negative. Other related works at both the document and ntence levels include tho in [2, 10, 15, 16, 36].
Most ntence level and even document level classification methods are bad on identification of o
pinion words or phras. There are basically two types of approaches: (1) corpus-bad approaches, and (2) dictionary-bad. approaches. Corpus-bad approaches find co-occurrence patterns of words to determine the ntiments of words or phras, e.g., the works in [10, 32, 34]. Dictionary-bad approaches u synonyms and antonyms in WordNet to determine word ntiments bad on a t of ed opinion words. Such approaches are studied in [1, 8, 13, 17]. [13] propos the idea of opinion mining and summarization. It us a lexicon-bad method to determine whether the opinion expresd on a product feature is positive or negative. A related method is ud in [17]. The methods are improved in [28] by a more sophisticated method bad on relaxation labeling. We will show in Section 5 that the propod technique performs much better than both the methods. In [37], a system is reported for analyzing movie reviews in the same framework. However, the system is domain specific. Other recent work related to ntiment analysis includes [3, 15, 16, 18, 19, 20, 21, 22, 24, 30, 34]. [14] studies the extraction of comparative ntences and relations, which is different from this work as we do not deal with comparative ntences in this rearch.
Our holistic lexicon-bad approach to identifying the orientations of context dependent opinion words is cloly related to works that identify domain opinion words [11, 16]. Both [11] and [16] u c
onjunction rules to find such words from large domain corpora. The conjunction rule basically states that when two opinion words are linked by “and” in a ntence, their opinion orientations are the same. For example, in the ntence, “this room is beautiful and spacious”, both “beautiful” and “spacious” are positive opinion words. Bad on this rule or language convention, if we do not know whether “spacious” is positive or negative, but know that “beautiful” is positive, we can infer that “spacious” is also positive. Although our approach will also u this linguistic rule or convention, our method is different in two aspects. First, we argue that finding domain opinion words is still problematic becau in the same domain the same word may indicate different opinions depending on what features it is applied to. For example, in the following review ntences in the camera domain, “the battery life is very long” and “it takes a long time to focus”, “long” is positive in the first ntence, but negative in the cond. Thus, we need to consider both the feature and the opinion word rather than only the opinion word as in [11, 16]. Second, our approach does not need to find opinion orientations of domain opinion words up front or offline. It makes the decision whenever needed or online. It is more flexible. [28] also us similar rules to compute opinion orientations bad on relaxation labeling. However, as we will e, [28] produces poorer results than the propod method.
3.PROBLEM DEFINITION
This ction first defines the general problem of mantic analysis of reviews and then highlights the specific instance of the problem that we aim to solve.
In general, opinions can be expresd on anything, e.g., a product, an individual, an organization, an event, a topic, etc. We u the general term object to denote the entity that has been commented on. The object has a t of components (or parts) and also a t of
attributes (or properties). Thus the object can be hierarchically decompod according to the part-of relationship, i.e., each component may also have its sub-components and so on. For example, a product (e.g., a car, a digital camera) can have different components, an event can have sub-events, a topic can have sub-topics, etc. Formally, we have the following definition: Definition(object): An object O is an entity which can be a product, person, event, organization, or topic. It is associated with a pair, O: (T, A), where T is a hierarchy or taxonomy of components(or parts), sub-components, and so on, and A is a t of attributes of O. Each component has its own t of sub-components and attributes.
Example 1: A particular brand of digital camera is an object. It has a t of components, e.g., lens, battery, etc., and also a t of attributes, e.g., picture quality, size, etc. The battery component also h
as its t of attributes, e.g., battery life, battery size, etc. Esntially, an object is reprented as a tree. The root is the object itlf. Each non-root node is a component or sub-component of the object. Each link is a part-of relationship. Each node is also associated with a t of attributes. An opinion can be expresd on any node and any attribute of the node.
Example 2: Following Example 1, one can express an opinion on the camera (the root node), e.g., “I do not like this camera”, or on one of its attributes, e.g., “the picture quality of this camera is poor”. Likewi, one can also express an opinion on any one of the camera’s components or the attribute of the component.
To simplify our discussion, we u the word “features” to reprent both components and attributes, which allows us to omit the hierarchy. Using features for products is also quite common in practice. For an ordinary ur, it is probably too complex to u a hierarchical reprentation of features and opinions. We note that in this framework the object itlf is also treated as a feature.
Let the review be r. In the most general ca, r consists of a quence of ntences r= 〈s1, s2, …, s m〉.
Definition (explicit and implicit feature): If a feature f appears in review r, it is called an explicit feature
in r. If f does not appear in r but is implied, it is called an implicit feature in r. Example 3: “battery life” in the following ntence is an explicit feature:
“The battery life of this camera is too short”.
“Size” is an implicit feature in the following ntence as it does not appear in the ntence but it is implied:
“This camera is too large”.
Here, “large” is called a feature indicator.
Definition (opinion passage on a feature): The opinion passage on feature f of an object evaluated in r is a group of concutive ntences in r that express a positive or negative opinion on f. It is possible that a quence of ntences (at least one) in a review together express an opinion on an object or a feature of the object. Also, it is possible that a single ntence express opinions on more than one feature:
“The picture quality is good, but the battery life is short”. Most current rearch focus on ntences, i.e., each passage consisting of a single ntence. In our subquent discussion, we u
ntences and passages interchangeably as we work on ntences as well.
Definition (explicit and implicit opinion): An explicit opinion on feature f is a subjective ntence that directly express a positive or negative opinion. An implicit opinion on feature f is an objective ntence that implies an opinion.
Example 4: The following ntence express an explicit positive opinion:
“The picture quality of this camera is amazing.”
The following ntence express an implicit negative opinion:
“The earphone broke in two days.”
Although this ntence states an objective fact (assume it is true), it implicitly express a negative opinion on the earphone.
Definition (opinion holder): The holder of a particular opinion is the person or the organization that holds the opinion.
In the ca of product reviews, forum postings and blogs, opinion holders are usually the authors of the postings. Opinion holders are more important in news articles becau they often explicitly state the person or organization that holds a particular view. For example, the opinion holder in the ntence “John expresd his disagreement on the treaty” is “John”. In this work, we will not study opinion holders (e [17]).
Definition (mantic orientation of an opinion): The mantic orientation of an opinion on a feature f states whether the opinion is positive, negative or neutral.
We now put things together to define a model of an object and a t of opinions on the object. An object is reprented with a finite t of features, F = {f1, f2, …, f n}. Each feature f i in F can be expresd with a finite t of words or phras W i, which are synonyms. That is, we have a t of corresponding synonym ts W = {W1, W2, …, W n} for the n features. Since each feature f i in F has a name (denoted by f i), then f i∈W i. Each author or opinion holder j comments on a subt of the features S j⊆F. For each feature f k∈S j that opinion holder j comments on, he/she choos a word or phra from W k to describe the feature, and then express a positive, negative or neutral opinion on it.
This simple model covers most but not all cas. For example, it does not cover the situation described in the following ntence: “the view-finder and the lens of this camera are too clo”, which express a negative opinion on the distance of the two components. However, we will u this simplified model in the rest of this paper. The above cas are rare in product reviews. This model introduces three main practical problems. Given a collection of reviews D as input, we have:
Problem 1: Both F and W are unknown. Then, in opinion analysis, we need to perform three tasks:
Task 1: Identifying and extracting object features that have been commented on in each review d∈D.
Task 2: Determining whether the opinions on the features are positive, negative or neutral.
Task 3: Grouping synonyms of features, as different people may u different words to express the same feature.
Problem 2:F is known but W is unknown. This is similar to Problem 1, but slightly easier. All the three tasks for Problem 1 still need to be performed, but Task 3 becomes the problem of matching discovered features with the t of given features F. Problem 3:W is known (then F is also known). We only need to perform Task 2 above, namely, determining whether the opinions on the known feat
ures are positive, negative or neutral after all the ntences that contain them are extracted. Clearly, the first problem is the most difficult to solve. Problem 2 is slightly easier. Problem 3 is the easiest, but still realistic. Example 5: A cellular phone company wants to analyze customer reviews on a few models of its phones. It is quite realistic to produce the feature t F that the company is interested in and also the t of synonyms of each feature W i (although the t might not be complete). Then there is no need to perform Tasks 1 and 3 (which are very challenging problems).
Output: The final output for each evaluative text d is a t of pairs. Each pair is denoted by (f, SO), where f is a feature and SO is the mantic or opinion orientation (positive or negative) expresd in d on feature f. We ignore neutral opinions in the output as they are not usually uful.
This model covers most but not all cas. For example, it does not cover the situation described in the following ntence: “the viewfinder and the lens of this camera are too clo”, which express a negative opinion on the distance of the two components. However, such cas are rare in practice. We will u this simplified model in the rest of this paper. Note also that this model does not consider the strength of each opinion [33], i.e., whether the opinion is strongly negative (or positive) or weakly negative (or positive), but it can be added.
There are many ways to u the results. A simple way is to produce a feature-bad summary of opinions on the object [13]. That is, for each feature, we can show how many reviewers expresd negative opinions and how many reviewers expresd positive opinions. What is important is that this is a structured summary produced from unstructured text. The summary can also be easily visualized to give a clear view of opinions on different object features from existing urs [21].
The rest of the paper focus on solving Problem 3. That is, we assume that all features are given, which is realistic for specific domains as Example 5 shows. Our task is to determine whether the opinion expresd by each reviewer on each product feature is positive, negative or neutral.
4.THE PROPOSED TECHNIQUE
We now prent the propod technique. The main idea is to u the opinion words around each product feature in a review ntence to determine the opinion orientation on the product feature. As we discusd earlier, the key difficulties are: (1) how to combine multiple opinion words (which may be conflicting) to arrive at the final decision, (2) how to deal with context or domain dependent opinion words without any prior knowledge from the ur, and (3) how to deal with many important language constructs which can change the mantic orientations of opinion words. We propo ve
ral novel techniques which make u of the review and ntence context, and general natural language rules to deal with the problems. 4.1. Opinion Words, Phras and Idioms Opinion (or ntiment) words and phras are words and phras that express positive or negative ntiments. Words that encode a desirable state (e.g., great, awesome) have a positive orientation, while words that reprent an undesirable state have a negative orientation (e.g., disappointing). While orientations apply to most adjectives, there are tho adjectives that have no orientations (e.g., external, digital). There are also many words who mantic orientations depend on contexts in which they appear. For example, the word “long” in the following two ntences has completely different orientations, one positive and one negative:
“The battery of this camera lasts very long”
“This program takes a long time to run”
In the propod method, we will deal with this problem. Although words that express positive or negative orientations are usually adjectives and adverbs, verbs and nouns can be ud to express opinions as well, e.g., verbs such as “like” and “hate”, and nouns such as “junk” and “rubbish”.
Rearchers have compiled ts of such words and phras for adjectives, adverbs, verbs, and nou
杀人如不能举
ns respectively. Such lists are collectively called the opinion lexicon. Each t is usually obtained through a bootstrapping process [13] using the WordNet. In this work, we ud the lists from [13]. However, the lists only have opinion words that are adjectives and adverbs. We added verb and noun lists identified in the same way. We also have lists of context dependent opinion words.
In order to make u of the different lists, we need to perform part-of-speech (POS)tagging as many words can have multiple POS tags depending on their usages. The part-of-speech of a word is a linguistic category that is defined by its syntactic or morphological behavior. Common POS categories in English are: noun, verb, adjective, adverb, pronoun, preposition, conjunction and interjection. In this project, we ud the NLProcessor linguistic parr [25] for POS tagging.
Idioms: Apart from opinion words, there are also idioms. We also identified tho positive, negative and dependent idioms. In fact, most idioms express strong opinions, e.g., “cost (somebody) an arm and a leg”. We annotated more than 1000 idioms. Although this task is time consuming, it is only a one-time effort and the annotated idioms can be ud by the community.
Non-opinion phras containing opinion words: An important issue that needs to be handled is that some phras have no opinions but contain opinion words, e.g., “pretty large”, where “pretty” is a po
sitive opinion word, but the whole phra has no opinion or has context dependent opinion. Such phras need to be identified and ud to overwrite the opinion words in them. 4.2. Aggregating Opinions for a Feature
Using the final lists of positive, negative and dependent words, and idioms, the system identifies (positive, negative or neutral) opinion orientation expresd on each product feature in a review ntence as follows:
•Given a ntence s that contains a t of features, opinion words in the ntence are identified first. Note that a ntence may express opinions on multiple features. For each feature f in the ntence, we compute an orientation score for the feature. A positive word is assigned the mantic orientation
书签的做法score of +1, and a negative word is assigned the mantic orientation score of −1. All the scores are then summed up using the following score function:
,),(.)(:∑
∈∧∈=
V
w s w w i i i i i f w dis SO w f score (1) where w i is an opinion word, V is the t of all opinion words (including idioms) and s is the ntence that contains the feature f , and dis (w i , f ) is the distance between feature f and opinion word w i in the ntence s . w i .SO is the mantic orientation of the word w i . The multiplicative inver in the formula is ud to give low weights to opinion words that are far away from the feature f .
The reason that the new function works better than the simple summation of opinions in [13] is that far away opinion words may not modify the current feature. However, tting a distance range/limit within which the opinion words are considered does not perform well either becau in some cas, the opinion words may be far away. The propod new function deals with both problems nicely.
Note that the feature itlf can be an opinion word as it may be an adjective reprenting a feature indicator , e.g., “reliable” in the ntence “This camera is very reliable”. In this ca, score(f ) is +1 or –1 depending on whether f (e.g., “reliable”) is positive or negative (in this ca, Equation (1) will not be ud).
• If the final score is positive, then the opinion on the feature in the ntence s is positive. If the final score is negative, then the opinion on the feature is negative. It is neutral otherwi. The algorithm i
s given in Figure 2, where the variable orientation in OpinionOrietation holds the total score. Several constructs need special handling, for which a t of linguistic rules is ud: Negation Rules : The negation word or phra usually revers the opinion expresd in a ntence. Negation words include traditional words such as “no”, “not”, and “never”, and also pattern-bad negations such as “stop” + “vb-ing”, “quit” + “vb-ing” and “cea” + “to vb”. Here, vb is the POS tag for verb and “vb-ing” is vb in its -ing form. The following rules are applied for negations:
Negation Negative → Positive //e.g., “no problem” Negation Positive → Negative // e.g., “not good”
Negation Neutral → Negative // e.g., “does not work”, where
“work” is a neutral verb. For pattern-bad negations, the system detects the patterns and then applies the rules above. For example, the ntence, “the camera stopped working after 3 days”, conforms to the pattern “stop”+“vb-ing”, and is assigned the negative orientation by applying the last rule as “working” is neutral.
Note that “Negative” and “Positive” above reprent negative and positive opinion words respectively.
Non-negation containing negation words : There are also non-negation phras that contain negation words, e.g., “not” in “I like this camera not just becau it is beautiful” does not mean negation becau of the phra “not just”. Again, such phras need be identified to overwrite the negation words in them.
1. Algorithm OpinionOrietation ()
2. for each ntence s i that contains a t of features do
3. features = features contained in s i ;
4.
for each feature f j in features do
5.
orientation = 0;
6.
if feature f j is in the “but” clau then
7.
orientation = apply the “but” clau rule 8. el remove “but” clau from s i if it exists; 9.
for each unmarked opinion word ow in s i do
10.
// ow can be a TOO word or Negation word as well
11. orientation += wordOrientation (ow, f j , s i ); 12. endfor 13. endif
14. if orientation > 0 then 15. f j ’s orientation in s i = 1 16. el if orientation < 0 then
17. f j ’s orientation in s i = -1 18. el
19. f j ’s orientation in s i = 0 20. endif 21. endif
22. if f j is an adjective then
23. (f j ).orientation += f j ’s orientation in s i ;
24. el let o ij is the nearest adjective word to f j , in s i ;
25. (f j , o ij ).orientation += f j ’s orientation in s i ; 26. endif
27. endfor 28. endfor;
// Context dependent opinion words handling 29. for each f j with orientation = 0 in ntence s i do 30. if f j is an adjective then
31. f j ’s orientation in s i = (f j ).orientation
32. el // synonym and antonym rule should be applied too 33. let o ij is the nearest opinion word to f j , in s i ; 34. if (f j , o ij ) exists then
35. f j ’s orientation in s i = (f j , o ij ).orientation 36. endif 37. endif
38. if f j ’s orientation in s i = 0 then
39.
f j ’s orientation in s i = apply inter-ntence
conjunction rule
40. endif 41. endfor
1. Procedure wordOrientation (word , feature , ntence )
2. if word is a Negation word then
3. orientation = apply Negation Rules;
4. mark words in ntence ud by Negation rules
5. elif word is a TOO word then谁是老大
6. orientation = apply TOO Rules;
7. mark words in ntence ud by TOO rules
赛车英文
8. el
9. orientation = orientation of word in opinionWord_list 10. endif 11.
orientation =
,)
,(feature word dis n
orientatio
Figure 2: Predicting the orientations of opinions on product
features
“But” Clau Rules: A ntence containing “but” also needs special treatment. The opinion before “but” and after “but” are usually the opposite to each other. Phras such as “with the exception of”, “except that”, and “except for” behaves similarly to “but” and are handled in the same way as “but”.
“but” claus are handled as follows:
If the product feature f j appears in the “but” clau then
for each unmarked opinion word ow in the “but” clau of the ntence s i do
// ow can be a TOO word (e below) or Negation word红岩读后感300字
orientation += wordOrientation(ow, f j, s i);
endfor
If
orientation≠ 0 then
return
粉色玫瑰
orientation
el
orientation = orientation of the clau before “but”
If orientation≠ 0 then
return –1 * orientation
el return
endif
The algorithm above basically says that we follow the mantic orientation of the “but” clau first. If we cannot get an orientation there, we will look at the clau before“but”and negate its orientation.
Non-but claus containing but-like words: Similar to negations and opinion words, a ntence containing “but” does not necessarily change the opinion orientation. For example, “but” in “I not only like the picture quality of this camera, but also its size” does not change opinion after “but” due to the phra “but also”.
4.3. Handling Context Dependent Opinions
We now deal with context dependent opinion words. Three rules (or linguistic conventions) are propod, which u the contextual information in other reviews of the same product, ntences in the same review and even claus of the same ntence to infer the orientation of the opinion word in question. Since this method makes u of the global information rather than only the local informat
ion, we thus call this approach the holistic approach.
1.Intra-ntence conjunction rule: For example, we have the
ntence, “the battery life is very long”. It is not clearly whether “long” means a positive or a negative opinion on the product feature “battery life”. Our algorithm tries to e whether any other reviewer said that “long” is positive (or negative). For example, another reviewer wrote “This camera takes great pictures and has a long battery life”. From this ntence, we can discover that “long” is positive for “battery life” becau it is conjoined with the positive opinion word “great”. We call this the intra-ntence conjunction rule, which means that a ntence only express one opinion orientation unless there is a “but” word which changes the direction. The following ntence is unlikely: “This camera takes great pictures and has a short battery life.” It is much more natural to say: “This camera takes great pictures, but has
a short battery life.”
2.Pudo intra-ntence conjunction rule: Sometimes, one
家乡土may not u an explicit conjunction “and”. Let us u the example ntence “the battery life is long” again. We have no idea whether “long” is positive or negative for “battery life”.
A similar strategy can be applied. For instance, another
reviewer might have written the following: “The camera has a long battery life, which is great”. The ntence indicates that the mantic orientation of “long” for “battery life” is positive due to “great”, although no explicit “and” is ud.
Using the two rules, we consider two cas:
Adjectives as feature indicators: In this ca, an adjective is a feature indicator. For example, “small” is a feature indicator that indicates feature “size” in the ntence, “this camera is very small”. It is not clearly from this ntence whether “small” means positive or negative. The above two rules can be applied to determine the mantic orientation of “small” for “camera”.
Explicit features that are not adjectives: In this ca, we u opinion words near the feature words to determine the opinion orientations on the feature words. For example, in the ntence “the battery life of this camera is long”, “battery life” is the given feature and “long” is a nearby opinion word. Again we can apply the above two rules to find the mantic orientation of “long” for “battery life”.
3.Inter-ntence conjunction rule: If the above two rules could
方正not decide the opinion orientation, we u the context of previous or next ntence (or claus) to decide. That is, we extend the intra-ntence conjunction rule to neighboring ntences. The idea is that people usually express the same opinion (positive or negative) across ntences unless there is an indication of opinion change using words such as “but” and “however”. For example, the following ntences are natural: “The picture quality is amazing. The battery life is long”
However, the following ntences are not natural:
“The picture quality is amazing. The battery life is short”
It is much more natural to say: “The picture quality is amazing. However, the battery life is short”
Below, we give the algorithm. The variable orientation is the opinion score on the current feature. Note that the algorithm only us neighboring ntences. Neighboring claus in the same ntence can be ud in a similar way too.
if the previous ntence exists and has an opinion then
if there is not a “However” or “But” word to change the direction of the current ntence, then
orientation = the orientation of the last clau of the
previous ntence
el orientation = opposite orientation of the last clau of
the previous ntence
elif the next ntence exists and has an opinion then
if there is a not “However” or “But” word to change the direction of the next ntence, then
orientation = the orientation of the first clau of the
next ntence
el orientation = opposite orientation of the last clau
of the next ntence
el orientation = 0
endif
For rule 1 and rule 2, we should also note the following. It is possible that in the reviews of a product the same opinion word for the same feature has conflicting orientations. For example, another reviewer may say that “small” is negative for camera size: