a r X i v :c s /9902011v 1 [c s .D L ] 7 F e
b 1999
Content-Bad Book Recommending Using Learning for Text Categorization
Raymond J.Mooney
Department of Computer Sciences
Loriene Roy
Graduate School of Library and Information Science
中外文化差异University of Texas Austin,TX 78712
Email :mooney@cs.utexas.edu,loriene@gslis.utexas.edu
ABSTRACT
Recommender systems improve access to relevant products and information by making personalized s
uggestions bad on previous examples of a ur’s likes and dislikes.Most ex-isting recommender systems u social filtering methods that ba recommendations on other urs’preferences.By con-trast,content-bad methods u information about an item itlf to make suggestions.This approach has the advantage of being able to recommended previously unrated items to urs with unique interests and to provide explanations for its recommendations.We describe a content-bad book rec-ommending system that utilizes information extraction and a machine-learning algorithm for text categorization.Initial experimental results demonstrate that this approach can pro-duce accurate recommendations.
KEYWORDS:Recommender systems,information filtering,
machine learning,text categorization
INTRODUCTION
There is a growing interest in recommender systems that sug-gest music,films,books,and other products and rvices to urs bad on examples of their likes and dislikes [19,26,11].A number of successful startup companies like Fire-fly,Net Perceptions,and LikeMinds have formed to provide recommending technology.On-line book stores like Ama-zon and BarnesAndNoble have popular rec
ommendation r-vices,and many libraries have a long history of providing reader’s advisory rvices [2,21].Such rvices are im-portant since readers’preferences are often complex and not readily reduced to keywords or standard subject categories,but rather best illustrated by example.Digital libraries should
be able to build on this tradition of assisting readers by pro-viding cost-effective,informed,and personalized automated recommendations for their patrons.
Existing recommender systems almost exclusively utilize a form of computerized matchmaking called collaborative or social filtering .The system maintains a databa of the pref-erences of individual urs,finds other urs who known preferences correlate significantly with a given patron,and recommends to a person other items enjoyed by their matched patrons.This approach assumes that a given ur’s tastes are generally the same as another ur of the system and that a sufficient number of ur ratings are available.Items that have not been rated by a sufficient number of urs cannot be effectively recommended.Unfortunately,statistics on li-brary u indicate that most books are utilized by very few patrons [12].Therefore,collaborative approaches naturally tend to recommend popular titles,perpetuating homogene-ity in reading choices.Also,since significant information about other urs is required to make recommendations,this approach rais concerns a
bout privacy and access to propri-etary customer data.
Learning individualized profiles from descriptions of exam-ples (content-bad recommending [3]),on the other hand,allows a system to uniquely characterize each patron with-out having to match their interests to someone el’s.Items are recommended bad on information about the item itlf rather than on the preferences of other urs.This also allows for the possibility of providing explanations that list content features that caud an item to be recommended;potentially giving readers confidence in the system’s recommendations and insight into their own preferences.Finally,a content-bad approach can allow urs to provide initial subject in-formation to aid the system.
Machine learning for text-categorization has been applied to content-bad recommending of web pages [25]and news-group messages [15];however,to our knowledge has not previously been applied to book recommending.We have
been exploring content-bad book recommending by apply-ing automated text-categorization methods to mi-structured text extracted from the web.Our current prototype system, L IBRA(Learning Intelligent Book Recommending Agent), us a databa of book information extracted from web pages Urs provide1–10ratings for a lected t of training bo穿越火线英文
oks;the system then learns a profile of the ur using a Bayesian learning algorithm and produces a ranked list of the most recommended additional titles from the sys-tem’s catalog.
As evidence for the promi of this approach,we prent ini-tial experimental results on veral data ts of books ran-domly lected from particular genres such as mystery,sci-ence,literaryfiction,and sciencefiction and rated by differ-ent urs.We u standard experimental methodology from machine learning and prent results for veral evaluation metrics on independent test data including rank correlation coefficient and average rating of top-ranked books.
The remainder of the paper is organized as follows.Section 2provides an overview of the system including the algorithm ud to learn ur profiles.Section3prents results of our initial experimental evaluation of the system.Section4dis-cuss topics for further rearch,and ction5prents our
conclusions on the advantages and promi of content-bad
book recommending.
水墨画的意思SYSTEM DESCRIPTION
Extracting Information and Building a Databa
First,an Amazon subject arch is performed to obtain a list of book-description URL’s of broadly relevant titles.L I-BRA then downloads each of the pages and us a simple pattern-bad information-extraction system to extract data
about each title.Information extraction(IE)is the task of lo-
cating specific pieces of information from a document,thereby obtaining uful structured
data
from
unstructured
text[16,
9].Specifically,it involvesfinding a t of substrings from the document,calledfillers,for each of a t of specified slots.When applied to web pages instead of natural language text,such an extractor is sometimes called a wrapper[14]. The current slots utilized by the recommender are:title,au-thors,synops,published reviews,customer comments,re-lated authors,related titles,and subject terms.Amazon pro-duces the information about related authors and titles using collaborative methods;however,L IBRA simply treats them as additional content about the book.Only books that have at least one synopsis,review or customer comment are retained as having adequate content information.A number of other slots are also publisher,date,ISBN,price, etc.)but are currently not ud by the recommender.We have initially asmbled databas for literaryfiction(3,061 titles),sciencefiction(3,813titles),mystery(7,285titles), and science(6,177titles).
Since the layout of Amazon’s automatically generated pages is quite regular,a fairly simple extraction system is suffi-cient.L IBRA’s extractor employs a simple pattern matcher that us pre-filler,filler,and post-filler patterns for each slot, as described by[6].In other applications,more sophisticated information extraction methods and inductive learning of ex-traction rules might be uful[7].
The text in each slot is then procesd into an unordered bag of words(tokens)and the examples reprented as a vector of bags of words(one bag for each slot).A book’s title and authors are also added to its own related-title and related-author slots,since a book is obviously“related”to itlf,and this allows overlap in the slots with books listed as related to it.Some minor additions include the removal of a small list of stop-words,the preprocessing of author names into unique tokens of the formfirst-initial
P(D)
|D|
i=1
P(a i|c j)(1)
where a i is the i th word in the document,and|D|is the length of the document in words.Since for any given docu-
ment,the prior P(D)is a constant,this factor can be ignored if all that is desired is a ranking rather than a probability es-timate.A ranking is produced by sorting documents by their odds ratio,P(c1|D)/P
(c0|D),where c1reprents the pos-itive class and c0reprents the negative class.An example is classified as positive if the odds are greater than1,and negative otherwi.
In our ca,since books are reprented as a vector of“doc-uments,”d m,one for each slot(where s m denotes the m th slot),the probability of each word given the category and the slot,P(w k|c j,s m),must be estimated and the posterior cat-egory probabilities for a book,B,computed using:
P(c j|B)=P(c j)
WORDS ZUBRIN9.85
WORDS SMOLIN9.39
WORDS TREFIL8.77
WORDS DOT8.67
SUBJECTS COMPARATIVE8.39
AUTHOR D
ZUBRIN7.63
AUTHOR R
MORA VEC7.63
RELATED-AUTHORS B
RADFORD7.63
WORDS LEE7.57
WORDS MORA VEC7.57
WORDS WAGNER7.57
RELATED-TITLES CONNECTIONIST7.51
RELATED-TITLES BELOW7.51
Table1:Sample Positive Profile Features
an average of11.5conds,and probabilistically categorized
new test examples at an average rate of about200books per
cond.An optimized implementation could no doubt sig-
nificantly improve performance even further.
A profile can be partially illustrated by listing the
features most indicative of a positive or negative rating.Table1prents the top20features for a sample profile learned for recom-mending science books.Strength measures how much more likely a word in a slot is to appear in a positively rated book than a negatively rated one,computed as:
Strength(w k,s j)=log(P(w k|c1,s j)/P(w k|c0,s j))(6) Producing,Explaining,and Revising Recommendations Once a profile is learned,it is ud to predict the preferred ranking of the remaining books bad on posterior probabil-ity of a positive categorization,and the top-scoring recom-mendations are prented to the ur.
The system also has a limited ability to“explain”its rec-ommendations by listing the features that most contributed to its high rank.For example,given the profile illustrated above,L IBRA prented the explanation shown in Table2. The strength of a cue in this ca is multiplied by the num-ber of times it appears in the description in order to fully indicate its influence on the ranking.The positiveness of a feature can in turn be explained by listing the ur’s training examples that most influenced its strength,as illustrated in Table3where“Count”gives the number of times the feature appeared in the description of the rated book.
After reviewing the recommendations(and perhaps disrec-ommendations),the ur may assign their own rating to ex-amples they believe to be incorrectly ranked and retrain the
The Fabric of Reality:
The Science of Parallel Univers-And Its Implications by David Deutsch recommended becau:
Slot Word Strength The Life of the Cosmos1015
bordeauxBefore the Beginning:Our Univer and Others87 Unveiling the Edge of Time103 Black Holes:A Traveler’s Guide93
The Inflationary Univer92 Table3:Sample Feature Explanation
system to produce improved recommendations.As with rel-evance feedback in information retrieval[27],this cycle can be repeated veral times in order to produce the best results. Also,as new examples are provided,the system can track any change in a ur’s preferences and alter its recommendations bad on the additional information.
EXPERIMENTAL RESULTS
Methodology
Data Collection Several data ts were asmbled to eval-uate L IBRA.Thefirst two were bad on thefirst3,061 adequate-information titles(books with at least one abstract, review,or customer comment)returned for the subject arch “literaturefiction.”Two parate ts were randomly lected from this datat,one with936books and one with935,and rated by two different urs.The ts will be called L IT1 and L IT2,respectively.The remaining ts were bad on all of the adequate-information Amazon titles for“mystery”(7,285titles),“science”(6,177titles),and“sciencefiction”(3,813titles).From each of the ts,500titles were chon at random and rated by a ur(the same ur rated both the science and
pili
sciencefiction books).The ts will be called
Data
L IT1
935 4.5341.2
M YST
500 4.1531.2
SF
12345678910
27178677410612583704022 L IT2
7311782946456466151 S CI
56119758367332821126
Table5:Data Rating Distributions
M YST,S CI,and SF,respectively.
In order to prent a quantitative picture of performance on a realistic sample;books to be rated where lected at ran-dom.However,this means that many books may not have been familiar to the ur,in which ca,the ur was asked to supply a rating bad on reviewing the Amazon page de-scribing the book.Table4prents some statistics about the data and Table5prents the number of books in each rating category.Note that overall the data ts have quite different ratings distributions.
Performance Evaluation To test the system,we performed 10-fold cross-validation,in which each data t is randomly split into10equal-size gments and results are averaged over10trials,each time leaving a parate gment out for independent testing,and training the system on the remain-ing data[22].In order to obrve performance given varying amounts of training data,learning curves were generated by testing the system after training on increasing subts of the overall training data.A number of metrics were ud to mea-sure performance on the novel test data,including:
•Classification accuracy(Acc):The percentage of exam-ples correctly classified as positive or negative.
•Recall(Rec):The percentage of positive examples classi-fied as positive.
•Precision(Pr):The percentage of examples classified as positive which are positive.
•Precision at Top3(Pr3):The percentage of the3top ranked examples which are positive.
•Precision at Top10(Pr10):The percentage of the10top ranked examples which are positive.
•F-Measure(F):A weighted average of precision and recall frequently ud in information retrieval:
F=(2·P r·Rec)/(P r+Rec)
Data N
L IT15filter是什么意思
65.551.353.386.776.049.7 6.63 6.650.35 L IT120
73.965.163.686.781.063.47.407.320.64 L IT1100
79.862.875.996.794.068.58.578.030.74
59.057.652.470.074.053.3 6.80 6.820.31 L IT210
69.567.263.293.391.064.18.207.840.59 L IT240
78.078.571.296.794.074.48.778.220.72 L IT2840
M YST5
75.687.982.490.090.083.88.408.340.40 M YST20
85.295.485.996.794.090.38.378.520.50 M YST100
85.893.288.196.798.090.58.908.970.61
62.863.846.373.360.051.1 6.97 6.170.35 S CI10
75.466.064.296.780.063.18.377.030.51 S CI40
81.874.472.293.383.072.38.507.290.65 S CI450
wool
SF5
64.649.028.953.336.031.5 5.83 4.720.15 SF20
72.658.940.170.043.043.0 6.47 5.260.39 SF100
79.282.249.190.063.060.67.70 6.260.61
Table6:Summary of Results
•Rating of Top3(Rt3):The average ur rating assigned to the3top ranked examples.
大连培训学校•Rating of Top10(Rt10):The average ur rating assigned to the10top ranked examples.
•Rank Correlation(r s):Spearman’s rank correlation coef-ficient between the system’s ranking and that impod by the urs ratings(−1≤r s≤1);ties are handled using the method recommended by[1].
The top3and top10metrics are given since many urs will be primarily interested in getting a few top-ranked recom-mendations.Rank correlation gives a good overall picture of how the system’s continuous ranking of books agrees with the ur’s,without requiring that the system actually predict the numerical rating score assigned by the ur.A correlation coefficient of0.3to0.6is generally considered“moderate”and above0.6is considered“strong.”
Basic Results
The results are summarized in Table6,where N reprents the number of training examples utilized and results are shown for a number of reprentative points along the learning curve. Overall,the results are quite encouraging even when the sys-tem is given relatively small training ts.The SF data t is clearly the most difficult since there are very few highly-rated books.
The“top n”metrics are perhaps the most relevant to many urs.Consider precision at top3,which is fairly consis-tently in the90%range after only20training examples(the exceptions are L IT1until70examples1and SF until450 examples).Therefore,L IBRA’s top recommendations are highly likely to be viewed positively by the ur.Note that the“%Positive”column in Table4gives the probability that a randomly chon example from a given data t will be positively rated.Therefore,for every data t,the top3and top10recommendations are always substantially more likely than random to be rated positively,even after only5training examples.
0.10.20.30.40.50.6
positively0.70.80
100
200
300
400500600700
800
900
C o r r e l a t i o n C o e f f i c i e n t
Training Examples
LIBRA LIBRA-NR
Figure 1:L IT 1Rank Correlation Considering the average rating of the top 3recommenda-tions,it is fairly consistently above an 8after only 20training examples (the exceptions again are L IT 1until 100examples and SF).For every data t,the top 3and top 10recommen-dations are always rated substantially higher than a randomly lected example (cf.the average rating from Table 4).Looking a
t the rank correlation,except for SF,there is at least a moderate correlation (r s ≥0.3)after only 10exam-ples,and SF exhibits a moderate correlation after 40exam-ples.This becomes a strong correlation (r s ≥0.6)for L IT 1after only 20examples,for L IT 2after 40examples,for S CI after 70examples,for M YST after 300examples,and for SF after 450examples.
Results on the Role of Collaborative Content
Since collaborative and content-bad approaches to recom-mending have somewhat complementary strengths and weak-ness,an interesting question that has already attracted some initial attention [3,4]is whether they can be combined to produce even better results.Since L IBRA exploits content about related authors and titles that Amazon produces using collaborative methods,an interesting question is whether this collaborative content actually helps its performance.To ex-amine this issue,we conducted an “ablation”study in which the slots for related authors and related titles were removed from L IBRA ’s reprentation of book content.The resulting system,called L IBRA -NR,was compared to the original one using the same 10-fold training and test ts.The statisti-cal significance of any differences in performance between the two systems was evaluated using a 1-tailed paired t -test requiring a significance level of p <0.05.
Overall,the results indicate that the u of collaborative con-tent has a significant positive effect.Figures 1,2,and 3,show sample learning curves for different important met-rics for a few data ts.For the L IT 1rank-correlation re-sults shown in Figure 1,there is a consistent,statistically-significant difference in performance from 20examples on-
010
20304050607080901000
50
100
beige
150
200250300350
400
450
% P r e c i s i o n T o p 10
Training Examples
LIBRA LIBRA-NR
Figure 2:M YST Precision at Top 10
12345678
050100150
200250300350400450
R a t i n g T o p 3
Training Examples
LIBRA LIBRA-NR
Figure 3:SF Average Rating of Top 3
ward.For the M YST results on precision at top 10shown in Figure 2,there is a consistent,statistically-significant differ-ence in performance from 40examples onward.For the SF results on average rating of the top 3,there is a statistically-significant difference at 10,100,150,200,and 450examples.The results shown are some of the most consistent differ-ences for each of the metrics;however,all of the datats demonstrate some significant advantage of using collabora-tive content according to one or more metrics.Therefore,in-formation obtained from collaborative methods can be ud to improve content-bad recommending,even when the ac-tual ur data underlying the collaborative method is unavail-able due to privacy or proprietary concerns.
FUTURE WORK
We are currently developing a web-bad interface so that L IBRA can be experimentally evaluated in practical u with a larger body of urs.We plan to conduct a study in which each ur lects their own training examples,obtains recom-mendations,and provides final informed ratings after reading one or more lected books.