P. Brusilovsky, A. Kobsa, and W. Nejdl (Eds.): The Adaptive Web, LNCS 4321, pp. 325 – 341, 2007. © Springer-Verlag Berlin Heidelberg 2007
10
Content-Bad Recommendation Systems
Michael J. Pazzani 1 and Daniel Billsus 2
1 Rutgers University, ASBIII, 3 Rutgers Plaza
New Brunswick, NJ 08901 pazzani@rutgers.edu
2 FX Palo Alto Laboratory, Inc., 3400 Hillview Ave, Bldg. 4
Palo Alto, CA 94304
msdosAbstract. This chapter discuss content-bad recommendation systems, i.e., systems that recommend an item to a ur bad upon a description of the item and a profile of the ur’s interests. Content-bad recommendation systems may be ud in a variety of domains ranging from recommen
ding web pages, news articles, restaurants, television programs, and items for sale. Although the details of various systems differ, content-bad recommendation systems share in common a means for describing the items that may be recommended, a means for creating a profile of the ur that describes the types of items the ur likes, and a means of comparing items to the ur profile to determine what to recommend. The profile is often created and updated automatically in respon to feedback on the desirability of items that have been prented to the ur.
10.1 Introduction
A common scenario for modern recommendation systems is a Web application with which a ur interacts. Typically, a system prents a summary list of items to a ur, and the ur lects among the items to receive more details on an item or to interact with the item in some way. For example, online news sites prent web pages with headlines (and occasionally story summaries) and allow the ur to lect a headline to read a story. E-commerce sites often prent a page with a list of individual prod-ucts and then allow the ur to e more details about a lected product and purcha the product. Although the web rver transmits HTML and the ur es a web page, the web rver typically has a databa of items and dynamically constructs web pages with a list of items. Becau there are often many more items available in a databa than would easily fit on a w
eb page, it is necessary to lect a subt of items to display to the ur or to determine an order in which to display the items.
Content-bad recommendation systems analyze item descriptions to identify items that are of particular interest to the ur. Becau the details of recommendation sys-tems differ bad on the reprentation of items, this chapter first discuss alternative item reprentations. Next, recommendation algorithms suited for each reprentation are discusd. The chapter concludes with a discussion of variants of the approaches,
326 M.J. Pazzani and D. Billsus
the strengths and weakness of content-bad recommendation systems, and direc-tions for future rearch and development.
10.1.1 Item Reprentation
Items that can be recommended to the ur are often stored in a databa table. Table 10.1 shows a simple databa with records (i.e., “rows”) that describe three restau-rants. The column names (e.g., Cuisine or Service) are properties of restaurants. The properties are also called “attributes,” “chara
cteristics,” “fields,” or “variables” in different publications. Each record contains a value for each attribute. A unique iden-tifier, ID in Table 10.1, allows items with the same name to be distinguished and rves as a key to retrieve the other attributes of the record.
中韩翻译在线
Table 10.1. A restaurant databa
ID Name Cuisine Service Cost
sunday是什么意思10002 Chris’s Cafe French Table Medium
10003 Jacques Bistro French Table High
The databa depicted in Table 10.1 could be ud to drive a web site that lists and recommends restaurants. This is an example of structured data in which there is a small number of attributes, each item is described by the same t of attributes, and there is a known t of values that the attributes may have. In this ca, many machine learning algorithms may be ud to learn a ur profile, or a menu interface can easily be created to allow a ur to create a profile. The next ction of this chapter discuss veral approaches to creating a ur profile from structured data.
Of cour, a web page typically has more information than is shown in Table 10.1, such as a text des
cription of the restaurant, a restaurant review, or even a menu. The may easily be stored as additional fields in the databa and a web page can be cre-ated with templates to display the text fields (as well as the structured data). However, free text data creates a number of complications when learning a ur profile. For exam-ple, a profile might indicate that there is an 80% probability that a particular ur would like a French restaurant. This might be added to the profile becau a ur gave a posi-tive review of four out of five French restaurants. However, unrestricted text fields are typically unique and there would be no opportunity to provide feedback on five restau-rants described as “A charming café with attentive staff overlooking the river.”
An extreme example of unstructured data may occur in news articles. Table 10.2 shows an example of a part of a news article. The entire article can be treated as a large unrestricted text field.
Table 10.2. Part of a newspaper article
Lawmakers Fine-Tuning Energy Plan
SACRAMENTO, Calif. -- With California's energy rerves remaining all but de-pleted, lawmakers prepared to work through the weekend fine-tuning a plan Gov.
Gray Davis says will put the state in the power business for "a long time to come."
The proposal involves partially taking over California's two largest utilities and sign-ing long-term contracts of up to 10 years to buy electricity from wholesalers.
schedule是什么意思
10 Content-Bad Recommendation Systems 327 Unrestricted texts such as news articles are examples of unstructured data. Unlike structured data, there are no attribute names with well-defined values. Furthermore, the full complexity of natural language may be prent in the text field including polymous words (the same word may have veral meanings) and synonyms (dif-ferent words may have the same meaning). For example, in the article in Table 10.2, “Gray” is a name rather than a color, and “power” and “electricity” refer to the same underlying concept.
Many domains are best reprented by mi-structured data in which there are some attributes with a t of restricted values and some free-text fields. A common approach to dealing with free text fields is to convert the free text to a structured rep-rentation. For example, each word may be viewed as an attribute, with a Boolean value indicating whether the word is in the article or with an integer value indicating the number of times the word appears in the article.
Many personalization systems that deal with unrestricted text u a technique to create a structured reprentation that originated with text arch systems [34]. In this formalism, rather than using word
s, the root forms of words are typically created through a process called stemming [30]. The goal of stemming is to create a term that reflects the common meaning behind words such as “compute,” “computation,” “computer” “computes” and “computers.” The value of a variable associated with a term is a real number that reprents the importance or relevance. This value is called the tf*idf weight (term-frequency times inver document frequency). The tf*idf weight, w(t,d), of a term t in a document d is a function of the frequency of t in the document (tf t ,d), the number of documents that contain the term (df t ) and the number of documents in the collection (N).1
22,,log )(log ),(∑⎟⎟⎠⎞⎜⎜⎝⎛⎟⎟
⎠
⎞⎜⎜⎝⎛=i t d t t d t i i df N tf df N tf d t w (10.1)
Table 10.3 shows the tf*idf reprentation (also called the vector space reprentation) of the complete article excerpted in Table 10.2. The terms are ordered by the tf*idf weight. The intuition behind the weight is that the terms with the highest weight occur more often in that document than in the other documents, and therefore are more central to the topic of the document. Note that terms such as “util” (a stem of “utility”), “power,” “megawatt,” are among the highest weighted terms capturi
ng the meaning. 1 Note that in the description of tf*idf weights, the word “document” is traditionally ud since the original motivation was to retrieve documents. While the chapter will stick with the original terminology, in a recommendation system, the documents correspond to a text de-scription of an item to be recommended. Note that the equations here are reprentative of the class of formulae called tf*idf . In general, tf*idf systems have weights that increa monotonically with term frequency and decrea monotonically with document frequency.
328 M.J. Pazzani and D. Billsus
Table 10.3.tf*idf reprentation of the article in Table 10.2
util-0.339 power-0.329 megawatt-0.309 electr-0.217 energi-0.206 california-0.181 debt-0.128 lawmak-0.128 state-0.122 wholesal-0.119 partial-0.106 consum-0.105 alert-0.103 scroung-0.096 advoc-0.09 testi-0.088 bail-out-0.088 crisi-0.085 amid-0.084 price-0.083 long-0.082 bond-0.081 plan-0.081 term-0.08 grid-0.078 rerv-0.077 blackout-0.076 bid-0.076 market-0.074 fine-0.073 deregul-0.07 spiral-0.068 deplet-0.068 liar-0.066.
smoOf cour, this reprentation does not capture the context in which a word is ud. It los the relati
onships between words in the description. For example, a description of a steak hou might contain the ntence, “there is nothing on the menu that a vege-tarian would like” while the description of a vegetarian restaurant might mention “vegan” rather than vegetarian. In a manually created structured databa, the cuisine attribute having a value of “vegetarian” would indicate that the restaurant is indeed a vegetarian one. In contrast, when converting an unstructured text description to struc-tured data, the prence of the word vegetarian does not always indicate that a restau-rant is vegetarian and the abnce of the word vegetarian does not always indicate that the restaurant is not a vegetarian restaurant. As a conquence, techniques for creating ur profiles that deal with structured data need to differ somewhat from tho tech-niques that deal with unstructured data or unstructured data automatically and impre-cily converted to structured data.
One variant on using words as terms is to u ts of contiguous words as terms. For example, in the article in Table 10.2, terms such as “energy rerves” and “power business” might be more descriptive of the content than the words treated as indi-vidual terms. Of cour, terms such as “all but” would also be included, but one would expect that the have very low weights, in the same way that “all” and “but” individu-ally have low weights and are not among the most important terms in Table 10.3.
ahp
10.2 Ur Profiles
A profile of the ur’s interests is ud by most recommendation systems. This profile may consist of a number of different types of information. Here, we concentrate on two types of information:
1.A model of the ur’s preferences, i.e., a description of the types of items that
免费英语资料interest the ur. There are many possible alternative reprentations of this de-scription, but one common reprentation is a function that for any item predicts the likelihood that the ur is interested in that item. For efficiency purpos, this function may be ud to retrieve the n items most likely to be of interest to the ur.
2.A history of the ur’s interactions with the recommendation system. This may
include storing the items that a ur has viewed together with other information about the ur’s interaction, (e.g., whether the ur has purchad the item or a rat-ing that the ur has given the item). Other types of history include saving queries typed by the ur (e.g., that a ur arched for an Italian restaurant in the 90210 zip code).
10 Content-Bad Recommendation Systems 329 There are veral us of the history of ur inte
styx
ractions. First, the system can simply display recently visited items to facilitate the ur returning to the items. Second, the system can filter out from a recommendation system an item that the ur has already purchad or read.2 Another important u of the history in content-bad recommendation systems is to rve as training data for a machine learning algorithm that creates a ur model. The next ction will discuss veral different approaches to learning a ur model. Here, we briefly describe approaches of manually providing the information ud by recommendation systems: ur customization and rule-bad recommendation systems.
In ur customization, a recommendation system provides an interface that al-lows urs to construct a reprentation of their own interests. Often check boxes are ud to allow a ur to lect from the known values of attributes, e.g., the cui-sine of restaurants, the names of favorite sports teams, the favorite ctions of a news site, or the genre of favorite movies. In other cas, a form allows a ur to type words that occur in the free text descriptions of items, e.g., the name of a mu-sician or author that interests the ur. Once the ur has entered this information, a simple databa matching process is ud to find items that meet the specified crite-ria and display them to the ur.
indecentThere are veral limitations of ur customization systems. First, they require ef-fort from the ur a
nd it is difficult to get many urs to make this effort. This is par-ticularly true when the ur’s interests change, e.g., a ur may not follow football during the ason but then become interested in the Superbowl. Second, customiza-tion systems do not provide a way to determine the order in which to prent items and can find either too few or too many matching items to display.
Figure 10.1 shows book recommendations is usually thought of as a good example of collaborative recommendation (e Chap-ter 9 of this book [35]), parts of the ur’s profile can be viewed as a content-bad profile. For example, Amazon contains a feature called “favorites” that reprents the categories of items preferred by urs. The favorites are either calculated by keep-ing track of the categories of items purchad by urs or may be t manually by the ur. Figure 10.2 shows an example of a ur customization interface in which a ur can lect the categories.
In rule-bad recommendation systems, the recommendation system has rules to recommend other products bad on the ur history. For example, a system may contain a rule that recommends the quel to a book or movie to people who have purchad the early item in the ries. Another rule might recommend a new CD by an artist to urs that purchad earlier CDs by that artist. Rule-bad systems may capture veral common reasons for making recommendations, but they do not
offer the same detailed personalized recommendations that are available with other recom-mendation systems.
cqlt2Of cour, in some situations it is appropriate to recommend an item the ur has purchad and in other situations it is not. For example, a system should continue to recommend an item that wears out or is expended, such as a razor blade or print cartridge, while there is lit-tle value in recommending a CD or DVD a ur owns.