首页 > 英语园地

[WWW09]personalized Recommendation on dynamic content using predictive bilinear models

更新时间:2023-07-03 10:29:38 阅读：评论：0

Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models

Wei Chu

Y ahoo!Labs.

2821Mission College Blvd

Santa Clara,CA95054

Seung-T aek Park

Y ahoo!Labs.

2821Mission College Blvd Santa Clara,CA95054

ABSTRACT

In Web-bad rvices of dynamic content(such as news arti-cles),recommender systems face the diﬃculty of timely iden-tifying new items of high-quality and providing recommen-dations for new urs.We propo a feature-bad machine learning approach to personalized recommendation that

is capable of handling the cold-start issue eﬀectively.We main-tain proﬁles of content of interest,in which temporal charac-teristics of the popularity and freshness,are up-dated in real-time manner.We also maintain proﬁles of urs including demographic information and a summary of ur activities within Yahoo!properties.Bad on all features

in ur and content proﬁles,we develop predictive bilinear regression models to provide accurate personalized recom-mendations of new items for both existing and new urs. This approach results in an oﬄine model with light computa-tional overhead compared with other recommender systems that require online re-training.The propod framework is general andﬂexible for other personalized tasks.The supe-rior performance of our approach is veriﬁed on a large-scale data t collected from the Today-Module on Yahoo!Front Page,with comparison against six competitive approaches.

Categories and Subject Descriptors

H.1.0[Models and Principles]:General;H.3.3[Information Search and Retrieval]:Informationﬁltering;H.3.5[Online Information Services]:Web-bad rvices

General Terms

Algorithms,Experimentation,Design,Performance

Keywords

Personalization,Dynamic Features,Bilinear,Regression,Rank-ing,Ur and Content Proﬁles,Recommender Systems

1.INTRODUCTION

The Internet provides an unparalleled opportunity for or-ganizations to deliver digital content to their visitors instan-taneously.Content consumers usually have short attention span,while possibly a large number of content venders.The Copyright is held by the International World Wide Web Conference Com-mittee(IW3C2).Distribution of the papers is limited to classroom u, and personal u by others.

WWW2009,April20–24,2009,Madrid,Spain.

ACM978-1-60558-487-4/09/04.biggest challenge most organizations face is not lack of con-tent,but how to optimize the content they already own by identifying the most appropriate customers at the right time. Personalized recommendation has become a desirable fea-ture of e-business Web sites to improve customer satisfaction and customer retention[8],by tailoring content prentation to suit an individual’s needs rather than take the traditional “one-size-ﬁts-all”approach.

Personalized recommendation involves a process of gath-ering and storing information about site visitors,managing the content asts,analyzing current and past ur inter-active behavior,and,bad on the analysis,delivering the right content to each visitor[31].Search engines help in-dex available content asts and return relevant information to urs,if the urs are looking for something speciﬁc that can be summarized as a keyword query.However,in many cas,urs are looking for things might interest them,but do not have concrete desideration in mind when browsing a Web site.In such cas,it is a recommendation engine that prents the most plausible content that the ur may want, bad on her interests as demonstrated by her past activities. Traditional recommendation engines could be distinguished into three diﬀerent approaches:rule-badﬁltering,content-badﬁltering,and collaborativeﬁltering[32].Rule-bad ﬁltering creates a ur-speciﬁc utility function and then ap-plies it to the items under consideration.This approach is cloly related to customization,which requires urs to identify themlves,conﬁgure their individual ttings,and maintain their personalized environment over time[21].It is easy to fail since the burden of responsibility falls on the urs.Content-badﬁltering generates a proﬁle for a ur bad on the content descriptions of the items previously rated by the ur.The main drawback of this approach is the recommended items are similar to the items previously en by the ur.Mladenic[30]provided a survey of the com-monly ud text-learning techniques in the context of con-tentﬁltering.Collaborativ

eﬁltering(CF)is one of the most successful and widely ud recommender system technology [37].CF analyzes urs’ratings to recognize commonalities between urs on the basis of their historical ratings,and then generates new recommendations bad on like-minded urs’preferences.CF provides a good solution to“a clod world”,where overlaps in ratings across urs are relatively high and the univer of content items is almost static.

In many scenarios,such as newsﬁltering[15],where the content univer changes rapidly and signiﬁcant portion of urs are new urs,CF will suﬀer from the cold-start prob-lem.Several hybrid recommender systems have been devel-

oped to tackle the cold-start problem by combining two or more recommendation techniques.The inability of CF to recommend new items is commonly leveraged by coupling with a content-badﬁltering,such as in Fab[3],a recom-mender system for the Web content.Burke[10]provided a comprehensive analysis of approaches to generating hybrid recommendation engines.

Although hybridization can alleviate some of the weak-ness associated with CF and other recommendation tech-niques,there are still a few important issues that haven’t been well studied in literature:

•Dynamic Content:We consider not only the item t undergoes inrtions and deletions frequently,but also the content value and then the appraiment from urs are changing rapidly as well.For example,the lifetime of breaking news on the Internet is usually a couple of hours,and the value of the news(such as click through rate)is decaying temporally as people get to know it, e Figure3(a)for an example.Traditional recom-mender systems usually treat urs’feedback static,so that feedback on the same items given at diﬀerent time stamps is still comparable.This assumption doesn’t hold on dynamic content.Rebuilding the model on very recent data is typically an expensive task,and tends to lo long-term interests of urs.On dynamic content,recommender systems always face the cold-start problem for new items.

•Urs with Open Proﬁles:A typical ur proﬁle in a CF system is a list of ratings on items of interest.In practice,we can legally collect ur information to de-velop a general proﬁle for a site visitor[19],which is not limited to the content univer only.The gen-eral proﬁle may include declared demographic informa-tion,activities on relevant sites,consumption history, etc.The objective is to provide valuable insight into urs’preferences,interests and wants.Clearly,the general proﬁle can help tackle the cold-start problem on new urs.Demographic recommender

[34],aim to gment urs bad on personal attributes

and make recommendations according to demographic class.However,the history of ur ratings and con-tent features haven’t been jointly exploited to form “people-to-people”correlation.

In this paper,we propo a machine learning approach to handling both issues in personalized recommendation. The key idea is to maintain proﬁles for both content and urs,and build a feature-bad bilinear regression model to quantify the associations between heterogeneous features byﬁtting the historical interactive data.The feature-bad predictive model can then be applied to recommending new and existing items for both new and existing urs.

The goodness of dynamic content over time is a crucial ingredient in content management.We inrt dynamic fea-tures,such as instantaneous click-through rate(CTR)to in-dicate temporal popularity,into the content feature t.We continuously update the dynamic features in the delivery pha by aggregating urs’interactions over content items in a real-time manner.We demonstrate that maintaining content proﬁles with dynamic features is an eﬀective strat-egy to overcome the cold-start problem on dynamic content.

Figure1:An illustration of unfolding a multi-dimensional event.

The open proﬁles of urs provide valuable information about ur preferences and interests that helps in recom-mending content for new urs.Historical feedback given by urs on content of interest,such as ratings or click stream, directly reveals urs’opinion on the content univer.The bilinear regression models we propod can discover associa-tion patterns between the general ur proﬁles and the con-tent features by exploiting the interactive data(the typical ur proﬁle in traditional CF).The established associations are then applied to evaluating individualized appraiment over currently available items for accurate and prompt per-sonalized recommendations in real time.

This work is motivated by a personalized content opti-mization task for the Today-Module on Yahoo!Front Page. The eﬀectiveness of the bilinear models is veriﬁed on a large-scale real-world data t collected in the application.This approach results in an oﬄine model except online tracked dynamic features in content proﬁles.The computational overhead in online recommendation is minor

compared with recommender systems that require online re-training.The framework is general andﬂexible,which can be adapted to other personalized tasks.

The paper is organized as follows:We introduce data rep-rentation in Section2,which includes content proﬁling, ur proﬁling and interactive feedback;In Section3we de-scribe a family of probabilistic bilinear models in detail that covers training algorithms and further discussions on poten-tial capabilities;We review related work in Section4;We re-port the experimental results on the data t collected from the Today-Module with comparison against six competitive alternatives in Section5and conclude in Section6.

2.DATA REPRESENTATION

The obrvational data is naturally recorded in multi-dimensional format.A logistic event is associated with at least three types of objects,ur×content×timestamp. The multi-dimensional events can always beﬂattened into two-way form without loss of generality,e Figure1for an illustration.In personalization on dynamic content,we can treat content×timestamp as items of interest.Note that the dimension of timestamp is usually not considered in tradi-tional recommender systems.Theﬂattened dimensions form a new content item space,in which features are extracted for

proﬁling.We generate and maintain three ts of data: content proﬁles,ur proﬁles,and interactive feedback on content items of interest.

2.1Content Proﬁles

When a content is either created or acquired,the informa-

tion related to the content,such as manufacturer,product name and categories etc.,constitutes an initial part of the proﬁle.Continuous reﬁnement of the content proﬁle helps to optimize the u of the content asts.In the delivery pha, the content is delivered to urs and interactions on the con-tent are logged and analyzed,providing the ability to asss the content popularity in a real-time manner.The content popularity over time is a crucial ingredient in content man-agement,since the commercial value of most content is vary-ing or decaying temporally,especially for breaking news. We consider generalized content items here,which are re-deﬁned with both temporal characteristics and other condi-tions.In a content proﬁle,there are at least two groups of features:

•Static descriptors:Such as categories,manufacturer name,title,bag of words of textual content etc.

•Temporal characteristics:Such as popularity,click-through rate(CTR)and price at current time stamp or the hours elapd after content acquisition.

We can collect any features related to the content items. For example,in arch the items become webpages fud with a query,and then joint features,such as contextual co-occurrences,can be constructed.

Each content is reprented as a vertical vector,denoted by z,where z∈R C and C is the number of content features.

2.2Ur Proﬁles

The objective of collecting visitor information is to de-velop a ur proﬁle that describes a site visitor’s interests, consumption history,and other descriptors important to the site owner.A review of various ur proﬁling techniques is provided in[19].Explicit proﬁling requests each visitor to declare personal information,such as age,gender and occu-pation,or toﬁll out questionnaires that explicitly state their preferences.Implicit proﬁling tracks the visitors’behavior and it is generally transparent to the visitor.Browsing and purchasing patterns are the behaviors most often assd. The proﬁle combined with demographic,transaction,and navigation data implicitly reprents a ur’s preferences and recent interests.

The ur feature space is spanned by legally usable fea-tures.Each ur is reprented as a vertical

vector,denoted by x,where x∈R D and D is the dimensionality of the ur feature space.

2.3Interactive Feedback

In traditional collaborativeﬁltering(CF),the feedback given by urs on content of interest are ud as ur proﬁles to evaluate commonalities between urs.In our regression approach,we parate the feedback from ur proﬁles.The feedback on content of interest is utilized as targets that relate patterns in ur features to content features. Although the interactions between the urs and the avail-able items vary depending on the types of items involved, we can always obrve or measure some feedback from ur side.For example,a ur may purcha a product or a r-vice after review,and even rate it later.For a content posted on a Web page,a ur may click to e more details.The ratings and actions(click or not,purcha or not)provide explicit feedback.1There are a range of eﬀorts attempted to measure various kinds of implicit feedback indicators from linger time[13]to eye movements[36].We focus on two types of feedback in this paper:

•Continuous scores:most implicit feedback and ratings can be converted as continuous scores.

•Binary actions:such as click or not,purcha or not after reviewing an item.

We have collected three ts of data,including content features,ur proﬁles and interactive data between urs and items.Let index the i-th ur as x i and the j-th content item as z j,and denote by r ij the interaction between the ur x i and the item z j.We only obrve interactions on a small subt of all possible ur/item pairs,and denote by O the t of obrvations{r ij}.

3.BILINEAR REGRESSION MODELS

The ur and content proﬁles provide timely descriptions of urs and items respectively.As the two feature spaces are usually dichotomous,it is hard to apply the contextual data mining techniques[9]here.However,the interactive feedback reveals the correlations between ur patterns and content features.In this ction,we describe a family of pre-dictive bilinear models to discover pattern aﬃnities between heterogeneous features.A t of weight coeﬃcients is in-troduced to capture the pairwi associations between ur and content features.The parametric model is optimized by ﬁtting the obrved interactive feedback.

3.1Bilinear Indicator

The bilinear models can be regarded as a special ca in the Tucker family[14],which have been widely applied in machine learning applications.For example,Tenenbaum and Freeman[39]develope

d a bilinear model for parat-ing“style”and“content”in images,and recently Chu and Ghahramani[11]derived a probabilistic framework of the Tucker family for modeling structural dependency from par-tially obrved high-dimensional array data.

We deﬁne an indicator as a bilinear function of x i and z j in the following:

s ij=

a=1

哈利波特1在线阅读D

b=1

emperorx i,b z j,a w ab,(1)

where D and C are the dimensionality of ur and content features respectively,z j,a denotes the a-th feature of z j and x i,b denotes the b-th feature of x i.The weight variable w ab is independent of ur and content features and quantiﬁes the aﬃnity of the two factors x i,b and z j,a in interactions.2 The scalar s ij is generated by mixing the basis vectors with coeﬃcients given by the Kronecker product of x i and z j.The indicator can be equivalently rewritten as

s ij=w (z j⊗x i),

1Clicks and ur purcha history are often considered as im-plicit feedback in other collaborativeﬁltering literature since the may not reﬂect real ur preferences.For example,a ur mayﬁnd that an article is uninteresting after clicking and reading it.However,we refer the actions as explicit feedback since the ur intentions of the actions are clearer than tho of other implicit feedback such as linger time and eye movement.

2In practice,we also inrt an individual-speciﬁc oﬀt for

where w is a column vector of entries {w ab },and z j ⊗x i

denotes the Kronecker product of x i and z j ,a column vector of entries {x i,b z j,a }.In matrix form,eq(1)can be rewritten as

s ij =x i W z j ,

where W denotes a D ×C matrix with entries {w ab },which

describes a linear projection from the ur feature space onto the item feature space.The projected ur proﬁle W x i

is aligned to the item features,denoted by ˜xhandspring

i ,which can be explained as urs’preferences on item characteristics accordingly.Then the indicator becomes a dot s ij =˜x

i z j =P C a =1˜x i,a z j,a .To further examine the feature functions,let us distin-guish dynamic features in the item feature vector as z j = z j,s

z j,d

,where z j,s denotes static features and z j,d denotes dynamic features that vary along time.The indicator s ij can then be rewritten as follows,

s ij =h x i W s ,x

i W d i

z j,s z j,d =˜x i,s z j,s +˜x i,d z j,d ,(2)where W s and W d denote the columns in W associated

with the static and dynamic item features respectively,and ˜x

i,s and ˜x i,d denote the i -th ur’s preferences on the static and dynamic item features respectively.

Note that a ur’s score s ij on an item is compod of

three parts:˜x i,s z j,s reﬂects long-term personal preferences

on content features learnt from historical activities;z j,d is of dynamic characteristics,in our work which include tem-poral popularity over the whole ur article quality;the tradeoﬀbetween static personal preferences and

article quality is determined by ˜x

i,d .On cold-start with new items,the ur’s preferences on

static item features ˜x i,s z j,s play an important role,as the

dynamic features couldn’t be accurately estimated at the beginning stage.Similarly,on cold-start with new urs,recommendations are fully determined by the urs’pref-erences on content features ˜x

i ,which are projected from the ur proﬁle x i .3

As we will show in the following,the projection W can be learnt from the historical interactive feedback.

3.2Probabilistic Framework

We employ appropriate likelihood functions to relate the indicator s ij to diﬀerent types of obrved interactions.•Continuous scores with Gaussian measurement noi:

p (r ij |s ij )=1

√2πσexp −(r ij −s ij )22σ

where σstands for the noi level.4

each ur.The ﬁnal scalar is evaluated as

iconic

s ij =C X

a =1D X

b =1

baodi

x i,b z j,a w ab +µi ,

where µi ∈R denotes a ur-speciﬁc oﬀt.Here µi is ud to tradeoﬀthe ur’s activity level,since some urs are active clickers while some are casual urs.3

There is an implicit assumption that the ur proﬁle is rich enough to be transformed into preferences on item charac-teristics.This condition can be easily satisﬁed in practice.4

In practice,the noi level could be preﬁxed at an appro-priate value bad on the signal/noi ratio.

•Binary actions with r ij ∈{−1,1}.The logistic func-tion is widely ud as the likelihood function,which is deﬁned as

p (r ij |s ij )=

1+exp(−r ij s ij +γ),where γdenotes a bias term,usually t at 1.Given a t of w ,the likelihood of obrving the interac-tive data can be evaluated by

p (O |w )=Y

p (r ij |s ij ),(3)

where the index ij runs over the obrvational t O .

We also specify a standard Gaussian distribution over the weight variables as a priori,

p (w )=1√2πςexp −P ab w 2ab

2ς ,(4)

where ς2is the variance.

Bad on the Bayes’theorem,the posterior distribution of w is proportional to the product of the likelihood and the prior,

p (w |O )∝p (O |w )p (w ).

(5)

where p (w )is the prior distribution deﬁned as in eq(4)and p (O |w )is the likelihood deﬁned as in eq(3).

so what 什么意思3.3Ofﬂine Modeling

In this ction,we describe a training algorithm in batch

mode to estimate the posterior distribution of the weight coeﬃcients p (w |O )as in eq(5).For continuous scores with Gaussian noi,the posterior distribution is still a Gaussian due to the conjugate property.With non-Gaussian like-lihood functions,the posterior distribution becomes non-Gaussian.However we can always approximate the true distribution by a Gaussian distribution.One of the most popular techniques is the Laplace approximation [26],which ﬁnds the mode of the true posterior as the approximate mean and approximates the inver covariance matrix by the Hes-sian

matrix,the cond order derivatives with respect to the weights at the mode point.

The mode,also known as the maximum-a-posteriori (MAP)estimate,can be found by maximizing the joint probabil-ity p (O |w )p (w ).The optimization problem is equivalent to minimizing the negative logarithm of the joint

min w L (w )=12ς2X ab

w 2

ab −X ij

log p (r ij |s ij ),(6)where ς2plays a role of tradeoﬀ.The gradient with respect

to w ab can be computed as follows,

∂L (w )∂w ab =w ab ς2−X ij ∂log p (r ij |s ij )

∂s ij

x i,b z j,a ,(7)

and gradient-decent packages can then be employed to ﬁnd the minimum.Note that the objective functional is convex and the minimum is unique.The detailed formulations are given in Table 1and the gradient-descent algorithm is sum-marized as in Table 2.Each objective/gradient evaluation costs O (NCD ),where CD is the size of w and N is the size of the obrved t O .Note that matrix inver can

Table1:The logarithm likelihood functions and the ﬁrst-order derivatives.

Target log p(r ij|s ij)∂log p(r ij|s ij)

∂s ij

Continuous−(r ij−s ij)2

皮肤暗黄怎么美白2σ2−1

log(2πσ)s ij−r ij

can college 21σ2

Binary−log(1+exp(−r ij s ij+γ))r ij p(−r ij|s ij) Table2:The gradient-descent algorithm for MAP.

1.Initialize w=0,givenσ2andς2

2.While objective/gradient evaluation at w is requested:

Compute the objective as in eq(6);

Compute the gradients for w as in eq(7);

Return the objective/gradients to the package.

3.Until the optimization package returns theﬁnal w.

be applied directly to the ca of continuous targets for an solution,but the computational cost is O(NC2D2+C3D3). It is very expensive for the cas having a large number of features.

3.4Prediction

The MAP estimate,denoted as w MAP,is then applied to new ur/item pairs for prediction.For any pair of x i and z j in test,the best guess of the indicator s ij is determined as follow,

ˆs ij=

a=1

b=1

x i,b z j,a w MAP

,(8)

where w MAP

is an entry of the MAP estimate w MAP.

3.5Discussions

In this ction,we discuss model lection and some poten-tials of the framework we propod,such as online learning and active learning.

3.5.1Model Selection

The prior varianceς2is an important model parameter in the regression framework.The most common approach in practice to determine the best model tting is cross valida-tion.In k-fold cross validation,the original training data is randomly partitioned into veral folds,whereas in our ap-plication having time ries of dynamical features we have to split the training data by a temporal point into two folds, usually with size ratio2:1.Given a particular t of model parameters,we run the training algorithm on the fold of earlier data to estimate the weight coeﬃcients,and test the resulting model on the left-out fold to obtain the validation error.The predictive performance indicates the goodness of the model parameter tting.We try grid arch over a t of parameter values toﬁnd th

e optimal one on which we obrve the best performance on the validation data.The optimal weight coeﬃcients in the regression model areﬁ-nally obtained by training on the whole training data t using the best t of model parameters.

3.5.2Online Learning and Active Learning

郑博闻In this work we only focus on training an oﬄine model cou-pled with dynamic features,whereas the probabilistic frame-work we employed provides the capacity of online learning as well.Assumed-densityﬁltering(ADF)is a one-pass,-quential method for computing an approximate posterior distribution[17].In ADF,obrvations are procesd one by one,updating the posterior distribution which is usually approximated as a Gaussian before processing the next ob-rvation.The approximate posterior is found by minimiz-ing KL-divergence to prerve a speciﬁc t of posterior ex-pectations.Recently,Expectation Propagation[29]extends ADF to incorporate iterative reﬁnement of the approxima-tions,which iterates additional pass over the obrvations and does not require corresponding with time of arrival as in time ries.

Learning could be made more eﬃcient if we can actively lect salient data points.Within the probabilistic regres-sion framework,the expected informativeness of a new ob-rvation can be mea

sured by the change in entropy of the posterior distribution of the weight coeﬃcients after inclu-sion of the candidate[24].The new posterior distribution with the inclusion of the unud sample can be approx-imated as a Gaussian by ADF-like online learning algo-rithms.Bad on information-theoretical principles,the en-tropy gain on the posterior distribution of weight variables can then be applied as the criterion for candidate election.

4.RELATED WORK

mustOur work is cloly related to adaptive news systems,one of the most popular types of personalized Web-bad rvice [6].The most relevant previous work to our study would be the Google News recommender system[15],a content-agnostic system which combines three diﬀerent algorithms using a linear model to generate recommendations in News domain.However,since the propod approach is a pure collaborativeﬁltering,it does not solve the cold-start prob-lem for new urs.Even though ratings from new urs can be updated in near real-time by gridifying their algorithm, it still needs to wait until new urs provide ratings or clicks before making recommendations.Also,the reported results are bad on two heavy ur data ts(top5K heavy urs with370K clicks and500K urs with10M clicks),where eﬀects of new and casual urs haven’t been considered.In our application of the Today-Module on Yahoo!Front Page, 40%of clickers are new clickers with no histor

ical clicks,82% of clickers have less or equal to5historical clicks,92%of clickers have no more than10historical clicks as shown in the Figure3(b).Another key diﬀerence lies in that Google News[15]is a content-agnostic system which doesn’t resort to either content features or ur information.YourNews

[2]allows urs to customize their interest proﬁles through

a ur model interface.The study on ur behavior shows the beneﬁt from customization but also cautions the down-side on system performance.In our application,we build up ur and content proﬁles without any solicitation on urs. Newsjunkie[18]provided personalized news feeds for urs by measuring news novelty in the context of stories the urs have already read.Our content proﬁles can also maintain dynamic features in addition to context novelty,such as pop-ularity and freshness.Our model also leverages ur proﬁles to facilitate cold-start on new urs.

Our work is also related to personalized arch,though the tasks are quite diﬀerent.Micarelli et al.[28]gave a nice review on this direction.Personalized arch builds models of short-term and long-term ur needs bad on obrved ur actions,which is able to satisfy the urs better than standard arch engines bad on traditional Information Retrieval(IR)techniques.Speretta and Gauch[38]devel-

本文发布于:2023-07-03 10:29:38，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/1075890.html

上一篇：2012高中英语 Module1The Third Period：Language Study教案外研版必修1

下一篇：Customer perceptions of e-rvice quality in online shopping

标签：美白皮肤

留言与评论（共有 0 条评论）