[WWW09]personalized Recommendation on dynamic content using predictive bilinear models

更新时间:2023-07-03 10:29:38 阅读: 评论:0

Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models
Wei Chu
Y ahoo!Labs.
2821Mission College Blvd
Santa Clara,CA95054
Seung-T aek Park
Y ahoo!Labs.
2821Mission College Blvd Santa Clara,CA95054
ABSTRACT
In Web-bad rvices of dynamic content(such as news arti-cles),recommender systems face the difficulty of timely iden-tifying new items of high-quality and providing recommen-dations for new urs.We propo a feature-bad machine learning approach to personalized recommendation that
is capable of handling the cold-start issue effectively.We main-tain profiles of content of interest,in which temporal charac-teristics of the popularity and freshness,are up-dated in real-time manner.We also maintain profiles of urs including demographic information and a summary of ur activities within Yahoo!properties.Bad on all features
in ur and content profiles,we develop predictive bilinear regression models to provide accurate personalized recom-mendations of new items for both existing and new urs. This approach results in an offline model with light computa-tional overhead compared with other recommender systems that require online re-training.The propod framework is general andflexible for other personalized tasks.The supe-rior performance of our approach is verified on a large-scale data t collected from the Today-Module on Yahoo!Front Page,with comparison against six competitive approaches.
Categories and Subject Descriptors
H.1.0[Models and Principles]:General;H.3.3[Information Search and Retrieval]:Informationfiltering;H.3.5[Online Information Services]:Web-bad rvices
General Terms
Algorithms,Experimentation,Design,Performance
Keywords
Personalization,Dynamic Features,Bilinear,Regression,Rank-ing,Ur and Content Profiles,Recommender Systems
1.INTRODUCTION
The Internet provides an unparalleled opportunity for or-ganizations to deliver digital content to their visitors instan-taneously.Content consumers usually have short attention span,while possibly a large number of content venders.The Copyright is held by the International World Wide Web Conference Com-mittee(IW3C2).Distribution of the papers is limited to classroom u, and personal u by others.
WWW2009,April20–24,2009,Madrid,Spain.
ACM978-1-60558-487-4/09/04.biggest challenge most organizations face is not lack of con-tent,but how to optimize the content they already own by identifying the most appropriate customers at the right time. Personalized recommendation has become a desirable fea-ture of e-business Web sites to improve customer satisfaction and customer retention[8],by tailoring content prentation to suit an individual’s needs rather than take the traditional “one-size-fits-all”approach.
Personalized recommendation involves a process of gath-ering and storing information about site visitors,managing the content asts,analyzing current and past ur inter-active behavior,and,bad on the analysis,delivering the right content to each visitor[31].Search engines help in-dex available content asts and return relevant information to urs,if the urs are looking for something specific that can be summarized as a keyword query.However,in many cas,urs are looking for things might interest them,but do not have concrete desideration in mind when browsing a Web site.In such cas,it is a recommendation engine that prents the most plausible content that the ur may want, bad on her interests as demonstrated by her past activities. Traditional recommendation engines could be distinguished into three different approaches:rule-badfiltering,content-badfiltering,and collaborativefiltering[32].Rule-bad filtering creates a ur-specific utility function and then ap-plies it to the items under consideration.This approach is cloly related to customization,which requires urs to identify themlves,configure their individual ttings,and maintain their personalized environment over time[21].It is easy to fail since the burden of responsibility falls on the urs.Content-badfiltering generates a profile for a ur bad on the content descriptions of the items previously rated by the ur.The main drawback of this approach is the recommended items are similar to the items previously en by the ur.Mladenic[30]provided a survey of the com-monly ud text-learning techniques in the context of con-tentfiltering.Collaborativ
efiltering(CF)is one of the most successful and widely ud recommender system technology [37].CF analyzes urs’ratings to recognize commonalities between urs on the basis of their historical ratings,and then generates new recommendations bad on like-minded urs’preferences.CF provides a good solution to“a clod world”,where overlaps in ratings across urs are relatively high and the univer of content items is almost static.
In many scenarios,such as newsfiltering[15],where the content univer changes rapidly and significant portion of urs are new urs,CF will suffer from the cold-start prob-lem.Several hybrid recommender systems have been devel-
oped to tackle the cold-start problem by combining two or more recommendation techniques.The inability of CF to recommend new items is commonly leveraged by coupling with a content-badfiltering,such as in Fab[3],a recom-mender system for the Web content.Burke[10]provided a comprehensive analysis of approaches to generating hybrid recommendation engines.
Although hybridization can alleviate some of the weak-ness associated with CF and other recommendation tech-niques,there are still a few important issues that haven’t been well studied in literature:
•Dynamic Content:We consider not only the item t undergoes inrtions and deletions frequently,but also the content value and then the appraiment from urs are changing rapidly as well.For example,the lifetime of breaking news on the Internet is usually a couple of hours,and the value of the news(such as click through rate)is decaying temporally as people get to know it, e Figure3(a)for an example.Traditional recom-mender systems usually treat urs’feedback static,so that feedback on the same items given at different time stamps is still comparable.This assumption doesn’t hold on dynamic content.Rebuilding the model on very recent data is typically an expensive task,and tends to lo long-term interests of urs.On dynamic content,recommender systems always face the cold-start problem for new items.
•Urs with Open Profiles:A typical ur profile in a CF system is a list of ratings on items of interest.In practice,we can legally collect ur information to de-velop a general profile for a site visitor[19],which is not limited to the content univer only.The gen-eral profile may include declared demographic informa-tion,activities on relevant sites,consumption history, etc.The objective is to provide valuable insight into urs’preferences,interests and wants.Clearly,the general profile can help tackle the cold-start problem on new urs.Demographic recommender
[34],aim to gment urs bad on personal attributes
and make recommendations according to demographic class.However,the history of ur ratings and con-tent features haven’t been jointly exploited to form “people-to-people”correlation.
In this paper,we propo a machine learning approach to handling both issues in personalized recommendation. The key idea is to maintain profiles for both content and urs,and build a feature-bad bilinear regression model to quantify the associations between heterogeneous features byfitting the historical interactive data.The feature-bad predictive model can then be applied to recommending new and existing items for both new and existing urs.
The goodness of dynamic content over time is a crucial ingredient in content management.We inrt dynamic fea-tures,such as instantaneous click-through rate(CTR)to in-dicate temporal popularity,into the content feature t.We continuously update the dynamic features in the delivery pha by aggregating urs’interactions over content items in a real-time manner.We demonstrate that maintaining content profiles with dynamic features is an effective strat-egy to overcome the cold-start problem on dynamic content.
Figure1:An illustration of unfolding a multi-dimensional event.
The open profiles of urs provide valuable information about ur preferences and interests that helps in recom-mending content for new urs.Historical feedback given by urs on content of interest,such as ratings or click stream, directly reveals urs’opinion on the content univer.The bilinear regression models we propod can discover associa-tion patterns between the general ur profiles and the con-tent features by exploiting the interactive data(the typical ur profile in traditional CF).The established associations are then applied to evaluating individualized appraiment over currently available items for accurate and prompt per-sonalized recommendations in real time.
This work is motivated by a personalized content opti-mization task for the Today-Module on Yahoo!Front Page. The effectiveness of the bilinear models is verified on a large-scale real-world data t collected in the application.This approach results in an offline model except online tracked dynamic features in content profiles.The computational overhead in online recommendation is minor
compared with recommender systems that require online re-training.The framework is general andflexible,which can be adapted to other personalized tasks.
The paper is organized as follows:We introduce data rep-rentation in Section2,which includes content profiling, ur profiling and interactive feedback;In Section3we de-scribe a family of probabilistic bilinear models in detail that covers training algorithms and further discussions on poten-tial capabilities;We review related work in Section4;We re-port the experimental results on the data t collected from the Today-Module with comparison against six competitive alternatives in Section5and conclude in Section6.
2.DATA REPRESENTATION
The obrvational data is naturally recorded in multi-dimensional format.A logistic event is associated with at least three types of objects,ur×content×timestamp. The multi-dimensional events can always beflattened into two-way form without loss of generality,e Figure1for an illustration.In personalization on dynamic content,we can treat content×timestamp as items of interest.Note that the dimension of timestamp is usually not considered in tradi-tional recommender systems.Theflattened dimensions form a new content item space,in which features are extracted for
profiling.We generate and maintain three ts of data: content profiles,ur profiles,and interactive feedback on content items of interest.
2.1Content Profiles
When a content is either created or acquired,the informa-
tion related to the content,such as manufacturer,product name and categories etc.,constitutes an initial part of the profile.Continuous refinement of the content profile helps to optimize the u of the content asts.In the delivery pha, the content is delivered to urs and interactions on the con-tent are logged and analyzed,providing the ability to asss the content popularity in a real-time manner.The content popularity over time is a crucial ingredient in content man-agement,since the commercial value of most content is vary-ing or decaying temporally,especially for breaking news. We consider generalized content items here,which are re-defined with both temporal characteristics and other condi-tions.In a content profile,there are at least two groups of features:
•Static descriptors:Such as categories,manufacturer name,title,bag of words of textual content etc.
•Temporal characteristics:Such as popularity,click-through rate(CTR)and price at current time stamp or the hours elapd after content acquisition.
We can collect any features related to the content items. For example,in arch the items become webpages fud with a query,and then joint features,such as contextual co-occurrences,can be constructed.
Each content is reprented as a vertical vector,denoted by z,where z∈R C and C is the number of content features.
2.2Ur Profiles
The objective of collecting visitor information is to de-velop a ur profile that describes a site visitor’s interests, consumption history,and other descriptors important to the site owner.A review of various ur profiling techniques is provided in[19].Explicit profiling requests each visitor to declare personal information,such as age,gender and occu-pation,or tofill out questionnaires that explicitly state their preferences.Implicit profiling tracks the visitors’behavior and it is generally transparent to the visitor.Browsing and purchasing patterns are the behaviors most often assd. The profile combined with demographic,transaction,and navigation data implicitly reprents a ur’s preferences and recent interests.
The ur feature space is spanned by legally usable fea-tures.Each ur is reprented as a vertical
vector,denoted by x,where x∈R D and D is the dimensionality of the ur feature space.
2.3Interactive Feedback
In traditional collaborativefiltering(CF),the feedback given by urs on content of interest are ud as ur profiles to evaluate commonalities between urs.In our regression approach,we parate the feedback from ur profiles.The feedback on content of interest is utilized as targets that relate patterns in ur features to content features. Although the interactions between the urs and the avail-able items vary depending on the types of items involved, we can always obrve or measure some feedback from ur side.For example,a ur may purcha a product or a r-vice after review,and even rate it later.For a content posted on a Web page,a ur may click to e more details.The ratings and actions(click or not,purcha or not)provide explicit feedback.1There are a range of efforts attempted to measure various kinds of implicit feedback indicators from linger time[13]to eye movements[36].We focus on two types of feedback in this paper:
•Continuous scores:most implicit feedback and ratings can be converted as continuous scores.
•Binary actions:such as click or not,purcha or not after reviewing an item.
We have collected three ts of data,including content features,ur profiles and interactive data between urs and items.Let index the i-th ur as x i and the j-th content item as z j,and denote by r ij the interaction between the ur x i and the item z j.We only obrve interactions on a small subt of all possible ur/item pairs,and denote by O the t of obrvations{r ij}.
3.BILINEAR REGRESSION MODELS
The ur and content profiles provide timely descriptions of urs and items respectively.As the two feature spaces are usually dichotomous,it is hard to apply the contextual data mining techniques[9]here.However,the interactive feedback reveals the correlations between ur patterns and content features.In this ction,we describe a family of pre-dictive bilinear models to discover pattern affinities between heterogeneous features.A t of weight coefficients is in-troduced to capture the pairwi associations between ur and content features.The parametric model is optimized by fitting the obrved interactive feedback.
3.1Bilinear Indicator
The bilinear models can be regarded as a special ca in the Tucker family[14],which have been widely applied in machine learning applications.For example,Tenenbaum and Freeman[39]develope
d a bilinear model for parat-ing“style”and“content”in images,and recently Chu and Ghahramani[11]derived a probabilistic framework of the Tucker family for modeling structural dependency from par-tially obrved high-dimensional array data.
We define an indicator as a bilinear function of x i and z j in the following:
s ij=
C
X
a=1
哈利波特1在线阅读D
X
b=1
emperorx i,b z j,a w ab,(1)
where D and C are the dimensionality of ur and content features respectively,z j,a denotes the a-th feature of z j and x i,b denotes the b-th feature of x i.The weight variable w ab is independent of ur and content features and quantifies the affinity of the two factors x i,b and z j,a in interactions.2 The scalar s ij is generated by mixing the basis vectors with coefficients given by the Kronecker product of x i and z j.The indicator can be equivalently rewritten as
s ij=w (z j⊗x i),
1Clicks and ur purcha history are often considered as im-plicit feedback in other collaborativefiltering literature since the may not reflect real ur preferences.For example,a ur mayfind that an article is uninteresting after clicking and reading it.However,we refer the actions as explicit feedback since the ur intentions of the actions are clearer than tho of other implicit feedback such as linger time and eye movement.
2In practice,we also inrt an individual-specific offt for
where w is a column vector of entries {w ab },and z j ⊗x i
denotes the Kronecker product of x i and z j ,a column vector of entries {x i,b z j,a }.In matrix form,eq(1)can be rewritten as
s ij =x  i W z j ,
where W denotes a D ×C matrix with entries {w ab },which
describes a linear projection from the ur feature space onto the item feature space.The projected ur profile W  x i
is aligned to the item features,denoted by ˜xhandspring
i ,which can be explained as urs’preferences on item characteristics accordingly.Then the indicator becomes a dot s ij =˜x
i z j =P C a =1˜x i,a z j,a .To further examine the feature functions,let us distin-guish dynamic features in the item feature vector as z j = z j,s
z j,d
,where z j,s denotes static features and z j,d denotes dynamic features that vary along time.The indicator s ij can then be rewritten as follows,
s ij =h x  i W s ,x
i W d i
z j,s z j,d  =˜x  i,s z j,s +˜x  i,d z j,d ,(2)where W s and W d denote the columns in W associated
with the static and dynamic item features respectively,and ˜x
i,s and ˜x i,d denote the i -th ur’s preferences on the static and dynamic item features respectively.
Note that a ur’s score s ij on an item is compod of
three parts:˜x  i,s z j,s reflects long-term personal preferences
on content features learnt from historical activities;z j,d is of dynamic characteristics,in our work which include tem-poral popularity over the whole ur article quality;the tradeoffbetween static personal preferences and
article quality is determined by ˜x
i,d .On cold-start with new items,the ur’s preferences on
static item features ˜x  i,s z j,s play an important role,as the
dynamic features couldn’t be accurately estimated at the beginning stage.Similarly,on cold-start with new urs,recommendations are fully determined by the urs’pref-erences on content features ˜x
i ,which are projected from the ur profile x i .3
As we will show in the following,the projection W can be learnt from the historical interactive feedback.
3.2Probabilistic Framework
We employ appropriate likelihood functions to relate the indicator s ij to different types of obrved interactions.•Continuous scores with Gaussian measurement noi:
p (r ij |s ij )=1
√2πσexp  −(r ij −s ij )22σ
,
where σstands for the noi level.4
each ur.The final scalar is evaluated as
iconic
s ij =C X
a =1D X
b =1
baodi
x i,b z j,a w ab +µi ,
where µi ∈R denotes a ur-specific offt.Here µi is ud to tradeoffthe ur’s activity level,since some urs are active clickers while some are casual urs.3
There is an implicit assumption that the ur profile is rich enough to be transformed into preferences on item charac-teristics.This condition can be easily satisfied in practice.4
In practice,the noi level could be prefixed at an appro-priate value bad on the signal/noi ratio.
•Binary actions with r ij ∈{−1,1}.The logistic func-tion is widely ud as the likelihood function,which is defined as
p (r ij |s ij )=
1
1+exp(−r ij s ij +γ),where γdenotes a bias term,usually t at 1.Given a t of w ,the likelihood of obrving the interac-tive data can be evaluated by
p (O |w )=Y
ij
p (r ij |s ij ),(3)
where the index ij runs over the obrvational t O .
We also specify a standard Gaussian distribution over the weight variables as a priori,
p (w )=1√2πςexp  −P ab w 2ab
2ς ,(4)
where ς2is the variance.
Bad on the Bayes’theorem,the posterior distribution of w is proportional to the product of the likelihood and the prior,
p (w |O )∝p (O |w )p (w ).
(5)
where p (w )is the prior distribution defined as in eq(4)and p (O |w )is the likelihood defined as in eq(3).
so what 什么意思3.3Offline Modeling
In this ction,we describe a training algorithm in batch
mode to estimate the posterior distribution of the weight coefficients p (w |O )as in eq(5).For continuous scores with Gaussian noi,the posterior distribution is still a Gaussian due to the conjugate property.With non-Gaussian like-lihood functions,the posterior distribution becomes non-Gaussian.However we can always approximate the true distribution by a Gaussian distribution.One of the most popular techniques is the Laplace approximation [26],which finds the mode of the true posterior as the approximate mean and approximates the inver covariance matrix by the Hes-sian
matrix,the cond order derivatives with respect to the weights at the mode point.
The mode,also known as the maximum-a-posteriori (MAP)estimate,can be found by maximizing the joint probabil-ity p (O |w )p (w ).The optimization problem is equivalent to minimizing the negative logarithm of the joint
min w L (w )=12ς2X ab
w 2
ab −X ij
log p (r ij |s ij ),(6)where ς2plays a role of tradeoff.The gradient with respect
to w ab can be computed as follows,
∂L (w )∂w ab =w ab ς2−X ij ∂log p (r ij |s ij )
∂s ij
x i,b z j,a ,(7)
and gradient-decent packages can then be employed to find the minimum.Note that the objective functional is convex and the minimum is unique.The detailed formulations are given in Table 1and the gradient-descent algorithm is sum-marized as in Table 2.Each objective/gradient evaluation costs O (NCD ),where CD is the size of w and N is the size of the obrved t O .Note that matrix inver can
Table1:The logarithm likelihood functions and the first-order derivatives.
Target log p(r ij|s ij)∂log p(r ij|s ij)
∂s ij
Continuous−(r ij−s ij)2
皮肤暗黄怎么美白2σ2−1
2
log(2πσ)s ij−r ij
can college 21σ2
Binary−log(1+exp(−r ij s ij+γ))r ij p(−r ij|s ij) Table2:The gradient-descent algorithm for MAP.
1.Initialize w=0,givenσ2andς2
2.While objective/gradient evaluation at w is requested:
Compute the objective as in eq(6);
Compute the gradients for w as in eq(7);
Return the objective/gradients to the package.
3.Until the optimization package returns thefinal w.
be applied directly to the ca of continuous targets for an solution,but the computational cost is O(NC2D2+C3D3). It is very expensive for the cas having a large number of features.
3.4Prediction
The MAP estimate,denoted as w MAP,is then applied to new ur/item pairs for prediction.For any pair of x i and z j in test,the best guess of the indicator s ij is determined as follow,
ˆs ij=
C
X
a=1
D
X
b=1
x i,b z j,a w MAP
ab
,(8)
where w MAP
ab
is an entry of the MAP estimate w MAP.
3.5Discussions
In this ction,we discuss model lection and some poten-tials of the framework we propod,such as online learning and active learning.
3.5.1Model Selection
The prior varianceς2is an important model parameter in the regression framework.The most common approach in practice to determine the best model tting is cross valida-tion.In k-fold cross validation,the original training data is randomly partitioned into veral folds,whereas in our ap-plication having time ries of dynamical features we have to split the training data by a temporal point into two folds, usually with size ratio2:1.Given a particular t of model parameters,we run the training algorithm on the fold of earlier data to estimate the weight coefficients,and test the resulting model on the left-out fold to obtain the validation error.The predictive performance indicates the goodness of the model parameter tting.We try grid arch over a t of parameter values tofind th
e optimal one on which we obrve the best performance on the validation data.The optimal weight coefficients in the regression model arefi-nally obtained by training on the whole training data t using the best t of model parameters.
3.5.2Online Learning and Active Learning
郑博闻In this work we only focus on training an offline model cou-pled with dynamic features,whereas the probabilistic frame-work we employed provides the capacity of online learning as well.Assumed-densityfiltering(ADF)is a one-pass,-quential method for computing an approximate posterior distribution[17].In ADF,obrvations are procesd one by one,updating the posterior distribution which is usually approximated as a Gaussian before processing the next ob-rvation.The approximate posterior is found by minimiz-ing KL-divergence to prerve a specific t of posterior ex-pectations.Recently,Expectation Propagation[29]extends ADF to incorporate iterative refinement of the approxima-tions,which iterates additional pass over the obrvations and does not require corresponding with time of arrival as in time ries.
Learning could be made more efficient if we can actively lect salient data points.Within the probabilistic regres-sion framework,the expected informativeness of a new ob-rvation can be mea
sured by the change in entropy of the posterior distribution of the weight coefficients after inclu-sion of the candidate[24].The new posterior distribution with the inclusion of the unud sample can be approx-imated as a Gaussian by ADF-like online learning algo-rithms.Bad on information-theoretical principles,the en-tropy gain on the posterior distribution of weight variables can then be applied as the criterion for candidate election.
4.RELATED WORK
mustOur work is cloly related to adaptive news systems,one of the most popular types of personalized Web-bad rvice [6].The most relevant previous work to our study would be the Google News recommender system[15],a content-agnostic system which combines three different algorithms using a linear model to generate recommendations in News domain.However,since the propod approach is a pure collaborativefiltering,it does not solve the cold-start prob-lem for new urs.Even though ratings from new urs can be updated in near real-time by gridifying their algorithm, it still needs to wait until new urs provide ratings or clicks before making recommendations.Also,the reported results are bad on two heavy ur data ts(top5K heavy urs with370K clicks and500K urs with10M clicks),where effects of new and casual urs haven’t been considered.In our application of the Today-Module on Yahoo!Front Page, 40%of clickers are new clickers with no histor
ical clicks,82% of clickers have less or equal to5historical clicks,92%of clickers have no more than10historical clicks as shown in the Figure3(b).Another key difference lies in that Google News[15]is a content-agnostic system which doesn’t resort to either content features or ur information.YourNews
[2]allows urs to customize their interest profiles through
a ur model interface.The study on ur behavior shows the benefit from customization but also cautions the down-side on system performance.In our application,we build up ur and content profiles without any solicitation on urs. Newsjunkie[18]provided personalized news feeds for urs by measuring news novelty in the context of stories the urs have already read.Our content profiles can also maintain dynamic features in addition to context novelty,such as pop-ularity and freshness.Our model also leverages ur profiles to facilitate cold-start on new urs.
Our work is also related to personalized arch,though the tasks are quite different.Micarelli et al.[28]gave a nice review on this direction.Personalized arch builds models of short-term and long-term ur needs bad on obrved ur actions,which is able to satisfy the urs better than standard arch engines bad on traditional Information Retrieval(IR)techniques.Speretta and Gauch[38]devel-

本文发布于:2023-07-03 10:29:38,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/78/1075890.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:美白   皮肤
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图