Item-bad collaborative filtering recommendation algorithms

更新时间:2023-06-16 07:10:16 阅读: 评论:0

Item-bad Collaborative Filtering Recommendation
Algorithms
Badrul Sarwar,George Karypis,Joph Konstan,and John Riedl
GroupLens Rearch Group/Army HPC Rearch Center
Department of Computer Science and Engineering
University of Minnesota,Minneapolis,MN55455
{sarwar,karypis,konstan,riedl}@cs.umn.edu
Appears in WWW10,May1-5,2001,Hong Kong.
Abstract
Recommender systems apply knowledge discovery techniques to the problem of making personalized recom-mendations for information,products or rvices during a live interaction.The systems,especially the k-nearest
neighbor collaborativefiltering bad ones,are achieving widespread success on the Web.The tremendous growth in
自制豆腐乳the amount of available information and the number of visitors to Web sites in recent years pos some key challenges
for recommender systems.The are:producing high quality recommendations,performing many recommendations
per cond for millions of urs and items and achieving high coverage in the face of data sparsity.In traditional
collaborativefiltering systems the amount of work increas with the number of participants in the system.New
recommender system technologies are needed that can quickly produce high quality recommendations,even for very
large-scale problems.To address the issues we have explored item-bad collaborativefiltering techniques.Item-
bad techniquesfirst analyze the ur-item matrix to identify relationships between different items,and then u
the relationships to indirectly compute recommendations for urs.
In this paper we analyze different item-bad recommendation generation algorithms.We look into different techniques for computing item-item ,item-item sine similarities between item vec-
tors)and different techniques for obtaining recommendations from ,weighted ssion model).
Finally,we experimentally evaluate our results and compare them to the basic k-nearest neighbor approach.Our
experiments suggest that item-bad algorithms provide dramatically better performance than ur-bad algorithms,
while at the same time providing better quality than the best available ur-bad algorithms.
1Introduction
The amount of information in the world is increasing far more quickly than our ability to process it.All of us have known the feeling of being overwhelmed by the number of new books,journal articles,and conference proceedings coming out each year.Technology has dramatically reduced the barriers to publishing and distributing information. Now it is time to create the technologies that can help us sift through all the available information tofind that which is most valuable to us.
One of the most promising such technologies is collaborativefiltering[19,27,14,16].Collaborativefiltering works by building a databa of preferences for items by urs.A new ur,Neo,is matched against the databa to discover neighbors,which are other urs who have historically had similar taste to Neo.Items that the neighbors like are then
recommended to Neo,as he will probably also like them.Collaborativefiltering has been very successful in both rearch and practice,and in both informationfiltering applications and E-commerce applications.However,there re-main important rearch questions in overcoming two fundamental challenges for collaborativefiltering recommender systems.
Thefirst challenge is to improve the scalability of the collaborativefiltering algorithms.The algorithms are able to arch tens of thousands of potential neighbors in real-time,but the demands
of modern systems are to arch tens of millions of potential neighbors.Further,existing algorithms have performance problems with individual urs for whom the site has large amounts of information.For instance,if a site is using browsing patterns as indications of content preference,it may have thousands of data points for its most frequent visitors.The“long ur rows”slow down the number of neighbors that can be arched per cond,further reducing scalability.
The cond challenge is to improve the quality of the recommendations for the urs.Urs need recommendations they can trust to help themfind items they will like.Urs will”vote with their feet”by refusing to u recommender systems that are not consistently accurate for them.懂得作文
In some ways the two challenges are in conflict,since the less time an algorithm spends arching for neighbors, the more scalable it will be,and the wor its quality.For this reason,it is important to treat the two challenges simultaneously so the solutions discovered are both uful and practical.
In this paper,we address the issues of recommender systems by applying a different approach–item-bad al-gorithms.The bottleneck in conventional collaborativefiltering algorithms is the arch for neighbors among a large ur population of potential neighbors[12].Item-bad algorithms avoid this bottleneck by exploring the relation-ships between itemsfirst,rather than the relationships betwee
n urs.Recommendations for urs are computed by finding items that are similar to other items the ur has liked.Becau the relationships between items are relatively static,item-bad algorithms may be able to provide the same quality as the ur-bad algorithms with less online computation.
1.1Related Work
In this ction we briefly prent some of the rearch literature related to collaborativefiltering,recommender systems, data mining and personalization.
Tapestry[10]is one of the earliest implementations of collaborativefiltering-bad recommender systems.This system relied on the explicit opinions of people from a clo-knit community,such as an office workgroup.How-ever,recommender system for large communities cannot depend on each person knowing the others.Later,veral ratings-bad automated recommender systems were developed.The GroupLens rearch system[19,16]provides a pudonymous collaborativefiltering solution for Unet news and movies.Ringo[27]and Video Recommender[14] are email and web-bad systems that generate recommendations on music and movies respectively.A special issue of Communications of the ACM[20]prents a number of different recommender systems.
Other technologies have also been applied to recommender systems,including Bayesian networks,cl
ustering,and Horting.Bayesian networks create a model bad on a training t with a decision tree at each node and edges reprenting ur information.The model can be built off-line over a matter of hours or days.The resulting model is very small,very fast,and esntially as accurate as nearest neighbor methods[6].Bayesian networks may prove practical for environments in which knowledge of ur preferences changes slowly with respect to the time needed to build the model but are not suitable for environments in which ur preference models must be updated rapidly or frequently.
Clustering techniques work by identifying groups of urs who appear to have similar preferences.Once the clusters are created,predictions for an individual can be made by averaging the opinions of the other urs in that cluster.Some clustering techniques reprent each urs with partial participation in veral clusters.The prediction is then an average across the clusters,weighted by degree of participation.Clustering techniques usually produce less-personal recommendations than other methods,and in some cas,the clusters have wor accuracy than nearest neighbor algorithms[6].Once the clustering is complete,however,performance can be very good,since the size of the group that must be analyzed is much smaller.Clustering techniques can also be applied as a”first step”for
shrinking the candidate t in a nearest neighbor algorithm or for distributing nearest-neighbor comp
utation across veral recommender engines.While dividing the population into clusters may hurt the accuracy or recommendations to urs near the fringes of their assigned cluster,pre-clustering may be a worthwhile trade-off between accuracy and throughput.
Horting is a graph-bad technique in which nodes are urs,and edges between nodes indicate degree of similarity between two urs[1].Predictions are produced by walking the graph to nearby nodes and combining the opinions of the nearby urs.Horting differs from nearest neighbor as the graph may be walked through other urs who have not rated the item in question,thus exploring transitive relationships that nearest neighbor algorithms do not consider.In one study using synthetic data,Horting produced better predictions than a nearest neighbor algorithm[1].
Schafer et al.,[26]prent a detailed taxonomy and examples of recommender systems ud in E-commerce and how they can provide one-to-one personalization and at the same can capture customer loyalty.Although the systems have been successful in the past,their widespread u has expod some of their limitations such as the problems of sparsity in the data t,problems associated with high dimensionality and so on.Sparsity problem in recommender system has been addresd in[23,11].The problems associated with high dimensionality in recommender systems have been discusd in[4],and application of dimensionality reduction techniques to address the is
sues has been investigated in[24].
Our work explores the extent to which item-bad recommenders,a new class of recommender algorithms,are able to solve the problems.
1.2Contributions
This paper has three primary rearch contributions:
1.Analysis of the item-bad prediction algorithms and identification of different ways to implement its subtasks.
2.Formulation of a precomputed model of item similarity to increa the online scalability of item-bad recom-
mendations.
3.An experimental comparison of the quality of veral different item-bad algorithms to the classic ur-bad
(nearest neighbor)algorithms.
1.3Organization
The rest of the paper is organized as follows.The next ction provides a brief background in collaborativefiltering algorithms.Wefirst formally describe the collaborativefiltering process and then discuss its two variants memory-bad and model-bad approaches.We then prent some challenges associated with the memory-bad approach. In ction3,we prent the item-bad approach and describe different sub-tasks of the algorithm in detail.Section4 describes our experimental work.It provides details of our data ts,evaluation metrics,methodology and results of different experiments and discussion of the results.Thefinal ction provides some concluding remarks and directions for future rearch.
2Collaborative Filtering Bad Recommender Systems
Recommender systems systems apply data analysis techniques to the problem of helping ursfind the items they would like to purcha at E-Commerce sites by producing a predicted likeliness score or a list of top–N recommended items for a given ur.Item recommendations can be made using different methods.Recommendations can be bad on demographics of the urs,overall top lling items,or past buying habit of urs as a predictor of future items. Collaborative Filtering(CF)[19,27]is
the most successful recommendation technique to date.The basic idea of CF-bad algorithms is to provide item recommendations or predictions bad on the opinions of other like-minded urs.The opinions of urs can be obtained explicitly from the urs or by using some implicit measures.
u 1 u 2
a
. . . . i i i
.
优美的段落摘抄大全
.
Input (ratings table)
Active ur
Item for which prediction
is sought
CF-Algorithm
P
a,j
领导活动(prediction on
item  j for the active
ur)
{T
i1
公司职位, T
i2
, ..., T
iN
}  T op-N
list of items for the
active ur
Output interface Figure1:The Collaborative Filtering Process.
2.0.1Overview of the Collaborative Filtering Process
The goal of a collaborativefiltering algorithm is to suggest new items or to predict the utility of a certain item for a particular ur bad on the ur’s previous likings and the opinions of other like-minded urs.In a typical CF scenario,there is a list of m urs={u1,u2,...,u m}and a list of n items={i1,i2,...,i n}.Each ur u i has a list of items I u
i
,which the ur has expresd his/her opinions about.Opinions can be explicitly given by the ur as a rating score,generally within a certain numerical scale,or can be implicitly derived from purcha records,by
analyzing timing logs,by mining web hyperlinks and so on[28,16].Note that I u
i ⊆and it is possible for I u
i
to beblack是什么颜色
a null-t.There exists a distinguished ur u a∈called the active ur for whom the task of a collaborativefiltering algorithm is tofind an item likeliness that can be of two forms.
•Prediction is a numerical value,P a,j,expressing the predicted likeliness of item i j∈I u a for the active ur u a.
This predicted value is within the same ,from1to5)as the opinion values provided by u a.
•Recommendation is a list of N items,I r⊂,that the active ur will like the most.Note that the recommended
list must be on items not already purchad by the active ,I r∩I u
a
= .This interface of CF algorithms is also known as Top-N recommendation.
Figure1shows the schematic diagram of the collaborativefiltering process.CF algorithms reprent the entire m×n ur-item data as a ratings matrix,.Each entry a i,j in reprent the preference score(ratings)of the i th ur on the j th item.Each individual ratings is within a numerical scale and it can as well be0indicating that the ur has not yet rated that item.Rearchers have devid a number of collaborativefiltering algorithms that can be divided into two main categories—Memory-bad(ur-bad)and Model-bad(item-bad)algorithms[6].In this ction we provide a detailed a
nalysis of CF-bad recommender system algorithms.
Memory-bad Collaborative Filtering Algorithms Memory-bad algorithms utilize the entire ur-item data-ba to generate a prediction.The systems employ statistical techniques tofind a t of urs,known as neighbors,that have a history of agreeing with the target ,they either rate different items similarly or they tend to buy similar t of items).Once a neighborhood of urs is formed,the systems u different algorithms to combine the preferences of neighbors to produce a prediction or top-N recommendation for the active ur.The techniques,also known as nearest-neighbor or ur-bad collaborativefiltering are more popular and widely ud in practice.
Model-bad Collaborative Filtering Algorithms Mo-del-bad collaborativefiltering algorithms provide item recommendation byfirst developing a model of ur ratings.Algorithms in this category take a probabilistic approach and envision the collaborativefiltering process as computing the expected value of a ur prediction,given his/her ratings on other items.The model building process is performed by different machine learning algorithms such as Bayesian network,clustering,and rule-bad approaches.The Bayesian network model[6]formulates a
probabilistic model for collaborativefiltering problem.Clustering model treats collaborativefiltering as
a classification problem[2,6,29]and works by clustering similar urs in same class and estimating the probability that a particular ur is in a particular class C,and from there computes the conditional probability of ratings.The rule-bad approach applies association rule discovery algorithms tofind association between co-purchad items and then generates item recommendation bad on the strength of the association between items[25].
2.0.2Challenges of Ur-bad Collaborative Filtering Algorithms
Ur-bad collaborativefiltering systems have been very successful in past,but their widespread u has revealed some potential challenges such as:
•Sparsity.In practice,many commercial recommender systems are ud to evaluate large item , recommends books recommends music albums).In the systems,even active urs may have purchad well under1%of the items(1%of2million books is20,000books).Accordingly,a recommender system bad on nearest neighbor algorithms may be unable to make any item recommendations for a particular ur.As a result the accuracy of recommendations may be poor.
•Scalability.Nearest neighbor algorithms require computation that grows with both the number of u
rs and the number of items.With millions of urs and items,a typical web-bad recommender system running existing algorithms will suffer rious scalability problems.
The weakness of nearest neighbor algorithm for large,spar databas led us to explore alternative recommender system algorithms.Ourfirst approach attempted to bridge the sparsity by incorporating mi-intelligentfiltering agents into the system[23,11].The agents evaluated and rated each item using syntactic features.By providing a den ratings t,they helped alleviate coverage and improved quality.Thefiltering agent solution,however,did not address the fundamental problem of poor relationships among like-minded but spar-rating urs.To explore that we took an algorithmic approach and ud Latent Semantic Indexing(LSI)to capture the similarity between urs and items in a reduced dimensional space[24,25].In this paper we look into another technique,the model-bad approach,in addressing the challenges,especially the scalability challenge.The main idea here is to analyze the ur-item reprentation matrix to identify relations between different items and then to u the relations to compute the prediction score for a given ur-item pair.The intuition behind this approach is that a ur would be interested in purchasing items that are similar to the items the ur liked earlier and would tend to avoid items that are similar to the items the ur didn’t like earlier.The techniques don’t require to identify the neighborhood of similar urs when a recomme
ndation is requested,as a result they tend to produce much faster recommendations.A number of different schemes have been propod to compute the association between items ranging from probabilistic approach[6]to more traditional item-item correlations[15,13].We prent a detailed analysis of our approach in the next ction.
3Item-bad Collaborative Filt-ering Algorithm
In this ction we study a class of item-bad recommendation algorithms for producing predictions to urs.Unlike the ur-bad collaborativefiltering algorithm discusd in Section2the item-bad approach looks into the t of items the target ur has rated and computes how similar they are to the target item i and then lects k most similar items{i1,i2,...,i k}.At the same time their corresponding similarities{s i1,s i2,...,s ik}are also computed.Once the most similar items are found,the prediction is then computed by taking a weighted average of the target ur’s ratings on the similar items.We describe the two aspects namely,the similarity computation and the prediction generation in details here.
3.1Item Similarity Computation
One critical step in the item-bad collaborativefiltering algorithm is to compute the similarity betwee
n items and then to lect the most similar items.The basic idea in similarity computation between two items i and j is tofirst isolate
1 2
3
n-1 n 2 u m-1
i  and  j  the similarity  s i,j    is 1, u老马识途
m -1. Figure 2:Isolation of the co-rated items and similarity computation
the urs who have rated both of the items and then to apply a similarity computation technique to determine the similarity s i ,j .Figure 2illustrates this process,here the matrix rows reprent urs and the columns reprent items.
There are a number of different ways to compute the similarity between items.Here we prent three such methods.The are cosine-bad similarity,correlation-bad similarity and adjusted-cosine similarity.
3.1.1Cosine-bad Similarity
In this ca,two items are thought of as two vectors in the m dimensional ur-space.The similarity between them is measured by computing the cosine of the angle between the two vectors.Formally,in the m ×n ratings matrix in Figure 2,similarity between items i and j ,denoted by sim (i ,j )is given by
sim (i ,j )=cos ( i , j )= i · j
u ∈U (R u ,j −¯R j )2
需求说明书
.Here R u ,i denotes the rating of ur u on item i ,¯R
i is the average rating of the i -th item.3.1.3Adjusted Cosine Similarity
One fundamental difference between the similarity computation in ur-bad CF and item-bad CF is that in ca of ur-bad CF the similarity is computed along the rows of the matrix but in ca of the item-bad CF the similarity is computed along the ,each pair in the co-rated t corresponds to a different ur (Figure 2).Computing similarity using basic cosine measure in item-bad ca has one important drawback–the difference in rating scale between different urs are not taken into account.The adjusted cosine similarity offts this drawback by subtracting

本文发布于:2023-06-16 07:10:16,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/966123.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:豆腐乳   懂得   段落
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图