手机版

推荐系统netflix获奖算法(2)

时间：2025-07-01 来源：未知

小中大

字号：

赢得netflix推荐系统大奖的算法

notdeliverarealcontributionafterbeingincorporatedwithintheoverallblend.

III.BASELINEPREDICTORS

Collaborative lteringmodelstrytocapturetheinteractionsbetweenusersanditemsthatproducethedifferentratingvalues.However,manyoftheobservedratingvaluesareduetoeffectsassociatedwitheitherusersoritems,independentlyoftheirinteraction.AprimeexampleisthattypicalCFdataexhibitlargeuseranditembiases–i.e.,systematictendenciesforsomeuserstogivehigherratingsthanothers,andforsomeitemstoreceivehigherratingsthanothers.

Wewillencapsulatethoseeffects,whichdonotinvolveuser-iteminteraction,withinthebaselinepredictors.Becausethesepredictorstendtocapturemuchoftheobservedsignal,itisvitaltomodelthemaccurately.Thisenablesisolatingthepartofthesignalthattrulyrepresentsuser-iteminteraction,andsubjectingittomoreappropriateuserpreferencemodels.Denotebyµtheoverallaveragerating.Abaselinepredic-tionforanunknownratingruiisdenotedbybuiandaccountsfortheuseranditemeffects:

bui=µ+bu+bi

(1)

Theparametersbuandbiindicatetheobserveddeviationsofuseruanditemi,respectively,fromtheaverage.Forexample,supposethatwewantabaselineestimatefortheratingofthemovieTitanicbyuserJoe.Now,saythattheaverageratingoverallmovies,µ,is3.7stars.Furthermore,Titanicisbetterthananaveragemovie,soittendstoberated0.5starsabovetheaverage.Ontheotherhand,Joeisacriticaluser,whotendstorate0.3starslowerthantheaverage.Thus,thebaselineestimateforTitanic’sratingbyJoewouldbe3.9starsbycalculating3.7 0.3+0.5.

Awaytoestimatetheparametersisbydecouplingthecalculationofthebi’sfromthecalculationofthebu’s.First,foreachitemiweset

bi=∑u∈R(i)(rui µ)λ(i)|.(2)

1+|RThen,foreachuseruweset

bu=

∑i∈R(u)(rui µ bi)

λ+|R(u)|

.

(3)

2Averagesareshrunktowardszerobyusingtheregularizationparameters,λ1,λ2,whicharedeterminedbyvalidationontheProbeset.Wewereusing:λ1=25,λ2=10.Wheneverthisworkreferstobaselinepredictorsestimatedfashion,theyaredenotedbyb

inthisdecoupled

ui.Amoreaccurateestimationofbuandbiwilltreatthemsymmetrically,bysolvingtheleastsquaresproblem

min

b(rui µ bu bi)2+λ3(∑b2u+∑b2

i).

(4)

(u,i∑)∈K

u

i

Hereinafter,b denotesalluseranditembiases(busand

bis).The rstterm∑’sthat(u,i)∈K(rui µ+bu+bi)2strivesto ndbu’sandbiterm,λ3(∑ub2u+∑ib2 tthegivenratings.Theregularizing

i),avoidsover ttingbypenalizingthe

magnitudesoftheparameters.Thisleastsquareproblemcan

2

besolvedfairlyef cientlybythemethodofstochasticgradientdescent.Inpractice,wewereusingmorecomprehensiveversionsof(4),towhichweturnnow.A.Timechangingbaselinepredictors

Muchofthetemporalvariabilityinthedataisincludedwithinthebaselinepredictors,throughtwomajortemporaleffects.The rstaddressesthefactthatanitem’spopularitymaychangeovertime.Forexample,moviescangoinandoutofpopularityastriggeredbyexternaleventssuchastheappearanceofanactorinanewmovie.Thisismanifestedinourmodelsbytreatingtheitembiasbiasafunctionoftime.Thesecondmajortemporaleffectallowsuserstochangetheirbaselineratingsovertime.Forexample,auserwhotendedtorateanaveragemovie“4stars”,maynowratesuchamovie“3stars”.Thismayre ectseveralfactorsincludinganaturaldriftinauser’sratingscale,thefactthatratingsaregiveninthecontextofotherratingsthatweregivenrecentlyandalsothefactthattheidentityoftheraterwithinahouseholdcanchangeovertime.Hence,inourmodelswetaketheparameterbuasafunctionoftime.Thisinducesatemplateforatimesensitivebaselinepredictorforu’sratingofiatdaytui:

bui=µ+bu(tui)+bi(tui)

(5)

Here,bu(·)andbi(·)arerealvaluedfunctionsthatchangeovertime.Theexactwaytobuildthesefunctionsshouldre ectareasonablewaytoparameterizetheinvolvingtemporalchanges.

Amajordistinctionisbetweentemporaleffectsthatspanextendedperiodsoftimeandmoretransienteffects.Wedonotexpectmovielikeabilityto uctuateonadailybasis,butrathertochangeovermoreextendedperiods.Ontheotherhand,weobservethatusereffectscanchangeonadailybasis,re ectinginconsistenciesnaturaltocustomerbehavior.Thisrequires nertimeresolutionwhenmodelinguser-biasescomparedwithalowerresolutionthatsuf cesforcapturingitem-relatedtimeeffects.

Westartwithourchoiceoftime-changingitembiasesbi(t).Wefounditadequatetosplittheitembiasesintotime-basedbins,usingaconstantitembiasforeachtimeperiod.Thedecisionofhowtosplitthetimelineintobinsshouldbalancethedesiretoachieve nerresolution(hence,smallerbins)withtheneedforenoughratingsperbin(hence,largerbins).Infact,thereisawidevarietyofbinsizesthatyieldaboutthesameaccuracy.Inourimplementation,eachbincorrespondstoroughlytenconsecutiveweeksofdata,leadingto30binsspanningalldaysinthedataset.AdaytisassociatedwithanintegerBin(t)(anumberbetween1and30inourdata),suchthatthemoviebiasissplitintoastationarypartandatimechangingpart:

bi(t)=bi+bi,Bin(t)(6)Whilebinningtheparametersworkswellontheitems,

itismoreofachallengeontheusers’side.Ontheonehand,wewouldlikea nerresolutionforuserstodetectveryshortlivedtemporaleffects.Ontheotherhand,wedonotexpectenoughratingsperusertoproducereliableestimatesforisolatedbins.Differentfunctionalformscanbe

…… 此处隐藏：2856字，全部文档内容请下载后查看。喜欢就下载吧 ……

推荐系统netflix获奖算法(2).doc 将本文的Word文档下载到电脑，方便复制、编辑、收藏和打印

下载这篇word文档