赢得netflix推荐系统大奖的算法
rule:
r ui=bui+|R(u)| 1
(ruj b
+|N(u)| j∈∑
1
uj)wijR(u)
j∈∑
cij
N(u)
Here,then2
(17)item-itemweightswijandcijrepresenttheadjustmentsweneedtomaketothepredictedratingofitemi,givenaratingofitemj.Itwasprovengreatlybene cialtousetwosetsofitem-itemweights:one(thewijs)isrelatedtothevaluesoftheratings,andtheotherdisregardstheratingvalue,consideringonlywhichitemswererated(thecijs).Theseweightsareautomaticallylearnedfromthebiases.Theconstantsb
thedatatogetherwith
ujareprecomputedaccordingto(2)–(3).
Whenadaptingrule(17)toaddresstemporaldynamics,twocomponentsshouldbeconsideredseparately.First,isthebaselinepredictorportion,bui=µ+bi+bu,whichexplainsmostoftheobservedsignal.Second,isthepartthatcapturesthemore|R(u)| 1
informativesignal,∑j∈R(u)(ruj b dealingwith)| 1user-iteminteraction
uj)wij+|N(ufromthe∑j∈N(u)cij.Forthe
baselinepart,nothingchangesfactormodel,andwemakeittime-aware,accordingtoeither(10),or(11).Thelatteroneaddsfrequenciesandisgenerallypreferred.However,capturingtemporaldynamicswithintheinteractionpartrequiresadifferentstrategy.
Item-itemweights(wijandcij)re ectinherentitemcharac-teristicsandarenotexpectedtodriftovertime.Thelearningprocessshouldmakesurethattheycaptureunbiasedlongtermvalues,withoutbeingtooaffectedfromdriftingaspects.Indeed,thetime-changingnatureofthedatacanmaskmuchofthelongertermitem-itemrelationshipsifnottreatedad-equately.Forinstance,auserratingbothitemsiandjhighinashorttimeperiod,isagoodindicatorforrelatingthem,therebypushinghigherthevalueofwij.Ontheotherhand,ifthosetworatingsaregiven veyearsapart,whiletheuser’staste(ifnotheridentity)couldconsiderablychange,thisislessevidenceofanyrelationbetweentheitems.Ontopofthis,wewouldarguethatthoseconsiderationsareprettymuchuser-dependent–someusersaremoreconsistentthanothersandallowrelatingtheirlongertermactions.
Ourgoalhereistodistillaccuratevaluesfortheitem-itemweights,despitetheinterferingtemporaleffects.Firstweneedtoparameterizethedecayingrelationsbetweentwoitemsratedbyuseru.Weadoptanexponentialdecayformedbythefunctione βu· t,whereβu>0controlstheuserspeci cdecayrateandshouldbelearnedfromthedata.Thisleadstothepredictionrule
r 1
ui=bui+|N(u)| j|cij+
(18)
j∈∑
e βu·|tui tuN(u)
|R(u)| 1
e βu·|tui tuj|((ruj b
).j∈∑
uj)wijR(u)
Theinvolvedparametersarelearnedbyminimizingtheas-sociatedregularizedsquarederror.Minimizationisperformed
bystochasticgradientdescentfor20–30iterations.Themodelisapplieddirectlytotherawdata,soallparameters(biasesandmovie-movieweights)arelearnedsimultaneously.Learning
6
ofbias-relatedparametersisgovernedbythesameconstants
discussedinSec.III.Asforthemovie-movieweights(bothwijandcij),theirlearningrateis0.005withregularizationconstantof0.002.Finally,theupdateoftheexponentβu,usesaparticularlysmallstepsizeof1e-7,withregularizationconstantequaling0.01.
Wealsoexperimentedwithotherdecayforms,likethemorecomputationally-friendly(1+βu t) 1,whichresultedinthesameaccuracy,withanimprovedrunningtime.(Noneedtochangemeta-parameters.)
Asinthefactorcase,properlyconsideringtemporaldy-namicsimprovestheaccuracyoftheneighborhoodmodel.TheRMSEdecreasesfrom0.9002[7]to0.8870(seenextsubsection).Toourbestknowledge,thisissigni cantlybetterthanpreviouslyknownresultsbyneighborhoodmethods.Toputthisinsomeperspective,thisresultisevenbetterthanthosereported[1,2,11,15]byusinghybridapproachessuchasapplyinganeighborhoodapproachonresidualsofotheralgorithms.Alessonisthataddressingtemporaldynamicsinthedatacanhaveamoresigni cantimpactonaccuracythandesigningmorecomplexlearningalgorithms.
A.What’sintheblend?
Weranthetime-awareneighborhoodmodel,withbiasesfollowing(10)for20,25,and30iterationsofstochasticgradientdescent.TheresultingRMSEswere0.8887,0.8885and0.8887,respectively.Theresultswith20and30iterationsareintheblend.
Wealsotriedextending(18)withanon-normalizedterm.Thisinvolvedaddingathirdsetofmovie-movieweights,dij,asfollows:
r 1
ui=bui+|N(u)| βu·|tui tuj|cij+
j∈∑
e N(u)
|R(u)|
1e βu·|tui tuj|((ruj b
)+j∈∑
uj)wijR(u)
j∈∑
e γu·|tui tuj|((ruj b
uj)dij).R(u)
Here,wealsotriedtoemphasizetheveryadjacentratingsmadebytheuser.Therefore,thenewdecay-controllingcon-stants,theγus,wereinitializedwitharelativelyhighvalueof0.5(comparedtoinitializingβuwithzero.)Inaddition,fordijweusedaslowerlearningrateof1e-5.Learningwasdoneby25iterationsofstochasticgradientdescent.TheresultwithRMSE=0.8881isincludedintheblend.Inretrospect,webelievethatsuchaminisculeRMSEreductiondoesnotjustifyaddingathirdsetofmovie-movieweights.
Finally,weranthetime-awareneighborhoodmodel,withbiasesfollowing(11)for20iterationsofstochasticgradientdescent.(Thethirdsetofmovie-movieweightswasnotused.)TheresultofRMSE=ter,werefertothismodelas[PQ3].