赢得netflix推荐系统大奖的算法
notdeliverarealcontributionafterbeingincorporatedwithintheoverallblend.
III.BASELINEPREDICTORS
Collaborative lteringmodelstrytocapturetheinteractionsbetweenusersanditemsthatproducethedifferentratingvalues.However,manyoftheobservedratingvaluesareduetoeffectsassociatedwitheitherusersoritems,independentlyoftheirinteraction.AprimeexampleisthattypicalCFdataexhibitlargeuseranditembiases–i.e.,systematictendenciesforsomeuserstogivehigherratingsthanothers,andforsomeitemstoreceivehigherratingsthanothers.
Wewillencapsulatethoseeffects,whichdonotinvolveuser-iteminteraction,withinthebaselinepredictors.Becausethesepredictorstendtocapturemuchoftheobservedsignal,itisvitaltomodelthemaccurately.Thisenablesisolatingthepartofthesignalthattrulyrepresentsuser-iteminteraction,andsubjectingittomoreappropriateuserpreferencemodels.Denotebyµtheoverallaveragerating.Abaselinepredic-tionforanunknownratingruiisdenotedbybuiandaccountsfortheuseranditemeffects:
bui=µ+bu+bi
(1)
Theparametersbuandbiindicatetheobserveddeviationsofuseruanditemi,respectively,fromtheaverage.Forexample,supposethatwewantabaselineestimatefortheratingofthemovieTitanicbyuserJoe.Now,saythattheaverageratingoverallmovies,µ,is3.7stars.Furthermore,Titanicisbetterthananaveragemovie,soittendstoberated0.5starsabovetheaverage.Ontheotherhand,Joeisacriticaluser,whotendstorate0.3starslowerthantheaverage.Thus,thebaselineestimateforTitanic’sratingbyJoewouldbe3.9starsbycalculating3.7 0.3+0.5.
Awaytoestimatetheparametersisbydecouplingthecalculationofthebi’sfromthecalculationofthebu’s.First,foreachitemiweset
bi=∑u∈R(i)(rui µ)λ(i)|.(2)
1+|RThen,foreachuseruweset
bu=
∑i∈R(u)(rui µ bi)
λ+|R(u)|
.
(3)
2Averagesareshrunktowardszerobyusingtheregularizationparameters,λ1,λ2,whicharedeterminedbyvalidationontheProbeset.Wewereusing:λ1=25,λ2=10.Wheneverthisworkreferstobaselinepredictorsestimatedfashion,theyaredenotedbyb
inthisdecoupled
ui.Amoreaccurateestimationofbuandbiwilltreatthemsymmetrically,bysolvingtheleastsquaresproblem
min
b(rui µ bu bi)2+λ3(∑b2u+∑b2
i).
(4)
(u,i∑)∈K
u
i
Hereinafter,b denotesalluseranditembiases(busand
bis).The rstterm∑’sthat(u,i)∈K(rui µ+bu+bi)2strivesto ndbu’sandbiterm,λ3(∑ub2u+∑ib2 tthegivenratings.Theregularizing
i),avoidsover ttingbypenalizingthe
magnitudesoftheparameters.Thisleastsquareproblemcan
2
besolvedfairlyef cientlybythemethodofstochasticgradientdescent.Inpractice,wewereusingmorecomprehensiveversionsof(4),towhichweturnnow.A.Timechangingbaselinepredictors
Muchofthetemporalvariabilityinthedataisincludedwithinthebaselinepredictors,throughtwomajortemporaleffects.The rstaddressesthefactthatanitem’spopularitymaychangeovertime.Forexample,moviescangoinandoutofpopularityastriggeredbyexternaleventssuchastheappearanceofanactorinanewmovie.Thisismanifestedinourmodelsbytreatingtheitembiasbiasafunctionoftime.Thesecondmajortemporaleffectallowsuserstochangetheirbaselineratingsovertime.Forexample,auserwhotendedtorateanaveragemovie“4stars”,maynowratesuchamovie“3stars”.Thismayre ectseveralfactorsincludinganaturaldriftinauser’sratingscale,thefactthatratingsaregiveninthecontextofotherratingsthatweregivenrecentlyandalsothefactthattheidentityoftheraterwithinahouseholdcanchangeovertime.Hence,inourmodelswetaketheparameterbuasafunctionoftime.Thisinducesatemplateforatimesensitivebaselinepredictorforu’sratingofiatdaytui:
bui=µ+bu(tui)+bi(tui)
(5)
Here,bu(·)andbi(·)arerealvaluedfunctionsthatchangeovertime.Theexactwaytobuildthesefunctionsshouldre ectareasonablewaytoparameterizetheinvolvingtemporalchanges.
Amajordistinctionisbetweentemporaleffectsthatspanextendedperiodsoftimeandmoretransienteffects.Wedonotexpectmovielikeabilityto uctuateonadailybasis,butrathertochangeovermoreextendedperiods.Ontheotherhand,weobservethatusereffectscanchangeonadailybasis,re ectinginconsistenciesnaturaltocustomerbehavior.Thisrequires nertimeresolutionwhenmodelinguser-biasescomparedwithalowerresolutionthatsuf cesforcapturingitem-relatedtimeeffects.
Westartwithourchoiceoftime-changingitembiasesbi(t).Wefounditadequatetosplittheitembiasesintotime-basedbins,usingaconstantitembiasforeachtimeperiod.Thedecisionofhowtosplitthetimelineintobinsshouldbalancethedesiretoachieve nerresolution(hence,smallerbins)withtheneedforenoughratingsperbin(hence,largerbins).Infact,thereisawidevarietyofbinsizesthatyieldaboutthesameaccuracy.Inourimplementation,eachbincorrespondstoroughlytenconsecutiveweeksofdata,leadingto30binsspanningalldaysinthedataset.AdaytisassociatedwithanintegerBin(t)(anumberbetween1and30inourdata),suchthatthemoviebiasissplitintoastationarypartandatimechangingpart:
bi(t)=bi+bi,Bin(t)(6)Whilebinningtheparametersworkswellontheitems,
itismoreofachallengeontheusers’side.Ontheonehand,wewouldlikea nerresolutionforuserstodetectveryshortlivedtemporaleffects.Ontheotherhand,wedonotexpectenoughratingsperusertoproducereliableestimatesforisolatedbins.Differentfunctionalformscanbe