手机版

推荐系统netflix获奖算法(2)

发布时间:2021-06-07   来源:未知    
字号:

赢得netflix推荐系统大奖的算法

notdeliverarealcontributionafterbeingincorporatedwithintheoverallblend.

III.BASELINEPREDICTORS

Collaborative lteringmodelstrytocapturetheinteractionsbetweenusersanditemsthatproducethedifferentratingvalues.However,manyoftheobservedratingvaluesareduetoeffectsassociatedwitheitherusersoritems,independentlyoftheirinteraction.AprimeexampleisthattypicalCFdataexhibitlargeuseranditembiases–i.e.,systematictendenciesforsomeuserstogivehigherratingsthanothers,andforsomeitemstoreceivehigherratingsthanothers.

Wewillencapsulatethoseeffects,whichdonotinvolveuser-iteminteraction,withinthebaselinepredictors.Becausethesepredictorstendtocapturemuchoftheobservedsignal,itisvitaltomodelthemaccurately.Thisenablesisolatingthepartofthesignalthattrulyrepresentsuser-iteminteraction,andsubjectingittomoreappropriateuserpreferencemodels.Denotebyµtheoverallaveragerating.Abaselinepredic-tionforanunknownratingruiisdenotedbybuiandaccountsfortheuseranditemeffects:

bui=µ+bu+bi

(1)

Theparametersbuandbiindicatetheobserveddeviationsofuseruanditemi,respectively,fromtheaverage.Forexample,supposethatwewantabaselineestimatefortheratingofthemovieTitanicbyuserJoe.Now,saythattheaverageratingoverallmovies,µ,is3.7stars.Furthermore,Titanicisbetterthananaveragemovie,soittendstoberated0.5starsabovetheaverage.Ontheotherhand,Joeisacriticaluser,whotendstorate0.3starslowerthantheaverage.Thus,thebaselineestimateforTitanic’sratingbyJoewouldbe3.9starsbycalculating3.7 0.3+0.5.

Awaytoestimatetheparametersisbydecouplingthecalculationofthebi’sfromthecalculationofthebu’s.First,foreachitemiweset

bi=∑u∈R(i)(rui µ)λ(i)|.(2)

1+|RThen,foreachuseruweset

bu=

∑i∈R(u)(rui µ bi)

λ+|R(u)|

.

(3)

2Averagesareshrunktowardszerobyusingtheregularizationparameters,λ1,λ2,whicharedeterminedbyvalidationontheProbeset.Wewereusing:λ1=25,λ2=10.Wheneverthisworkreferstobaselinepredictorsestimatedfashion,theyaredenotedbyb

inthisdecoupled

ui.Amoreaccurateestimationofbuandbiwilltreatthemsymmetrically,bysolvingtheleastsquaresproblem

min

b(rui µ bu bi)2+λ3(∑b2u+∑b2

i).

(4)

(u,i∑)∈K

u

i

Hereinafter,b denotesalluseranditembiases(busand

bis).The rstterm∑’sthat(u,i)∈K(rui µ+bu+bi)2strivesto ndbu’sandbiterm,λ3(∑ub2u+∑ib2 tthegivenratings.Theregularizing

i),avoidsover ttingbypenalizingthe

magnitudesoftheparameters.Thisleastsquareproblemcan

2

besolvedfairlyef cientlybythemethodofstochasticgradientdescent.Inpractice,wewereusingmorecomprehensiveversionsof(4),towhichweturnnow.A.Timechangingbaselinepredictors

Muchofthetemporalvariabilityinthedataisincludedwithinthebaselinepredictors,throughtwomajortemporaleffects.The rstaddressesthefactthatanitem’spopularitymaychangeovertime.Forexample,moviescangoinandoutofpopularityastriggeredbyexternaleventssuchastheappearanceofanactorinanewmovie.Thisismanifestedinourmodelsbytreatingtheitembiasbiasafunctionoftime.Thesecondmajortemporaleffectallowsuserstochangetheirbaselineratingsovertime.Forexample,auserwhotendedtorateanaveragemovie“4stars”,maynowratesuchamovie“3stars”.Thismayre ectseveralfactorsincludinganaturaldriftinauser’sratingscale,thefactthatratingsaregiveninthecontextofotherratingsthatweregivenrecentlyandalsothefactthattheidentityoftheraterwithinahouseholdcanchangeovertime.Hence,inourmodelswetaketheparameterbuasafunctionoftime.Thisinducesatemplateforatimesensitivebaselinepredictorforu’sratingofiatdaytui:

bui=µ+bu(tui)+bi(tui)

(5)

Here,bu(·)andbi(·)arerealvaluedfunctionsthatchangeovertime.Theexactwaytobuildthesefunctionsshouldre ectareasonablewaytoparameterizetheinvolvingtemporalchanges.

Amajordistinctionisbetweentemporaleffectsthatspanextendedperiodsoftimeandmoretransienteffects.Wedonotexpectmovielikeabilityto uctuateonadailybasis,butrathertochangeovermoreextendedperiods.Ontheotherhand,weobservethatusereffectscanchangeonadailybasis,re ectinginconsistenciesnaturaltocustomerbehavior.Thisrequires nertimeresolutionwhenmodelinguser-biasescomparedwithalowerresolutionthatsuf cesforcapturingitem-relatedtimeeffects.

Westartwithourchoiceoftime-changingitembiasesbi(t).Wefounditadequatetosplittheitembiasesintotime-basedbins,usingaconstantitembiasforeachtimeperiod.Thedecisionofhowtosplitthetimelineintobinsshouldbalancethedesiretoachieve nerresolution(hence,smallerbins)withtheneedforenoughratingsperbin(hence,largerbins).Infact,thereisawidevarietyofbinsizesthatyieldaboutthesameaccuracy.Inourimplementation,eachbincorrespondstoroughlytenconsecutiveweeksofdata,leadingto30binsspanningalldaysinthedataset.AdaytisassociatedwithanintegerBin(t)(anumberbetween1and30inourdata),suchthatthemoviebiasissplitintoastationarypartandatimechangingpart:

bi(t)=bi+bi,Bin(t)(6)Whilebinningtheparametersworkswellontheitems,

itismoreofachallengeontheusers’side.Ontheonehand,wewouldlikea nerresolutionforuserstodetectveryshortlivedtemporaleffects.Ontheotherhand,wedonotexpectenoughratingsperusertoproducereliableestimatesforisolatedbins.Differentfunctionalformscanbe

推荐系统netflix获奖算法(2).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
×
二维码
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)