手机版

推荐系统netflix获奖算法(3)

发布时间:2021-06-07   来源:未知    
字号:

赢得netflix推荐系统大奖的算法

consideredforparameterizingtemporaluserbehavior,withvaryingcomplexityandaccuracy.

Onesimplemodelingchoiceusesalinearfunctiontocaptureapossiblegradualdriftofuserbias.Foreachuseru,wedenotethemeandateofratingbytu.Now,ifuratedamovieondayt,thentheassociatedtimedeviationofthisratingisde nedas

devu(t)=sign(t tu)·|t tu|β.

Here|t tu|measuresthenumberofdaysbetweendatestandtu.WesetthevalueofβbyvalidationontheProbeset;inourimplementationβ=0.4.Weintroduceasinglenewparameterforeachusercalledαusothatwegetour rstde nitionofatime-dependentuser-bias:

bu(1)

(t)=bu+αu·devu(t)

(7)

Thissimplelinearmodelforapproximatingadriftingbehaviorrequireslearningtwoparametersperuser:buandαu.

Thelinearfunctionformodelingtheuserbiasmesheswellwithgradualdriftsintheuserbehavior.However,wealsoobservesuddendriftsemergingas“spikes”associatedwithasingledayorsession.Forexample,wehavefoundthatmultipleratingsausergivesinasingleday,tendtoconcentratearoundasinglevalue.Suchaneffectneednotspanmorethanasingleday.Thismayre ectthemoodoftheuserthatday,theimpactofratingsgiveninasingledayoneachother,orchangesintheactualraterinmulti-personaccounts.Toaddresssuchshortlivedeffects,weassignasingleparameterperuserandday,absorbingtheday-speci cvariability.Thisparameterisdenotedbybut.

IntheNet ixdata,auserrateson40differentdaysonaverage.Thus,workingwithbutrequires,onaverage,40parameterstodescribeeachuserbias.Itisexpectedthatbutisinadequateasastandaloneforcapturingtheuserbias,sinceitmissesallsortsofsignalsthatspanmorethanasingleday.Thus,itservesasanadditivecomponentwithinthepreviouslydescribedschemes.Theuserbiasmodel(7)becomes

bu(3)

(t)=bu+αu·devu(t)+but.

(8)

Thediscussionsofarleadstothebaselinepredictorbui=µ+bu+αu·devu(tui)+bu,tui+bi+bi,Bin(tui).

(9)

Ifusedasastandalonepredictor,itsresultingRMSEwouldbe0.9605.

Anothereffectwithinthescopeofbaselinepredictorsisrelatedtothechangingscaleofuserratings.Whilebi(t)isauser-independentmeasureforthemeritofitemiattimet,userstendtorespondtosuchameasuredifferently.Forexample,differentusersemploydifferentratingscales,andasingleusercanchangehisratingscaleovertime.Accordingly,therawvalueofthemoviebiasisnotcompletelyuser-independent.Toaddressthis,weaddatime-dependentscalingfeaturetothebaselinepredictors,denotedbycu(t).Thus,thebaselinepredictor(9)becomes

bui=µ+bu+αu·devu(tui)+bu,tui+(bi+bi,Bin(tui))·cu(tui).

(10)

3

Alldiscussedwaystoimplementbu(t)wouldbevalidforimplementingcu(t)aswell.Wechosetodedicateaseparateparameterperday,resultingin:cu(t)=cu+cut.Asusual,cuisthestablepartofcu(t),whereascutrepresentsday-speci cvariability.

Addingthemultiplicativefactorcu(t)tothebaselinepre-dictor(asper(10))lowersRMSEto0.9555.Interestingly,thisbasicmodel,whichcapturesjustmaineffectsdisregardinguser-iteminteractions,canexplainalmostasmuchofthedatavariabilityasthecommercialNet ixCinematchrecommendersystem,whosepublishedRMSEonthesameQuizsetis0.9514[4].B.Frequencies

ItwasbroughttoourattentionbyourcolleaguesatthePragmaticTheoryteam(PT)thatthenumberofratingsausergaveonaspeci cdayexplainsasigni cantportionofthevariabilityofthedataduringthatday.Formally,denotebyFuitheoverallnumberofratingsthatuserugaveondaytui.ThevalueofFuiwillbehenceforthdubbeda“frequency”,followingPT’snotation.InpracticeweworkwitharoundedlogarithmofFui,denotedbyfui= logaFui .1

Interestingly,eventhoughfuiissolelydrivenbyuseru,itwillin uencetheitem-biases,ratherthantheuser-biases.Accordingly,foreachitemiweintroduceatermbif,capturingthebiasspeci cfortheitemiatlog-frequencyf.Baselinepredictor(10)isextendedtobe

bui=µ+bu+αu·devu(tui)+bu,tui+(bi+bi,Bin(tui))·cu(tui)+bi,fui.

(11)

Wenotethatitwouldbesensibletomultiplybi,fuibycu(tui),butwehavenotexperimentedwiththis.

Theeffectofaddingthefrequencytermtothemoviebiasisquitedramatic.RMSEdropsfrom0.9555to0.9278.Notably,itshowsabaselinepredictorwithapredictionaccuracysigni cantlybetterthanthatoftheoriginalNet ixCinematchalgorithm.

Here,itisimportanttoremindthatabaselinepredictor,nomatterhowaccurate,cannotyieldpersonalizedrecommenda-tionsonitsown,asitmissesallinteractionsbetweenusersanditems.Inasense,itiscapturingtheportionofthedatathatislessrelevantforestablishingrecommendationsandindoingsoenablesderivingaccuraterecommendationsbysubjectingothermodelstocleanerdata.Nonetheless,weincludedtwoofthemoreaccuratebaselinepredictorsinourblend.

Whyfrequencieswork?:Inordertograspthesourceoffrequenciescontribution,wemaketwoempiricalobservations.First,wecouldseethatfrequenciesareextremelypowerfulforastandalonebaselinepredictor,butaswewillsee,theycontributemuchlesswithinafullmethod,wheremostoftheirbene tdisappearswhenaddingtheuser-movieinteractionterms(matrixfactorizationorneighborhood).Secondisthefactthatfrequenciesseemtobemuchmorehelpfulwhenusedwithmoviebiases,butnotsowhenusedwithuser-relatedparameters.

1Notice

thatFuiisstrictlypositivewheneveritisused,sothelogarithmis

wellde ned.

推荐系统netflix获奖算法(3).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
×
二维码
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)