手机版

推荐系统netflix获奖算法

发布时间:2021-06-07   来源:未知    
字号:

赢得netflix推荐系统大奖的算法

1

TheBellKorSolutiontotheNet ixGrandPrize

YehudaKorenAugust2009

I.INTRODUCTION

Thisarticledescribespartofourcontributiontothe“Bell-Kor’sPragmaticChaos” nalsolution,whichwontheNet ixGrandPrize.TheotherportionofthecontributionwascreatedwhileworkingatAT&TwithRobertBellandChrisVolinsky,asreportedinour2008ProgressPrizereport[3].The nalsolutionincludesallthepredictorsdescribedthere.Inthisarticlewedescribeonlythenewerpredictors.

Sowhatisnewoverlastyear’ssolution?Firstwefurtherim-provedthebaselinepredictors(Sec.III).Thisinturnimprovesourothermodels,whichincorporatethosepredictors,likethematrixfactorizationmodel(Sec.IV).Inaddition,anextensionoftheneighborhoodmodelthataddressestemporaldynamicswasintroduced(Sec.V).OntheRestrictedBoltzmannMa-chines(RBM)front,weuseanewRBMmodelwithsuperioraccuracybyconditioningthevisibleunits(Sec.VI).The naladditionistheintroductionofanewblendingalgorithm,whichisbasedongradientboosteddecisiontrees(GBDT)(Sec.VII).

II.PRELIMINARIES

TheNet ixdatasetcontainsmorethan100milliondate-stampedmovieratingsperformedbyanonymousNet ixcus-tomersbetweenDec31,1999andDec31,2005[4].Thisdatasetgivesratingsaboutm=480,189usersandn=17,770movies(aka,items).

Thecontestwasdesignedinatraining-testsetformat.AHold-outsetofabout4.2millionratingswascreatedconsistingofthelastninemoviesratedbyeachuser(orfewerifauserhadnotratedatleast18moviesovertheentireperiod).Theremainingdatamadeupthetrainingset.TheHold-outsetwasrandomlysplitthreeways,intosubsetscalledProbe,Quiz,andTest.TheProbesetwasattachedtothetrainingset,andlabels(theratingthattheusergavethemovie)wereattached.TheQuizandTestsetsmadeupanevaluationset,whichisknownastheQualifyingset,thatcompetitorswererequiredtopredictratingsfor.Onceacompetitorsubmitspre-dictions,theprizemasterreturnstherootmeansquarederror(RMSE)achievedontheQuizset,whichispostedonapublicleaderboard(/leaderboard).RMSEvaluesmentionedinthisarticlecorrespondtotheQuizset.Ultimately,thewinneroftheprizeistheonethatscoresbestontheTestset,andthosescoreswereneverdisclosedbyNet ix.Thisprecludescleversystemswhichmight“game”thecompetitionbylearningabouttheQuizsetthroughrepeatedsubmissions.

Comparedwiththetrainingdata,theHold-outsetcontainsmanymoreratingsbyusersthatdonotratemuchandare

Y.KoreniswithYahoo!Research,

Haifa,

ISRAEL.

Email:yehuda@

thereforehardertopredict.Inaway,thisrepresentsrealrequirementsforacollaborative ltering(CF)system,whichneedstopredictnewratingsfromolderones,andtoequallyaddressallusers,notjusttheheavyraters.

Wereservespecialindexingletterstodistinguishusersfrommovies:forusersu,v,andformoviesi,j.Aratingruiindicatesthepreferencebyuseruofmoviei.Valuesarerangingfrom1(star)indicatingnointerestto5(stars)indicatingastronginterest.Wedistinguishpredictedratingsfromknownones,byusingthenotationr uiforthepredictedvalueofrui.

Thescalartuidenotesthetimeofratingrui.Here,timeismeasuredindays,sotuicountsthenumberofdayselapsedsincesomeearlytimepoint.About99%ofthepossibleratingsaremissing,becauseausertypicallyratesonlyasmallportionofthemovies.The(u,i)pairsforwhichruiisknownarestoredinthetrainingsetK={(u,i)|ruiisknown}.NoticethatKincludesalsotheProbeset.EachuseruisassociatedwithasetofitemsdenotedbyR(u),whichcontainsalltheitemsforwhichratingsbyuareavailable.Likewise,R(i)denotesthesetofuserswhorateditemi.Sometimes,wealsouseasetdenotedbyN(u),whichcontainsallitemsforwhichuprovidedarating,eveniftheratingvalueisunknown.Thus,N(u)extendsR(u)byalsoconsideringtheratingsintheQualifyingset.

Modelsfortheratingdataarelearnedby ttingthepre-viouslyobservedratings(trainingset).However,ourgoalistogeneralizethoseinawaythatallowsustopredictfuture,unknownratings(Qualifyingset).Thus,cautionshouldbeex-ercisedtoavoidover ttingtheobserveddata.Weachievethisbyregularizingthelearnedparameters,whosemagnitudesarepenalized.Theextentofregularizationiscontrolledbytunableconstants.Unlessotherwisestated,weuseL2regularization.Thisisagoodplacetoaddsomewordsontheconstantscontrollingouralgorithms(includingstepsizes,regularization,andnumberofiterations).ExactvaluesoftheseconstantsaredeterminedbyvalidationontheProbeset.Inallcasesbutone(tobementionedbelow),suchvalidationisdoneinamanualgreedymanner.Thatis,whenanewlyintroducedconstantneedstogettuned,weexecutemultiplerunsofthealgorithmsandpickthevaluethatyieldsthebestRMSEontheNet ixProbeset[4].Thisschemedoesnotresultinoptimalsettingsforseveralreasons.First,onceaconstantissetwedonotrevisititsvalue,eventhoughfutureintroductionofotherconstantsmayrequiremodifyingearliersettings.Second,weusethesameconstantsundermultiplevariantsofthesamealgorithm(e.g.,multipledimensionalitiesofafactorizationmodel),whereasamoredelicatetuningwouldrequireadifferentsettingforeachvariant.Wechosethisconvenient,butlessaccuratemethod,becauseourexperienceshowedthatovertuningtheaccuracyofasinglepredictordoes

推荐系统netflix获奖算法.doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
×
二维码
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)