赢得netflix推荐系统大奖的算法
representation).ThemovierepresentationisstillbasedonthetimeSVD++model.TheresultingRMSEisalso0.8661.
Finally,weaddedk-NNfeaturesontopofthetimeSVD++features.Thatis,foreachu ipair,wefoundthetop20moviesmostsimilartoi,whichwereratedbyu.Weaddedthemoviescores,eachmultipliedbytheirrespectivesimilaritiesasadditionalfeatures.SimilaritiesherewereshrunkPearsoncorrelations[1].ThisslightlyreducestheRMSEto0.8660.AnotherusageofGBDTisforsolvingaregressionproblempermovie.Foreachuserwecomputeda50-Dcharacteristicvectorformedbythevaluesofthe50hiddenunitsofarespectiveRBM.Then,foreachmovieweusedGBDTforsolvingtheregressionproblemoflinkingthe50-Duservectorstothetrueuserratingsofthemovie.Theresult,withRMSE=0.9248,willbedenotedas[PQ7]inthefollowingdescription.
B.ListofBellKor’sProbe-Qualifyingpairs
Welistthe24BellKorpredictorswhichparticipatedintheGBDTblending.Noticethatmanymoreofourpredictorsareinthe nalblendofQualifyingresults(asmentionedearlierinthisarticle).However,onlyforthoselistedbelowwepossesscorrespondingProberesults,whichrequireextracomputationalresourcestofullyre-trainthemodelwhileexcludingtheProbesetfromthetrainingset.
PostProgressPrize2008predictors
Thosewerementionedearlierinthisdocument:1)PQ12)PQ23)PQ34)PQ45)PQ56)PQ67)PQ7
ProgressPrize2008predictors
Thefollowingisbasedonournotationin[3]:8)SVD++(1)(f=200)
9)Integrated3)(f=100,k=300)10)SVD++((f=500)
11)FirstneighborhoodmodelofSec.2.2of[3]
(RMSE=0.9002)
12)Aneighborhoodmodelmentionedtowardstheendof
Sec.2.2of[3](RMSE=0.8914)ProgressPrize2007predictors
Thefollowingisbasedonournotationin[2]:13)Predictor#4014)Predictor#3515)Predictor#6716)Predictor#75
17)NNMF(60factors)withadaptiveuserfactors18)Predictor#8119)Predictor#73
20)100neighborsUser-kNNonresidualsofallglobal
effectsbutthelast421)Predictor#8522)Predictor#45
9
23)Predictor#8324)Predictor#106
OnelastpredictorwithRMSE=0.8713isinthe nalblend.Itisbasedontheblendingtechniquedescribedinpage12of[3].Thetechniquewasappliedtothefourpredictorsindexedaboveby:2,9,12,and13.
VIII.CONCLUDINGREMARKS
GrantingthegrandprizecelebratestheconclusionoftheNet ixPrizecompetition.Wideparticipation,extensivepresscoverageandmanypublicationsallre ecttheimmensesuc-cessofthecompetition.Dealingwithmovies,asubjectclosetotheheartsofmany,wasde nitelyagoodstart.Yet,muchcouldgowrong,butdidnot,thankstoseveralenablingfactors.The rstsuccessfactorisontheorganizationalside–Net ix.Theydidagreatservicetothe eldbyreleasingapreciousdataset,anactwhichissorare,yetcourageousandimportanttotheprogressofscience.Beyondthis,bothdesignandconductofthecompetitionwere awlessandnon-trivial.Forexample,thesizeofthedatawasrightontarget.Muchlargerandmorerepresentativethancomparabledatasets,yetsmallenoughtomakethecompetitionaccessibletoanyonewithacommodityPC.Asanotherexample,Iwouldmentionthesplitofthetestsetintothreeparts:Probe,Quiz,andTest,whichwasessentialtoensurethefairnessofthecompetition.Despitebeingplannedwellahead,itprovedtobeadecisivefactorattheverylastminuteofthecompetition,threeyearslater.Thesecondsuccessfactoristhewideengagementofmanycompetitors.Thiscreatedpositivebuzz,leadingtofurtherenrollmentofmanymore.Muchwassaidandwrittenonthecollaborativespiritofthecompetitors,whichopenlypublishedanddiscussedtheirinnovationsonthewebforumandthroughscienti cpublications.Thefeelingwasofabigcommunityprogressingtogether,makingtheexperiencemoreenjoyableandef cienttoallparticipants.Infact,thisfacilitatedthena-tureofthecompetition,whichproceededlikealongmarathon,ratherthanaseriesofshortsprints.
Anotherhelpfulfactorwassometouchofluck.Themostprominentoneisthechoiceofthe10%improvementgoal.Anysmalldeviationfromthisnumber,wouldhavemadethecompetitioneithertooeasyorimpossiblydif cult.Inaddition,thegoddessofluckensuredmostsuspenseful nishlinesinboth2007ProgressPrizeand2009GrandPrize,matchingbestsportsevents.
Thescienceofrecommendersystemsisaprimebene ciaryofthecontest.Manynewpeoplebecameinvolvedinthe eldandmadetheircontributions.Thereisaclearspikeinrelatedpublications,andtheNet ixdatasetisthedirectcatalysttodevelopingsomeofthebetteralgorithmsknowninthe eld.Outofthenumerousnewalgorithmiccontributions,Iwouldliketohighlightone–thosehumblebaselinepredictors(orbiases),whichcapturemaineffectsinthedata.Whiletheliteraturemostlyconcentratesonthemoresophisticatedalgorithmicaspects,wehavelearnedthatanaccuratetreatmentofmaineffectsisprobablyatleastassigni cantascomingupwithmodelingbreakthroughs.
Finally,wewereluckytowinthiscompetition,butrecog-nizetheimportantcontributionsofthemanyothercontestants,