The Elements of Statistical Learning: Data Mining, Inference, and Prediction.(Second Edition)《统计学习基础——数据挖掘、推理与预测》部分章节习题的部分参考解答。
TheElementofStatisticalLearning–Chapter4
oxstar@SJTUJanuary6,2011
Ex.4.1ShowhowtosolvethegeneralizedeigenvalueproblemmaxaTBasubjecttoaTWa=1bytransformingtoastandardeigenvalueproblem.
AnswerWisthecommoncovariancematrix,andit’spositive-semide nite,sowecande ne
b=Wa,
1
a=W b,
1
aT=bTW 1
Hencethegeneralizedeigenvalueproblem
max(aTBa)=max(bTW BW b)
a
b
1
1
subjectto
aTWa=bTW WW b=1
11
Sotheproblemistransformedtoastandardeigenvalueproblem.
Ex.4.2Supposewehavefeaturesx∈Rp,atwo-classresponse,withclasssizesN1,N2,andthetargetcodedas N/N1,N/N2.
1.ShowthattheLDAruleclassiestoclass2if
N N 1T 11T 112 1 xΣ( µ2 µ 1)>µ 2Σµ 2 µ 1Σµ 1+log log,
22NN
T
andclass1otherwise.
2.Considerminimizationoftheleastsquarescriterion
N i=1
(yi β0 βTxi)2.
satis esShowthatthesolutionβ
NN12 + Bβ=N( (N 2)ΣΣµ2 µ 1)N B=( (aftersimpli cation),whereΣµ2 µ 1)( µ2 µ 1)T. Bβisinthedirection( 3.HenceshowthatΣµ2 µ 1)andthus
∝Σ 1( βµ2 µ 1)
Thereforetheleastsquaresregressioncoe cientisidenticaltotheLDAcoe cient,uptoa
scalarmultiple.
4.Showthatthisresultholdsforany(distinct)codingofthetwoclasses.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.(Second Edition)《统计学习基础——数据挖掘、推理与预测》部分章节习题的部分参考解答。
0,andhencethepredictedvaluesf =β 0+β Tx.Considerthefollowing5.Findthesolutionβ
rule:classifytoclass2ify i>0andclass1otherwise.ShowthisisnotthesameastheLDAruleunlesstheclasseshaveequalnumbersofobservations.(Fisher,1936;Ripley,1996)Proof
1.Considerthelog-ratioofeachclassdensity(equation4.9intextbook)
log
π21Pr(G=2|X=x)
=log (µ2+µ1)TΣ 1(µ2 µ1)+xTΣ 1(µ2+µ1)
Pr(G=1|X=x)π12
Whenit>0,theLDArulewillclassifyxtoclass2,meanwhile,weneedtoestimatethe
parametersoftheGaussiandistributionsusingourtrainingdata
1 1( 1( µ2+µ 1)TΣµ2 µ 1)+log(π1) log(π2)xTΣµ2 µ 1)>( 2
N N 1T 11T 112
=µ 2Σµ 2 µ 1Σµ 1+log log22NN
2.Letβ =(β,β0)TandcomputethepartialdeviationoftheRSS(β ),thenwehave
N RSS(β )
= 2(yi β0 βTxi)=0
β0
i=1N RSS(β )
= 2xi(yi β0 βTxi)=0
βi=1
(1)(2)
Wecanalsoderivethat
N1 β0=(yi βTxi)//from(1)
Ni=1
NNN 1 T
xi[β(xi x¯)]=xiyi yj//from(2)(3)
Ni=1i=1j=1
(3)(4)
=
2 k=1gi=k
xi
N1y1+N2y2yk
N
(5)
gi=k
==
2 k=1
Nkµ k
Nyk (N1y1+N2y2)
N
//xi=Nkµ k(6)(7)(8)
N1N2
(y2 y1)( µ2 µ 1)N=N( µ2 µ 1)//y1=N/N1,y2=N/N2
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.(Second Edition)《统计学习基础——数据挖掘、推理与预测》部分章节习题的部分参考解答。
Wealsohave
=(N 2)Σ
2 k=1gi=k
(xi µ k)(xi µ k)T(xixT T kµ Ti 2xiµk+µk)xixT kµ Ti Nkµk
//
//xT k=xiµ Tiµk
gi=k
(9)
=
2 k=1gi=k
(10)
=
2
gi=k
xi=Nkµ k(11)
k=1
=
N i=1
N i=1
xixT 1µ T 2µ Ti (N1µ1+N2µ2)
xk
2
k=1
(12)(13)(14)
xix¯T=
=
2
k=1gi=k
N
gi=k
xk
T
1
(N1µ 1+N2µ 2)(N1µ 1+N2µ 2)T N
i=1
Meanwhile
N i=1
xi[βT(xi x¯)]=
N i=1
xi[(xi x¯)Tβ]=xixTi
N i=1
xix¯Tβ
(15)
1TTT =(N 2)Σ+(N1µ 1µ 1+N2µ 2µ 2) (N1µ 1+N2µ 2)(N1µ 1+N2µ 2)β//from(12)(14)
N
(16)
+N1N2( =(N 2)Σµ2µ T 1µ T 2µ T 1µ T(17)2 µ2 µ1+µ1)β
NN12 Bβ=N( +Σµ2 µ 1)//from(8)(18)=(N 2)Σ
N3.
Bβ=( Σµ2 µ 1)( µ2 µ 1)Tβ
Bβisinthedirection( ( µ2 µ 1)Tβisascalar,thereforeΣµ2 µ 1),and
NN112 = 1N( Σµ2 µ 1) ΣBβ//from(18)β
N 2N
1N1N2
1( =N ( µ2 µ 1)TβΣµ2 µ 1)
N 2N 1( ∝Σµ2 µ 1)4.ByreplacingNwith
=β
N1N2
(y2
(19)(20)(21)
N1N2N(N 2)
y1)(from(7)and(8))andfrom(20),westillhave
T 1( 1( (y2 y1) ( µ2 µ 1)βΣµ2 µ 1)∝Σµ2 µ 1)
5.Theboundaryconditionisyi=0,sofrom(3)wehave
N 1 0= Txi=β Tµ ββ =µ Tβ
Ni=1
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.(Second Edition)《统计学习基础——数据挖掘、推理与预测》部分章节习题的部分参考解答。
Whentheclasseshaveequalnumbersofobservations,i.e.N1=N2=N/2
µ =
Thenwehavepredictedvalues
=(x µ ∝(x µ 1( f )Tβ )TΣµ2 µ 1)(23)1 1( 1( µ1+µ 2)TΣ∝xTΣµ2 µ 1) ( µ2 µ 1)//from(22)(24)2
WhiletheLDAruleindicatethatthelog-ratioofeachclassdensity=zeroistheboundarycondition(N1=N2soπ1=π2)
log
1Pr(G=2|X=x)
= (µ2+µ1)TΣ 1(µ2 µ1)+xTΣ 1(µ2+µ1)
Pr(G=1|X=x)2
(25)
1
( µ1+µ 2)2
(22)
Compare(24)with(25),wecan ndthattheyarethesamerule.ButwhenN1=N2,theserulesareobviouslydi erent.
vialinearregression.Indetail,letEx.4.3SupposewetransformtheoriginalpredictorsXtoY
=X(XTX) 1XTY=XB ,whereYistheindicatorresponsematrix.SimilarlyforanyinputY
Tx∈RK.ShowthatLDAusingY isidenticaltoLDAx∈Rp,wegetatransformedvectory =B
intheoriginalspace.
Tx,sowehaveProofTransformedvectory =B
Ty igi=kBxigi=ky Tµ==B xµ k=k
NkNi
Ty igi= Bxigi= y Tµ==B xµ =
N Ni N y y T
( y µ )( y µ )iik=1g=kkki yΣ=
N K
N T T x xk)(xi µk)Bk=1gi=kB(xi µ=
N K
TΣ xB =B
Substitute(26)(27)(30)forµgiandΣinequation(4.9)
log
=yπkPr(G=k|Y )1xT T 1 T=logµk+µ xB( µx x ( )B(BΣxB)k µ )π 2Pr(G= |Y=y )
(B TΣ xB ) 1B T( +xTBµx µ x)
k
(26)(27)(28)(29)(30)
T( 1µx µ (B TΣ xB ) 1Bµx x xInfact,ifBk µ )=Σx( k ),LDAusingYisidenticaltoLDAinthe
originalspace.Sowewillproveitbellow.
Yisaindicatorresponsematrix,therefore
Nkµ x=xi=XTyk(31)k
NK 1T x=ΣxixTNkµ xµxi k( k)i=1
k=1
K 1T
=XTX XT(ykyk)XN K
k=1
gi=k
//from(12)(32)(33)(34)
=
1
(XTX XTYYTX)
N K
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.(Second Edition)《统计学习基础——数据挖掘、推理与预测》部分章节习题的部分参考解答。
xB (B TΣ xB ) 1B Twhichsatis esHH=H.WehireWeneedaprojectionmatrixH=Σ
xB (B TΣ xB ) 1B TXTYforassistance,where(bythede nitionofB )HXTY=Σ
T=((XTX) 1XTY)T=YTX(XTX) 1B
1 xB =(XTX XTYYTX)(XTX) 1XTYΣ
N K1
XTY(I YTX(XTX) 1XTY)=
N K
TΣ xB ) 1=(YTX(XTX) 11XTY(I YTX(XTX) 1XTY)) 1(B
N KT
=(N K)(YX(XTX) 1XTY(I YTX(XTX) 1XTY)) 1
TXTY=YTX(XTX) 1XTY,wehaveLetP=B
xB =Σ
TΣ xB ) 1(B
1
XTY(I P)
N K
=(N K)(P(I P)) 1
P(I P)isinvertible,soPandI Pareinvertible,hence
TΣ xB ) 1=(N K)(I P) 1P 1(B
1
XTY(I P)(N K)(I P) 1P 1YTX(XTX) 1XTYHXTY=
N K=XTYHerewecanprove
HXTY=XTY HXTyk=XTyk
HNkµ x x//from(31)k=Nkµk
xB (B TΣ xB ) 1B Tµ Σ x x//de nitionofHk=µk (B TΣ xB ) 1B Tµ 1µ x=Σ x B
k (B TΣ xB ) 1B T( Bµxk
x
µ x )
k
x 1( =Σ xxµk µ )