手机版

Generalizing Subcategorization Frames Acquired from Corpora(5)

发布时间:2021-06-08   来源:未知    
字号:

This paper presents a method of improving the quality of subcategorization frames (SCFs) acquired from corpora in order to augment a lexicon of a lexicalized grammar. We first estimate a confidence value that a word can have each SCF, and create an SCF con

Table1:TreefamiliesoftheXTAGEnglishgrammarmappedfrom23outof163SCFtypes

Tnx0Vnx1Tnx0Vs1

Tnx0Vnx2nx1Tnx0Vnx1Pnx2Tnx0Vnx1pnx2Tnx0Vplnx1Tnx0VplTnx0Vnx1s2Tnx0Vpnx1Ts0Vnx1Tnx0Vax1

Tnx0Vplnx2nx1

Transitive

SententialcomplementDitransitive

MultipleanchorditransitivewithPPDitransitivewithPPTransitiveverbParticleIntransitiveverbParticle

SententialcomplementwithNPIntransitivewithPP

TransitivesententialsubjectIntransitivewithadjectiveDitransitiveverbParticle

1

0.8

confidence cut-off 0.01confidence cut-off 0.03confidence cut-off 0.05

0.6Recall

0.4 0.2 0

0 0.2 0.4

Precision

0.6 0.8 1

Inordertoevaluateourmethod,wesplittheSCFlexi-conoftheXTAGEnglishgrammarintothetrainingpor-tionandthetestportion.Thetrainingportionincludes9,427SCFsfor8,399words,whilethetestportionin-cludes433SCFsfor280wordsThetestportionisse-lectedfromtheSCFlexiconforwordsthatareobservedintheacquiredSCFlexicon.WeextractSCFcon dence-valuevectorsfromthetrainingportionandcombinethemwiththeSCFcon dence-valuevectorsobtainedfromtheacquiredSCFs.Thenumberoftheresultingdataobjectsis8,679.5WealsomakeuseoftheSCFcon dence-valuevectorsobtainedfromthetrainingSCFlexiconasanini-tialcentroidbyregardingεas0.Thetotalnumberofthemwas35.6Wethenperformedclusteringofthese8,679dataobjectsinto35clusters.

We nallyevaluateprecisionandrecalloftheresultingSCFsbycomparingthemwiththetestSCFlexiconoftheXTAGEnglishgrammar.

We rstcomparecon dencecut-offwithfrequencycut-offtoinvestigateeffectsofBayesianestimation.Fig-ure4showsprecisionandrecalloftheresultingSCFsetsusingcon dencecut-offandfrequencycut-off.Wemea-suredprecisionandrecalloftheSCFsetsobtainedusingcon dencecut-offwhoserecognitionthresholdt=0.01(con dencecut-off0.01),0.03(con dencecut-off0.03),and0.05(con dencecut-off0.05)byvaryingthresholdforthecon dencevaluefrom0to1.WealsomeasuredthosefortheSCFsetsobtainedusingfrequencycut-offbyvaryingthresholdfortherelativefrequencyfrom0to1.Thegraphapparentlyindicatesthatthecon dencecut-offsoutperformedthefrequencycut-off.Whenwe

5WeusedtheSCFcon dence-valuevectorsforwordswhich

Figure4:PrecisionandrecalloftheresultingSCFsusingcon dencecut-offandfrequencycut-off

1

centroid cut-off 0.03centroid cut-off 0.03*

0.8

0.6Recall

0.4 0.2 0

0 0.2 0.4

Precision

0.6 0.8 1

Figure5:PrecisionandrecalloftheresultingSCFsusingcon dencecut-offandfrequencycut-off

comparecon dencecut-offswithdifferentrecognitionthresholds,wecanimproveprecisionusinghigherrecog-nitionthresholdwhilewecanimproverecallusinglowerrecognitionthreshold.Thisresultisquiteconsistentwithourexpectations.

Wethencomparecentroidcut-offwithcon dencecut-offtoobserveeffectsofclusteringusinginformationinthelexiconoftheXTAGEnglishgrammar.Figure5showsprecisionandrecalloftheresultingSCFsetsusingcentroidcut-offandcon dencecut-offwiththerecogni-tionthresholdt=0.03byvaryingthethresholdforthecon dencevalue.Inordertoshowtheeffectsofinfor-mationofthetrainingSCFlexicon,centroidcut-off0.03*isSCFsobtainedbyclusteringofSCFcon dence-valuevectorsintheacquiredSCFsonlywithrandominitial-ization.ThegraphapparentlyshowsthatclusteringismeaningfulonlywhenwemakeuseofthereliableSCFcon dence-valuevectorsobtainedfromthemanuallytai-

areincludedintheXTAGEnglishgrammar.WhenboththetrainingSCFlexiconandtheacquiredSCFlexiconhavethesamewords,wesimplyusedanSCFcon dence-valuevectorobtainedfromtheacquiredSCFlexicon.

6WeusedtheSCFcon dence-valuevectorsthatappearwithmorethantwowords.

Generalizing Subcategorization Frames Acquired from Corpora(5).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
×
二维码
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)