手机版

Generalizing Subcategorization Frames Acquired from Corpora(2)

发布时间:2021-06-08   来源:未知    
字号:

This paper presents a method of improving the quality of subcategorization frames (SCFs) acquired from corpora in order to augment a lexicon of a lexicalized grammar. We first estimate a confidence value that a word can have each SCF, and create an SCF con

(#S(EPATTERN:TARGET|ftp|

:SUBCAT(VSUBCATNONE):CLASSES(222985):RELIABILITY0

:FREQSCORE0.01640195:FREQCNT2

:TLTL(VVDVV0)

:SLTL(((|ssh|NN1))):OLT1LNIL:OLT2LNIL

:OLT3LNIL:LRL0))

Figure1:AnacquiredSCFforaverb“ftp”thelexiconoftheXTAGEnglishgrammar,andthencom-paredtheresultswiththoseobtainedbynaivefrequencycut-off.

Figure2:ProbabilitydistributionsofSCFsforapply2.2

ClusteringofVerbSCFDistributions

2

2.1

Background

AcquisitionofSCFsforLexicalizedGrammars

WestartbyacquiringSCFsforalexicalizedgrammarfromcorporabythemethoddescribedin(CarrollandFang,2004).

Intheirstudy,they rstacquire ne-grainedSCFsbythemethodproposedby(BriscoeandCarroll,1997;Ko-rhonen,2002).Figure1showsanexampleofoneac-quiredSCFentryforaverb“ftp.”EachacquiredSCFen-tryhasseveral eldsabouttheobservedSCF.Weexplainhereonlyitsportionrelatedtothisstudy.TheTARGET eldisawordstem(|ftp|inFigure1),the rstnumberintheCLASSES eldindicatesanSCFID(22inFigure1),andFREQCNTshowshowoftenwordsderivablefromthewordstemhadtheSCFidenti edbytheSCFID(2timesinFigure1)inthetrainingcorpus.TheobtainedSCFscomprisethetotal163typesofrelatively ne-grainedSCFs,whichareoriginallybasedontheSCFsintheANLT(BoguraevandBriscoe,1987)andCOMLEX(Gr-ishmanetal.,1994)dictionaries.Inthisexample,theSCFID22correspondstoanSCFofintransitiveverb.TheythenobtainSCFsforthetargetlexicalizedgram-mar(theLINGOEnglishResourceGrammar(Flickinger,2000)intheirstudy)byusingahandcraftedtranslationmapfromthese163typestooneofthetypesofSCFsinthetargetgrammar.Theyreportthattheycouldachieveacoverageimprovementof4.5%(52.7%to57.2%)withaparsingtimedouble(9.78sec.to21.78sec.).

Thisapproachiseasilyextensibletoanylexicalizedgrammars,ifthegrammarshaveanorganizedarchitec-tureoflexicon,whichderivepossiblelexicalentriesfromeachSCFthegrammarde nes.Existinglexicalizedgrammarsusuallyareequippedwiththiskindoforga-nization,e.g.,lexicaltypesinLINGOERGandtreefam-iliesintheXTAGEnglishgrammar.

TherearesomerelatedworkonclusteringofSCFprob-abilitydistributions(SchulteimWaldeandBrew,2002;Korhonenetal.,2003).Thesestudiesaimatobtainingverbsemanticclasses,whichcloselyrelatedtosyntacticbehaviorofargumentselection.

SchulteimWaldeandBrew(2002)employedcluster-ingofverbSCFdistributionstoinduceverbsemanticclasses.They rstrepresentaverbSCFdistributionbyann-dimensionalvectorforeachverb.EachelementintheSCFdistributionrepresentsaprobabilitythataverbappearswiththecorrespondingSCF.Theythenperformk-Meansclustering(Forgy,1965)ofthesevectorsinor-dertoobtainverbsemanticclasses.

Korhonenetal.(2003)alsoconductedclusteringofverbSCFdistributionsusingadifferentclusteringmethodincludingthenearestneighborsclusteringandtheInformationBottleneckclustering(Tishbyetal.,1999).Theyinvestigatedtheeffectofpolysemicverbsonclus-tering.

Althoughthesestudiesdemonstratedthatthereisacer-tainclassi cationofverbsbyclusteringofverbSCFdis-tributions,theydonotfocusontheimprovementofthequalityoftheSCFlexicon.Inthispaper,wefocusontheproblemtoidentifywhetherawordcanhaveeachSCFandtrytoobtainwordclasseswhoseelementwordshavethesamesetofSCFs.

3Method

Thebasicideaofourmethodis rsttoobtainwordclasseswhoseelementwordshavethesamesetofSCFs,usingnotonlyacquiredSCFsbutalsoexistingSCFsinthetargetgrammar.Wetheneliminateimplausibleac-quiredSCFsandaddplausibleunseenSCFsaccordingtothesetofSCFsrepresentedbythecentroidsoftheresult-ingclusters.3.1

RepresentationofCon denceValuesforSCFs

WerepresentanSCFcon dence-valuevectorofeachwordwiwithavectorvi,anobjectforclustering.Eachelementvijinvirepresentsthecon dencevalueofSCF

Generalizing Subcategorization Frames Acquired from Corpora(2).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
×
二维码
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)