everydogprobablychasedsomewhitecatFullnotation:
TOPh1
LZT
prpstn
every
dogprobably
chaserelHNLh12
EVENTe2TENSEpast
MOODindic
ARG1x4ARG2x13
HNLh14BVx13RESTRh15BODYh16
HNLh18ARGx13
HNLh18INSTx13
H-CONS
h5qeqh8h10qeqh12h15qeqh18h21qeqh9
Unscopedform,abbreviatednotation:
prpstneverysomewhite
dog
cat
probably
chase
Scopedforms:
somewhitecatchaseprpstnprobablyeverydog
prpstneverydogprobablysomewhitecatchaseprpstneverydogsomewhitecatprobablychaseprpstnprobablysomewhitecateverydogchaseprpstnsomewhitecatprobablyeverydogchaseprpstnsomewhitecateverydogprobablychase
Figure1:ExampleofMRSrepresentationproducedbytheERG
‘wrong’choice(i.e.,onenotlicensedbythegrammar).Onedangerwiththisisthattheinputmightbetoounderspeci- ed,buttheapproachwearetakingtothisistoletcorpusdataactasanoracletoguideagenerator.Speci cally,inthecontextofthespeechprosthesisprojectmentionedin2.3.,theinputtothegeneratormayunderspecifyclosed-classwords,suchasdeterminers.Althoughthegrammarprovidessomeconstraints(e.g.muchmaynotoccurwithpluralnouns),atotallyunderspeci eddeterminerwouldgenerallyresultingenerationoffarmorestringsthantheapplicationcouldplausiblyrequire.Frequencyinformationfromcorporacanbeusedtomakeabestguessinanap-plicationsuchasthis,wherecompleteprecisionisnotarequirement.
Weneedtodomoreworktode neappropriategeneral-purposeinterfacestothesemantics,especiallyforapplica-tionsthatusethegrammarforgeneration.However,webelieveMRSwilleventuallybehighlysuitableasarepre-sentationforasharedresourcegrammar.
4.
Ef ciencyimprovementsviashared
resources
TheLinGOERGhasbeencentralinacollaborativeef-forttoimprovetechnologyforprocessingHPSG(andsim-ilarformalisms).Untilrecently,ithasbeenarguedthathigh-levelgrammarformalismsaretooslowforreal-worldapplications.GroupsfromtheDFKI,SaarlandUniversity,theUniversityofTokyoandCSLIhavecollaboratedonim-provingef ciencybyusingtheLinGOgrammarasacom-monreferencepoint.Allthesitesusedthe[incrtsdb()]systemtomeasurevariousparametersinparsingwiththeERGoncommontest-suites.TheLKBsystemwasusedasabaselinetovalidatecorrectness.Itwasalsousedtopreprocessthetypehierarchyandlexicontoavoidthene-cessityforeachgrouptowriteaparserforthesyntaxusedintheERG.Practicallyspeaking,thisismuchfasterthanattemptingtogetgroupstoagreeonacommonsyntaxfor
thegrammarde nition les.Similarly,theothersystemshavemadeuseofvariousexpandedformsofthegrammarthatcanbeoutputbytheLKB,enablingthemtobypassprocessingstageswhichareirrelevantforcoreparsercom-parison,suchasexpansionofthetypehierarchytoformasemi-latticeandmorphologicalprocessing.
SomesmallchangestotheERGweremadetoallowthiscomparison:forinstance,therewereafewplaceswheretheLKB’sassumptionthatfeaturestructuresmaynotcontaincycleswasusedinordertoruleoutotherwisevaliduni -cations.However,itwasnotdif culttoremovethesecasesandtheERGitselfisnowneutralwithrespecttocyclicityassumptionsinthatitdoesnotgeneratecyclicstructures(modulobugs).
ThecombinationoftheLinGOERGandthe[incrtsdb()]test-suitemachineryhasenabledmuchmorede-tailedcross-platformperformanceevaluationonrealisticgrammarsthanhaspreviouslybeenpossible.Theresulthasbeenacombiningofprocessingtechniquesdevelopedbythevariousgroupstomutualbene t.(Kieferetal.,1999)isapartialreportonsomeofthiswork;(Flickingeretal.,2000)containsdetaileddiscussionsbymostoftheparticipantsinthiscollaboration.TheLKBnowincorpo-ratestechniquesthatwereadaptedfromothersystems,es-peciallyPAGEandPET(Callmeier,2000).
5.Othercollaborationsandfuturework
Inadditiontotheworkonprocessingef ciencyandper-formancepro lingdescribedabove,theERGandtheLKBalsoformthebasisofwell-establishedcollaborationsonex-tractionofstochasticlexicalizedtreegrammars(Neumann,1997)andtheintegrationwithdiscoursemodelsandprag-matics(CopestakeandLascarides,1998).EmergingjointprojectsincludetheadaptationoftheERGlexicaltypehi-erarchytolargeEnglishandJapaneselexiconsdevelopedforMTincollaborationwithFrancisBondandotherre-searchersfromNTT.Weareactivelyinvestigatingpossible