coverageisnotundulydomain-orapplication-speci c.Thismeansthattheapplication-speci ctestsuitehastobesupplementedbyonewhichcontains(hand-constructed)exampleswhichillustrateparticulargrammaticalphenom-ena.Forinstance,thetestsentenceshownbelowwouldin-dicatewhetherthegrammarcouldhandleextractionfromprepositionalphrases:
Whichof cedoesJonessleepin?
Thesentencesarenotsupposedtoberealisticbutmustminimizeambiguityandusestandardizedvocabularyandphrasesinordertomaketheresultsasclearcutaspossible.Thistestsuiteshouldalsocontainnegativeexamples,toensurethatthegrammardoesnotovergenerate.TheCSLItestsuite(whichislargelydrawnfromtheHPtestsuite)containsaround1350sentences(about400ofwhichareungrammatical).Again,thisisusedbothasatargetandforregressiontesting.
Theissueofevaluatingtheresultsofparsingatestsuiteiscomplex,sincehand-checkingeachresultistootime-consuming.Weusederivationtreesasonewayofsumma-rizingtheresultsofaparsewhichisusefulforregressioncheckingonthegrammar,andespeciallyforensuringthatrevisionstotheparserhavenotcreatedbugs.TheMRSrepresentationallowsa exiblewayofcheckingsemanticstructuresforequivalence:ensuringthattheunderspeci edsemanticsiswell-formedbyconstructingascopedrepre-sentationisalsodetectssomegrammarbugs.
Currently,theonlywayofcheckinggenerationintheabsenceofanapplicationwhichconstructsinputrepresen-tationsistoanalyzeatestsentenceandthenattempttogen-eratefromtheresults.Weareactivelyinvestigatingmoresatisfactorymethods,butdiscussionofthisleadsintois-suesofde ninginterfacesforgeneration,whichwecannotexploreadequatelyinthispaper.3.2.Collaborativegrammarcoding
Anygrammardevelopmentenvironmentshouldprovideexplicitsupportforcollaborativedevelopment.Ataverybasiclevel,theLKBfacilitatescodemanipulationbyal-lowingthegrammarsourcetobesplitintomultiple les,onthebasisoffunctionality,forinstance.Wemaintainsource lesforgrammarsusingthestandardCVSsourcecontrolsystem,whichallowsmultiplepeopletoworkonthesame leandautomaticallymergesdifferentversionsiftheeditsdonotoverlap.Althoughthereisapossibilityofintroducingerrorsinthisway,wehavefoundtheprocessworkswellifdeveloperscheckinandupdatetheirsourcereasonablyfrequently.
WehavesuccessfullytaughtmanystudentshowtousetheLKBsystemanddevelopsmallgrammars,butthelearn-ingcurverequiredtounderstandtheLinGOERGwellenoughtocollaborateonitisverysteep.Severalpeo-plehavecontributedsubstantiallytotheLinGOgrammar(see6.),butonlyfourofthemhavebeeninapositiontodolarge-scaleworkonthecoregrammar,thoughseveralmorepeoplehavebeeninvolvedinmoreperipheralactivi-ties,suchasaddinglexicon.
Webelievethatourexperienceisreasonablyrepresenta-tive.4Itisoftensuggestedthattheproblemwithgrammarengineeringisthatthereisalackofmodularity,butitisnotcleartousthatthisiscorrect.Insoftwareengineeringgenerally,therearetwocon ictinggoals:itisdesirabletodivideataskintocomponentswithhiddeninternalstruc-turewhichcanbedevelopedindependentlyofeachother,butitisalsodesirabletoavoidduplicationoffunctional-ity.Differentprogramminglanguagesemphasizedifferentparadigms:forinstance,Modula-2providedstrongsup-portforhidingdataandfunctionsbuttheobject-orientedprogramminglanguageCemphasizescommonalityin-stead(Stroustrup,1991).Information-hidingisoftenre-ferredtoasmodularityinthesoftwareengineeringlitera-ture:thisisamuchstrongersensethantheideaofsimplydividingupcode.Someparadigmsaremoreappropriatethanothersforspeci capplicationareas:e.g.,Stroustruparguesthatobject-orientedprogramming(OOP)ismoresuitedtographicsthantoclassicalarithmetic.
InourexperiencewiththeLinGOERGandpreviousgrammardesignwork,commonalitycompletelyeclipsesinformation-hidingingrammardesign.Whilegeneraliza-tioninsoftwareengineeringismotivatedbypracticalcon-siderationsofavoidingerrorsandtime-wastingduetore-dundancy,rmation-hidingisal-mosttheantithesisofthis,sinceitinherentlyinvolveshav-ingsomepartsoftherepresentationwhichareonlyusedinspeci edsubsystemsofthegrammar.Considerthedis-coverythatafeaturewhichisusedinthedescriptionoflong-distancedependenciescorrelateswithaphenomenoninmorphology.Thiswouldberegardedasgoodnewsbyagrammardeveloperandnotasafailureofmodular-ity,becauseitisageneralizationthatenhancesthepredic-tivepowerofthesystem.Furthermore,information-hidingmodulesareonlyusefulinsoftwaredevelopmentiftheycanbede nedintheinitialdesign,buttheyareinher-entlyin exibleandthereforedonotworkwellformoreexploratoryprogramming.
Becausewecannotisolateindividuallinguisticphe-nomena,wecannotexpectsomeonetoworkonananalysiswithoutsomeknowledgeoftherestofthegrammar.Butthereareothernotionsofmodularity.AswithOOP,thein-heritancehierarchyallowsdeveloperstoworkonexpand-ingleaveswithoutaffectingthemoregeneralnodes.Somedevelopers’tasksprimarilyinvolveclassi cation.Forin-stance,alexiconcanbeextendedbysomeonewithlit-tleknowledgeofthegrammarbecausetheycancopytheclassesallocatedtowordstheyknowaresimilar.Simi-larly,arelativelyuntraineddevelopercanaddmorpholog-icalrules,eventhoughthemorphologycomponentcan-notbeamoduleintheinformation-hidingsense,becauseHPSGisamonostrataltheory.
TheLKBsystemhasextensivetoolsfordevelopinginheritancehierarchies.Unlikeanyotherfeaturestruc-turebasedsystem,itincorporatesafullyorder-independentversionofdefaultuni cation(LascaridesandCopestake,