IEEE标准格式
SupportVectorMachinestodoclassi cationbecauseofitseffectivelearningabilityeveninhighdimensionalfeaturespace.Ratherthanusingnon-linearSupportVectorMachine(SVM),Dumaisetal.(1998)comparedlinearSVMwithanotherfourdifferentlearningalgorithmswhichareFindSimilar,DecisionTrees,NaiveBayes,andBayesNets,whichalsosupportsSVMintextclassi cationbecauseofitshighaccuracy,fastspeedaswellasitssimplemodel.Sebastiani(2002)alsorecommendsNeuralNetworkasapotentialselectionintextclassi cationinthatitsaccuracyisonlyslightlylowerthanSVMincomparison.Thecross-documentcomparisonofsmallpiecesoftext,usinglinguisticfeaturessuchasnounphrases,andsynonymsisintroducedbyHatzivassiloglouetal.(1999).Thesimilarityoftwoparagraphsisde nedbythesameactionconductedonthesameobjectbythesameactor.Therefore,drawingfeaturesaccordingtonounsandverbswouldgenerallyconcludeaparagraphintoseveralprimitiveelements.Inadditiontothesimilarprimitiveelements,restrictionssuchasordering,distancesandprimitive(matchingnounandverbpairs)arealsoimplementedtoexcludeweaklyrelatedfeatures.Thefeatureselectionmethodscaneffectivelyreducethedimensionsofdataset(Ikonomakis,2005)whilekeepingtheperformanceofclassi cation.Tomakesurewhichwordsaretobekept,anEvaluationfunctionhasbeenintroducedbySoucyandMineau(2003)tomeasurehowmuchinformationwecangetbyclassifyingthroughasingleword.AnotherimprovementbyHanetal.(2004)istousePrincipalComponentAnalysis(PCA)toreducethedimensionintransformationoffeatures.NigamandMccallum(2000)combineExpectation-MaximizationandNaiveBayesclassi ertotraintheclassi erwithcertainamountoflabeledtextsfollowedbylargeamountofunlabeleddocuments,whichrealizestheautomatictrainingwithouthugeamountofhand-designedtrainingdata.
answering(QA)ispossibletobeathumanchampionsinJeopardy.AsFerrucci(2012)mentioned,thestructureofWatsonismorecomplicatedthananysingleagentasithashundredsofalgorithmsworkingtogether,inthewaythatMinsky(1988)introducedinSocietyofMind.Generally,WatsonconsistsofpartswhichareDeepQA,NaturalLanguageProcessing(NLP),MachineLearning(ML),andSemanticWebandCloudComputing(Gliozzoetal.,2013).TheDeepQAsystemanalyzesthequestionbydifferentalgorithms,givingdifferentinterpretationsofquestionsandformingqueriesforeachquestion(Ferrucci,2012).Itprovidesallthepossibleanswerstothequestionwiththeevidencesandthescoresforeachcandidate,whichwouldgeneratearankingofcandidateanswerswiththelikelihoodofcorrectness.TheMachineLearningalgorithmsareusedtotraintheweightsinitsevaluatingandanalyzingalgorithms(Gliozzoetal.,2013).ThecluethatWatsonusesinsearchingisnamedaslexicalanswertype(LAT),whichtellsWatsonwhatthequestionisaskingaboutandwhatkindofthingsitneedstolookfor.Beforedoingsearching,itwouldgeneratepriorknowledgeoftypelabel,knownas‘direction’,toeachcandidateanswerandsearchevidencesforandagainstthis‘typedirection’(Ferrucci,2012).TheDeepQAalsohasahighrequirementinGrammar-basedandsyntacticanalysistechniques,forexample,relationextractiontechniquesingettingpossiblerelationsbetweenwords,basedonarule-basedapproach.Inaddition,theabilityofbreakingthequestiondownintosub-questionsbylogicsalsoimprovedWatsonsperformance(Ferrucci,2012),whichenablesWatsonto ndresultsforeachsmallerquestionsandcombinethemtogether.Incorrespondencetotheabilityofbreakingdownquestions,itcanalsogeneratethescorefortheoriginalquestionbasedontheevidenceforsub-questions.
Tosimulatehumanknowledge,Watsonalsousesself-containeddatabase.However,thisrequirementhasledtoitsgreathardwarecost.Watsonalso
IV.IBMWATSONneedstodoautomatictextanalysisandknowledge
TheIBMWatsonprojecthasshownusthatextractiontoupdateitsdatabase,becauseofthecomputersysteminopen-domainquestion-enormousamountofworkandtheinsuranceof
2