Selecting important information while accounting for repetitions is a hard task for both summarization and question answering. We propose a formal model that represents a collection of documents in a two-dimensional space of textual and conceptual units wi
to just words but also lead to more pronounced im-provement when our model is employed.A more detailed analysis of the above experiments together with the discussion of advantages and disadvantages of our evaluation schema can be found in(Filatova and Hatzivassiloglou,2004).
6Conclusion
In this paper we proposed a formal model for in-formation selection and redundancy avoidance in summarization and question-answering.Within this two-dimensional model,summarization and question-answering entail mapping textual units onto conceptual units,and optimizing the selection of a subset of textual units that maximizes the in-formation content of the covered conceptual units. The formalization of the process allows us to benefit from theoretical results,including suitable approx-imation algorithms.Experiments using DUC data showed that this approach does indeed lead to im-provements due to better information packing over a straightforward content selection method.
7Acknowledgements
We wish to thank Rocco Servedio and Mihalis Yannakakis for valuable discussions of theoreti-cal foundations of the set cover problem.This work was supported by ARDA under Advanced Question Answering for Intelligence(AQUAINT) project MDA908-02-C-0008.
References
Regina Barzilay and Michael -ing lexical chains for text summarization.In Pro-ceedings of the ACL/EACL1997Workshop on In-telligent Scalable Text Summarization,Spain. Sasha Blair-Goldensohn,Kathleen R.McKeown, and Andrew Hazen Schlaikjer.2003.Defscriber: A hybrid system for definitional qa.In Proceed-ings of26th Annual International ACM SIGIR Conference,Toronoto,Canada,July.
Elena Filatova and Vasileios Hatzivassiloglou. 2003.Domain-independent detection,extraction, and labeling of atomic events.In Proceedings of Recent Advances in Natural Language Process-ing Conference,RANLP,Bulgaria.
Elena Filatova and Vasileios Hatzivassiloglou. 2004.Event-based extractive summarization.In Proceedings of ACL Workshop on Summariza-tion,Barcelona,Spain,July.
Jade Goldstein,Vibhu Mittal,Jaime Carbonell, and Jamie Callan.2000.Creating and evaluat-ing multi-document sentence extract summaries.
In Proceedings of the ninth international con-ference on Information and knowledge manage-ment,pages165–172.
Donna Harman and Ellen V oorhees,editors.2001. Proceedings of the Document Understanding Conference(DUC).NIST,New Orleans,USA. Vasileios Hatzivassiloglou,Judith L.Klavans, Melissa L.Holcombe,Regina Barzilay,Min-Yen Kan,and Kathleen R.McKeown.2001. Simfinder:Aflexible clustering tool for summa-rization.In Proceedings of workshop on Auto-matic Summarization,NAACL,Pittsburg,USA. Dorit S.Hochbaum.1997.Approximating cov-ering and packing problems:Set cover,vertex cover,independent set,and related problems.In Dorit S.Hochbaum,editor,Approximation Al-gorithms for NP-hard Problems,pages94–143. PWS Publishing Company,Boston,MA.
H.P.Edmundson.1968.New methods in automatic extracting.Journal of the Association for Com-puting Machinery,23(1):264–285,April. Julian Kupiec,Jan Pedersen,and Francine Chen. 1995.A trainable document summarizer.In Pro-ceedings of18th Annual International ACM SI-GIR Conference,pages68–73,Seattle,USA. Chin-Yew Lin and Eduard Hovy.1997.Identify-ing topic by position.In Proceedings of the5th Conference on Applied Natural Language Pro-cessing,ANLP,Washington,DC.
Chin-Yew Lin and Eduard Hovy.2003.Auto-matic evaluation of summaries using n-gram co-occurrence statistics.In Proceedings of2003 Language Technology Conference(HLT-NAACL 2003),Edmonton,Canada,May.
H.P.Luhn.1959.The automatic creation of litera-ture abstracts.IBM Journal of Research and De-velopment,2(2):159–165,April.
Daniel Marcu.1997.From discourse struc-tures to text summaries.In Proceedings of the ACL/EACL1997Workshop on Intelligent Scal-able Text Summarization,pages82–88,Spain. Simone Teufel and Marc Moens.1997.Sentence extraction as a classification task.In Proceedings of the ACL/EACL1997Workshop on Intelligent Scalable Text Summarizaion,Spain.
Ellen M.V oorhees.2003.Evaluating answers to definition questions.In Proceedings of HLT-NAACL,Edmonton,Canada,May.
Hong Yu and Vasileios Hatzivassiloglou.2003.To-wards answering opinion questions:Separating facts from opinions and identifying the polarity of opinion sentences.In Proceedings of the Confer-ence on Empirical Methods in Natural Language Processing(EMNLP),Sapporo,Japan,July.