手机版

Simple Text Processing

发布时间:2021-06-06   来源:未知    
字号:

data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response

1

Document Management, Department of Linguistics, Uppsala University, VT992

Information Retrieval

Document Life Cycle

Document Management, Department of Linguistics, Uppsala University, VT993

Document Management, Department of Linguistics, Uppsala University, VT994

Data Retrieval versus Information Retrieval

Precision and Recall

matching inference model query language query speci cation items wanted error response

data retrieval exact deduction deterministic arti cial complete matching sensitive

information retrieval partial, best match induction probabilistic natural incomplete relevant insensitive

precision= j Dj\ D j Djr

\ recall= j D D Dj j jr r

deduction: p(a); 8x: p(x) ! q(x) ) q(a) induction: p(a); q(a) ) 8x: p(x) ! q(x)

D: documents returned by the system D: relevant documentsr

data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response

Document Management, Department of Linguistics, Uppsala University, VT995

Document Management, Department of Linguistics, Uppsala University, VT996

Simple Text Processing

Searching StringsSearch Pattern

searching strings string replacement sorting tokenization text segmentation (sentences, phrases, MWU's) frequency counts

simple strings expressionsString Searching Problems

string matching string distance longest common subsequence approximate string matching longest repeated substring

Document Management, Department of Linguistics, Uppsala University, VT997

Document Management, Department of Linguistics, Uppsala University, VT998

String Matching

String Matching - Algorithms (1)

Given pattern string x, with j x j= m, and text string y, with j y j= n, where m; n> 0 and m n, if x occurs as a substring of y then determine the position within y of the rst occurrence of x, i.e. return the least value of i such that y(i; i+ m? 1)= x(1; m).

at successive positions. worst case: O(mn) average case: O(m+ n) Knuth-Morris-Pratt: calculate a next-table for the pattern x which will be used for shifting the pattern when mismatching worst case: O(m+ n) Boyer-Moore: scan the string string from left to right, but match symbols from right to left (disregard portions which cannot possibly match the pattern) worst case: O(m+ n) average case: O(n=m) (large alphabet, small pattern)

Brute Force: compare pattern x with substrings of y

Simple Text Processing.doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
×
二维码
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)