手机版

Simple Text Processing(2)

发布时间:2021-06-06   来源:未知    
字号:

data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response

Document Management, Department of Linguistics, Uppsala University, VT999

Document Management, Department of Linguistics, Uppsala University, VT9910

String Matching - Algorithms (2)

String Distance

Sunday,Hume: mismatch optimizationQuick Search Maximal Shift Optimal Mismatch Tuned Boyer-Moore Least Cost Harrison: test for the occurrence of the pattern but don't nd its location, needs preprocessed static texts, uses hashing functions Karp-Rabin: comparison of signatures in a 'Brute-Forcemanner', uses hashing functions

Given strings x and y, with j x j= m, j y j= n, where m; n> 0, and a string distance metric, d, nd d(x; y). (Determine a minimal sequence of editing operations necessary to transform x to y

.

Document Management, Department of Linguistics, Uppsala University, VT9911

Document Management, Department of Linguistics, Uppsala University, VT9912

Longest Common Subsequence

Approximate String Matching

A subsequence of a string is obtained by eliminating zero or more symbols, not necessarily contiguous, from the string. A common subsequence of two strings is a string which occurs as a subsequence of both strings. A longest common subsequence of two strings is a common subsequence of the strings having maximal length.

Given a pattern string x, with j x j= m, text string y, with j y j= n, where m; n> 0 and m n, an integer k 0 and a distance function d, nd all the substrings, s, of y such that d(x; s) k.

data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response

Document Management, Department of Linguistics, Uppsala University, VT9913

Document Management, Department of Linguistics, Uppsala University, VT9914

Dynamic Programming - String Distancef (a; b)= 0 if a matches b, otherwise f (a; b)= 1 d1 1= f (x1; y1) 8i> 1: d 1= d?1 1+ f (x; y1 ) 8j> 1: d1= d1?1+ f (x1; y ) 8i; j> 1: d= min(d?1+1; d?1+1; d?1?1+ f (x; y )); i; i; i;j;j j i;j i;j i;j i;j i j;

Dynamic Programming - LCSf (a; b)= 1 if a matches b, otherwise f (a; b)= 0 d1 1= f (x1; y1) 8i> 1: d 1= d?1 1+ f (x; y1 ) 8j> 1: d1= d1?1+ f (x1; y ) 8i; j> 1: d= max(d?1; d?1; d?1?1+f (x; y ))i; i; i;j;j j i;j i;j i;j i;j i j

b a l a n s a r m e n s

b 0 1 2 3 4 5 6 7 8 9 10 11

a 1 0 1 2 3 4 5 6 7 8 9 10

l 2 1 0 1 2 3 4 5 6 7 8 9

a 3 2 1 0 1 2 3 4 5 6 7 8

n 4 3 2 1 0 1 2 3 4 5 6 7

c 5 4 3 2 1 1 2 3 4 5 6 7

e 6 5 4 3 2 2 2 3 4 4 5 6

7 6 5 4 3 3 3 3 4 5 5 6

a 8 7 6 5 4 4 3 4 4 5 6 6

r 9 8 7 6 5 5 4 3 4 5 6 7

m 10 9 8 7 6 6 5 4 3 4 5 6

b a l a n s a r m e n s

b 1 1 1 1 1 1 1 1 1 1 1 1

a 1 2 2 2 2 2 2 2 2 2 2 2

l 1 2 3 3 3 3 3 3 3 3 3 3

a 1 2 3 4 4 4 4 4 4 4 4 4

n 1 2 3 4 5 5 5 5 5 5 5 5

c 1 2 3 4 5 5 5 5 5 5 5 5

e 1 2 3 4 5 5 5 5 5 6 6 6

1 2 3 4 5 5 5 5 5 6 6 6

a 1 2 3 4 5 5 6 6 6 6 6 6

r 1 2 3 4 5 5 6 7 7 7 7 7

m 1 2 3 4 5 5 6 7 8 8 8 8

Document Management, Department of Linguistics, Uppsala University, VT9915

Document Management, Department of Linguistics, Uppsala University, VT9916

Heaviest Common Subsequence

Approximate String Matching

Landau-Vishkin k-mismatch: based on Knuth-Morris-

Using weight functions: 1. f (xi; yj )= w(i; j; xi) if xi matches yj, otherwise f (xi; yj )= 0 2. f (xi; yj )= w(i; j; xi; yj )

Landau-Vishkin k-distance: based on dynamic programming

Pratt string matching

Simple Text Processing(2).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
×
二维码
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)