data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response
Document Management, Department of Linguistics, Uppsala University, VT999
Document Management, Department of Linguistics, Uppsala University, VT9910
String Matching - Algorithms (2)
String Distance
Sunday,Hume: mismatch optimizationQuick Search Maximal Shift Optimal Mismatch Tuned Boyer-Moore Least Cost Harrison: test for the occurrence of the pattern but don't nd its location, needs preprocessed static texts, uses hashing functions Karp-Rabin: comparison of signatures in a 'Brute-Forcemanner', uses hashing functions
Given strings x and y, with j x j= m, j y j= n, where m; n> 0, and a string distance metric, d, nd d(x; y). (Determine a minimal sequence of editing operations necessary to transform x to y
.
Document Management, Department of Linguistics, Uppsala University, VT9911
Document Management, Department of Linguistics, Uppsala University, VT9912
Longest Common Subsequence
Approximate String Matching
A subsequence of a string is obtained by eliminating zero or more symbols, not necessarily contiguous, from the string. A common subsequence of two strings is a string which occurs as a subsequence of both strings. A longest common subsequence of two strings is a common subsequence of the strings having maximal length.
Given a pattern string x, with j x j= m, text string y, with j y j= n, where m; n> 0 and m n, an integer k 0 and a distance function d, nd all the substrings, s, of y such that d(x; s) k.
data retrieval information retrieval matching exact partial, best match inference deduction induction model deterministic probabilistic query language artificial natural query specification complete incomplete items wanted matching relevant error response
Document Management, Department of Linguistics, Uppsala University, VT9913
Document Management, Department of Linguistics, Uppsala University, VT9914
Dynamic Programming - String Distancef (a; b)= 0 if a matches b, otherwise f (a; b)= 1 d1 1= f (x1; y1) 8i> 1: d 1= d?1 1+ f (x; y1 ) 8j> 1: d1= d1?1+ f (x1; y ) 8i; j> 1: d= min(d?1+1; d?1+1; d?1?1+ f (x; y )); i; i; i;j;j j i;j i;j i;j i;j i j;
Dynamic Programming - LCSf (a; b)= 1 if a matches b, otherwise f (a; b)= 0 d1 1= f (x1; y1) 8i> 1: d 1= d?1 1+ f (x; y1 ) 8j> 1: d1= d1?1+ f (x1; y ) 8i; j> 1: d= max(d?1; d?1; d?1?1+f (x; y ))i; i; i;j;j j i;j i;j i;j i;j i j
b a l a n s a r m e n s
b 0 1 2 3 4 5 6 7 8 9 10 11
a 1 0 1 2 3 4 5 6 7 8 9 10
l 2 1 0 1 2 3 4 5 6 7 8 9
a 3 2 1 0 1 2 3 4 5 6 7 8
n 4 3 2 1 0 1 2 3 4 5 6 7
c 5 4 3 2 1 1 2 3 4 5 6 7
e 6 5 4 3 2 2 2 3 4 4 5 6
7 6 5 4 3 3 3 3 4 5 5 6
a 8 7 6 5 4 4 3 4 4 5 6 6
r 9 8 7 6 5 5 4 3 4 5 6 7
m 10 9 8 7 6 6 5 4 3 4 5 6
b a l a n s a r m e n s
b 1 1 1 1 1 1 1 1 1 1 1 1
a 1 2 2 2 2 2 2 2 2 2 2 2
l 1 2 3 3 3 3 3 3 3 3 3 3
a 1 2 3 4 4 4 4 4 4 4 4 4
n 1 2 3 4 5 5 5 5 5 5 5 5
c 1 2 3 4 5 5 5 5 5 5 5 5
e 1 2 3 4 5 5 5 5 5 6 6 6
1 2 3 4 5 5 5 5 5 6 6 6
a 1 2 3 4 5 5 6 6 6 6 6 6
r 1 2 3 4 5 5 6 7 7 7 7 7
m 1 2 3 4 5 5 6 7 8 8 8 8
Document Management, Department of Linguistics, Uppsala University, VT9915
Document Management, Department of Linguistics, Uppsala University, VT9916
Heaviest Common Subsequence
Approximate String Matching
Landau-Vishkin k-mismatch: based on Knuth-Morris-
Using weight functions: 1. f (xi; yj )= w(i; j; xi) if xi matches yj, otherwise f (xi; yj )= 0 2. f (xi; yj )= w(i; j; xi; yj )
Landau-Vishkin k-distance: based on dynamic programming
Pratt string matching