算法导论32.5 Exercises 答案

32.5-1

如下两表,是分别执行完COMPUTE-SUFFIX-ARRAY的第2-7行和第8行后的结果。

\(\begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&8&9&1&\texttt{hi}\\ 2&9&16&2&\texttt{ip}\\ 3&16&16&3&\texttt{pp}\\ 4&16&9&4&\texttt{pi}\\ 5&9&20&5&\texttt{it}\\ 6&20&25&6&\texttt{ty}\\ 7&25&8&7&\texttt{yh}\\ 8&8&15&8&\texttt{ho}\\ 9&15&16&9&\texttt{op}\\ 10&16&16&10&\texttt{pp}\\ 11&16&9&11&\texttt{pi}\\ 12&9&20&12&\texttt{it}\\ 13&20&25&13&\texttt{ty}\\ 14&25&0&14&\texttt{y}\\ \end{array} \qquad \begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&8&9&1&\texttt{hi}\\ 2&8&15&8&\texttt{ho}\\ 3&9&16&2&\texttt{ip}\\ 4&9&20&5&\texttt{it}\\ 5&9&20&12&\texttt{it}\\ 6&15&16&9&\texttt{op}\\ 7&16&9&4&\texttt{pi}\\ 8&16&9&11&\texttt{pi}\\ 9&16&16&3&\texttt{pp}\\ 10&16&16&10&\texttt{pp}\\ 11&20&25&6&\texttt{ty}\\ 12&20&25&13&\texttt{ty}\\ 13&25&0&14&\texttt{y}\\ 14&25&8&7&\texttt{yh}\\ \end{array}\)

\(l=2\)时,分别执行完COMPUTE-SUFFIX-ARRAY的第11行,第12-17行和第18行的结果如下三个表所示。

\(\begin{array}{ccl} i&rank&\text{substring}\\\hline 1&1&\texttt{hi}\\ 2&3&\texttt{ip}\\ 3&7&\texttt{pp}\\ 4&6&\texttt{pi}\\ 5&4&\texttt{it}\\ 6&8&\texttt{ty}\\ 7&10&\texttt{yh}\\ 8&2&\texttt{ho}\\ 9&5&\texttt{op}\\ 10&7&\texttt{pp}\\ 11&6&\texttt{pi}\\ 12&4&\texttt{it}\\ 13&8&\texttt{ty}\\ 14&9&\texttt{y}\\ \end{array}\qquad \begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&1&7&1&\texttt{hipp}\\ 2&3&6&2&\texttt{ippi}\\ 3&7&4&3&\texttt{ppit}\\ 4&6&8&4&\texttt{pity}\\ 5&4&10&5&\texttt{ityh}\\ 6&8&2&6&\texttt{tyho}\\ 7&10&5&7&\texttt{yhop}\\ 8&2&7&8&\texttt{hopp}\\ 9&5&6&9&\texttt{oppi}\\ 10&7&4&10&\texttt{ppit}\\ 11&6&8&11&\texttt{pity}\\ 12&4&9&12&\texttt{ity}\\ 13&8&0&13&\texttt{ty}\\ 14&9&0&14&\texttt{y}\\ \end{array}\qquad \begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&1&7&1&\texttt{hipp}\\ 2&2&7&8&\texttt{hopp}\\ 3&3&6&2&\texttt{ippi}\\ 4&4&9&12&\texttt{ity}\\ 5&4&10&5&\texttt{ityh}\\ 6&5&6&9&\texttt{oppi}\\ 7&6&8&4&\texttt{pity}\\ 8&6&8&11&\texttt{pity}\\ 9&7&4&3&\texttt{ppit}\\ 10&7&4&10&\texttt{ppit}\\ 11&8&0&13&\texttt{ty}\\ 12&8&2&6&\texttt{tyho}\\ 13&9&0&14&\texttt{y}\\ 14&10&5&7&\texttt{yhop}\\ \end{array}\)

\(l=4\)时,分别执行完COMPUTE-SUFFIX-ARRAY的第11行,第12-17行和第18行的结果如下三个表所示。

\(\begin{array}{ccl} i&rank&\text{substring}\\\hline 1&1&\texttt{hipp}\\ 2&3&\texttt{ippi}\\ 3&8&\texttt{ppit}\\ 4&7&\texttt{pity}\\ 5&5&\texttt{ityh}\\ 6&10&\texttt{tyho}\\ 7&12&\texttt{yhop}\\ 8&2&\texttt{hopp}\\ 9&6&\texttt{oppi}\\ 10&8&\texttt{ppit}\\ 11&7&\texttt{pity}\\ 12&4&\texttt{ity}\\ 13&9&\texttt{ty}\\ 14&11&\texttt{y}\\ \end{array}\qquad \begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&1&5&1&\texttt{hippityh}\\ 2&3&10&2&\texttt{ippityho}\\ 3&8&12&3&\texttt{ppityhop}\\ 4&7&2&4&\texttt{pityhopp}\\ 5&5&6&5&\texttt{ityhoppi}\\ 6&10&8&6&\texttt{tyhoppit}\\ 7&12&7&7&\texttt{yhoppity}\\ 8&2&4&8&\texttt{hoppity}\\ 9&6&9&9&\texttt{oppity}\\ 10&8&11&10&\texttt{ppity}\\ 11&7&0&11&\texttt{pity}\\ 12&4&0&12&\texttt{ity}\\ 13&9&0&13&\texttt{ty}\\ 14&11&0&14&\texttt{y}\\ \end{array}\qquad \begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&1&5&1&\texttt{hippityh}\\ 2&2&4&8&\texttt{hoppity}\\ 3&3&10&2&\texttt{ippityho}\\ 4&4&0&12&\texttt{ity}\\ 5&5&6&5&\texttt{ityhoppi}\\ 6&6&9&9&\texttt{oppity}\\ 7&7&0&11&\texttt{pity}\\ 8&7&2&4&\texttt{pityhopp}\\ 9&8&11&10&\texttt{ppity}\\ 10&8&12&3&\texttt{ppityhop}\\ 11&9&0&13&\texttt{ty}\\ 12&10&8&6&\texttt{tyhoppit}\\ 13&11&0&14&\texttt{y}\\ 14&12&7&7&\texttt{yhoppity}\\ \end{array}\)

\(l=8\)时,分别执行完COMPUTE-SUFFIX-ARRAY的第11行,第12-17行和第18行的结果如下三个表所示。

\(\begin{array}{ccl} i&rank&\text{substring}\\\hline 1&1&\texttt{hippityh}\\ 2&3&\texttt{ippityho}\\ 3&10&\texttt{ppityhop}\\ 4&8&\texttt{pityhopp}\\ 5&5&\texttt{ityhoppi}\\ 6&12&\texttt{tyhoppit}\\ 7&14&\texttt{yhoppity}\\ 8&2&\texttt{hoppity}\\ 9&6&\texttt{oppity}\\ 10&9&\texttt{ppity}\\ 11&7&\texttt{pity}\\ 12&4&\texttt{ity}\\ 13&11&\texttt{ty}\\ 14&13&\texttt{y}\\ \end{array}\qquad\begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&1&6&1&\texttt{hippityhoppity}\\ 2&3&9&2&\texttt{ippityhoppity}\\ 3&10&7&3&\texttt{ppityhoppity}\\ 4&8&4&4&\texttt{pityhoppity}\\ 5&5&11&5&\texttt{ityhoppity}\\ 6&12&13&6&\texttt{tyhoppity}\\ 7&14&0&7&\texttt{yhoppity}\\ 8&2&0&8&\texttt{hoppity}\\ 9&6&0&9&\texttt{oppity}\\ 10&9&0&10&\texttt{ppity}\\ 11&7&0&11&\texttt{pity}\\ 12&4&0&12&\texttt{ity}\\ 13&11&0&13&\texttt{ty}\\ 14&13&0&14&\texttt{y}\\ \end{array}\qquad\begin{array}{ccccl} i&left\text{-}rank&right\text{-}rank&index&\text{substring}\\\hline 1&1&6&1&\texttt{hippityhoppity}\\ 2&2&0&8&\texttt{hoppity}\\ 3&3&9&2&\texttt{ippityhoppity}\\ 4&4&0&12&\texttt{ity}\\ 5&5&11&5&\texttt{ityhoppity}\\ 6&6&0&9&\texttt{oppity}\\ 7&7&0&11&\texttt{pity}\\ 8&8&4&4&\texttt{pityhoppity}\\ 9&9&0&10&\texttt{ppity}\\ 10&10&7&3&\texttt{ppityhoppity}\\ 11&11&0&13&\texttt{ty}\\ 12&12&13&6&\texttt{tyhoppity}\\ 13&13&0&14&\texttt{y}\\ 14&14&0&7&\texttt{yhoppity}\\ \end{array}\)

因此,得到的\(SA\)数组为\([1,8,2,12,5,9,11,4,10,3,13,6,14,7]\)

最终,按序填入\(LCP\)数组的顺序如下表所示:

\(\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}\hline i&1&2&3&4&5&6&7&8&9&10&11&12&13&14\\\hline s&1&3&10&8&5&12&14&2&6&9&7&4&11&13\\\hline LCP[i]&0&0&5&4&3&2&1&1&0&1&0&1&0&0\\\hline \end{array}\)

因此,得到的\(LCP\)数组为\([0,1,0,1,3,0,0,4,1,5,0,2,0,1]\)

32.5-2

基本思想是,如果不需要\(\lfloor\lg n\rfloor-1\)while循环就能够区分出所有后缀的排名,那么while循环可以提前终止。

具体细节是,在MAKE-RANKS的过程中,如果最后一名的后缀的排名已经恰好达到了\(n\),那么可以终止while循环。修改狗的算法由COMPUTE-SUFFIX-ARRAY'给出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
COMPUTE-SUFFIX-ARRAY'(T, n)
allocate arrays substr-rank[1:n], rank[1:n], and SA[1:n]
for i = 1 to n
substr-rank[i].left-rank = ord(T[i])
if i < n
substr-rank[i].right-rank = ord(T[i + 1])
else substr-rank[i].right-rank = 0
substr-rank[i].index = i
sort the array substr-rank into monotonically increasing order based on the left-rank attributes, using the right-rank attributes to break ties; if still a tie, the order does not matter
l = 2
while l < n
MAKE-RANKS(substr-rank, rank, n)
for i = 1 to n
substr-rank[i].left-rank = rank[i]
if i + l ≤ n
substr-rank[i].right-rank = rank[i + l]
else substr-rank[i].right-rank = 0
substr-rank[i].index = i
sort the array substr-rank into monotonically increasing order based on the left-rank attributes, using the right-rank attributes to break ties; if still a tie, the order does not matter
l = 2 * l
for i = 1 to n
SA[i] = substr-rank[i].index
return SA

32.5-3

使用一个不曾出现在\(T_1,T_2\)中的字符\(\texttt{@}\),并将其和\(T_1,T_2\)拼接起来,得到\(T=T_1\texttt{@}T_2\),其长度为\(n=n_1+n_2+1\)。对字符串\(T\)求出它的SA数组和LCP数组后,枚举\(T\)中每对排名相邻的后缀\(SA[i],SA[i-1]\)。如果这两个后缀来自\(T\)的不同部分(也就是其中一个来自\(T_1\),另一个来自\(T_2\))那么说明这两个后缀的最长公共前缀为\(T_1,T_2\)这两个字符串的子串之一,由于\(\texttt{@}\)不在\(T_1\)中,因此可以确保\(LCP\)不会恰好经过\(\texttt{@}\)。具体过程由程序LONGEST-COMMON-SUBSTRINGS给出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
LONGEST-COMMON-SUBSTRINGS(T1, T2, n1, n2)
T = T1 @ T2
n = n1 + n2 + 1
SA = COMPUTE-SUFFIX-ARRAY(T, n)
LCP = COMPUTE-LCP(T, SA, n)
len = 0
S = ∅
for i = 2 to n
l = SA[i - 1]
r = SA[i]
if min{l, r} <= n1 and max(l, r) > n1 + 1
if LCP[i] > len
len = LCP[i]
S = {min{l, r}}
else if lCP[i] == len
S = S ∪ {min{l, r}}
let string-list be a new array
for l in S
INSERT(string-list, T1[l : l + len - 1])
return string-list

32.5-4

这里首先需要解释一下这个约束\(SA[i-1]=n'-SA[i]-LCP[i]+2\)是怎么来的:

首先,\(T\)中有一个子串\([l,r]\)是回文串,那么在\(T'\)这个字符串中,其在后半段对应的位置中是\([l',r']\)。这意味着\(r-l+1=r'-l'+1=\text{len}\),由于\(T'\)的最后\(n\)个字符是由\(T\)反转而来,并且字符\(\texttt{@}\)处在下标\(\dfrac{n'+1}{2}\)中,因此有\(\dfrac{n'+1}{2}-r=l'-\dfrac{n'+1}{2}\)。最终联立上面两个式子可以得到\(n'+1=l'+l+\text{len}-1\),更进一步可以得到\(l'=n'-l-\text{len}+2\),和上面给定的约束很像。

因此,这个求解最长回文子串的算法的错误在于,它认为后缀\(T[l:]\)\(T[l':]\)\(SA\)数组中是相邻的,然而并非如此。

考虑字符串\(T=\texttt{abbcabb}\),那么有\(T'=\texttt{abbcabb@bbacbba}\)。等算法结束后,我们可以得到关于这个字符串的表格:

\(\begin{array}{cclc} i&SA[i]&\text{substring}&LCP[i]&SA[i-1]=n'-SA[i]-LCP[i]+2\\\hline 1&8&\texttt{@bbacbba}&0&\text{No}\\ 2&15&\texttt{a}&0&\text{No}\\ 3&5&\texttt{abb@bbacbba}&1&\text{No}\\ 4&1&\texttt{abbcabb@bbacbba}&3&\text{No}\\ 5&11&\texttt{acbba}&1&\text{No}\\ 6&7&\texttt{b@bbacbba}&0&\text{No}\\ 7&14&\texttt{ba}&1&\text{No}\\ 8&10&\texttt{bacbba}&2&\text{No}\\ 9&6&\texttt{bb@bbacbba}&1&\text{Yes}\\ 10&13&\texttt{bba}&2&\text{No}\\ 11&9&\texttt{bbacbba}&3&\text{No}\\ 12&2&\texttt{bbcabb@bbacbba}&2&\text{No}\\ 13&3&\texttt{bcabb@bbacbba}&1&\text{No}\\ 14&4&\texttt{cabb@bbacbba}&0&\text{No}\\ 15&12&\texttt{cbba}&1&\text{Yes}\\ \end{array}\)

由表格可知这个算法的输出结果为\(1\),但是实际上答案为\(2\)\(T\)一共有两个子串\(\texttt{bb}\)满足答案。其中一对\(\texttt{bb}\)及其在后半段的反转分别是\(T'[2:3],T'[13:14]\),但是在\(SA\)数组中,\(2\)\(13\)不相邻。同样的,另一对\(\texttt{bb}\)则及其反转分别是在\(T'[6:7],T'[9:10]\),但是在\(SA\)数组中,\(6\)\(10\)不相邻。

最终是因为约束\(SA[i-1]=n'-SA[i]-LCP[i]+2\)认为后缀\(T[l:]\)\(T[l':]\)\(SA\)数组中是相邻的,导致了错误。因此这个约束不正确。