2016年11月3日星期四

The number of distinct substrings

reference 
https://www.quora.com/Given-a-string-how-do-I-find-the-number-of-distinct-substrings-of-the-string

If you look through the prefixes of each suffix of a string, you have covered all substrings of that string.

分两步:
1.找到所有suffix
2.find number of distinct substrings  based on LCP(longest common prefix)

例子:BANANA

Suffixes are:
0) BANANA
1) ANANA
2) NANA
3) ANA
4) NA
5) A

因此go through every suffixes的prefix就构成了所有substrings。也就是对以ith位置作为substring头构成的所有substrings。对于每个suffix而言,能构成以ith位置为头的substring的个数就是这个suffix的长度

因此我们需要算的就是这些suffix中的重复量。我们不需要算出重复量,我们可以跳过重复计算不重复部分的长度

It would be a lot easier to go through the prefixes if we sort the above set of suffixes, as we can skip the repeated prefixes easily.

Sorted set of suffixes:
5) A
3) ANA
1) ANANA
0) BANANA
4) NA
2) NANA

LCP = Longest Common Prefix of 2 strings.

初始化
ans = length(first suffix) = length("A") = 1.

The consecutive pairs of suffixes, i.e, [A, ANA], [ANA, ANANA], [ANANA, BANANA], etc. from the above set of sorted suffixes.

We can see that,
LCP("A", "ANA") = "A".
因此对于"ANA"而言,能构成不和前面重复的prefix substring就是相当于suffix为NA

So we have, 
  1. ans += length("ANA") - LCP("A", "ANA")
  2. ans = ans + 3 - 1 = ans + 2 = 3

Do the same for the next pair of consecutive suffixes: ["ANA", "ANANA"]
  1. LCP("ANA", "ANANA") = "ANA".
  2. ans += length("ANANA") - length(LCP)
  3. => ans = ans + 5 - 3
  4. => ans = 3 + 2 = 5.

没有评论:

发表评论