The Knuth–Morris–Pratt (KMP) algorithm achieves linear time string matching by eliminating redundant comparisons. After a mismatch, it uses the Longest Prefix Suffix (LPS) array to skip ahead instead of re-checking characters that are known to match.
p. LPS[j] = length of the longest proper prefix of p[0..j] that is also a suffix.j > 0, set j = LPS[j-1] (i doesn’t change); with j = 0, advance i.Combined cost: O(m + n).
The preprocessing step computes the LPS (Longest Proper Prefix-Suffix) array, which encodes the structure of the pattern itself. For each position j, LPS[j] tells us how many characters can be skipped if a mismatch happens after position j.
Example: Suppose the pattern is ABABC. Here’s how the LPS array is derived:
p[0] = 'A' → no proper prefix/suffix → LPS[0] = 0p[1] = 'B' → prefixes/suffixes don’t match → LPS[1] = 0p[2] = 'A' → prefix A matches suffix A → LPS[2] = 1p[3] = 'B' → prefix AB matches suffix AB → LPS[3] = 2p[4] = 'C' → mismatch, fallback to LPS[3-1] = LPS[1] = 0 → LPS[4] = 0So, LPS = [0, 0, 1, 2, 0]. It means if a mismatch happens after matching up to p[3], we can skip directly to p[LPS[3]] = p[2] without re-checking the previous characters.
Naïve search may revisit the same text positions many times. KMP never moves i backward and only moves j backward along precomputed borders. Each character of the text is processed at most once; each character of the pattern participates in at most one decreasing sequence of fallbacks.
Define a potential function Φ = i + j. Each iteration changes Φ by at most 1 and Φ is bounded by n + m. Therefore, the total number of iterations is O(n + m). Intuitively: matches consume text; mismatches recycle partial knowledge via LPS without re-reading text.
Before each comparison, p[0..j-1] equals the suffix of t[0..i-1]. After a mismatch, setting j = LPS[j-1] preserves this invariant and avoids redundant work.
compute_lps() to kmp_search()compute_lps is a special case of KMP: it runs the same idea with the pattern as both text and pattern. That’s why its complexity is O(m) and why it seamlessly powers the O(n) search phase.
Jump to: Compute LPS · KMP Search
| Phase | Operation | Time |
|---|---|---|
| Preprocessing | Compute LPS array | O(m) |
| Search | Scan text once using LPS | O(n) |
| Total | No redundant rescans | O(m + n) |