KMP String Match Deep Dive — Why the runtime is O(m + n)

Amortized view: i never moves backward; fallbacks only move j via LPS.

The Theory Behind KMP’s O(m + n) Time Complexity

The Knuth–Morris–Pratt (KMP) algorithm achieves linear time string matching by eliminating redundant comparisons. After a mismatch, it uses the Longest Prefix Suffix (LPS) array to skip ahead instead of re-checking characters that are known to match.

1) Two Phases

Combined cost: O(m + n).

1.1) Understanding the Preprocessing Step

The preprocessing step computes the LPS (Longest Proper Prefix-Suffix) array, which encodes the structure of the pattern itself. For each position j, LPS[j] tells us how many characters can be skipped if a mismatch happens after position j.

Example: Suppose the pattern is ABABC. Here’s how the LPS array is derived:

So, LPS = [0, 0, 1, 2, 0]. It means if a mismatch happens after matching up to p[3], we can skip directly to p[LPS[3]] = p[2] without re-checking the previous characters.

2) Why Not O(m·n)?

Naïve search may revisit the same text positions many times. KMP never moves i backward and only moves j backward along precomputed borders. Each character of the text is processed at most once; each character of the pattern participates in at most one decreasing sequence of fallbacks.

3) Amortized Proof Sketch

Define a potential function Φ = i + j. Each iteration changes Φ by at most 1 and Φ is bounded by n + m. Therefore, the total number of iterations is O(n + m). Intuitively: matches consume text; mismatches recycle partial knowledge via LPS without re-reading text.

4) Invariant

Before each comparison, p[0..j-1] equals the suffix of t[0..i-1]. After a mismatch, setting j = LPS[j-1] preserves this invariant and avoids redundant work.

5) Relationship of compute_lps() to kmp_search()

compute_lps is a special case of KMP: it runs the same idea with the pattern as both text and pattern. That’s why its complexity is O(m) and why it seamlessly powers the O(n) search phase.

Jump to: Compute LPS · KMP Search

6) Summary

PhaseOperationTime
PreprocessingCompute LPS arrayO(m)
SearchScan text once using LPSO(n)
TotalNo redundant rescansO(m + n)