Master the art of finding patterns in text! From simple search to advanced algorithms like KMP and Boyer-Moore, learn how search engines, DNA analyzers, and text editors find what they're looking for.
Pattern matching is everywhere in modern technology! Let's see where you encounter it daily:
When you press Ctrl+F to find text, or when your IDE highlights syntax errors, that's pattern matching in action! VS Code uses Boyer-Moore for fast text search.
Scientists use pattern matching to find specific gene sequences in DNA strands. A single human genome has 3 billion base pairs - imagine finding a specific sequence without efficient algorithms!
Google processes billions of searches daily. They use advanced pattern matching to find your query in trillions of web pages. Without efficient algorithms, searches would take forever!
Antivirus software scans files for known virus signatures using pattern matching. They need to check millions of patterns quickly without slowing down your computer.
Tools like Turnitin use sophisticated pattern matching to compare submitted work against billions of documents to detect potential plagiarism.
Imagine you're reading a book and looking for every occurrence of the word "magic". Pattern matching is teaching a computer to do exactly that - find a specific pattern (like "magic") in a larger text (like your book).
See how the simplest algorithm checks every position:
Why it's wrong: If text="ABC" and pattern="BC", we need to check position 1, but range(3-2) only gives us position 0!
KMP (Knuth-Morris-Pratt) is like having a smart bookmark that remembers partial matches!
See how KMP builds its "failure function" to skip unnecessary comparisons:
Instead of comparing characters, Rabin-Karp uses math! It calculates a "fingerprint" (hash) for the pattern and checks if any part of the text has the same fingerprint.
See how the hash "rolls" as we slide the window:
Boyer-Moore reads the pattern backwards! When it finds a mismatch, it can skip many positions at once.
Write a function that counts how many times a pattern appears in a text (including overlapping occurrences).
If text = "AAAA" and pattern = "AA", how many occurrences are there?
Hint: Don't skip ahead after finding a match!
Find all starting positions in text where an anagram of the pattern exists.
An anagram has the same characters but in any order. How can you check this efficiently?
Find the longest substring that appears at least twice in the text.
You'll need to check all possible substrings. How can you optimize this?
The Z-algorithm creates an array where Z[i] tells you the length of the longest substring starting from position i that matches a prefix of the string.
When you need to find multiple patterns at once (like virus signatures or banned words), Aho-Corasick builds a trie of all patterns and searches them simultaneously!
Algorithm | Time Complexity | When to Use | Plain English Speed |
---|---|---|---|
Naive |
O(n × m) | Short patterns, simple cases | 🐢 Slow but simple |
KMP |
O(n + m) | Need guaranteed linear time | ⚡ Fast and consistent |
Rabin-Karp |
O(n + m) average | Multiple pattern search | 🎲 Usually fast |
Boyer-Moore |
O(n/m) best case | Long patterns, large alphabet | 🚀 Super fast for long patterns |
Z-Algorithm |
O(n + m) | Need prefix matches | ⚡ Linear and elegant |
Aho-Corasick |
O(n + m + z) | Multiple patterns at once | 💫 Find everything in one pass |
Always double-check matches character by character when hashes match!
Remember: range(len(text) - len(pattern) + 1)
not just - len(pattern)
Always check if pattern is empty before searching!