Pattern Matching - Interactive Tutorial

🤔 Why Should You Care About Pattern Matching?

Pattern matching is everywhere in modern technology! Let's see where you encounter it daily:

📝

Text Editors & IDEs

When you press Ctrl+F to find text, or when your IDE highlights syntax errors, that's pattern matching in action! VS Code uses Boyer-Moore for fast text search.

🧬

DNA Sequencing

Scientists use pattern matching to find specific gene sequences in DNA strands. A single human genome has 3 billion base pairs - imagine finding a specific sequence without efficient algorithms!

🔎

Search Engines

Google processes billions of searches daily. They use advanced pattern matching to find your query in trillions of web pages. Without efficient algorithms, searches would take forever!

🛡️

Security & Antivirus

Antivirus software scans files for known virus signatures using pattern matching. They need to check millions of patterns quickly without slowing down your computer.

📚

Plagiarism Detection

Tools like Turnitin use sophisticated pattern matching to compare submitted work against billions of documents to detect potential plagiarism.

🌟 The Basics - Let's Start Simple!

What is Pattern Matching?

Imagine you're reading a book and looking for every occurrence of the word "magic". Pattern matching is teaching a computer to do exactly that - find a specific pattern (like "magic") in a larger text (like your book).

Try It: Naive Pattern Matching

See how the simplest algorithm checks every position:

Your First Pattern Matching Algorithm 🎯

# The Naive Approach - Simple but it works!
def find_pattern_simple(text, pattern):
    # Think of it like sliding a ruler along a page
    matches = []
    
    # Try every starting position in the text
    for start in range(len(text) - len(pattern) + 1):
        # Check if pattern matches at this position
        found = True
        for i in range(len(pattern)):
            if text[start + i] != pattern[i]:
                found = False
                break  # No match, try next position
        
        if found:
            matches.append(start)  # Found it! 🎉
    
    return matches

# Let's try it!
text = "The cat in the hat sat on the mat"
pattern = "at"
positions = find_pattern_simple(text, pattern)
print(f"Found '{pattern}' at positions: {positions}")
# Output: Found 'at' at positions: [5, 16, 24, 34]

How It Works - Step by Step

1

Start at the beginning: Place the pattern at position 0 of the text

2

Compare characters: Check each character of the pattern with the text

3

Match or Move: If all match, record position. Either way, slide pattern one position right

4

Repeat: Continue until the pattern can't fit in remaining text

⚠️ Common Mistake: Off-by-One Error

# ❌ WRONG - This will miss the last possible position!
for start in range(len(text) - len(pattern)):  # Missing + 1
    # ... rest of code

Why it's wrong: If text="ABC" and pattern="BC", we need to check position 1, but range(3-2) only gives us position 0!

✅ The Right Way

# ✅ CORRECT - Includes all valid positions
for start in range(len(text) - len(pattern) + 1):
    # Now we check all positions where pattern can fit

🎨 Common Pattern Matching Algorithms

Intermediate

KMP Algorithm - The Smart Skipper

KMP (Knuth-Morris-Pratt) is like having a smart bookmark that remembers partial matches!

🎮 Interactive KMP Visualizer

See how KMP builds its "failure function" to skip unnecessary comparisons:

KMP Algorithm - Skip the Redundant Checks! 🏃‍♂️

# KMP - It's like having a smart assistant that remembers!
def build_lps_array(pattern):
    """LPS = Longest Proper Prefix which is also Suffix
    It tells us: 'If we fail at position i, where should we restart?'"""
    
    lps = [0] * len(pattern)
    length = 0  # Length of previous longest prefix suffix
    i = 1
    
    while i < len(pattern):
        if pattern[i] == pattern[length]:
            # We found a match! Extend the prefix
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                # Try shorter prefix
                length = lps[length - 1]
            else:
                # No prefix matches
                lps[i] = 0
                i += 1
    
    return lps

def kmp_search(text, pattern):
    """KMP - Smart pattern matching that never goes backward!"""
    lps = build_lps_array(pattern)
    matches = []
    
    text_idx = 0
    pattern_idx = 0
    
    while text_idx < len(text):
        if text[text_idx] == pattern[pattern_idx]:
            # Characters match! Move both forward
            text_idx += 1
            pattern_idx += 1
        
        if pattern_idx == len(pattern):
            # Found complete match! 🎉
            matches.append(text_idx - pattern_idx)
            pattern_idx = lps[pattern_idx - 1]
        elif text_idx < len(text) and text[text_idx] != pattern[pattern_idx]:
            # Mismatch! Use LPS to skip smartly
            if pattern_idx != 0:
                pattern_idx = lps[pattern_idx - 1]
            else:
                text_idx += 1
    
    return matches

# Example: Finding "ABAB" in text
text = "ABABDABACDABABCABAB"
pattern = "ABAB"
print(f"KMP found matches at: {kmp_search(text, pattern)}")

Intermediate

Rabin-Karp - The Hash Detective 🔍

Instead of comparing characters, Rabin-Karp uses math! It calculates a "fingerprint" (hash) for the pattern and checks if any part of the text has the same fingerprint.

🎮 Rolling Hash Calculator

See how the hash "rolls" as we slide the window:

Rabin-Karp - Pattern Matching with Math! 🔢

# Rabin-Karp - Like comparing fingerprints instead of faces!
def rabin_karp(text, pattern, prime=101):
    """Uses rolling hash for fast pattern matching"""
    m = len(pattern)
    n = len(text)
    d = 256  # Number of characters in alphabet
    pattern_hash = 0
    text_hash = 0
    h = 1
    matches = []
    
    # Calculate h = d^(m-1) % prime
    for i in range(m - 1):
        h = (h * d) % prime
    
    # Calculate initial hash values
    for i in range(m):
        pattern_hash = (d * pattern_hash + ord(pattern[i])) % prime
        text_hash = (d * text_hash + ord(text[i])) % prime
    
    # Slide the pattern over text
    for i in range(n - m + 1):
        # If hash values match, double-check character by character
        if pattern_hash == text_hash:
            if text[i:i+m] == pattern:
                matches.append(i)  # Found it! 🎯
        
        # Roll the hash to next window
        if i < n - m:
            # Remove leading character, add trailing character
            text_hash = (d * (text_hash - ord(text[i]) * h) + 
                        ord(text[i + m])) % prime
            
            # Handle negative hash value
            if text_hash < 0:
                text_hash += prime
    
    return matches

# Pro tip: Great for finding multiple patterns!
text = "The quick brown fox jumps over the lazy dog"
pattern = "the"
print(f"Found '{pattern}' at: {rabin_karp(text.lower(), pattern)}")

Advanced

Boyer-Moore - The Backwards Genius 🔄

Boyer-Moore reads the pattern backwards! When it finds a mismatch, it can skip many positions at once.

Boyer-Moore - Skip Like a Pro! ⚡

# Boyer-Moore - Why check every character when you can skip?
def boyer_moore_simplified(text, pattern):
    """Simplified Boyer-Moore with bad character rule"""
    
    # Build bad character table
    # It tells us: "If we see character X, how far can we skip?"
    bad_char = {}
    for i in range(len(pattern)):
        bad_char[pattern[i]] = i
    
    matches = []
    shift = 0
    
    while shift <= len(text) - len(pattern):
        j = len(pattern) - 1  # Start from the end!
        
        # Compare backwards
        while j >= 0 and pattern[j] == text[shift + j]:
            j -= 1
        
        if j < 0:
            # Found a match! 🎉
            matches.append(shift)
            shift += len(pattern)
        else:
            # Mismatch - skip smartly!
            bad_char_shift = j - bad_char.get(text[shift + j], -1)
            shift += max(1, bad_char_shift)
    
    return matches

# Boyer-Moore shines with long patterns!
text = "HERE IS A SIMPLE EXAMPLE"
pattern = "EXAMPLE"
print(f"Boyer-Moore found: {boyer_moore_simplified(text, pattern)}")

💪 Practice Problems

Beginner

Problem 1: Count Pattern Occurrences

Write a function that counts how many times a pattern appears in a text (including overlapping occurrences).

Think About It First! 🤔

If text = "AAAA" and pattern = "AA", how many occurrences are there?

Hint: Don't skip ahead after finding a match!

Show Solution

def count_overlapping(text, pattern):
    count = 0
    for i in range(len(text) - len(pattern) + 1):
        if text[i:i+len(pattern)] == pattern:
            count += 1
    return count

# Test: "AAAA" has 3 overlapping "AA"s
print(count_overlapping("AAAA", "AA"))  # Output: 3

Intermediate

Problem 2: Find All Anagrams

Find all starting positions in text where an anagram of the pattern exists.

Think About It First! 🤔

An anagram has the same characters but in any order. How can you check this efficiently?

Show Solution

from collections import Counter

def find_anagrams(text, pattern):
    result = []
    pattern_count = Counter(pattern)
    window_count = Counter(text[:len(pattern)])
    
    # Check first window
    if window_count == pattern_count:
        result.append(0)
    
    # Slide the window
    for i in range(len(pattern), len(text)):
        # Add new character
        window_count[text[i]] += 1
        
        # Remove old character
        old_char = text[i - len(pattern)]
        window_count[old_char] -= 1
        if window_count[old_char] == 0:
            del window_count[old_char]
        
        # Check if current window is anagram
        if window_count == pattern_count:
            result.append(i - len(pattern) + 1)
    
    return result

# Test: Find anagrams of "abc"
print(find_anagrams("cbaebabacd", "abc"))  # Output: [0, 6]

Advanced

Problem 3: Longest Repeating Substring

Find the longest substring that appears at least twice in the text.

Think About It First! 🤔

You'll need to check all possible substrings. How can you optimize this?

Show Solution

def longest_repeating_substring(text):
    n = len(text)
    longest = ""
    
    # Try all possible lengths, starting from longest
    for length in range(n - 1, 0, -1):
        seen = set()
        for i in range(n - length + 1):
            substring = text[i:i + length]
            if substring in seen:
                return substring  # Found repeating substring!
            seen.add(substring)
    
    return longest

# Test
print(longest_repeating_substring("banana"))  # Output: "ana"
print(longest_repeating_substring("abcdefg"))  # Output: ""

🚀 Advanced Algorithms

Advanced

Z-Algorithm - The Prefix Master

The Z-algorithm creates an array where Z[i] tells you the length of the longest substring starting from position i that matches a prefix of the string.

Z-Algorithm - Linear Time Magic! ✨

def z_algorithm(s):
    """Build Z array in linear time"""
    n = len(s)
    z = [0] * n
    l, r = 0, 0  # Z-box boundaries
    
    for i in range(1, n):
        if i <= r:
            # We're inside a Z-box, use previous values
            z[i] = min(r - i + 1, z[i - l])
        
        # Try to extend the match
        while i + z[i] < n and s[z[i]] == s[i + z[i]]:
            z[i] += 1
        
        # Update Z-box if we found a longer match
        if i + z[i] - 1 > r:
            l, r = i, i + z[i] - 1
    
    return z

def z_pattern_search(text, pattern):
    """Use Z-algorithm for pattern matching"""
    # Concatenate pattern + separator + text
    combined = pattern + "$" + text
    z = z_algorithm(combined)
    
    matches = []
    pattern_len = len(pattern)
    
    # Check where Z value equals pattern length
    for i in range(pattern_len + 1, len(combined)):
        if z[i] == pattern_len:
            matches.append(i - pattern_len - 1)
    
    return matches

Advanced

Aho-Corasick - Multiple Pattern Search Champion

When you need to find multiple patterns at once (like virus signatures or banned words), Aho-Corasick builds a trie of all patterns and searches them simultaneously!

Aho-Corasick - Find All Patterns at Once! 🎯

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.patterns = []  # Patterns ending at this node

def find_multiple_patterns(text, patterns):
    """Simple version of Aho-Corasick for multiple patterns"""
    
    # Build trie from patterns
    root = TrieNode()
    for pattern in patterns:
        node = root
        for char in pattern:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
        node.patterns.append(pattern)
    
    # Search all patterns simultaneously
    results = {pattern: [] for pattern in patterns}
    
    for start in range(len(text)):
        node = root
        for i in range(start, len(text)):
            if text[i] not in node.children:
                break
            node = node.children[text[i]]
            if node.is_end:
                for pattern in node.patterns:
                    results[pattern].append(start)
    
    return results

# Find multiple patterns in one pass!
text = "ushers"
patterns = ["he", "she", "hers"]
matches = find_multiple_patterns(text, patterns)
print(f"Found: {matches}")
# Output: {'he': [2], 'she': [1], 'hers': [2]}

📖 Quick Reference

Algorithm Comparison Chart

Algorithm	Time Complexity	When to Use	Plain English Speed
`Naive`	O(n × m)	Short patterns, simple cases	🐢 Slow but simple
`KMP`	O(n + m)	Need guaranteed linear time	⚡ Fast and consistent
`Rabin-Karp`	O(n + m) average	Multiple pattern search	🎲 Usually fast
`Boyer-Moore`	O(n/m) best case	Long patterns, large alphabet	🚀 Super fast for long patterns
`Z-Algorithm`	O(n + m)	Need prefix matches	⚡ Linear and elegant
`Aho-Corasick`	O(n + m + z)	Multiple patterns at once	💫 Find everything in one pass

Use Naive When...

Pattern is very short (< 10 chars)
Text is small
Simplicity matters most
One-time search

Use KMP When...

Pattern has repetitive parts
Need guaranteed O(n) time
Streaming data
Can't go backwards

Use Boyer-Moore When...

Pattern is long
Large alphabet (like English)
Speed is critical
Text is very long

Use Rabin-Karp When...

Searching multiple patterns
Pattern matching in 2D
Plagiarism detection
Rolling hash useful

Common Pitfalls to Avoid

❌ Pitfall #1: Hash Collisions in Rabin-Karp

Always double-check matches character by character when hashes match!

❌ Pitfall #2: Off-by-One in Loop Bounds

Remember: range(len(text) - len(pattern) + 1) not just - len(pattern)

❌ Pitfall #3: Not Handling Empty Patterns

Always check if pattern is empty before searching!

🔍 Pattern Matching Algorithms

🤔 Why Should You Care About Pattern Matching?

Text Editors & IDEs

DNA Sequencing

Search Engines

Security & Antivirus

Plagiarism Detection

🌟 The Basics - Let's Start Simple!

What is Pattern Matching?

Try It: Naive Pattern Matching

How It Works - Step by Step

⚠️ Common Mistake: Off-by-One Error

✅ The Right Way

🎨 Common Pattern Matching Algorithms

KMP Algorithm - The Smart Skipper

🎮 Interactive KMP Visualizer

Rabin-Karp - The Hash Detective 🔍

🎮 Rolling Hash Calculator

Boyer-Moore - The Backwards Genius 🔄

💪 Practice Problems

Problem 1: Count Pattern Occurrences

Think About It First! 🤔

Problem 2: Find All Anagrams

Think About It First! 🤔

Problem 3: Longest Repeating Substring

Think About It First! 🤔

🚀 Advanced Algorithms

Z-Algorithm - The Prefix Master

Aho-Corasick - Multiple Pattern Search Champion

📖 Quick Reference

Algorithm Comparison Chart

Use Naive When...

Use KMP When...

Use Boyer-Moore When...

Use Rabin-Karp When...

Common Pitfalls to Avoid

❌ Pitfall #1: Hash Collisions in Rabin-Karp

❌ Pitfall #2: Off-by-One in Loop Bounds

❌ Pitfall #3: Not Handling Empty Patterns