Design a Real-Time Fraud Detection System

Hard 30 min read

Problem Statement & Requirements

Why Fraud Detection Matters

Global payment fraud exceeds $30 billion per year. Every major payment processor (Stripe, PayPal, Visa) runs real-time fraud detection on every transaction. The system must make a block/allow decision in milliseconds while maintaining an extremely low false positive rate — blocking legitimate transactions costs revenue and customer trust.

Think of fraud detection like airport security with multiple screening layers. The first layer is a quick metal detector (rules engine). The second is an X-ray machine (ML model). Suspicious items get additional manual inspection (human review). Each layer catches different threats, and together they provide defense in depth.

Functional Requirements

Non-Functional Requirements

Back-of-Envelope Estimation

ParameterEstimate
Transactions per second10,000 (peak: 50,000)
Fraud rate0.1% (1 in 1,000 transactions)
Features per transaction200-500
Feature computation budget<30ms
ML inference budget<20ms
Total latency budget<100ms
Historical transactions stored2 years (~600B transactions)
Model retraining frequencyDaily (full) + hourly (incremental)

System API Design

Fraud Detection APIs
# Score a transaction (called by payment gateway)
POST /api/v1/transactions/score
{
  "transaction_id": "txn_abc123",
  "amount": 499.99,
  "currency": "USD",
  "merchant_id": "merch_456",
  "user_id": "user_789",
  "card_hash": "sha256_xxx",
  "ip_address": "203.0.113.42",
  "device_fingerprint": "fp_xyz",
  "timestamp": "2024-01-15T10:30:00Z"
}
# Response (must return in <100ms)
{
  "decision": "allow",  // allow, block, review
  "risk_score": 0.12,
  "triggered_rules": [],
  "model_version": "v4.2"
}

# Submit fraud/legitimate label (feedback loop)
POST /api/v1/transactions/label
{
  "transaction_id": "txn_abc123",
  "label": "fraud",
  "source": "chargeback"
}

# Manage rules
POST /api/v1/rules
{
  "name": "high_velocity_check",
  "condition": "txn_count_1h > 10 AND amount > 500",
  "action": "block"
}

Data Model

Core Schema
CREATE TABLE transactions (
    txn_id        VARCHAR PRIMARY KEY,
    user_id       VARCHAR,
    merchant_id   VARCHAR,
    amount        DECIMAL(12,2),
    currency      VARCHAR(3),
    risk_score    FLOAT,
    decision      VARCHAR,
    label         VARCHAR,  -- fraud, legitimate, null (unknown)
    features      JSONB,
    timestamp     TIMESTAMP
) PARTITION BY RANGE (timestamp);

CREATE TABLE rules (
    rule_id       VARCHAR PRIMARY KEY,
    name          TEXT,
    condition     TEXT,   -- expression DSL
    action        VARCHAR, -- block, review, score_boost
    enabled       BOOLEAN,
    priority      INT
);

CREATE TABLE alerts (
    alert_id      VARCHAR PRIMARY KEY,
    txn_id        VARCHAR,
    status        VARCHAR,  -- open, investigating, resolved
    assigned_to   VARCHAR,
    resolution    VARCHAR,  -- confirmed_fraud, false_positive
    created_at    TIMESTAMP
);

High-Level Architecture

The system has two paths: a real-time scoring path (<100ms) and an offline learning path (hours-days).

Event Ingestion

Transaction events arrive via Kafka. Each event triggers the scoring pipeline. Events are also stored for offline analysis and model retraining.

Feature Engine

Computes 200-500 features in real-time: user velocity (transactions in last 1h), merchant risk, device reputation, geo-anomaly, amount deviation. Pulls pre-computed features from the feature store and computes session-level features on the fly.

Rule Engine

Evaluates deterministic rules first (blocklists, velocity limits, impossible travel). Fast and interpretable. Rules can be updated instantly without model retraining.

ML Scoring

Runs the fraud model on the feature vector. Outputs a risk score (0-1). Ensemble of gradient boosted trees (fast) and neural network (accurate). Combined with rule engine score for final decision.

Decision Engine

Combines rule and ML scores. Applies business logic: score <0.3 = allow, 0.3-0.7 = review, >0.7 = block. Thresholds tuned per merchant category and risk appetite.

Deep Dive: Core Components

Real-Time Feature Engineering

Streaming Feature Computation
class FraudFeatureEngine:
    def compute_features(self, txn, feature_store):
        user_id = txn["user_id"]
        # Pre-computed features from feature store (<5ms)
        stored = feature_store.get_online(
            entity="user", id=user_id,
            features=["avg_txn_30d", "account_age",
                      "device_count", "country_count_7d"]
        )
        # Real-time windowed aggregations (<10ms)
        velocity = self.redis.get_sliding_window(
            f"velocity:{user_id}", window="1h"
        )
        # Derived features
        features = {
            "amount_deviation": (
                txn["amount"] - stored["avg_txn_30d"]
            ) / max(stored["avg_txn_30d"], 1),
            "txn_count_1h": velocity["count"],
            "txn_sum_1h": velocity["sum"],
            "is_new_device": txn["device_fingerprint"]
                not in stored.get("known_devices", []),
            "is_new_country": txn["country"]
                != stored.get("home_country"),
            **stored  # Include all pre-computed features
        }
        return features

Rule Engine + ML Hybrid

Why Both Rules AND ML?

Rules are fast, interpretable, and instantly updatable. Use them for known fraud patterns (stolen card lists, impossible travel, velocity limits). ML models catch novel patterns that rules miss. The hybrid approach provides defense in depth: rules for known threats, ML for unknown threats.

Handling Class Imbalance

Only 0.1% of transactions are fraudulent. Training on raw data gives a model that predicts "legitimate" 99.9% of the time. Solutions:

Graph-Based Fraud Detection

Fraud rings involve coordinated accounts. Build a transaction graph: nodes are users, merchants, devices, IPs. Edges connect related entities. Use graph algorithms (community detection, PageRank) to identify suspicious clusters sharing devices or addresses.

Concept Drift

Fraudsters Adapt

Fraud patterns change constantly. A model trained on last month's data may miss this month's attack vectors. Monitor for concept drift by tracking: (1) feature distribution shifts, (2) model score distribution changes, (3) rising false negative rate. Retrain daily and deploy new models via canary rollout.

Scaling & Optimization

Stream Processing Architecture

Use Kafka for event ingestion and Flink for real-time feature computation. Flink maintains sliding window state for velocity features. Back-pressure handling prevents queue buildup during traffic spikes.

Low-Latency Model Serving

Feedback Loop Latency

Label SourceDelayVolume
Manual reviewMinutes-hours~1% of transactions
Chargebacks30-90 days~0.1% of transactions
User reportsHours-days~0.05%
Auto-confirmed legitimate7 days (no dispute)~99%

Practice Problems

Practice 1: New Merchant Onboarding

A new merchant joins your platform with zero transaction history. Your ML model has no merchant-level features. Design a cold-start strategy that provides fraud protection without excessive false positives.

Practice 2: Coordinated Attack

You detect 500 small transactions ($1-5) from different accounts hitting the same merchant within 10 minutes — a card testing attack. Design a detection mechanism that catches this pattern in real-time.

Practice 3: Regional Compliance

EU regulations (PSD2/SCA) require different fraud thresholds than US markets. Design a system that applies region-specific rules and models while sharing global fraud signals.

Quick Reference

ComponentTechnologyPurpose
Event StreamingKafkaTransaction ingestion
Stream ProcessingFlink / Kafka StreamsReal-time feature computation
Feature StoreRedis / FeastLow-latency feature serving
Rule EngineDrools / Custom DSLDeterministic fraud rules
ML ModelXGBoost / LightGBMFraud scoring (<5ms inference)
Graph AnalysisNeo4j / TigerGraphFraud ring detection
Case ManagementCustom / SaaSHuman review workflow

Key Takeaways

  • Use a layered approach: rules for known patterns, ML for novel fraud
  • Real-time features (velocity, device, geo) are the strongest fraud signals
  • Handle class imbalance with cost-sensitive learning or oversampling
  • Monitor for concept drift and retrain models daily
  • Design for <100ms latency — use CPU-optimized models, not GPU
  • Graph analysis catches coordinated fraud that individual scoring misses