Problem Statement & Requirements
Why Fraud Detection Matters
Global payment fraud exceeds $30 billion per year. Every major payment processor (Stripe, PayPal, Visa) runs real-time fraud detection on every transaction. The system must make a block/allow decision in milliseconds while maintaining an extremely low false positive rate — blocking legitimate transactions costs revenue and customer trust.
Think of fraud detection like airport security with multiple screening layers. The first layer is a quick metal detector (rules engine). The second is an X-ray machine (ML model). Suspicious items get additional manual inspection (human review). Each layer catches different threats, and together they provide defense in depth.
Functional Requirements
- Real-time scoring — Score every transaction before authorization
- Rule engine — Configurable rules (velocity checks, blocklists, thresholds)
- ML model inference — Run trained fraud models on transaction features
- Alerting — Flag high-risk transactions for human review
- Feedback loop — Incorporate confirmed fraud/legitimate labels for retraining
- Case management — Investigation workflow for flagged transactions
Non-Functional Requirements
- Decision latency — <100ms for block/allow decision
- False positive rate — <0.1% (1 in 1,000 legitimate transactions incorrectly blocked)
- Fraud detection rate — >95% of fraudulent transactions caught
- Availability — 99.99% (downtime = all transactions auto-approved)
Back-of-Envelope Estimation
| Parameter | Estimate |
|---|---|
| Transactions per second | 10,000 (peak: 50,000) |
| Fraud rate | 0.1% (1 in 1,000 transactions) |
| Features per transaction | 200-500 |
| Feature computation budget | <30ms |
| ML inference budget | <20ms |
| Total latency budget | <100ms |
| Historical transactions stored | 2 years (~600B transactions) |
| Model retraining frequency | Daily (full) + hourly (incremental) |
System API Design
# Score a transaction (called by payment gateway)
POST /api/v1/transactions/score
{
"transaction_id": "txn_abc123",
"amount": 499.99,
"currency": "USD",
"merchant_id": "merch_456",
"user_id": "user_789",
"card_hash": "sha256_xxx",
"ip_address": "203.0.113.42",
"device_fingerprint": "fp_xyz",
"timestamp": "2024-01-15T10:30:00Z"
}
# Response (must return in <100ms)
{
"decision": "allow", // allow, block, review
"risk_score": 0.12,
"triggered_rules": [],
"model_version": "v4.2"
}
# Submit fraud/legitimate label (feedback loop)
POST /api/v1/transactions/label
{
"transaction_id": "txn_abc123",
"label": "fraud",
"source": "chargeback"
}
# Manage rules
POST /api/v1/rules
{
"name": "high_velocity_check",
"condition": "txn_count_1h > 10 AND amount > 500",
"action": "block"
}
Data Model
CREATE TABLE transactions (
txn_id VARCHAR PRIMARY KEY,
user_id VARCHAR,
merchant_id VARCHAR,
amount DECIMAL(12,2),
currency VARCHAR(3),
risk_score FLOAT,
decision VARCHAR,
label VARCHAR, -- fraud, legitimate, null (unknown)
features JSONB,
timestamp TIMESTAMP
) PARTITION BY RANGE (timestamp);
CREATE TABLE rules (
rule_id VARCHAR PRIMARY KEY,
name TEXT,
condition TEXT, -- expression DSL
action VARCHAR, -- block, review, score_boost
enabled BOOLEAN,
priority INT
);
CREATE TABLE alerts (
alert_id VARCHAR PRIMARY KEY,
txn_id VARCHAR,
status VARCHAR, -- open, investigating, resolved
assigned_to VARCHAR,
resolution VARCHAR, -- confirmed_fraud, false_positive
created_at TIMESTAMP
);
High-Level Architecture
The system has two paths: a real-time scoring path (<100ms) and an offline learning path (hours-days).
Event Ingestion
Transaction events arrive via Kafka. Each event triggers the scoring pipeline. Events are also stored for offline analysis and model retraining.
Feature Engine
Computes 200-500 features in real-time: user velocity (transactions in last 1h), merchant risk, device reputation, geo-anomaly, amount deviation. Pulls pre-computed features from the feature store and computes session-level features on the fly.
Rule Engine
Evaluates deterministic rules first (blocklists, velocity limits, impossible travel). Fast and interpretable. Rules can be updated instantly without model retraining.
ML Scoring
Runs the fraud model on the feature vector. Outputs a risk score (0-1). Ensemble of gradient boosted trees (fast) and neural network (accurate). Combined with rule engine score for final decision.
Decision Engine
Combines rule and ML scores. Applies business logic: score <0.3 = allow, 0.3-0.7 = review, >0.7 = block. Thresholds tuned per merchant category and risk appetite.
Deep Dive: Core Components
Real-Time Feature Engineering
class FraudFeatureEngine:
def compute_features(self, txn, feature_store):
user_id = txn["user_id"]
# Pre-computed features from feature store (<5ms)
stored = feature_store.get_online(
entity="user", id=user_id,
features=["avg_txn_30d", "account_age",
"device_count", "country_count_7d"]
)
# Real-time windowed aggregations (<10ms)
velocity = self.redis.get_sliding_window(
f"velocity:{user_id}", window="1h"
)
# Derived features
features = {
"amount_deviation": (
txn["amount"] - stored["avg_txn_30d"]
) / max(stored["avg_txn_30d"], 1),
"txn_count_1h": velocity["count"],
"txn_sum_1h": velocity["sum"],
"is_new_device": txn["device_fingerprint"]
not in stored.get("known_devices", []),
"is_new_country": txn["country"]
!= stored.get("home_country"),
**stored # Include all pre-computed features
}
return features
Rule Engine + ML Hybrid
Why Both Rules AND ML?
Rules are fast, interpretable, and instantly updatable. Use them for known fraud patterns (stolen card lists, impossible travel, velocity limits). ML models catch novel patterns that rules miss. The hybrid approach provides defense in depth: rules for known threats, ML for unknown threats.
Handling Class Imbalance
Only 0.1% of transactions are fraudulent. Training on raw data gives a model that predicts "legitimate" 99.9% of the time. Solutions:
- Oversampling: SMOTE generates synthetic fraud examples
- Undersampling: Randomly reduce legitimate examples to 10:1 ratio
- Cost-sensitive learning: Weight fraud examples 100x higher in loss function
- Anomaly detection: Train on legitimate transactions only, flag outliers
Graph-Based Fraud Detection
Fraud rings involve coordinated accounts. Build a transaction graph: nodes are users, merchants, devices, IPs. Edges connect related entities. Use graph algorithms (community detection, PageRank) to identify suspicious clusters sharing devices or addresses.
Concept Drift
Fraudsters Adapt
Fraud patterns change constantly. A model trained on last month's data may miss this month's attack vectors. Monitor for concept drift by tracking: (1) feature distribution shifts, (2) model score distribution changes, (3) rising false negative rate. Retrain daily and deploy new models via canary rollout.
Scaling & Optimization
Stream Processing Architecture
Use Kafka for event ingestion and Flink for real-time feature computation. Flink maintains sliding window state for velocity features. Back-pressure handling prevents queue buildup during traffic spikes.
Low-Latency Model Serving
- Pre-compile models: Convert to ONNX/TensorRT for 2-5x faster inference
- CPU-optimized models: Use XGBoost/LightGBM for <5ms inference (no GPU needed)
- Model caching: Keep hot models in memory, cold models on disk
- Parallel scoring: Run rule engine and ML model concurrently, combine results
Feedback Loop Latency
| Label Source | Delay | Volume |
|---|---|---|
| Manual review | Minutes-hours | ~1% of transactions |
| Chargebacks | 30-90 days | ~0.1% of transactions |
| User reports | Hours-days | ~0.05% |
| Auto-confirmed legitimate | 7 days (no dispute) | ~99% |
Practice Problems
Practice 1: New Merchant Onboarding
A new merchant joins your platform with zero transaction history. Your ML model has no merchant-level features. Design a cold-start strategy that provides fraud protection without excessive false positives.
Practice 2: Coordinated Attack
You detect 500 small transactions ($1-5) from different accounts hitting the same merchant within 10 minutes — a card testing attack. Design a detection mechanism that catches this pattern in real-time.
Practice 3: Regional Compliance
EU regulations (PSD2/SCA) require different fraud thresholds than US markets. Design a system that applies region-specific rules and models while sharing global fraud signals.
Quick Reference
| Component | Technology | Purpose |
|---|---|---|
| Event Streaming | Kafka | Transaction ingestion |
| Stream Processing | Flink / Kafka Streams | Real-time feature computation |
| Feature Store | Redis / Feast | Low-latency feature serving |
| Rule Engine | Drools / Custom DSL | Deterministic fraud rules |
| ML Model | XGBoost / LightGBM | Fraud scoring (<5ms inference) |
| Graph Analysis | Neo4j / TigerGraph | Fraud ring detection |
| Case Management | Custom / SaaS | Human review workflow |
Key Takeaways
- Use a layered approach: rules for known patterns, ML for novel fraud
- Real-time features (velocity, device, geo) are the strongest fraud signals
- Handle class imbalance with cost-sensitive learning or oversampling
- Monitor for concept drift and retrain models daily
- Design for <100ms latency — use CPU-optimized models, not GPU
- Graph analysis catches coordinated fraud that individual scoring misses