Executive summary — the honest brief before you risk one euro
This chapter synthesizes the core research into a single honest brief: what the math actually shows about retail FX, where systematic methods help, and the realistic performance bands to anchor your expectations.
TL;DR (read this if nothing else)
- "Bulletproof" doesn't exist. 68–89% of retail FX accounts lose money in any 12-month window (ESMA-disclosed numbers from real brokers). Average account lifespan: ~4 months. Only ~1% sustain profit over 5+ years.
- LLMs are not good at price prediction. JP Morgan shut down their deep-learning FX execution model in Oct 2023 — if their quant team can't make it work with prime-broker spreads and tick data, retail "AI signal" services are noise. The 2025 "Alpha Illusion" paper formally debunks the LLM-trading-agent hype cycle.
- Where AI/ML genuinely helps retail is execution, risk, regime detection, news/sentiment ingestion, and journaling — not generating alpha from price charts.
- Honest performance target for a serious 12-month build: Sharpe 0.5–0.8 net of costs, 15–25% annual return, 15–25% drawdown. Anything more in backtest = overfitting.
- Capital reality: sub-€10k = hobby (costs > realistic edge). €25–100k = supplementary. €250k+ = where 15–25% returns produce livable money without ruinous leverage.
- Recommendation: Yes, build it — but as a research + risk cockpit + paper-trading harness first, with live capital deferred 6+ months until forward-test results justify it. Treat it as a 12–18 month project, not a 6-week sprint.
The Hard Truths (synthesized across all five reports)
1. The math of edge decay
- Carry trade: academic Sharpe 0.7–1.0 collapses to 0.3–0.5 when crisis periods are included, then retail swap markups eat another 20–40% of that.
- Time-series momentum (Moskowitz/Ooi/Pedersen) is the strongest published FX edge, but the famous 1.2+ Sharpe is diversified across 58 instruments. Single-pair retail TSMOM: 0.3–0.6 net of costs.
- Any single-strategy single-pair retail edge that survives realistic costs is small. Real systems stack 3–6 weak edges across multiple pairs/timeframes.
2. Win-rate ≠ profitability
- Top systematic FX/CTA traders run 35–48% win rates with 2:1–3:1 R-multiples. High win-rate strategies (70%+) hide left-tail blowups.
- FundedNext data: 41% of paid CFD traders have win rates under 50%. Edge comes from R, not from being "right."
3. Drawdown is the silent killer
- A 50% drawdown requires +100% to recover. A 30% DD needs +43%.
- Even Sharpe-1.0 systematic traders see 25–45% drawdowns on multi-year samples. If a backtest shows 80% return / 5% DD: it's wrong, not magic.
4. Sharpe ratios lie at small sample sizes
- A backtest Sharpe of 1.5 over 2 years has a 95% CI of roughly 0.7–2.3 (Lo, 2002). Almost meaningless.
- The Deflated Sharpe Ratio (Bailey & López de Prado, 2014) shows that testing 45 variants gives you a "Sharpe > 1" by chance. Most retail backtests are this.
- Minimum track record length at typical retail Sharpe is ~3 years of live data — not tradition, math.
5. Black swans are real and uncompensated
- CHF unpeg 15-Jan-2015: FXCM took $225M in negative client balances; Alpari went bankrupt. EU brokers now have negative-balance protection — non-EU brokers don't.
- GBP flash crash 7-Oct-2016 23:07 GMT: 6% move in 2 minutes during Tokyo open. Stops slipped 100–200 pips.
- Weekend gaps, peg breaks, and broker-specific quote dislocations are the tail-risk you cannot hedge inside a single FX account.
What's Actually Worth Building
Based on cross-referencing all five docs, the system that's defensibly +EV for a retail trader looks like this:
┌─────────────────────────────────────────────────────────────┐
│ LAYER 1 — RESEARCH ENGINE (LLMs shine here) │
│ • Daily macro brief: CB calendars, COT, DXY, yield spreads │
│ • Central bank speech parser (GPT Fedspeak-style) │
│ • News sentiment per-pair │
│ • Regime detector (vol clustering, correlation breakdown) │
└─────────────────────────────────────────────────────────────┘
↓ feeds
┌─────────────────────────────────────────────────────────────┐
│ LAYER 2 — STRATEGY LIBRARY (rules-based, ML-augmented) │
│ • TSMOM on majors + crosses, multiple lookbacks │
│ • Mean-reversion on Asian-session ranges │
│ • Carry-trade overlay (small allocation) │
│ • Event/news avoidance filter │
│ → ML role: feature engineering, vol forecasting, sizing │
│ → NOT: predicting next bar's close │
└─────────────────────────────────────────────────────────────┘
↓ sized by
┌─────────────────────────────────────────────────────────────┐
│ LAYER 3 — RISK COCKPIT (the part that saves you) │
│ • Half-Kelly position sizing with per-pair pip-value calc │
│ • Correlation budget (n_eff cap of ~3 effective trades) │
│ • Daily/weekly/monthly loss limits → kill-switch │
│ • Pre-news/pre-weekend auto-flatten rules │
│ • Equity-curve SMA filter (stop trading when underwater) │
└─────────────────────────────────────────────────────────────┘
↓ executed via
┌─────────────────────────────────────────────────────────────┐
│ LAYER 4 — EXECUTION (broker-native API) │
│ • OANDA v20 REST for primary execution │
│ • IBKR as multi-asset insurance │
│ • Order audit log → journal │
└─────────────────────────────────────────────────────────────┘
↓ measured by
┌─────────────────────────────────────────────────────────────┐
│ LAYER 5 — JOURNAL + REVIEW (LLMs shine again) │
│ • Every trade logged with regime + reasoning │
│ • Weekly LLM review for behavioral leaks │
│ • Deflated Sharpe / PSR computed continuously │
│ • Walk-forward retrain schedule │
└─────────────────────────────────────────────────────────────┘
Recommended stack
| Component | Pick | Why |
|---|---|---|
| Primary broker | OANDA v20 API | Cleanest REST/streaming API, identical paper/live, ESMA-regulated, $0 min |
| Secondary broker | Interactive Brokers (Ireland) | Multi-asset insurance, bank-grade safety |
| Backtesting framework | nautilus_trader (primary) + vectorbt (sweeps) |
Nautilus = nanosecond, production-realistic; vectorbt = fast parameter exploration |
| Historical data | Dukascopy ticks (free) + OANDA API (broker-native cross-check) | Cross-venue validation catches the "your data ≠ your broker's quotes" bug |
| Language | Python | Every framework worth using is Python-native; MT4/MT5 is maintenance-mode |
| News/macro | ForexFactory calendar + central bank RSS + an automated macro digest | Free tier covers 95% of need |
| Cloud | Local for dev, cheap VPS (Hetzner/Vultr in AMS/FRA) for live | Latency to OANDA Frankfurt POP matters once live |
Hard "no" list
- MT4/MT5 as the trading core (use only for visual chart cross-checks)
- Any broker registered in SVG, Vanuatu, Belize, Marshall Islands, Seychelles, Mauritius
- Copy-trade / signal-service subscriptions
- "AI trading bot" SaaS — every one tested in the Alpha Illusion paper failed
- Going live with <6 months of forward-test data
- Risking >1% per trade until 100+ live trades logged
Dutch-Specific Watch-Outs
- Tax (from doc 02): Active leveraged FX trading is exactly the profile the Belastingdienst reclassifies from Box 3 (wealth, ~36% deemed yield) to Box 1 (income, up to 49.5%). No bright-line threshold. Keep clean per-trade exports from the broker API from day one.
- AFM oversight comes via MiFID passporting — OANDA Europe (Malta), IBKR Ireland, Pepperstone Cyprus all legitimate.
- Negative balance protection is mandatory for EU-regulated brokers since 2018 ESMA rules. Don't trade with anyone who doesn't have it.
Realistic Targets & Milestones
If you commit to this seriously, here's what good looks like:
| Phase | Duration | Goal | Capital at risk |
|---|---|---|---|
| 0. Foundation | 4–6 weeks | Build research engine + risk cockpit + backtest harness. No strategies yet. | €0 |
| 1. Strategy R&D | 8–12 weeks | Code 3–5 rules-based strategies. Backtest with walk-forward + DSR. Discard most. | €0 |
| 2. Paper trade | 3–6 months | Forward-test surviving strategies on live demo. Track slippage vs backtest. | €0 |
| 3. Live micro | 3 months | Trade live with 0.1× target size. Validate execution + psychology. | €1–5k |
| 4. Scale-up | 3 months | Step to 0.25× → 0.5× → 1.0× target size as DSR confirms edge. | Scale with proven sharpe |
| 5. Steady state | ongoing | Monitor for edge decay, retrain quarterly, journal weekly. | Full allocation |
Total elapsed before full-size live trading: ~12–18 months. Anyone selling you a shortcut is selling you survivorship bias.
Honest go/no-go questions
Before committing real capital — or weeks of build time — answer these:
- Capital available for this — and capital you can afford to lose entirely? (Determines whether this is rational at all.)
- Time horizon — are you willing to spend 12+ months before scaling to real size, or are you hoping for income in 3 months? (If the latter: stop now.)
- Discretionary or fully systematic? — fully systematic is harder to build but psychologically easier to run. Discretionary trading is the opposite.
- Pairs/timeframe preference — major pairs on H1–H4 is where the edges are documented. Scalping EUR/USD on 1m is a retail-killer.
- Existing trading experience — if zero, the right first step is 3 months of paper trading a simple rules-based system before anything more complex gets bolted on.
The recommendation
Treat a system as a tool for disciplined trading — not a money printer. The realistic outcome of doing this right is a process that:
- Removes the behavioral leaks that kill most retail traders
- Gives you a documented edge stack worth 15–25% annual at 15–25% DD
- Pays for itself once capital is €50k+
- Becomes a durable asset (the process + the track record) regardless of whether you trade it forever
The realistic outcome of doing it wrong — skipping forward-test, overfitting backtests, chasing "signal" services — is a 4-month account lifespan and a tuition payment.