arbitrage-engine/docs/ai/01-architecture-map.md
fanziqi 22787b3e0a docs: add AI documentation suite and comprehensive code review report
- Generate full AI-consumable docs (docs/ai/): system overview, architecture,
  module cheatsheet, API contracts, data model, build guide, decision log,
  glossary, and open questions (deep tier coverage)
- Add PROBLEM_REPORT.md: categorized bug/risk summary
- Add DETAILED_CODE_REVIEW.md: full line-by-line review of all 15 backend
  files, documenting 4 fatal issues, 5 critical deployment bugs, 4 security
  vulnerabilities, and 6 architecture defects with prioritized fix plan
2026-03-03 19:01:18 +08:00

8.3 KiB
Raw Blame History

generated_by version created last_updated source_commit coverage
repo-insight 1 2026-03-03 2026-03-03 0d9dffa standard

01 — Architecture Map

Purpose

Describes the architecture style, component relationships, data flow, and runtime execution topology of the arbitrage engine.

TL;DR

  • Multi-process architecture: each concern is a separate PM2 process; they communicate exclusively through PostgreSQL (tables + NOTIFY/LISTEN).
  • No message broker: PostgreSQL serves as both the data store and the inter-process message bus (NOTIFY new_signal).
  • Dual-database write: every PG write in signal_engine.py and agg_trades_collector.py attempts a secondary write to Cloud SQL (GCP) for durability.
  • FastAPI is read-only at runtime: it proxies Binance REST for rates/history and reads the PG tables written by workers; it does not control the signal engine.
  • Signal pipeline: raw aggTrades → in-memory rolling windows (CVD/VWAP/ATR) → scored signal → PG write + NOTIFY → live_executor executes Binance order.
  • Frontend polling: React SPA polls /api/rates every 2 s (public) and slow endpoints every 120 s (auth required).
  • Risk guard is a separate process: polls every 5 s, can block new orders (circuit-break) by writing a flag to live_config; live_executor reads that flag before each trade.

Canonical Facts

Architecture Style

Shared-DB multi-process monolith. No microservices; no message broker. All processes run on a single GCP VM.

Component Diagram (text)

Binance WS (aggTrades)
  └─► agg_trades_collector.py ──────────────────► agg_trades (partitioned table)
                                                         │
Binance WS (market data)                                 ▼
  └─► market_data_collector.py ──────────────► market_indicators table
                                                         │
Binance WS (liquidations)                                ▼
  └─► liquidation_collector.py ──────────────► liquidation tables
                                                         │
                                                  signal_engine.py
                                                  (15 s loop, reads agg_trades +
                                                   market_indicators)
                                                         │
                                               ┌─────────┴──────────────┐
                                               │                        │
                                      signal_indicators          paper_trades
                                      signal_indicators_1m       (paper mode)
                                      signal_trades
                                      NOTIFY new_signal
                                               │
                                         live_executor.py
                                         (LISTEN new_signal →
                                          Binance Futures API)
                                               │
                                         live_trades table
                                               │
                                         risk_guard.py (5 s)
                                         monitors live_trades,
                                         writes live_config flags
                                               │
                                       signal_pusher.py
                                       (reads signal_indicators →
                                        Discord webhook)
                                               │
                                    FastAPI main.py (read/proxy)
                                    + rate_snapshots (2 s write)
                                               │
                                         Next.js Frontend
                                         (polling SPA)

Data Flow — Signal Pipeline

  1. agg_trades_collector.py: streams aggTrade WS events for all symbols, batch-inserts into agg_trades partitioned table (partitioned by month on time_ms).
  2. signal_engine.py (15 s loop per symbol):
    • Cold-start: reads last N rows from agg_trades to warm up TradeWindow (CVD, VWAP) and ATRCalculator deques.
    • Fetches new trades since last_agg_id.
    • Feeds trades into three TradeWindow instances (30 m, 4 h, 24 h) and one ATRCalculator (5 m candles, 14-period).
    • Reads market_indicators for long-short ratio, OI, coinbase premium, funding rate, liquidations.
    • Scores signal using JSON strategy config weights (score 0100, threshold 75).
    • Writes to signal_indicators (15 s cadence) and signal_indicators_1m (1 m cadence).
    • If score ≥ threshold: opens paper trade (if enabled), emits NOTIFY new_signal (if live enabled).
  3. live_executor.py: LISTEN new_signal on PG; deserializes payload; calls Binance Futures REST to place market order; writes to live_trades.
  4. risk_guard.py: every 5 s checks daily loss, consecutive losses, unrealized PnL, balance, data freshness, hold timeout; sets live_config.circuit_break flag to block/resume new orders.

Strategy Scoring (V5.x)

Two JSON configs in backend/strategies/:

Config Version Threshold Signals
v51_baseline.json 5.1 75 cvd, p99, accel, ls_ratio, oi, coinbase_premium
v52_8signals.json 5.2 75 cvd, p99, accel, ls_ratio, oi, coinbase_premium, funding_rate, liquidation

Score categories: direction (CVD), crowding (P99 large trades), environment (ATR/VWAP), confirmation (LS ratio, OI), auxiliary (coinbase premium), funding_rate, liquidation.

TP/SL Configuration

  • V5.1: SL=1.4×ATR, TP1=1.05×ATR, TP2=2.1×ATR
  • V5.2: SL=2.1×ATR, TP1=1.4×ATR, TP2=3.15×ATR
  • Signal cooldown: 10 minutes per symbol per direction.

Risk Guard Circuit-Break Rules

Rule Threshold Action
Daily loss -5R Full close + shutdown
Consecutive losses 5 Pause 60 min
API disconnect >30 s Pause new orders
Balance too low < risk×2 Reject new orders
Data stale >30 s Block new orders
Hold timeout yellow 45 min Alert
Hold timeout auto-close 70 min Force close

Frontend Architecture

  • Next.js App Router (frontend/app/): page-per-route, all pages are client components ("use client").
  • Auth: JWT stored in localStorage; lib/auth.tsx provides useAuth() hook + authFetch() helper with auto-refresh.
  • API client: lib/api.ts — typed wrapper, distinguishes public (/api/rates, /api/health) from protected (all other) endpoints.
  • Polling strategy: rates every 2 s, slow data (stats, history, signals) every 120 s; kline charts re-render every 30 s.

Interfaces / Dependencies

  • PG NOTIFY channel name: new_signal
  • live_config table keys: risk_per_trade_usd, max_positions, circuit_break (inferred)
  • market_indicators populated by market_data_collector.py with types: long_short_ratio, top_trader_position, open_interest_hist, coinbase_premium, funding_rate

Unknowns & Risks

  • [inference] PM2 config (ecosystem.dev.config.js) not read; exact restart/watch/env-file settings unknown.
  • [inference] signal_pusher.py exact Discord webhook configuration (env var name, rate limit handling) not confirmed.
  • [unknown] Cloud SQL write failure does not block signal_engine but may create data divergence between local PG and Cloud SQL.
  • [risk] Hardcoded testnet credentials in source code (arb_engine_2026); production requires explicit env var override.

Source Refs

  • backend/signal_engine.py:1-16 — architecture docstring
  • backend/live_executor.py:1-10 — executor architecture comment
  • backend/risk_guard.py:1-12, 55-73 — risk rules and config
  • backend/signal_engine.py:170-245TradeWindow, ATRCalculator classes
  • backend/signal_engine.py:44-67 — strategy config loading
  • backend/strategies/v51_baseline.json, backend/strategies/v52_8signals.json
  • backend/main.py:61-83 — background snapshot loop
  • frontend/lib/api.ts:103-116 — API client methods
  • frontend/app/page.tsx:149-154 — polling intervals