arbitrage-engine/docs/ai/01-architecture-map.md
fanziqi 22787b3e0a docs: add AI documentation suite and comprehensive code review report
- Generate full AI-consumable docs (docs/ai/): system overview, architecture,
  module cheatsheet, API contracts, data model, build guide, decision log,
  glossary, and open questions (deep tier coverage)
- Add PROBLEM_REPORT.md: categorized bug/risk summary
- Add DETAILED_CODE_REVIEW.md: full line-by-line review of all 15 backend
  files, documenting 4 fatal issues, 5 critical deployment bugs, 4 security
  vulnerabilities, and 6 architecture defects with prioritized fix plan
2026-03-03 19:01:18 +08:00

138 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
generated_by: repo-insight
version: 1
created: 2026-03-03
last_updated: 2026-03-03
source_commit: 0d9dffa
coverage: standard
---
# 01 — Architecture Map
## Purpose
Describes the architecture style, component relationships, data flow, and runtime execution topology of the arbitrage engine.
## TL;DR
- **Multi-process architecture**: each concern is a separate PM2 process; they communicate exclusively through PostgreSQL (tables + NOTIFY/LISTEN).
- **No message broker**: PostgreSQL serves as both the data store and the inter-process message bus (`NOTIFY new_signal`).
- **Dual-database write**: every PG write in `signal_engine.py` and `agg_trades_collector.py` attempts a secondary write to Cloud SQL (GCP) for durability.
- **FastAPI is read-only at runtime**: it proxies Binance REST for rates/history and reads the PG tables written by workers; it does not control the signal engine.
- **Signal pipeline**: raw aggTrades → in-memory rolling windows (CVD/VWAP/ATR) → scored signal → PG write + `NOTIFY` → live_executor executes Binance order.
- **Frontend polling**: React SPA polls `/api/rates` every 2 s (public) and slow endpoints every 120 s (auth required).
- **Risk guard is a separate process**: polls every 5 s, can block new orders (circuit-break) by writing a flag to `live_config`; live_executor reads that flag before each trade.
## Canonical Facts
### Architecture Style
Shared-DB multi-process monolith. No microservices; no message broker. All processes run on a single GCP VM.
### Component Diagram (text)
```
Binance WS (aggTrades)
└─► agg_trades_collector.py ──────────────────► agg_trades (partitioned table)
Binance WS (market data) ▼
└─► market_data_collector.py ──────────────► market_indicators table
Binance WS (liquidations) ▼
└─► liquidation_collector.py ──────────────► liquidation tables
signal_engine.py
(15 s loop, reads agg_trades +
market_indicators)
┌─────────┴──────────────┐
│ │
signal_indicators paper_trades
signal_indicators_1m (paper mode)
signal_trades
NOTIFY new_signal
live_executor.py
(LISTEN new_signal →
Binance Futures API)
live_trades table
risk_guard.py (5 s)
monitors live_trades,
writes live_config flags
signal_pusher.py
(reads signal_indicators →
Discord webhook)
FastAPI main.py (read/proxy)
+ rate_snapshots (2 s write)
Next.js Frontend
(polling SPA)
```
### Data Flow — Signal Pipeline
1. `agg_trades_collector.py`: streams `aggTrade` WS events for all symbols, batch-inserts into `agg_trades` partitioned table (partitioned by month on `time_ms`).
2. `signal_engine.py` (15 s loop per symbol):
- Cold-start: reads last N rows from `agg_trades` to warm up `TradeWindow` (CVD, VWAP) and `ATRCalculator` deques.
- Fetches new trades since `last_agg_id`.
- Feeds trades into three `TradeWindow` instances (30 m, 4 h, 24 h) and one `ATRCalculator` (5 m candles, 14-period).
- Reads `market_indicators` for long-short ratio, OI, coinbase premium, funding rate, liquidations.
- Scores signal using JSON strategy config weights (score 0100, threshold 75).
- Writes to `signal_indicators` (15 s cadence) and `signal_indicators_1m` (1 m cadence).
- If score ≥ threshold: opens paper trade (if enabled), emits `NOTIFY new_signal` (if live enabled).
3. `live_executor.py`: `LISTEN new_signal` on PG; deserializes payload; calls Binance Futures REST to place market order; writes to `live_trades`.
4. `risk_guard.py`: every 5 s checks daily loss, consecutive losses, unrealized PnL, balance, data freshness, hold timeout; sets `live_config.circuit_break` flag to block/resume new orders.
### Strategy Scoring (V5.x)
Two JSON configs in `backend/strategies/`:
| Config | Version | Threshold | Signals |
|--------|---------|-----------|---------|
| `v51_baseline.json` | 5.1 | 75 | cvd, p99, accel, ls_ratio, oi, coinbase_premium |
| `v52_8signals.json` | 5.2 | 75 | cvd, p99, accel, ls_ratio, oi, coinbase_premium, funding_rate, liquidation |
Score categories: `direction` (CVD), `crowding` (P99 large trades), `environment` (ATR/VWAP), `confirmation` (LS ratio, OI), `auxiliary` (coinbase premium), `funding_rate`, `liquidation`.
### TP/SL Configuration
- V5.1: SL=1.4×ATR, TP1=1.05×ATR, TP2=2.1×ATR
- V5.2: SL=2.1×ATR, TP1=1.4×ATR, TP2=3.15×ATR
- Signal cooldown: 10 minutes per symbol per direction.
### Risk Guard Circuit-Break Rules
| Rule | Threshold | Action |
|------|-----------|--------|
| Daily loss | -5R | Full close + shutdown |
| Consecutive losses | 5 | Pause 60 min |
| API disconnect | >30 s | Pause new orders |
| Balance too low | < risk×2 | Reject new orders |
| Data stale | >30 s | Block new orders |
| Hold timeout yellow | 45 min | Alert |
| Hold timeout auto-close | 70 min | Force close |
### Frontend Architecture
- **Next.js App Router** (`frontend/app/`): page-per-route, all pages are client components (`"use client"`).
- **Auth**: JWT stored in `localStorage`; `lib/auth.tsx` provides `useAuth()` hook + `authFetch()` helper with auto-refresh.
- **API client**: `lib/api.ts` — typed wrapper, distinguishes public (`/api/rates`, `/api/health`) from protected (all other) endpoints.
- **Polling strategy**: rates every 2 s, slow data (stats, history, signals) every 120 s; kline charts re-render every 30 s.
## Interfaces / Dependencies
- PG NOTIFY channel name: `new_signal`
- `live_config` table keys: `risk_per_trade_usd`, `max_positions`, `circuit_break` (inferred)
- `market_indicators` populated by `market_data_collector.py` with types: `long_short_ratio`, `top_trader_position`, `open_interest_hist`, `coinbase_premium`, `funding_rate`
## Unknowns & Risks
- [inference] PM2 config (`ecosystem.dev.config.js`) not read; exact restart/watch/env-file settings unknown.
- [inference] `signal_pusher.py` exact Discord webhook configuration (env var name, rate limit handling) not confirmed.
- [unknown] Cloud SQL write failure does not block signal_engine but may create data divergence between local PG and Cloud SQL.
- [risk] Hardcoded testnet credentials in source code (`arb_engine_2026`); production requires explicit env var override.
## Source Refs
- `backend/signal_engine.py:1-16` — architecture docstring
- `backend/live_executor.py:1-10` — executor architecture comment
- `backend/risk_guard.py:1-12, 55-73` — risk rules and config
- `backend/signal_engine.py:170-245``TradeWindow`, `ATRCalculator` classes
- `backend/signal_engine.py:44-67` — strategy config loading
- `backend/strategies/v51_baseline.json`, `backend/strategies/v52_8signals.json`
- `backend/main.py:61-83` — background snapshot loop
- `frontend/lib/api.ts:103-116` — API client methods
- `frontend/app/page.tsx:149-154` — polling intervals