arbitrage-engine/docs/ai/04-data-model.md
fanziqi 22787b3e0a docs: add AI documentation suite and comprehensive code review report
- Generate full AI-consumable docs (docs/ai/): system overview, architecture,
  module cheatsheet, API contracts, data model, build guide, decision log,
  glossary, and open questions (deep tier coverage)
- Add PROBLEM_REPORT.md: categorized bug/risk summary
- Add DETAILED_CODE_REVIEW.md: full line-by-line review of all 15 backend
  files, documenting 4 fatal issues, 5 critical deployment bugs, 4 security
  vulnerabilities, and 6 architecture defects with prioritized fix plan
2026-03-03 19:01:18 +08:00

11 KiB
Raw Permalink Blame History

generated_by version created last_updated source_commit coverage
repo-insight 1 2026-03-03 2026-03-03 0d9dffa standard

04 — Data Model

Purpose

Documents all PostgreSQL tables, columns, relations, constraints, storage design, and partitioning strategy.

TL;DR

  • Single PostgreSQL database arb_engine; 15+ tables defined in db.py SCHEMA_SQL + auth.py AUTH_SCHEMA.
  • agg_trades is a range-partitioned table (by time_ms in milliseconds); monthly partitions auto-created by ensure_partitions().
  • Dual-write: local PG is primary; Cloud SQL at 10.106.0.3 receives same writes via a secondary psycopg2 pool (non-fatal if down).
  • All timestamps: ts columns are Unix seconds (integer); time_ms columns are Unix milliseconds (bigint); created_at columns are PG TIMESTAMP.
  • JSONB used for score_factors in paper_trades/live_trades, detail in live_events, value in market_indicators.
  • Auth tokens stored in DB: refresh tokens in refresh_tokens table (revocable); no session table.

Canonical Facts

Tables

rate_snapshots — Funding Rate Snapshots

Populated every 2 s by background_snapshot_loop() in main.py.

Column Type Description
id BIGSERIAL PK
ts BIGINT NOT NULL Unix seconds
btc_rate DOUBLE PRECISION BTC funding rate (decimal)
eth_rate DOUBLE PRECISION ETH funding rate
btc_price DOUBLE PRECISION BTC mark price USD
eth_price DOUBLE PRECISION ETH mark price USD
btc_index_price DOUBLE PRECISION BTC index price
eth_index_price DOUBLE PRECISION ETH index price

Index: idx_rate_snapshots_ts on ts.


agg_trades — Aggregate Trades (Partitioned)

Partitioned by RANGE(time_ms); monthly child tables named agg_trades_YYYYMM.

Column Type Description
agg_id BIGINT NOT NULL Binance aggTrade ID
symbol TEXT NOT NULL e.g. BTCUSDT
price DOUBLE PRECISION Trade price
qty DOUBLE PRECISION Trade quantity (BTC/ETH/etc.)
time_ms BIGINT NOT NULL Trade timestamp ms
is_buyer_maker SMALLINT 0=taker buy, 1=taker sell

PK: (time_ms, symbol, agg_id). Indexes: idx_agg_trades_sym_time on (symbol, time_ms DESC), idx_agg_trades_sym_agg on (symbol, agg_id).

Partitions auto-created for current + next 2 months. Named agg_trades_YYYYMM.


agg_trades_meta — Collection State

Column Type Description
symbol TEXT PK e.g. BTCUSDT
last_agg_id BIGINT Last processed aggTrade ID
last_time_ms BIGINT Timestamp of last trade
earliest_agg_id BIGINT Oldest buffered ID
earliest_time_ms BIGINT Oldest buffered timestamp
updated_at TEXT Human-readable update time

signal_indicators — Signal Engine Output (15 s cadence)

Column Type Description
id BIGSERIAL PK
ts BIGINT Unix seconds
symbol TEXT
cvd_fast DOUBLE PRECISION CVD 30 m window
cvd_mid DOUBLE PRECISION CVD 4 h window
cvd_day DOUBLE PRECISION CVD UTC day
cvd_fast_slope DOUBLE PRECISION CVD momentum
atr_5m DOUBLE PRECISION ATR (5 m candles, 14 periods)
atr_percentile DOUBLE PRECISION ATR rank in 24 h history
vwap_30m DOUBLE PRECISION VWAP 30 m
price DOUBLE PRECISION Current mark price
p95_qty DOUBLE PRECISION P95 large-order threshold
p99_qty DOUBLE PRECISION P99 large-order threshold
buy_vol_1m DOUBLE PRECISION 1 m buy volume
sell_vol_1m DOUBLE PRECISION 1 m sell volume
score INTEGER Signal score 0100
signal TEXT LONG, SHORT, or null

Indexes: idx_si_ts, idx_si_sym_ts.


signal_indicators_1m — 1-Minute Signal Snapshot

Subset of signal_indicators columns; written at 1 m cadence for lightweight chart queries.


signal_trades — Signal Engine Trade Tracking

Column Type Description
id BIGSERIAL PK
ts_open BIGINT Open timestamp (Unix s)
ts_close BIGINT Close timestamp
symbol TEXT
direction TEXT LONG / SHORT
entry_price DOUBLE PRECISION
exit_price DOUBLE PRECISION
qty DOUBLE PRECISION
score INTEGER Signal score at entry
pnl DOUBLE PRECISION Realized PnL
sl_price DOUBLE PRECISION Stop-loss level
tp1_price DOUBLE PRECISION Take-profit 1 level
tp2_price DOUBLE PRECISION Take-profit 2 level
status TEXT DEFAULT open open, closed, stopped

paper_trades — Paper Trading Records

Column Type Description
id BIGSERIAL PK
symbol TEXT
direction TEXT LONG/SHORT
score INT Signal score
tier TEXT light/standard/heavy
entry_price DOUBLE PRECISION
entry_ts BIGINT Unix ms
exit_price DOUBLE PRECISION
exit_ts BIGINT
tp1_price DOUBLE PRECISION
tp2_price DOUBLE PRECISION
sl_price DOUBLE PRECISION
tp1_hit BOOLEAN DEFAULT FALSE
status TEXT DEFAULT active active, tp1, tp2, sl, timeout
pnl_r DOUBLE PRECISION PnL in R units
atr_at_entry DOUBLE PRECISION ATR snapshot at entry
score_factors JSONB Breakdown of signal score components
strategy VARCHAR(32) DEFAULT v51_baseline Strategy name
created_at TIMESTAMP

live_trades — Live Trading Records

Column Type Description
id BIGSERIAL PK
symbol TEXT
strategy TEXT
direction TEXT LONG/SHORT
status TEXT DEFAULT active
entry_price / exit_price DOUBLE PRECISION
entry_ts / exit_ts BIGINT Unix ms
sl_price, tp1_price, tp2_price DOUBLE PRECISION
tp1_hit BOOLEAN
score DOUBLE PRECISION
tier TEXT
pnl_r DOUBLE PRECISION
fee_usdt DOUBLE PRECISION Exchange fees
funding_fee_usdt DOUBLE PRECISION Funding fees paid while holding
risk_distance DOUBLE PRECISION Entry to SL distance
atr_at_entry DOUBLE PRECISION
score_factors JSONB
signal_id BIGINT FK → signal_indicators.id
binance_order_id TEXT Binance order ID
fill_price DOUBLE PRECISION Actual fill price
slippage_bps DOUBLE PRECISION Slippage in basis points
protection_gap_ms BIGINT Time between SL order and fill
signal_to_order_ms BIGINT Latency: signal → order placed
order_to_fill_ms BIGINT Latency: order → fill
qty DOUBLE PRECISION
created_at TIMESTAMP

live_config — Runtime Configuration KV Store

Column Type Description
key TEXT PK Config key
value TEXT Config value (string)
label TEXT Human label
updated_at TIMESTAMP

Known keys: risk_per_trade_usd, max_positions, circuit_break (inferred).


live_events — Trade Event Log

Column Type Description
id BIGSERIAL PK
ts BIGINT Unix ms (default: NOW())
level TEXT info/warning/error
category TEXT Event category
symbol TEXT
message TEXT
detail JSONB Structured event data

signal_logs — Legacy Signal Log

Kept for backwards compatibility with the original funding-rate signal system.

Column Type
id BIGSERIAL PK
symbol TEXT
rate DOUBLE PRECISION
annualized DOUBLE PRECISION
sent_at TEXT
message TEXT

Auth Tables (defined in auth.py AUTH_SCHEMA)

users

Column Type
id BIGSERIAL PK
email TEXT UNIQUE NOT NULL
password_hash TEXT NOT NULL
discord_id TEXT
role TEXT DEFAULT user
banned INTEGER DEFAULT 0
created_at TEXT

subscriptions

Column Type
user_id BIGINT PK → users
tier TEXT DEFAULT free
expires_at TEXT

invite_codes

Column Type
id BIGSERIAL PK
code TEXT UNIQUE
created_by INTEGER
max_uses INTEGER DEFAULT 1
used_count INTEGER DEFAULT 0
status TEXT DEFAULT active
expires_at TEXT

invite_usage

Column Type
id BIGSERIAL PK
code_id BIGINT → invite_codes
user_id BIGINT → users
used_at TEXT

refresh_tokens

Column Type
id BIGSERIAL PK
user_id BIGINT → users
token TEXT UNIQUE
expires_at TEXT
revoked INTEGER DEFAULT 0

market_indicators — Market Indicator JSONB Store

Populated by market_data_collector.py.

Column Type Description
symbol TEXT
indicator_type TEXT long_short_ratio, top_trader_position, open_interest_hist, coinbase_premium, funding_rate
timestamp_ms BIGINT
value JSONB Raw indicator payload

Query pattern: WHERE symbol=? AND indicator_type=? ORDER BY timestamp_ms DESC LIMIT 1.

Storage Design Decisions

  • Partitioning: agg_trades partitioned by month to avoid table bloat; partition maintenance is automated.
  • Dual-write: Cloud SQL secondary is best-effort (errors logged, never fatal).
  • JSONB score_factors: allows schema-free storage of per-strategy signal breakdowns without migrations.
  • Timestamps: mix of Unix seconds (ts), Unix ms (time_ms, timestamp_ms, entry_ts), ISO strings (created_at TEXT in auth tables), and PG TIMESTAMP; be careful when querying across tables.

Interfaces / Dependencies

  • db.py:init_schema() — creates all tables in SCHEMA_SQL
  • auth.py:ensure_tables() — creates auth tables from AUTH_SCHEMA
  • db.py:ensure_partitions() — auto-creates monthly agg_trades_YYYYMM partitions

Unknowns & Risks

  • [unknown] market_indicators table schema not in SCHEMA_SQL; likely created by market_data_collector.py separately — verify before querying.
  • [risk] Timestamp inconsistency: some tables use TEXT for timestamps (auth tables), others use BIGINT, others use PG TIMESTAMP — cross-table JOINs on time fields require explicit casting.
  • [inference] live_config circuit-break key name not confirmed from source; inferred from risk_guard.py behavior.
  • [risk] users table defined in both SCHEMA_SQL (db.py) and AUTH_SCHEMA (auth.py); duplicate CREATE TABLE IF NOT EXISTS; actual schema diverges between the two definitions (db.py version lacks discord_id, banned).

Source Refs

  • backend/db.py:166-356SCHEMA_SQL with all table definitions
  • backend/auth.py:28-71AUTH_SCHEMA auth tables
  • backend/db.py:360-414ensure_partitions(), init_schema()
  • backend/signal_engine.py:123-158market_indicators query pattern