458 lines
17 KiB
Markdown
458 lines
17 KiB
Markdown
# FULL AUDIT V3(review/full-audit-v3)
|
||
|
||
审阅时间:2026-03-02
|
||
分支:`review/full-audit-v3`
|
||
范围:后端 + 前端(按你的清单全覆盖)
|
||
|
||
## 审阅说明
|
||
- 当前分支中未找到你提到的上一轮报告文件 `docs/LIVE_TRADING_REVIEW.md`,因此本轮以代码现状做完整复核,并尽量对照你描述的“已修复点”验证。
|
||
- 已确认若干上轮高优先级修复仍在(例如:`/api/live/*` 紧急接口管理员校验、`reduceOnly` 紧急平仓、TP1后SL重挂失败不推进状态等)。
|
||
- 下面仅列出**本轮仍存在/新发现**问题。
|
||
|
||
---
|
||
|
||
## P0(资金安全 / 核心正确性)
|
||
|
||
### P0-1 风控进程失联时实盘仍会开仓(Fail-Open)
|
||
- 文件:行号:`backend/live_executor.py:321-334`
|
||
- 问题描述:`execute_entry()` 读取 `/tmp/risk_guard_state.json` 失败时,当前逻辑是 `pass` 或“告警后继续交易”。这意味着 `risk_guard` 崩溃/未启动/状态文件损坏时,系统会在**无风控联动**情况下继续开新仓。
|
||
- 风险评估:高概率形成失控开仓窗口,属于直接资金风险。
|
||
- 修复代码建议:
|
||
```python
|
||
# backend/live_executor.py
|
||
state_path = "/tmp/risk_guard_state.json"
|
||
try:
|
||
st = os.stat(state_path)
|
||
# 15秒内未更新视为risk_guard失联
|
||
if time.time() - st.st_mtime > 15:
|
||
logger.error(f"[{symbol}] 风控状态文件过期,拒绝开仓")
|
||
return None
|
||
|
||
with open(state_path) as f:
|
||
risk_state = json.load(f)
|
||
|
||
if risk_state.get("block_new_entries") or risk_state.get("reduce_only"):
|
||
logger.warning(f"[{symbol}] 风控阻断开仓: {risk_state.get('circuit_break_reason', 'N/A')}")
|
||
return None
|
||
|
||
except FileNotFoundError:
|
||
logger.error(f"[{symbol}] 风控状态文件不存在,拒绝开仓")
|
||
return None
|
||
except Exception as e:
|
||
logger.error(f"[{symbol}] 读取风控状态失败({e}),拒绝开仓")
|
||
return None
|
||
```
|
||
|
||
### P0-2 1R基准在执行层/对账层/风控层不一致,导致风险预算失真
|
||
- 文件:行号:
|
||
- `backend/live_executor.py:67-80`(从 `live_config` 动态刷新 `RISK_PER_TRADE_USD`)
|
||
- `backend/position_sync.py:54,469-483`(固定环境变量值用于fee/funding换算)
|
||
- `backend/risk_guard.py:61,232,551`(固定环境变量值用于R换算与日限判断)
|
||
- 问题描述:实盘配置页可修改 `risk_per_trade_usd`,但 `position_sync/risk_guard` 仍按启动时环境变量计算。结果是同一笔交易在不同模块里 `R` 值不一致。
|
||
- 风险评估:当实际1R小于模块内旧值时,会**低估亏损R**,可能延迟熔断,属于资金安全风险。
|
||
- 修复代码建议:
|
||
```python
|
||
# backend/position_sync.py / backend/risk_guard.py 共用
|
||
|
||
def load_live_risk_usd(conn, default=2.0):
|
||
try:
|
||
cur = conn.cursor()
|
||
cur.execute("SELECT value FROM live_config WHERE key='risk_per_trade_usd'")
|
||
row = cur.fetchone()
|
||
return float(row[0]) if row and row[0] else default
|
||
except Exception:
|
||
return default
|
||
|
||
# 在每轮主循环开始时刷新
|
||
risk_usd = load_live_risk_usd(conn)
|
||
|
||
# 使用 risk_usd 统一替换静态 RISK_PER_TRADE_USD
|
||
fee_r = actual_fee_usdt / risk_usd if risk_usd > 0 else 0
|
||
unrealized_r = total_unrealized / risk_usd if risk_usd > 0 else 0
|
||
threshold = risk_usd * MIN_BALANCE_MULTIPLE
|
||
```
|
||
|
||
### P0-3 熔断 `close_all` 不校验平仓结果,可能“已熔断但未平仓”
|
||
- 文件:行号:`backend/risk_guard.py:343-357`
|
||
- 问题描述:`trigger_circuit_break(action="close_all")` 对 `POST /fapi/v1/order` 返回值未校验,日志直接记录“紧急平仓”,没有失败重试/二次验仓。
|
||
- 风险评估:在网络抖动、限流、精度错误等场景会留下未平仓暴露仓位,属于直接资金风险。
|
||
- 修复代码建议:
|
||
```python
|
||
# backend/risk_guard.py
|
||
close_resp, close_status = await binance_request(session, "POST", "/fapi/v1/order", {
|
||
"symbol": symbol,
|
||
"side": close_side,
|
||
"type": "MARKET",
|
||
"quantity": qty_str,
|
||
"reduceOnly": "true",
|
||
})
|
||
|
||
if close_status != 200:
|
||
logger.error(f"[{symbol}] 紧急平仓失败: {close_resp}")
|
||
_log_event(conn, "critical", "risk", f"紧急平仓失败 {symbol}", symbol,
|
||
{"response": str(close_resp)})
|
||
continue
|
||
|
||
# 二次验仓
|
||
verify, v_status = await binance_request(session, "GET", "/fapi/v2/positionRisk", {"symbol": symbol})
|
||
if v_status == 200 and any(abs(float(p.get("positionAmt", 0))) > 0 and p.get("symbol") == symbol for p in verify):
|
||
logger.error(f"[{symbol}] 紧急平仓后仍有仓位,进入人工告警")
|
||
```
|
||
|
||
### P0-4 V5辅助层 Coinbase Premium 单位不一致,评分被系统性放大
|
||
- 文件:行号:
|
||
- `backend/market_data_collector.py:126`(`premium_pct = ... * 100`,以“百分比值”存储)
|
||
- `backend/signal_engine.py:576-579`(按 `0.0005` 阈值判断,语义是“小数比例”)
|
||
- 问题描述:存储值是百分比(如 `0.05` 表示 0.05%),但评分阈值按比例值(0.0005=0.05%)处理,导致大多数微小溢价都被判为强辅助信号。
|
||
- 风险评估:信号评分体系失真(你要求重点检查的“评分正确性”),会显著改变开仓频率与方向。
|
||
- 修复代码建议:
|
||
```python
|
||
# backend/market_data_collector.py
|
||
premium_ratio = (coinbase_price - binance_price) / binance_price
|
||
payload = {
|
||
"premium_ratio": premium_ratio, # 0.0005 = 0.05%
|
||
"premium_pct": premium_ratio * 100.0, # 展示用途
|
||
...
|
||
}
|
||
|
||
# backend/signal_engine.py
|
||
premium_ratio = to_float((self.market_indicators.get("coinbase_premium") or {}).get("premium_ratio")) \
|
||
if isinstance(self.market_indicators.get("coinbase_premium"), dict) else None
|
||
|
||
if premium_ratio is None:
|
||
aux_score = 2
|
||
elif (direction == "LONG" and premium_ratio > 0.0005) or (direction == "SHORT" and premium_ratio < -0.0005):
|
||
aux_score = 5
|
||
elif abs(premium_ratio) <= 0.0005:
|
||
aux_score = 2
|
||
else:
|
||
aux_score = 0
|
||
```
|
||
|
||
---
|
||
|
||
## P1(逻辑正确性 / 并发与健壮性)
|
||
|
||
### P1-1 LISTEN连接断开后无自愈,信号链路可永久中断
|
||
- 文件:行号:`backend/live_executor.py:636-664, 644-655`
|
||
- 问题描述:仅 `work_conn` 有 `ensure_db_conn()`,`listen_conn` 没有重连逻辑。`listen_conn.poll()` 异常后会进入外层异常分支,下一轮继续使用坏连接。
|
||
- 风险评估:信号监听失效,开仓链路中断。
|
||
- 修复代码建议:
|
||
```python
|
||
def ensure_listen_conn(conn):
|
||
try:
|
||
conn.poll()
|
||
return conn
|
||
except Exception:
|
||
try:
|
||
conn.close()
|
||
except Exception:
|
||
pass
|
||
new_conn = get_db_connection()
|
||
cur = new_conn.cursor()
|
||
cur.execute("LISTEN new_signal;")
|
||
logger.warning("LISTEN连接已重建")
|
||
return new_conn
|
||
|
||
# 主循环中
|
||
listen_conn = ensure_listen_conn(listen_conn)
|
||
```
|
||
|
||
### P1-2 冷启动只加载4小时历史,但策略依赖24小时窗口统计
|
||
- 文件:行号:`backend/signal_engine.py:110-113, 971`
|
||
- 问题描述:`win_day` 用于 `p95/p99` 及大单拥挤判断,但 `load_historical(state, WINDOW_MID)` 只回灌4h,重启后前20h统计失真。
|
||
- 风险评估:重启后信号质量不稳定,评分偏移。
|
||
- 修复代码建议:
|
||
```python
|
||
# backend/signal_engine.py
|
||
for sym, state in states.items():
|
||
# 至少加载DAY窗口,保证p95/p99与day-cvd有效
|
||
load_historical(state, WINDOW_DAY)
|
||
```
|
||
|
||
### P1-3 资金费率收益未计入净PnL,净值统计偏差
|
||
- 文件:行号:`backend/position_sync.py:480-483`
|
||
- 问题描述:当前仅在 `funding_usdt < 0` 时扣减 `funding_r`,正向funding收益被忽略。
|
||
- 风险评估:`net_pnl_r` 被低估,影响绩效和风控阈值判断一致性。
|
||
- 修复代码建议:
|
||
```python
|
||
risk_usd = load_live_risk_usd(conn)
|
||
funding_r = (funding_usdt / risk_usd) if risk_usd > 0 else 0
|
||
# funding正值应增加净收益,负值应减少净收益
|
||
pnl_r = gross_pnl_r - fee_r + funding_r
|
||
```
|
||
|
||
### P1-4 风控规则“数据新鲜度”未真正落地
|
||
- 文件:行号:`backend/risk_guard.py:70-71, 98-99, 248-257`
|
||
- 问题描述:定义了 `MARKET_DATA_STALE_SEC / ACCOUNT_UPDATE_STALE_SEC` 与 `last_market_data/last_account_update`,但未更新也未纳入熔断判断。
|
||
- 风险评估:行情/账户数据陈旧时无法触发预期保护。
|
||
- 修复代码建议:
|
||
```python
|
||
def check_data_freshness(conn):
|
||
now = time.time()
|
||
issues = []
|
||
|
||
# API健康
|
||
api_gap = now - risk_state.last_api_success
|
||
if api_gap > API_DISCONNECT_THRESHOLD_SEC:
|
||
issues.append(f"API无响应{api_gap:.0f}秒")
|
||
|
||
# 行情新鲜度(基于signal_indicators最新ts)
|
||
cur = conn.cursor()
|
||
cur.execute("SELECT MAX(ts) FROM signal_indicators")
|
||
row = cur.fetchone()
|
||
if row and row[0]:
|
||
market_age = now - (row[0] / 1000)
|
||
if market_age > MARKET_DATA_STALE_SEC:
|
||
issues.append(f"行情数据延迟{market_age:.1f}秒")
|
||
|
||
return issues
|
||
```
|
||
|
||
### P1-5 Refresh Token轮转非原子,存在并发重放窗口
|
||
- 文件:行号:`backend/auth.py:313-325`
|
||
- 问题描述:先 `SELECT` 后 `UPDATE revoked=1`,并发请求可同时通过校验,导致同一refresh token被重复换新。
|
||
- 风险评估:会话安全边界被削弱。
|
||
- 修复代码建议:
|
||
```python
|
||
@router.post("/auth/refresh")
|
||
def refresh_token(body: RefreshReq):
|
||
now_iso = datetime.utcnow().isoformat()
|
||
row = _fetchone(
|
||
"UPDATE refresh_tokens "
|
||
"SET revoked = 1 "
|
||
"WHERE token = %s AND revoked = 0 AND expires_at > %s "
|
||
"RETURNING user_id",
|
||
(body.refresh_token, now_iso),
|
||
)
|
||
if not row:
|
||
raise HTTPException(status_code=401, detail="invalid refresh token")
|
||
|
||
user = _fetchone("SELECT * FROM users WHERE id = %s", (row["user_id"],))
|
||
...
|
||
```
|
||
|
||
### P1-6 `init_schema()` 未包含 live 表结构,部署依赖隐式外部DDL
|
||
- 文件:行号:`backend/db.py:166-306, 345-363`
|
||
- 问题描述:`SCHEMA_SQL` 里没有 `live_trades/live_config/live_events`,但运行时广泛依赖这些表。
|
||
- 风险评估:新环境或灾备恢复时会直接启动失败。
|
||
- 修复代码建议:
|
||
```sql
|
||
-- backend/db.py SCHEMA_SQL 追加
|
||
CREATE TABLE IF NOT EXISTS live_config (
|
||
key TEXT PRIMARY KEY,
|
||
value TEXT NOT NULL,
|
||
label TEXT,
|
||
updated_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
|
||
CREATE TABLE IF NOT EXISTS live_events (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
ts BIGINT DEFAULT (EXTRACT(EPOCH FROM NOW()) * 1000)::BIGINT,
|
||
level TEXT,
|
||
category TEXT,
|
||
symbol TEXT,
|
||
message TEXT,
|
||
detail JSONB
|
||
);
|
||
|
||
CREATE TABLE IF NOT EXISTS live_trades (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
symbol TEXT NOT NULL,
|
||
strategy TEXT NOT NULL,
|
||
direction TEXT NOT NULL,
|
||
status TEXT NOT NULL DEFAULT 'active',
|
||
entry_price DOUBLE PRECISION,
|
||
exit_price DOUBLE PRECISION,
|
||
entry_ts BIGINT,
|
||
exit_ts BIGINT,
|
||
pnl_r DOUBLE PRECISION,
|
||
risk_distance DOUBLE PRECISION,
|
||
tp1_hit BOOLEAN DEFAULT FALSE,
|
||
created_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## P2(状态一致性 / 前后端一致性 / 性能)
|
||
|
||
### P2-1 `_get_risk_usd()` 无缓存且被循环多次调用
|
||
- 文件:行号:`backend/main.py:1283, 1356, 1476-1482`
|
||
- 问题描述:函数注释写“缓存60秒”,实际每次调用都查DB;且在列表循环内重复调用。
|
||
- 风险评估:不必要的DB放大与同响应内数值不一致风险。
|
||
- 修复代码建议:
|
||
```python
|
||
_risk_cache = {"v": 2.0, "ts": 0.0}
|
||
|
||
async def _get_risk_usd() -> float:
|
||
now = time.time()
|
||
if now - _risk_cache["ts"] < 60:
|
||
return _risk_cache["v"]
|
||
try:
|
||
row = await async_fetchrow("SELECT value FROM live_config WHERE key = $1", "risk_per_trade_usd")
|
||
v = float(row["value"]) if row else 2.0
|
||
except Exception:
|
||
v = 2.0
|
||
_risk_cache.update({"v": v, "ts": now})
|
||
return v
|
||
|
||
# 调用侧每个接口先取一次
|
||
risk_usd = await _get_risk_usd()
|
||
```
|
||
|
||
### P2-2 实盘页顶部文案硬编码“测试网”,与后端环境可能不一致
|
||
- 文件:行号:`frontend/app/live/page.tsx:740`
|
||
- 问题描述:页面固定展示“测试网”,即使后端切到 production 也不会变化。
|
||
- 风险评估:操作员环境感知错误,容易误判操作风险。
|
||
- 修复代码建议:
|
||
```tsx
|
||
const [tradeEnv, setTradeEnv] = useState("unknown");
|
||
useEffect(() => {
|
||
(async () => {
|
||
const r = await authFetch("/api/live/config");
|
||
if (r.ok) {
|
||
const cfg = await r.json();
|
||
setTradeEnv(cfg?.trade_env?.value || "unknown");
|
||
}
|
||
})();
|
||
}, []);
|
||
|
||
<p className="text-[10px] text-slate-500">
|
||
V5.2策略 · 币安USDT永续合约 · {tradeEnv === "production" ? "生产网" : "测试网"}
|
||
</p>
|
||
```
|
||
|
||
### P2-3 模拟盘前端仍有 `1R=$200` 硬编码,和配置页不一致
|
||
- 文件:行号:
|
||
- `frontend/app/paper/page.tsx:234`
|
||
- `frontend/app/paper-v52/page.tsx:297`
|
||
- 问题描述:浮盈USDT使用 `unrealR * 200` 固定换算,与 `paper_config` 的 `initial_balance*risk_per_trade` 不一致。
|
||
- 风险评估:前端显示误导,策略评估偏差。
|
||
- 修复代码建议:
|
||
```tsx
|
||
const [paper1R, setPaper1R] = useState(200);
|
||
useEffect(() => {
|
||
(async () => {
|
||
const r = await authFetch("/api/paper/config");
|
||
if (r.ok) {
|
||
const cfg = await r.json();
|
||
setPaper1R((cfg.initial_balance || 10000) * (cfg.risk_per_trade || 0.02));
|
||
}
|
||
})();
|
||
}, []);
|
||
|
||
const unrealUsdt = unrealR * paper1R;
|
||
```
|
||
|
||
### P2-4 XRP/SOL 在 Coinbase Premium 采集中每轮抛异常
|
||
- 文件:行号:`backend/market_data_collector.py:112-117, 160-165`
|
||
- 问题描述:`collect_symbol()` 对4币种都调用 `collect_coinbase_premium()`,但 `pair_map` 仅 BTC/ETH,XRP/SOL 会 `KeyError`。
|
||
- 风险评估:日志噪音、错误监控污染、无意义异常重试。
|
||
- 修复代码建议:
|
||
```python
|
||
pair_map = {
|
||
"BTCUSDT": "BTC-USD",
|
||
"ETHUSDT": "ETH-USD",
|
||
"XRPUSDT": "XRP-USD",
|
||
"SOLUSDT": "SOL-USD",
|
||
}
|
||
coinbase_pair = pair_map.get(symbol)
|
||
if not coinbase_pair:
|
||
logger.info("[%s] coinbase_premium skipped (no mapping)", symbol)
|
||
return
|
||
```
|
||
|
||
---
|
||
|
||
## P3(代码质量 / 安全规范)
|
||
|
||
### P3-1 多处保留默认数据库密码
|
||
- 文件:行号:
|
||
- `backend/db.py:19, 28`
|
||
- `backend/market_data_collector.py:19`
|
||
- 问题描述:默认密码 `arb_engine_2026` 仍作为fallback。
|
||
- 风险评估:凭据管理不符合生产安全基线。
|
||
- 修复代码建议:
|
||
```python
|
||
PG_PASS = os.getenv("PG_PASS")
|
||
if not PG_PASS:
|
||
raise RuntimeError("PG_PASS is required")
|
||
|
||
CLOUD_PG_PASS = os.getenv("CLOUD_PG_PASS")
|
||
if CLOUD_PG_ENABLED and not CLOUD_PG_PASS:
|
||
raise RuntimeError("CLOUD_PG_PASS is required when CLOUD_PG_ENABLED=true")
|
||
```
|
||
|
||
### P3-2 `fetch_pending_signals()` 使用字符串拼接SQL
|
||
- 文件:行号:`backend/live_executor.py:528-542`
|
||
- 问题描述:`strategies_str` 通过f-string注入SQL,可维护性与安全性都较差。
|
||
- 风险评估:策略名包含特殊字符会导致SQL异常,且不符合参数化规范。
|
||
- 修复代码建议:
|
||
```python
|
||
cur.execute(
|
||
"""
|
||
SELECT si.id, si.symbol, si.signal, si.score, si.ts, si.factors, si.strategy, si.price
|
||
FROM signal_indicators si
|
||
WHERE si.signal IS NOT NULL
|
||
AND si.signal != ''
|
||
AND si.strategy = ANY(%s)
|
||
AND si.ts > extract(epoch from now()) * 1000 - 60000
|
||
AND NOT EXISTS (
|
||
SELECT 1 FROM live_trades lt
|
||
WHERE lt.signal_id = si.id AND lt.strategy = si.strategy
|
||
)
|
||
ORDER BY si.ts DESC
|
||
""",
|
||
(ENABLED_STRATEGIES,)
|
||
)
|
||
```
|
||
|
||
### P3-3 对账日志存在乱码(编码污染)
|
||
- 文件:行号:`backend/position_sync.py:447-448`
|
||
- 问题描述:注释和日志出现 mojibake(`ä¸...`),影响排障可读性。
|
||
- 风险评估:低风险,但会降低紧急问题定位效率。
|
||
- 修复代码建议:
|
||
```python
|
||
# fallback: 未找到明确平仓成交,延后本轮结算
|
||
logger.warning(f"[{symbol}] 未找到明确平仓成交,延后结算")
|
||
```
|
||
|
||
### P3-4 `ensure_partitions()` 在高频flush路径重复调用
|
||
- 文件:行号:`backend/agg_trades_collector.py:77`
|
||
- 问题描述:每次 `flush_buffer()` 都调用分区DDL检查。
|
||
- 风险评估:不必要的DB负担,影响吞吐稳定性。
|
||
- 修复代码建议:
|
||
```python
|
||
# 启动时 + 定时任务中维护分区,不在flush热路径调用
|
||
# flush_buffer() 内删除 ensure_partitions()
|
||
|
||
async def ensure_partitions_loop():
|
||
while True:
|
||
try:
|
||
ensure_partitions()
|
||
except Exception as e:
|
||
logger.warning(f"ensure_partitions_loop error: {e}")
|
||
await asyncio.sleep(3600)
|
||
```
|
||
|
||
---
|
||
|
||
## 是否可以上实盘
|
||
**结论:当前代码不建议上实盘。**
|
||
|
||
### 必须先修复的阻断项(Blocking)
|
||
1. `P0-1` 风控失联 fail-open(`live_executor` 仍可开仓)。
|
||
2. `P0-2` 1R基准跨模块不一致(会导致风控阈值偏差)。
|
||
3. `P0-3` `close_all` 未校验平仓结果(可能“已熔断未平仓”)。
|
||
4. `P0-4` 评分公式单位错误(Coinbase Premium层被系统性放大)。
|
||
5. `P1-1` LISTEN链路无重连(可导致信号执行中断)。
|
||
6. `P1-6` 缺少 live 表初始化DDL(新环境不可恢复启动)。
|
||
|
||
当以上阻断项修复并回归后,再进行一次针对:
|
||
- 熔断全链路演练(含限流/超时注入)
|
||
- 重启恢复演练(signal_engine + live_executor + position_sync)
|
||
- 评分一致性回放(V5.1/V5.2样本)
|
||
|
||
通过后再进入实盘灰度。
|