White paper · public edition · v0.1 draft · 2026

Orchestrated Hierarchical Multi-Agent System (HMAS)

A supervisor-worker agent swarm for systematic investing, refereed by a deterministic statistical gauntlet it cannot game.

Draft in progress. This is the public edition skeleton. Sections fill in over the coming revisions; a deeper, data-rich edition is available on request via the private area. The crown-jewel parameters (live thresholds, capital, keys) are never published.

Abstract

Most attempts to automate investment research fail in one of two ways: they trust a single model's judgment (which fabricates), or they optimize a single metric like the Sharpe ratio (which overfits). This system does neither. It is an orchestrated hierarchical multi-agent system (HMAS) in which a coordinator delegates strategy-discovery work to tiers of reasoning agents — cheap local models for bulk exploration, escalating to top-tier models only for decisions that change state — and every candidate must survive a deterministic evaluation gauntlet: a conjunction of independent statistical gates (permutation test, deflated Sharpe ratio, bootstrap robustness, forward-degradation) rather than any one number. The agents propose; an incorruptible referee disposes. This document describes the architecture, the mathematics of the gauntlet, the regime and risk layers, and the engineering that runs it autonomously.

I · Thesis & first principles

1.1 The question that set the direction: systematic investing is the proven edge; trading is R&D.
1.2 Three problems that force a multi-agent design — search space too large for one mind; the cost of always-on top-tier thinking; and the honesty problem (models fabricate at the reasoning layer).
1.3 Design invariants: an evidence bar; a conjunction gate, not a Sharpe ratio; trust-via-harness; spend proportional to decision value; the same evidence bar across risk mandates.
1.4 What the system is not: not a black box, not a single-metric optimizer, not auto-trading real money.

II · The HMAS architecture

2.1 Topology: three layered hierarchies — cognitive/compute, research-control, and evidentiary — and how they interlock.
2.2 The chain of command: local infantry → cloud NCOs → a top-tier "general," each rank told what to do by the rank above. Cost distribution.
2.3 The control plane: one routing endpoint, budgets, and bring-your-own-key economics.
2.4 Orchestration: coordinator + executor, an idea tree, git-worktree isolation, held-out merge discipline.
2.5 Trust-via-harness: sandbox + deterministic honesty gate + top-tier review. Why the referee must be code, not a model.
2.6 The local model farm: hardware inventory, role assignment, and the empirical "crucible" that decides which model to trust at which task.

III · The research engine

3.1 Where strategy ideas come from.
3.2 The executor loop: propose → isolate → backtest → evaluate → merge or prune.
3.3 The world-event overlay: free macro/geopolitical signals injected at the "observe" step — the same signal, opposite action across mandates.
3.4 Multi-mandate forks: a low-risk / maximum-yield core and a ring-fenced, ruin-bounded high-risk sleeve — same evidence bar, different drawdown budget.

IV · The evaluation gauntlet the math

4.1 Why a conjunction gate — the multiple-comparisons problem in strategy search.
4.2 Monte Carlo Permutation Test (MCPT): the null of no signal.
4.3 Deflated Sharpe Ratio (DSR): correcting for selection across many trials.
4.4 Monte Carlo robustness: bootstrap drawdown-tail and skip-trade resilience.
4.5 Forward-degradation: detecting backtest-beats-forward decay.
4.6 Survivorship-free universe construction.
4.7 Walk-forward and regime-conditioned validation.
4.8 A worked example: one candidate's full trip through the gauntlet, with data.

V · Risk, regime & capital

5.1 Regime detection: a market "thermometer" and a √det(correlation) generalized-variance statistic — with the honest caveat that it is coincident, not predictive.
5.2 The finding that diversification beats regime-timing.
5.3 Position sizing: Kelly today, calibrated hit-rates tomorrow.
5.4 Capital discipline: tranches, per-venue allocation, ring-fencing the high-risk sleeve, ruin bounds.

VI · Execution & operations

6.1 Venues and the signal → bridge → executor path.
6.2 The circuit breaker (and an honest accounting of a fail-open bug found in review).
6.3 Observability: live dashboards, cross-strategy overlays, audit trails.
6.4 Automation and alerting: scheduled timers, push on every action and failure.
6.5 Data infrastructure: a free-source shim that replaced a paid market-data subscription.

VII · Validation philosophy & intellectual honesty

7.1 The 30-day forward gate; why backtest ≠ forward.
7.2 A postmortem on a data bug that overwrote backtest baselines — documented as a feature, not hidden.
7.3 Why trading is currently paused: honesty as an engineering value.
7.4 The crucible: empirically proving which models earn trust.

VIII · Roadmap

8.1 The retool: an orchestration engine driving the existing gauntlet, multi-mandate forks, the event overlay.
8.2 Open decisions and known limitations.
8.3 Where the engineering goes next.

Appendices

A. Glossary.
B. Mathematical reference — every formula with reputable citations.
C. Model-tier & cost-economics table.
D. Hardware / farm inventory.
E. Data-source catalog.
F. System-evolution timeline.