Midas - Compound Engineering Framework

Overview

Two loops. One compounding knowledge base.

Midas is a dual-loop system for discovering and maintaining alpha features in diverse asset classes. The Offline Loop searches for new predictive signals using an LLM as a quant researcher; the Online Loop monitors live features, diagnoses degradation, and fires kill signals. Both loops write structured Learning Documents to a shared knowledge base, so that every failure and every success permanently improves future research.

The compound principle: each unit of work makes subsequent work easier. A feature that fails in the offline loop produces a learning that the next session reads before generating candidates. A feature that degrades in production produces an online learning the offline loop reads before its next plan.

System Architecture

Master pipeline

Offline Loop

Discovery Pipeline

Plan → Write → Assess → Learn

Each session begins by reading the knowledge base — past failures, past successes, regime docs — before generating a research plan. The LLM then proposes 3 diverse DSL expressions. Each is validated, computed against historical data, and scored by six evaluation agents running in parallel threads. Failing expressions are refined; passing ones are saved as candidates.

Evaluator

Static · Deterministic · Parallel

Six evaluation agents

MultiAgentEvaluator runs six specialised agents in parallel threads via ThreadPoolExecutor. All metric computation is vectorised numpy/pandas — no LLM calls in the inner loop. Each agent returns a AgentVerdict with a pass/fail, a 0–1 score, and concrete improvement suggestions fed back to the LLM refine prompt.

📈 predictive_power ⏱ decay_analysis 💸 trading_cost 🔀 diversification 🔍 overfit_detection 🌊 regime_robustness

Online Loop

Live Production Monitoring

Deploy → Monitor → Diagnose → Learn

OnlineMonitor.process_update() is called on every completed bar (typically hourly). It maintains rolling buffers for up to 30 days, computes ic_5d / ic_30d ratios, realised turnover, and PnL attribution. When thresholds are breached, AlertEngine fires an Alert. Critical alerts trigger an async LLM diagnosis that writes a structured learning to the knowledge base — the core compound-learning hook.

Feature Lifecycle

Candidate → Deployed → Archived

Modules

What's in the package

File	Responsibility
models.py	All shared dataclasses — `EvaluationResult`, `MultiAgentResult`, `Alert`, `FeatureMetrics`, `LearningDocument`, `DailyReport`, `DiagnoseResult`. No external imports beyond stdlib.
kb.py	Filesystem abstraction for the knowledge base. All reads and writes go through typed helpers. Seeds default Midas DSL skill, prompt templates, and threshold JSON on first run.
evaluator.py	`AlphaEvaluator` computes all `EvaluationResult` fields (vectorised). `MultiAgentEvaluator` runs six specialised agents in `ThreadPoolExecutor` and synthesises verdicts.
proposer.py	`DSLValidator` - fast syntax, depth, and lookback checks with no LLM calls. `ExpressionProposer` handles the Plan, Generate, and Refine phases through the configured provider, including OpenAI and Anthropic.
loops.py	The offline compound loop. Orchestrates Plan -> Write -> Assess -> Learn, handles refinement, and writes `LearningDocument` plus candidate markdown to the knowledge base every run.
monitor.py	`MonitorEngine` (rolling buffers), `AlertEngine` (threshold rules), `DiagnoseAgent` (async LLM diagnosis + kill signal), `OnlineMonitor` (top-level orchestrator).
promoter.py	`FeaturePromoter` manages candidate -> deployed -> archived transitions through markdown state updates. Exposes `pipeline_summary()` and `print_pipeline()`.
factory.py	`create_midas()` bootstraps the package. The `Midas` container wires all components together. The CLI entry point (`python -m midas`) supports status, promote, demote, reject, learnings, and demo commands.

Knowledge Base

Directory structure

This repository intentionally keeps demo_artifacts/ and example midas-kb/ outputs as demonstration artifacts so readers can inspect the files written by the framework.

Quickstart

Bootstrap in three lines

from midas import create_midas
import pandas as pd

# One-line bootstrap - creates the KB directory tree, seeds skills and prompts
midas = create_midas(
    kb_path  = "./midas-kb",
    provider = "openai",             # or "anthropic"
    api_key  = "...",                # or use environment variables
)

# Offline loop
def my_data_fn():
    # Return: (compute_fn, forward_returns_df, regime_series)
    return engine.compute_feature, engine.get_fwd_returns(), engine.get_regimes()

learning = midas.offline.run(
    research_goal    = "Short-term mean-reversion on VWAP deviation",
    existing_factors = midas.promoter.list_deployed(),
    data_fn          = my_data_fn,
    regime           = "HIGH_VOL",
)
print(learning.pattern_identified)   # saved to the knowledge base for future sessions

# Promote a winning candidate
midas.promoter.promote("vwap_zscore_mean_rev")

# Online loop
monitor = midas.build_online(
    feature_names = midas.promoter.list_deployed(),
    on_kill       = lambda name: engine.disable_feature(name),
)

async for bar in engine.live_feed():
    await monitor.process_update(
        timestamp      = bar.ts,
        feature_values = bar.signals,
        forward_return = bar.ret_1h,
        regime         = bar.regime,
        market_context = {"btc_vol": bar.btc_vol},
    )

CLI commands

# View the current feature pipeline
python -m midas status --kb ./midas-kb

# Promote a candidate to deployed
python -m midas promote vwap_zscore_mean_rev --kb ./midas-kb

# Retire a live feature with a reason
python -m midas demote vwap_zscore_mean_rev --kb ./midas-kb --reason "IC consistently < 0 in LOW_VOL regime"

# Reject a candidate back to archived
python -m midas reject noisy_ob_imbalance --kb ./midas-kb --reason "overfit ratio 2.4"

# Print the 5 most recent learning documents
python -m midas learnings --kb ./midas-kb --n 5

# Run the bundled demo
python -m midas demo

Design Decisions

⚡

No LLM in the inner loop

All metric computation is vectorised numpy/pandas. The LLM is called only for Plan, Generate, Refine, Learn, and Diagnose — never per-expression evaluation. Hundreds of expressions can be scored per second.

🔒

DSL validation before compute

DSLValidator catches unknown operators, negative lookbacks, unbalanced parens, and excessive nesting depth before the expression ever reaches the feature engine — saving latency and preventing crashes.

🧠

Structured learning, not chat history

Each loop run produces a markdown LearningDocument with typed fields: why it worked/failed, pattern identified, suggestions. These are loaded verbatim into future prompts — no summarisation loss.

🔀

Parallel agent evaluation

Six specialised agents each evaluate one dimension of quality and run concurrently via ThreadPoolExecutor. Each agent returns concrete improvement suggestions that are directly injected into the Refine prompt.

📁

Filesystem as source of truth

No database. The knowledge base is plain markdown files in a predictable directory tree — git-trackable, human-readable, easy to inspect and edit. KnowledgeBase provides a typed API so no code touches paths directly.

🌊

Regime-aware throughout

Regime labels thread through every stage: evaluation scores IC per-regime, the plan prompt loads current regime, the diagnose agent correlates degradation against regime shifts. No regime-blindness.

Default Thresholds

Configurable in `knowledge/thresholds.json`

{
  "min_rank_ic":       0.02,   // Spearman IC must exceed this
  "min_ir":            0.50,   // IC / IC_std ratio
  "max_turnover":      0.80,   // mean abs daily position change
  "max_correlation":   0.70,   // max correlation with any deployed factor
  "max_overfit_ratio": 1.50,   // IS_IC / OOS_IC must stay below this
  "min_oos_ic":        0.01,   // out-of-sample IC floor
  "min_composite":     0.30    // weighted composite score floor
}