Midas Quant Systems

Midas

Compound Engineering Framework for Alpha Feature Research
Every iteration makes the next one cheaper.

Overview

Two loops. One compounding knowledge base.

Midas is a dual-loop system for discovering and maintaining alpha features in diverse asset classes. The Offline Loop searches for new predictive signals using an LLM as a quant researcher; the Online Loop monitors live features, diagnoses degradation, and fires kill signals. Both loops write structured Learning Documents to a shared knowledge base, so that every failure and every success permanently improves future research.

The compound principle: each unit of work makes subsequent work easier. A feature that fails in the offline loop produces a learning that the next session reads before generating candidates. A feature that degrades in production produces an online learning the offline loop reads before its next plan.

System Architecture

Master pipeline

OFFLINE LOOP — DISCOVERY ONLINE LOOP — IMPROVEMENT KNOWLEDGE BASE — SHARED COMPOUND LEARNING 📋 Plan LLM research plan Write DSL expression gen 🔬 Assess 6-agent evaluation 💡 Learn Write to KB 📁 Candidates features/candidates/ refine (if failing) 🚀 Deploy features/deployed/ 📊 Monitor rolling IC · decay 🩺 Diagnose LLM root-cause 💡 Learn Write to KB Kill / Fix on_kill callback fix & redeploy Offline Learnings learnings/offline/ YYYY-MM-DD-*.md Online Learnings learnings/online/ YYYY-MM-DD-*.md Skills skills/ DSL · patterns Regimes regimes/ market docs Thresholds + Reports thresholds.json reports/daily/ promote →

Offline Loop

Discovery Pipeline

Plan → Write → Assess → Learn

Each session begins by reading the knowledge base — past failures, past successes, regime docs — before generating a research plan. The LLM then proposes 3 diverse DSL expressions. Each is validated, computed against historical data, and scored by six evaluation agents running in parallel threads. Failing expressions are refined; passing ones are saved as candidates.

Load KB learnings · skills thresholds Plan hypothesis · risks target horizon LLM Generate 3 candidates DSL expressions LLM DSLValidator syntax · depth 6-Agent Eval predictive · decay · cost diversify · overfit · regime ∥ parallel threads Learn LearningDocument → KB + candidates/ LLM invalid valid refine if failing pass

Evaluator

Static · Deterministic · Parallel

Six evaluation agents

MultiAgentEvaluator runs six specialised agents in parallel threads via ThreadPoolExecutor. All metric computation is vectorised numpy/pandas — no LLM calls in the inner loop. Each agent returns a AgentVerdict with a pass/fail, a 0–1 score, and concrete improvement suggestions fed back to the LLM refine prompt.

📈 predictive_power ⏱ decay_analysis 💸 trading_cost 🔀 diversification 🔍 overfit_detection 🌊 regime_robustness
feature pd.Series fwd_returns AlphaEvaluator ic · rank_ic · ir decay · turnover correlation · marginal_ic regime · oos · composite numpy vectorised predictive_power decay_analysis trading_cost diversification overfit_detection MultiAgentResult overall_pass blocking_issues improvement_suggestions verdicts[6] composite_score ∥ ThreadPoolExecutor

Online Loop

Live Production Monitoring

Deploy → Monitor → Diagnose → Learn

OnlineMonitor.process_update() is called on every completed bar (typically hourly). It maintains rolling buffers for up to 30 days, computes ic_5d / ic_30d ratios, realised turnover, and PnL attribution. When thresholds are breached, AlertEngine fires an Alert. Critical alerts trigger an async LLM diagnosis that writes a structured learning to the knowledge base — the core compound-learning hook.

Bar Feed feature_values forward_return MonitorEngine rolling buffers (30d) ic_1d / ic_5d / ic_30d half_life · turnover pnl_bps · slippage FeatureMetrics[ ] AlertEngine ic_ratio < 0.7 → warn ic_ratio < 0.5 → crit slippage > 2x → crit pnl < −50bps → warn Alert objects DiagnoseAgent root_cause evidence list fix_proposal (DSL) kill_signal: bool LLM async KB Write learnings/online/ reports/diagnoses/ reports/daily/ on_kill(feature) critical warning → on_alert callback

Feature Lifecycle

Candidate → Deployed → Archived

candidates/ offline loop → accept awaiting promotion deployed/ live in production monitored hourly archived/ retired features failure analysis attached promote() demote() python -m midas promote <name> python -m midas demote <name>

Modules

What's in the package

File Responsibility
models.py All shared dataclasses — EvaluationResult, MultiAgentResult, Alert, FeatureMetrics, LearningDocument, DailyReport, DiagnoseResult. No external imports beyond stdlib.
kb.py Filesystem abstraction for the knowledge base. All reads and writes go through typed helpers. Seeds default Midas DSL skill, prompt templates, and threshold JSON on first run.
evaluator.py AlphaEvaluator computes all EvaluationResult fields (vectorised). MultiAgentEvaluator runs six specialised agents in ThreadPoolExecutor and synthesises verdicts.
proposer.py DSLValidator - fast syntax, depth, and lookback checks with no LLM calls. ExpressionProposer handles the Plan, Generate, and Refine phases through the configured provider, including OpenAI and Anthropic.
loops.py The offline compound loop. Orchestrates Plan -> Write -> Assess -> Learn, handles refinement, and writes LearningDocument plus candidate markdown to the knowledge base every run.
monitor.py MonitorEngine (rolling buffers), AlertEngine (threshold rules), DiagnoseAgent (async LLM diagnosis + kill signal), OnlineMonitor (top-level orchestrator).
promoter.py FeaturePromoter manages candidate -> deployed -> archived transitions through markdown state updates. Exposes pipeline_summary() and print_pipeline().
factory.py create_midas() bootstraps the package. The Midas container wires all components together. The CLI entry point (python -m midas) supports status, promote, demote, reject, learnings, and demo commands.

Knowledge Base

Directory structure

midas-kb/
|-- skills/
| |-- midas-dsl.md # full DSL operator reference - seeded automatically
| `-- factor-patterns.md # common alpha pattern catalogue
|-- knowledge/
| |-- features/
| | |-- deployed/ # live in production - monitored every bar
| | |-- candidates/ # passed backtest, awaiting promote()
| | `-- archived/ # retired, failure analysis attached
| |-- learnings/
| | |-- offline/ # YYYY-MM-DD-<pattern>.md
| | `-- online/ # YYYY-MM-DD-<feature>-<alert_type>.md
| |-- regimes/
| `-- thresholds.json # pass/fail thresholds for all agents
|-- proposer/prompts/
| |-- plan.md # seeded on first run, fully editable
| |-- generate.md
| `-- refine.md
`-- reports/
|-- daily/ # YYYY-MM-DD.md - generated each session
`-- diagnoses/ # per-alert LLM diagnosis reports

This repository intentionally keeps demo_artifacts/ and example midas-kb/ outputs as demonstration artifacts so readers can inspect the files written by the framework.

Quickstart

Bootstrap in three lines

from midas import create_midas
import pandas as pd

# One-line bootstrap - creates the KB directory tree, seeds skills and prompts
midas = create_midas(
    kb_path  = "./midas-kb",
    provider = "openai",             # or "anthropic"
    api_key  = "...",                # or use environment variables
)

# Offline loop
def my_data_fn():
    # Return: (compute_fn, forward_returns_df, regime_series)
    return engine.compute_feature, engine.get_fwd_returns(), engine.get_regimes()

learning = midas.offline.run(
    research_goal    = "Short-term mean-reversion on VWAP deviation",
    existing_factors = midas.promoter.list_deployed(),
    data_fn          = my_data_fn,
    regime           = "HIGH_VOL",
)
print(learning.pattern_identified)   # saved to the knowledge base for future sessions

# Promote a winning candidate
midas.promoter.promote("vwap_zscore_mean_rev")

# Online loop
monitor = midas.build_online(
    feature_names = midas.promoter.list_deployed(),
    on_kill       = lambda name: engine.disable_feature(name),
)

async for bar in engine.live_feed():
    await monitor.process_update(
        timestamp      = bar.ts,
        feature_values = bar.signals,
        forward_return = bar.ret_1h,
        regime         = bar.regime,
        market_context = {"btc_vol": bar.btc_vol},
    )

CLI commands

# View the current feature pipeline
python -m midas status --kb ./midas-kb

# Promote a candidate to deployed
python -m midas promote vwap_zscore_mean_rev --kb ./midas-kb

# Retire a live feature with a reason
python -m midas demote vwap_zscore_mean_rev --kb ./midas-kb --reason "IC consistently < 0 in LOW_VOL regime"

# Reject a candidate back to archived
python -m midas reject noisy_ob_imbalance --kb ./midas-kb --reason "overfit ratio 2.4"

# Print the 5 most recent learning documents
python -m midas learnings --kb ./midas-kb --n 5

# Run the bundled demo
python -m midas demo

Design Decisions

No LLM in the inner loop

All metric computation is vectorised numpy/pandas. The LLM is called only for Plan, Generate, Refine, Learn, and Diagnose — never per-expression evaluation. Hundreds of expressions can be scored per second.

🔒

DSL validation before compute

DSLValidator catches unknown operators, negative lookbacks, unbalanced parens, and excessive nesting depth before the expression ever reaches the feature engine — saving latency and preventing crashes.

🧠

Structured learning, not chat history

Each loop run produces a markdown LearningDocument with typed fields: why it worked/failed, pattern identified, suggestions. These are loaded verbatim into future prompts — no summarisation loss.

🔀

Parallel agent evaluation

Six specialised agents each evaluate one dimension of quality and run concurrently via ThreadPoolExecutor. Each agent returns concrete improvement suggestions that are directly injected into the Refine prompt.

📁

Filesystem as source of truth

No database. The knowledge base is plain markdown files in a predictable directory tree — git-trackable, human-readable, easy to inspect and edit. KnowledgeBase provides a typed API so no code touches paths directly.

🌊

Regime-aware throughout

Regime labels thread through every stage: evaluation scores IC per-regime, the plan prompt loads current regime, the diagnose agent correlates degradation against regime shifts. No regime-blindness.

Default Thresholds

Configurable in knowledge/thresholds.json

{
  "min_rank_ic":       0.02,   // Spearman IC must exceed this
  "min_ir":            0.50,   // IC / IC_std ratio
  "max_turnover":      0.80,   // mean abs daily position change
  "max_correlation":   0.70,   // max correlation with any deployed factor
  "max_overfit_ratio": 1.50,   // IS_IC / OOS_IC must stay below this
  "min_oos_ic":        0.01,   // out-of-sample IC floor
  "min_composite":     0.30    // weighted composite score floor
}