Overview
Two loops. One compounding knowledge base.
Midas is a dual-loop system for discovering and maintaining alpha features in diverse asset classes. The Offline Loop searches for new predictive signals using an LLM as a quant researcher; the Online Loop monitors live features, diagnoses degradation, and fires kill signals. Both loops write structured Learning Documents to a shared knowledge base, so that every failure and every success permanently improves future research.
System Architecture
Master pipeline
Offline Loop
Discovery PipelinePlan → Write → Assess → Learn
Each session begins by reading the knowledge base — past failures, past successes, regime docs — before generating a research plan. The LLM then proposes 3 diverse DSL expressions. Each is validated, computed against historical data, and scored by six evaluation agents running in parallel threads. Failing expressions are refined; passing ones are saved as candidates.
Evaluator
Static · Deterministic · ParallelSix evaluation agents
MultiAgentEvaluator runs six specialised agents in parallel threads via
ThreadPoolExecutor. All metric computation is vectorised numpy/pandas —
no LLM calls in the inner loop. Each agent returns a AgentVerdict
with a pass/fail, a 0–1 score, and concrete improvement suggestions fed back
to the LLM refine prompt.
Online Loop
Live Production MonitoringDeploy → Monitor → Diagnose → Learn
OnlineMonitor.process_update() is called on every completed bar (typically hourly).
It maintains rolling buffers for up to 30 days, computes ic_5d / ic_30d ratios,
realised turnover, and PnL attribution. When thresholds are breached, AlertEngine
fires an Alert. Critical alerts trigger an async LLM diagnosis that writes a
structured learning to the knowledge base — the core compound-learning hook.
Feature Lifecycle
Candidate → Deployed → Archived
Modules
What's in the package
| File | Responsibility |
|---|---|
| models.py | All shared dataclasses — EvaluationResult, MultiAgentResult, Alert, FeatureMetrics, LearningDocument, DailyReport, DiagnoseResult. No external imports beyond stdlib. |
| kb.py | Filesystem abstraction for the knowledge base. All reads and writes go through typed helpers. Seeds default Midas DSL skill, prompt templates, and threshold JSON on first run. |
| evaluator.py | AlphaEvaluator computes all EvaluationResult fields (vectorised). MultiAgentEvaluator runs six specialised agents in ThreadPoolExecutor and synthesises verdicts. |
| proposer.py | DSLValidator - fast syntax, depth, and lookback checks with no LLM calls. ExpressionProposer handles the Plan, Generate, and Refine phases through the configured provider, including OpenAI and Anthropic. |
| loops.py | The offline compound loop. Orchestrates Plan -> Write -> Assess -> Learn, handles refinement, and writes LearningDocument plus candidate markdown to the knowledge base every run. |
| monitor.py | MonitorEngine (rolling buffers), AlertEngine (threshold rules), DiagnoseAgent (async LLM diagnosis + kill signal), OnlineMonitor (top-level orchestrator). |
| promoter.py | FeaturePromoter manages candidate -> deployed -> archived transitions through markdown state updates. Exposes pipeline_summary() and print_pipeline(). |
| factory.py | create_midas() bootstraps the package. The Midas container wires all components together. The CLI entry point (python -m midas) supports status, promote, demote, reject, learnings, and demo commands. |
Knowledge Base
Directory structure
|-- skills/
| |-- midas-dsl.md # full DSL operator reference - seeded automatically
| `-- factor-patterns.md # common alpha pattern catalogue
|-- knowledge/
| |-- features/
| | |-- deployed/ # live in production - monitored every bar
| | |-- candidates/ # passed backtest, awaiting promote()
| | `-- archived/ # retired, failure analysis attached
| |-- learnings/
| | |-- offline/ # YYYY-MM-DD-<pattern>.md
| | `-- online/ # YYYY-MM-DD-<feature>-<alert_type>.md
| |-- regimes/
| `-- thresholds.json # pass/fail thresholds for all agents
|-- proposer/prompts/
| |-- plan.md # seeded on first run, fully editable
| |-- generate.md
| `-- refine.md
`-- reports/
|-- daily/ # YYYY-MM-DD.md - generated each session
`-- diagnoses/ # per-alert LLM diagnosis reports
This repository intentionally keeps demo_artifacts/ and example midas-kb/ outputs as
demonstration artifacts so readers can inspect the files written by the framework.
Quickstart
Bootstrap in three lines
from midas import create_midas
import pandas as pd
# One-line bootstrap - creates the KB directory tree, seeds skills and prompts
midas = create_midas(
kb_path = "./midas-kb",
provider = "openai", # or "anthropic"
api_key = "...", # or use environment variables
)
# Offline loop
def my_data_fn():
# Return: (compute_fn, forward_returns_df, regime_series)
return engine.compute_feature, engine.get_fwd_returns(), engine.get_regimes()
learning = midas.offline.run(
research_goal = "Short-term mean-reversion on VWAP deviation",
existing_factors = midas.promoter.list_deployed(),
data_fn = my_data_fn,
regime = "HIGH_VOL",
)
print(learning.pattern_identified) # saved to the knowledge base for future sessions
# Promote a winning candidate
midas.promoter.promote("vwap_zscore_mean_rev")
# Online loop
monitor = midas.build_online(
feature_names = midas.promoter.list_deployed(),
on_kill = lambda name: engine.disable_feature(name),
)
async for bar in engine.live_feed():
await monitor.process_update(
timestamp = bar.ts,
feature_values = bar.signals,
forward_return = bar.ret_1h,
regime = bar.regime,
market_context = {"btc_vol": bar.btc_vol},
)
CLI commands
# View the current feature pipeline
python -m midas status --kb ./midas-kb
# Promote a candidate to deployed
python -m midas promote vwap_zscore_mean_rev --kb ./midas-kb
# Retire a live feature with a reason
python -m midas demote vwap_zscore_mean_rev --kb ./midas-kb --reason "IC consistently < 0 in LOW_VOL regime"
# Reject a candidate back to archived
python -m midas reject noisy_ob_imbalance --kb ./midas-kb --reason "overfit ratio 2.4"
# Print the 5 most recent learning documents
python -m midas learnings --kb ./midas-kb --n 5
# Run the bundled demo
python -m midas demo
Design Decisions
No LLM in the inner loop
All metric computation is vectorised numpy/pandas. The LLM is called only for Plan, Generate, Refine, Learn, and Diagnose — never per-expression evaluation. Hundreds of expressions can be scored per second.
DSL validation before compute
DSLValidator catches unknown operators, negative lookbacks, unbalanced parens, and excessive nesting depth before the expression ever reaches the feature engine — saving latency and preventing crashes.
Structured learning, not chat history
Each loop run produces a markdown LearningDocument with typed fields: why it worked/failed, pattern identified, suggestions. These are loaded verbatim into future prompts — no summarisation loss.
Parallel agent evaluation
Six specialised agents each evaluate one dimension of quality and run concurrently via ThreadPoolExecutor. Each agent returns concrete improvement suggestions that are directly injected into the Refine prompt.
Filesystem as source of truth
No database. The knowledge base is plain markdown files in a predictable directory tree — git-trackable, human-readable, easy to inspect and edit. KnowledgeBase provides a typed API so no code touches paths directly.
Regime-aware throughout
Regime labels thread through every stage: evaluation scores IC per-regime, the plan prompt loads current regime, the diagnose agent correlates degradation against regime shifts. No regime-blindness.
Default Thresholds
Configurable in knowledge/thresholds.json
{
"min_rank_ic": 0.02, // Spearman IC must exceed this
"min_ir": 0.50, // IC / IC_std ratio
"max_turnover": 0.80, // mean abs daily position change
"max_correlation": 0.70, // max correlation with any deployed factor
"max_overfit_ratio": 1.50, // IS_IC / OOS_IC must stay below this
"min_oos_ic": 0.01, // out-of-sample IC floor
"min_composite": 0.30 // weighted composite score floor
}