Leaderboard

A public, versioned leaderboard for interventional / counterfactual time-series estimation on the DoTime suites. Anyone can submit via a reproducible path; entries are seeded with the reference baselines below.

Metrics

Lower RMSE / NMSE / MAE is better; higher direction accuracy (dir_acc) and R² are better. Oracle (the true SCM counterfactual) is the upper bound on the synthetic suites; intervention-unaware baselines (Zero, Mean, AR1, VAR-OLS) are the naive lower references.

Reference results (preliminary)

Pooled RMSE / direction accuracy. Preliminary — regenerate at full scale with scripts/build_release.py before camera-ready; numbers below are from a reduced build for illustration.

RMSE (direction accuracy in parentheses); lower RMSE / higher dir-acc is better.

Baseline	Identifiability	RegimeSwitch	Continuous	Generic
Oracle (upper bound)	0.00 (1.00)	0.00 (1.00)	0.00 (1.00)	0.00 (1.00)
Zero	0.65 (0.00)	1.81 (0.00)	1.55 (0.00)	34.00 (0.00)
Mean (TrajMean)	0.63 (0.64)	1.82 (0.55)	1.54 (0.54)	34.30 (0.65)
AR1	0.67 (0.60)	2.33 (0.45)	4.66 (0.53)	33.99 (0.63)
VAR-OLS	0.65 (0.62)	1.93 (0.51)	2.64 (0.57)	51.93 (0.66)
DoOverTimePFN	pending checkpoint mapping (Phase 8)

Generic RMSE is on the raw (un-normalized) scale, hence the larger magnitudes. Numbers come from a reduced build (n ≈ 200–2000/suite); regenerate at release scale for camera-ready.

Per-tier and per-structure breakdowns are available in each submission JSON.

Submitting

A submission is a JSON file conforming to the ctp-submission/1 schema, produced by the reproducible evaluator:

# a registered baseline
python scripts/eval_submission.py --suite dot-Identifiability-v1 \
    --baseline VAR-OLS --out submission.json

# your own model (anything with predict(episode) -> Tensor)
python scripts/eval_submission.py --suite dot-Identifiability-v1 \
    --model mypkg.models:MyModel --name MyModel --out submission.json

Submission schema (`ctp-submission/1`)

{
  "schema": "ctp-submission/1",
  "suite": "dot-Identifiability-v1",
  "model": "MyModel",
  "package_version": "0.1.0",
  "n_episodes": 10800,
  "n_queries": 10800,
  "pooled":        {"rmse": 0.0, "mae": 0.0, "nmse": 0.0, "r2": 0.0, "dir_acc": 0.0},
  "per_structure": {"back_door": {"rmse": 0.0, "...": 0.0}, "...": {}}
}

Open a pull request adding your submission.json under leaderboard/<suite>/, or submit through the project’s leaderboard space. Since the D&B track is single-blind, submissions may use real names.

Leaderboard

Metrics

Reference results (preliminary)

Submitting

Submission schema (ctp-submission/1)

Submission schema (`ctp-submission/1`)