benchmarks

Frozen benchmark suites for DoTime.

This module exposes the consumer side of the released benchmarks: loading a versioned, immutable suite (downloading + caching it from Zenodo on first use) and iterating over its episodes for evaluation.

Public surface

Episode — one trajectory: obs/int data, intervention, ground truth.
BenchmarkSuite — a named, versioned collection of episodes.
load_benchmark() — fetch a suite by name (cached under ~/.cache).
available_suites()— list the registered suite names.

Notes for implementers

The download + parse path is stubbed where it touches real artifacts (marked TODO(release)). The frozen on-disk format is a per-suite directory with a manifest.json plus one or more parquet shards in the tidy schema produced by scripts/build_release.py. Wire _parse_suite_dir() to that schema.

class dotime.benchmarks.BenchmarkSuite(meta, episodes)[source]

Bases: object

A named, versioned, immutable collection of Episode objects.

Parameters:

meta (SuiteMetadata)
episodes (list[Episode])

by_structure()[source]

Yield (structure_name, episodes) groups.

Episodes with structure is None are grouped under "_all".

Return type:: Iterator[tuple[str, list[Episode]]]

filter(structure)[source]

Return a sub-suite containing only episodes of structure.

Return type:: BenchmarkSuite
Parameters:: structure (str)

class dotime.benchmarks.Episode(x_obs, x_int, intervention, y_true, query_target, query_time, structure=None, scm_id=None, metadata=<factory>)[source]

Bases: object

A single benchmark trajectory and its associated queries.

Variables:

x_obs – Observational trajectory, shape (T, N).
x_int – Interventional trajectory under intervention, shape (T, N).
intervention – The applied intervention specification.
y_true – Ground-truth interventional outcome(s) for the query/queries, shape (n_queries,).
query_target – Index of the queried variable per query, shape (n_queries,).
query_time – Query time (float in [0, 1] for continuous suites, or int step), shape (n_queries,).
structure – Identification structure label ("back_door", …), if applicable.
scm_id – Stable id of the generating SCM within the suite.
metadata – Free-form per-episode metadata (effect magnitude, regime count, …).

Parameters:

x_obs (Tensor)
x_int (Tensor)
intervention (InterventionSpec)
y_true (Tensor)
query_target (Tensor)
query_time (Tensor)
structure (str | None)
scm_id (int | None)
metadata (dict)

x_obs: Tensor

x_int: Tensor

intervention: InterventionSpec

y_true: Tensor

query_target: Tensor

query_time: Tensor

structure: str | None = None

scm_id: int | None = None

metadata: dict

property n_vars: int

property length: int

class dotime.benchmarks.SuiteMetadata(name, version, zenodo_record_id, doi, description, n_episodes, structures=(), license='CC-BY-4.0', hf_repo_id='')[source]

Bases: object

Static metadata for a released benchmark suite.

Parameters:

name (str)
version (str)
zenodo_record_id (str)
doi (str)
description (str)
n_episodes (int)
structures (tuple[str, ...])
license (str)
hf_repo_id (str)

name: str

version: str

zenodo_record_id: str

doi: str

description: str

n_episodes: int

structures: tuple[str, ...] = ()

license: str = 'CC-BY-4.0'

hf_repo_id: str = ''

property zenodo_files_url: str

dotime.benchmarks.available_suites()[source]

Return the names of all registered benchmark suites.

Return type:: list[str]

dotime.benchmarks.load_benchmark(name, version='latest', *, force_download=False, cache_dir=None)[source]

Load a frozen benchmark suite by name.

On first use the suite is downloaded from Zenodo into the cache directory (~/.cache/dotime by default, override with $DOTIME_CACHE or the cache_dir argument). Subsequent calls read from the cache.

Parameters:

name (str) – Suite name, e.g. "dot-Identifiability-v1". See available_suites().
version (str) – Suite version. "latest" resolves to the registered version.
force_download (bool) – Re-download even if a cached copy exists.
cache_dir (str | PathLike[str] | None) – Override the cache root.

Return type:

BenchmarkSuite

Returns:

BenchmarkSuite