benchmarks

Frozen benchmark suites for DoTime.

This module exposes the consumer side of the released benchmarks: loading a versioned, immutable suite (downloading + caching it from Zenodo on first use) and iterating over its episodes for evaluation.

Public surface

Notes for implementers

The download + parse path is stubbed where it touches real artifacts (marked TODO(release)). The frozen on-disk format is a per-suite directory with a manifest.json plus one or more parquet shards in the tidy schema produced by scripts/build_release.py. Wire _parse_suite_dir() to that schema.

class dotime.benchmarks.BenchmarkSuite(meta, episodes)[source]

Bases: object

A named, versioned, immutable collection of Episode objects.

Parameters:
by_structure()[source]

Yield (structure_name, episodes) groups.

Episodes with structure is None are grouped under "_all".

Return type:

Iterator[tuple[str, list[Episode]]]

filter(structure)[source]

Return a sub-suite containing only episodes of structure.

Return type:

BenchmarkSuite

Parameters:

structure (str)

class dotime.benchmarks.Episode(x_obs, x_int, intervention, y_true, query_target, query_time, structure=None, scm_id=None, metadata=<factory>)[source]

Bases: object

A single benchmark trajectory and its associated queries.

Variables:
  • x_obs – Observational trajectory, shape (T, N).

  • x_int – Interventional trajectory under intervention, shape (T, N).

  • intervention – The applied intervention specification.

  • y_true – Ground-truth interventional outcome(s) for the query/queries, shape (n_queries,).

  • query_target – Index of the queried variable per query, shape (n_queries,).

  • query_time – Query time (float in [0, 1] for continuous suites, or int step), shape (n_queries,).

  • structure – Identification structure label ("back_door", …), if applicable.

  • scm_id – Stable id of the generating SCM within the suite.

  • metadata – Free-form per-episode metadata (effect magnitude, regime count, …).

Parameters:
x_obs: Tensor
x_int: Tensor
intervention: InterventionSpec
y_true: Tensor
query_target: Tensor
query_time: Tensor
structure: str | None = None
scm_id: int | None = None
metadata: dict
property n_vars: int
property length: int
class dotime.benchmarks.SuiteMetadata(name, version, zenodo_record_id, doi, description, n_episodes, structures=(), license='CC-BY-4.0', hf_repo_id='')[source]

Bases: object

Static metadata for a released benchmark suite.

Parameters:
name: str
version: str
zenodo_record_id: str
doi: str
description: str
n_episodes: int
structures: tuple[str, ...] = ()
license: str = 'CC-BY-4.0'
hf_repo_id: str = ''
property zenodo_files_url: str
dotime.benchmarks.available_suites()[source]

Return the names of all registered benchmark suites.

Return type:

list[str]

dotime.benchmarks.load_benchmark(name, version='latest', *, force_download=False, cache_dir=None)[source]

Load a frozen benchmark suite by name.

On first use the suite is downloaded from Zenodo into the cache directory (~/.cache/dotime by default, override with $DOTIME_CACHE or the cache_dir argument). Subsequent calls read from the cache.

Parameters:
  • name (str) – Suite name, e.g. "dot-Identifiability-v1". See available_suites().

  • version (str) – Suite version. "latest" resolves to the registered version.

  • force_download (bool) – Re-download even if a cached copy exists.

  • cache_dir (str | PathLike[str] | None) – Override the cache root.

Return type:

BenchmarkSuite

Returns:

BenchmarkSuite