Using Benchmarks
This guide covers everything you need to work with existing benchmarks in DecisionFocusedLearningBenchmarks.jl: generating datasets, assembling DFL pipeline components, applying algorithms, and evaluating results.
What is a benchmark?
A benchmark bundles a problem family (an instance generator, a combinatorial solver, and a statistical model architecture) into a single object. It provides everything needed to run a Decision-Focused Learning experiment out of the box, without having to create each component from scratch. Three abstract types cover the main settings:
AbstractStaticBenchmark: static problems (one instance, one decision)AbstractStochasticBenchmark{exogenous}: stochastic problems (type parameter indicates whether uncertainty is exogenous)AbstractDynamicBenchmark{exogenous}: sequential / multi-stage problems
The sections below explain what changes between these settings. For most purposes, start with a static benchmark to understand the core workflow.
Core workflow
Every benchmark exposes three key methods. For any static benchmark:
bench = ArgmaxBenchmark()
model = generate_statistical_model(bench; seed=0) # Flux model
maximizer = generate_maximizer(bench) # combinatorial oracle
dataset = generate_dataset(bench, 100; seed=0) # Vector{DataSample}generate_statistical_model: returns an untrained neural network that maps input featuresxto cost parametersθ.generate_maximizer: returns a callable(θ; context...) -> ythat solves the combinatorial problem given cost parameters.generate_dataset: returns labeled training data as aVector{DataSample}.
At inference time these two pieces compose naturally as an end-to-end policy:
θ = model(sample.x) # predict cost parameters
y = maximizer(θ; sample.context...) # solve the optimization problemDataSample at a glance
All data in the package is represented as DataSample objects.
| Field | Type | Description |
|---|---|---|
x | any | Input features (fed to the statistical model) |
θ | any | Intermediate cost parameters |
y | any | Output decision / solution |
context | NamedTuple | Solver kwargs spread into maximizer(θ; sample.context...) |
extra | NamedTuple | Non-solver data (scenario, reward, step, …), never passed to the solver |
Not all fields are populated in every sample, depending on the setting. For convenience, named entries inside context and extra can be accessed directly on the sample via property forwarding:
sample.instance # looks up :instance in context first, then in extra
sample.scenario # looks up :scenario in context first, then in extraBenchmark type specifics
Static benchmarks
For static benchmarks (<:AbstractStaticBenchmark), generate_dataset may compute a default ground-truth label y if the benchmark implements it:
bench = ArgmaxBenchmark()
dataset = generate_dataset(bench, 100; seed=0) # Vector{DataSample} with x, y, contextYou can override the labels by providing a target_policy:
my_policy = sample -> DataSample(; sample.context..., x=sample.x, y=my_algorithm(sample.instance))
dataset = generate_dataset(bench, 100; seed=0, target_policy=my_policy)Stochastic benchmarks (exogenous)
For AbstractStochasticBenchmark{true} benchmarks the default call returns unlabeled samples, each sample carries one scenario in sample.extra.scenario:
bench = StochasticVehicleSchedulingBenchmark()
dataset = generate_dataset(bench, 20; seed=0) # y = nothingRequest multiple scenarios per instance with nb_scenarios:
dataset = generate_dataset(bench, 20; seed=0, nb_scenarios=5)
# returns 20 × 5 = 100 samplesTo compute labels, wrap your algorithm as a target_policy:
anticipative = generate_anticipative_solver(bench) # (scenario; kwargs...) -> y
policy = (sample, scenarios) -> [
DataSample(; sample.context..., x=sample.x,
y=anticipative(ξ; sample.context...))
for ξ in scenarios
]
labeled = generate_dataset(bench, 20; seed=0, nb_scenarios=5, target_policy=policy)Dynamic benchmarks
Dynamic benchmarks use a two-step workflow:
bench = DynamicVehicleSchedulingBenchmark()
# Step 1: create environments (reusable across experiments)
envs = generate_environments(bench, 10; seed=0)
# Step 2: roll out a policy to collect training trajectories
policy = generate_baseline_policies(bench)[1] # e.g. lazy policy
dataset = generate_dataset(bench, envs; target_policy=policy)
# dataset is a flat Vector{DataSample} of all steps across all trajectoriestarget_policy is required to create datasets for dynamic benchmarks (there is no default label). It must be a callable (env) -> Vector{DataSample} that performs a full episode rollout and returns the resulting trajectory.
Seed / RNG control
All generate_dataset and generate_environments calls accept either seed (creates an internal MersenneTwister) or rng for full control:
using Random
rng = MersenneTwister(42)
dataset = generate_dataset(bench, 50; rng=rng)Evaluation
# Average relative optimality gap across a dataset
gap = compute_gap(bench, dataset, model, maximizer)Objective value for a single decision:
obj = objective_value(bench, sample, y)Baseline policies
generate_baseline_policies returns a collection of named callables that can serve as reference points or as target_policy arguments:
policies = generate_baseline_policies(bench)
pol = policies[1] # e.g. greedy, lazy, or anticipative policy- Static / stochastic:
pol(sample) -> DataSample - Dynamic:
pol(env) -> Vector{DataSample}(full episode trajectory)
For dynamic benchmarks you can evaluate a policy over multiple episodes:
rewards, samples = evaluate_policy!(pol, envs, n_episodes)Visualization
Plots is an optional dependency, load it with using Plots to unlock the plot functions. Not all benchmarks support visualization, call has_visualization(bench) to check.
using Plots
bench = Argmax2DBenchmark()
dataset = generate_dataset(bench, 10)
sample = dataset[1]
has_visualization(bench) # true
plot_instance(bench, sample) # problem geometry only
plot_solution(bench, sample) # sample.y overlaid on the instance
plot_solution(bench, sample, y) # convenience 3-arg form: override y before plotting
# Dynamic benchmarks only
traj = generate_anticipative_solver(bench)(env)
plot_trajectory(bench, traj) # grid of epoch subplots
anim = animate_trajectory(bench, traj; fps=2)
gif(anim, "episode.gif")has_visualization(bench): returnstruefor benchmarks that implement plot support (if Plots is loaded).plot_instance(bench, sample; kwargs...): renders the problem geometry without any solution.plot_solution(bench, sample; kwargs...): renderssample.yoverlaid on the instance.plot_solution(bench, sample, y; kwargs...): 3-arg convenience form that overridesybefore plotting.plot_trajectory(bench, traj; kwargs...): dynamic benchmarks only; produces a grid of per-epoch subplots.animate_trajectory(bench, traj; kwargs...): dynamic benchmarks only, returns aPlots.Animationthat can be saved withgif(anim, "file.gif").