Warcraft

Find the cheapest path on a 12×12 terrain map: cell travel costs are unknown and must be inferred from the RGB terrain image using a neural network.

using DecisionFocusedLearningBenchmarks
using Plots

b = WarcraftBenchmark()

WarcraftBenchmark()

Observable input

At inference time the decision-maker observes only the terrain image x (not the costs θ):

sample = generate_dataset(b, 1)[1]
plot_context(b, sample)

A training sample

Each sample is a labeled triple (x, θ, y):

x: terrain image (12×12×3 RGB array; observable at train and test time)
θ: true cell travel costs (training supervision only, hidden at test time)
y: optimal path indicator (y[i,j] = 1 if cell (i,j) is on the path)

Left: terrain image. Middle: true costs θ. Right: optimal path y:

plot_sample(b, sample)

Untrained policy

A DFL policy chains two components: a CNN predicting cell travel costs from the terrain image:

model = generate_statistical_model(b)     # ResNet18 CNN: terrain image → 12×12 cost map

Chain(
  Conv((7, 7), 3 => 64, pad=3, stride=2, bias=false),  # 9_408 parameters
  BatchNorm(64, relu),                  # 128 parameters, plus 128
  MaxPool((3, 3), pad=1, stride=2),
  Parallel(
    PartialFunction(
      "",
      Metalhead.addact,
      (NNlib.relu,),
      NamedTuple(),
    ),
    identity,
    Chain(
      Conv((3, 3), 64 => 64, pad=1, bias=false),  # 36_864 parameters
      BatchNorm(64),                    # 128 parameters, plus 128
      NNlib.relu,
      Conv((3, 3), 64 => 64, pad=1, bias=false),  # 36_864 parameters
      BatchNorm(64),                    # 128 parameters, plus 128
    ),
  ),
  AdaptiveMaxPool((12, 12)),
  DecisionFocusedLearningBenchmarks.Utils.average_tensor,
  DecisionFocusedLearningBenchmarks.Utils.neg_tensor,
  DecisionFocusedLearningBenchmarks.Utils.squeeze_last_dims,
)         # Total: 9 trainable arrays, 83_520 parameters,
          # plus 6 non-trainable, 384 parameters, summarysize 328.945 KiB.

and a maximizer finding the shortest path given those costs:

maximizer = generate_maximizer(b)         # Dijkstra shortest path on the 12×12 grid

dijkstra_maximizer (generic function with 1 method)

An untrained CNN produces a near-uniform cost map, yielding a near-straight path:

θ_pred = model(sample.x)
y_pred = maximizer(θ_pred)
plot_sample(b, DataSample(sample; θ=θ_pred, y=y_pred))

Optimality gap on this sample (lower is better):

compute_gap(b, [sample], model, maximizer)

Float16(0.1515)

In the Warcraft benchmark, each instance is a 12×12 grid representing a Warcraft terrain map. Each cell has an unknown travel cost depending on its terrain type (forest, mountain, water, etc.). The task is to find the path from the top-left to the bottom-right corner that minimizes total travel cost.

Formally, let $\theta_{ij}$ be the (unknown) cost of cell $(i,j)$ and $y_{ij} \in \{0,1\}$ indicate whether cell $(i,j)$ is on the path. The objective is:

\[y^* = \mathop{\mathrm{argmin}}\limits_{y \in \mathcal{P}} \sum_{(i,j)} \theta_{ij} \, y_{ij}\]

where $\mathcal{P}$ is the set of valid grid paths (4-connected, source to sink).

The dataset contains 10 000 labeled terrain images from the Warcraft II tileset. It is downloaded automatically on first use via DataDeps.jl.

Key Components

WarcraftBenchmark has no parameters.

Method	Description
`generate_dataset(b, n)`	Downloads and loads `n` terrain images with true costs and paths
`generate_statistical_model(b)`	ResNet18 CNN (first 5 layers + adaptive maxpool + neg)
`generate_maximizer(b; dijkstra=true)`	Dijkstra or Bellman-Ford shortest path

DFL Policy

\[\xrightarrow[\text{Terrain image}]{x \in \mathbb{R}^{12 \times 12 \times 3}} \fbox{ResNet18 CNN} \xrightarrow[\text{Cell costs}]{\theta \in \mathbb{R}^{12 \times 12}} \fbox{Dijkstra} \xrightarrow[\text{Path}]{y \in \{0,1\}^{12 \times 12}}\]

The CNN maps terrain pixel values to predicted cell costs, which are then passed to a shortest-path solver. Training end-to-end with InferOpt.jl teaches the network to produce costs that lead to good paths, not just accurate cost estimates.

Tip

See the Warcraft tutorial for a complete end-to-end training example using PerturbedMultiplicative and FenchelYoungLoss.

Reference

Vlastelica et al. (2020), Differentiation of Blackbox Combinatorial Solvers, ICLR 2020.

This page was generated using Literate.jl.

Warcraft

Observable input

A training sample

Untrained policy

Problem Description

Key Components

DFL Policy