API Reference

Types

InferOpt.AbstractLayerType
AbstractLayer

Supertype for all the layers defined in InferOpt.

All of these layers are callable, and differentiable with any ChainRules-compatible autodiff backend.

Interface

  • (layer::AbstractLayer)(args...; kwargs...)
source
InferOpt.AbstractLossLayerType
AbstractLossLayer <: AbstractLayer

Supertype for all the loss layers defined in InferOpt.

Depending on the precise loss, the arguments to the layer might vary

Interface

  • (layer::AbstractLossLayer)(θ; kwargs...) or
  • (layer::AbstractLossLayer)(θ, θ_true; kwargs...) or
  • (layer::AbstractLossLayer)(θ, y_true; kwargs...) or
  • (layer::AbstractLossLayer)(θ, (; θ_true, y_true); kwargs...)
source
InferOpt.AbstractOptimizationLayerType
AbstractOptimizationLayer <: AbstractLayer

Supertype for all the optimization layers defined in InferOpt.

Interface

  • (layer::AbstractOptimizationLayer)(θ; kwargs...)
  • compute_probability_distribution(layer, θ; kwargs...) (only if the layer is probabilistic)
source
InferOpt.AbstractPerturbedType
AbstractPerturbed{F,parallel} <: AbstractOptimizationLayer

Differentiable perturbation of a black box optimizer of type F.

The parameter parallel is a boolean value indicating if the perturbations are run in parallel. This is particularly useful if your black box optimizer running time is high.

Available implementations:

These three subtypes share the following fields:

  • oracle: black box (optimizer)
  • perturbation::P: perturbation distribution of the input θ
  • grad_logdensity::G: gradient of the log density perturbation w.r.t. input θ
  • nb_samples::Int: number of perturbation samples drawn at each forward pass
  • seed::Union{Nothing,Int}: seed of the perturbation. It is reset each time the forward pass is called, making it deterministic by always drawing the same perturbations. If you do not want this behaviour, set this field to nothing.
  • rng::AbstractRNG: random number generator using the seed.
Warning

The perturbation field does not mean the same thing for a PerturbedOracle than for a PerturbedAdditive/PerturbedMultiplicative. See their respective docs.

source
InferOpt.AbstractRegularizedType
AbstractRegularized <: AbstractOptimizationLayer

Convex regularization perturbation of a black box linear optimizer

ŷ(θ) = argmax_{y ∈ C} {θᵀy - Ω(y)}

Interface

  • (regularized::AbstractRegularized)(θ; kwargs...): return ŷ(θ)
  • compute_regularization(regularized, y): return Ω(y)

Available implementations

source
InferOpt.AbstractRegularizedGeneralizedMaximizerType
AbstractRegularizedGeneralizedMaximizer <: AbstractRegularized

Convex regularization perturbation of a black box generalized optimizer

ŷ(θ) = argmax_{y ∈ C} {θᵀg(y) + h(y) - Ω(y)}
with g and h functions of y.

Interface

  • (regularized::AbstractRegularized)(θ; kwargs...): return ŷ(θ)
  • compute_regularization(regularized, y): return Ω(y)
  • get_maximizer(regularized): return the associated GeneralizedMaximizer optimizer
source
InferOpt.FenchelYoungLossType
FenchelYoungLoss <: AbstractLossLayer

Fenchel-Young loss associated with a given optimization layer.

L(θ, y_true) = (Ω(y_true) - θᵀy_true) - (Ω(ŷ) - θᵀŷ)

Reference: https://arxiv.org/abs/1901.02324

Fields

  • optimization_layer::AbstractOptimizationLayer: optimization layer that can be formulated as ŷ(θ) = argmax {θᵀy - Ω(y)} (either regularized or perturbed)
source
InferOpt.FixedAtomsProbabilityDistributionType
FixedAtomsProbabilityDistribution{A,W}

Encodes a probability distribution with finite support and fixed atoms.

See compute_expectation to understand the name of this struct.

Fields

  • atoms::Vector{A}: elements of the support
  • weights::Vector{W}: probability values for each atom (must sum to 1)
source
InferOpt.ImitationLossType
ImitationLoss <: AbstractLossLayer

Generic imitation loss of the form

L(θ, t_true) = max_y {δ(y, t_true) + α θᵀ(y - y_true) - (Ω(y) - Ω(y_true))}

Note: by default, t_true is a named tuple with field y_true, but it can be any data structure for which the get_y_true method is implemented.

Fields

  • aux_loss_maximizer: function of (θ, t_true, α) that computes the argmax in the problem above
  • δ: base loss function
  • Ω: regularization function
  • α::Float64: hyperparameter with a default value of 1.0
source
InferOpt.InterpolationType
Interpolation <: AbstractOptimizationLayer

Piecewise-linear interpolation of a black-box optimizer.

Fields

  • maximizer: underlying argmax function
  • λ::Float64: smoothing parameter (smaller = more faithful approximation, larger = more informative gradients)

Reference: https://arxiv.org/abs/1912.02175

source
InferOpt.PerturbedAdditiveMethod
PerturbedAdditive(oracle[; ε, nb_samples, seed, is_parallel, perturbation, grad_logdensity, rng])

PerturbedAdditive constructor.

Arguments

  • oracle: the black-box oracle we want to differentiate through. It should be a linear maximizer if you want to use it inside a FenchelYoungLoss.

Keyword arguments (optional)

  • ε=1.0: size of the perturbation.
  • nb_samples::Int=1: number of perturbation samples drawn at each forward pass.
  • perturbation=nothing: nothing by default. If you want to use a different distribution than a Normal for the perturbation z, give it here as a distribution-like object implementing the rand method. It should also implement logdensityof if grad_logdensity is not given.
  • grad_logdensity=nothing: gradient function of perturbation w.r.t. θ. If set to nothing (default), it's computed using automatic differentiation.
  • seed::Union{Nothing,Int}=nothing: seed of the perturbation. It is reset each time the forward pass is called, making it deterministic by always drawing the same perturbations. If you do not want this behaviour, set this field to nothing.
  • rng::AbstractRNG=MersenneTwister(0): random number generator using the seed.
source
InferOpt.PerturbedMultiplicativeType
PerturbedMultiplicative{P,G,O,R,S,parallel} <: AbstractPerturbed{parallel}

Differentiable multiplicative perturbation of a black-box oracle: the input undergoes θ -> θ ⊙ exp[εZ - ε²/2] where Z ∼ perturbation.

This AbstractOptimizationLayer is compatible with FenchelYoungLoss, if the oracle is an optimization maximizer with a linear objective.

Reference: https://arxiv.org/abs/2207.13513

See AbstractPerturbed for more details.

Specific field

  • ε:Float64: size of the perturbation
source
InferOpt.PerturbedMultiplicativeMethod
PerturbedMultiplicative(oracle[; ε, nb_samples, seed, is_parallel, perturbation, grad_logdensity, rng])

PerturbedMultiplicative constructor.

Arguments

  • oracle: the black-box oracle we want to differentiate through. It should be a linear maximizer if you want to use it inside a [FenchelYoungLoss].

Keyword arguments (optional)

  • ε=1.0: size of the perturbation.
  • nb_samples::Int=1: number of perturbation samples drawn at each forward pass.
  • perturbation=nothing: nothing by default. If you want to use a different distribution than a Normal for the perturbation z, give it here as a distribution-like object implementing the rand method. It should also implement logdensityof if grad_logdensity is not given.
  • grad_logdensity=nothing: gradient function of perturbation w.r.t. θ. If set to nothing (default), it's computed using automatic differentiation.
  • seed::Union{Nothing,Int}=nothing: seed of the perturbation. It is reset each time the forward pass is called, making it deterministic by always drawing the same perturbations. If you do not want this behaviour, set this field to nothing.
  • rng::AbstractRNG=MersenneTwister(0): random number generator using the seed.
source
InferOpt.PerturbedOracleType
PerturbedOracle{P,G,O,R,S,parallel} <: AbstractPerturbed{parallel}

Differentiable perturbed black-box oracle. The oracle input θ is perturbed as η ∼ perturbation(⋅|θ). PerturbedAdditive is a special case of PerturbedOracle with perturbation(θ) = MvNormal(θ, ε * I). [PerturbedMultiplicative] is also a special case of PerturbedOracle.

See AbstractPerturbed for more details about its fields.

source
InferOpt.PerturbedOracleMethod
PerturbedOracle(perturbation, oracle[; grad_logdensity, rng, seed, is_parallel, nb_samples])

PerturbedOracle constructor.

Arguments

  • oracle: the black-box oracle we want to differentiate through
  • perturbation: should be a callable such that perturbation(θ) is a distribution-like object that can be sampled with rand. It should also implement logdensityof if grad_logdensity is not given.

Keyword arguments (optional)

  • grad_logdensity=nothing: gradient function of perturbation w.r.t. θ. If set to nothing (default), it's computed using automatic differentiation.
  • nb_samples::Int=1: number of perturbation samples drawn at each forward pass
  • seed::Union{Nothing,Int}=nothing: seed of the perturbation. It is reset each time the forward pass is called, making it deterministic by always drawing the same perturbations. If you do not want this behaviour, set this field to nothing.
  • rng::AbstractRNG=MersenneTwister(0): random number generator using the seed.
Info

If you have access to the analytical expression of grad_logdensity it is recommended to give it, as it will be computationally faster.

source
InferOpt.PushforwardType
Pushforward <: AbstractLayer

Differentiable pushforward of a probabilistic optimization layer with an arbitrary function post-processing function.

Pushforward can be used for direct regret minimization (aka learning by experience) when the post-processing returns a cost.

Fields

  • optimization_layer::AbstractOptimizationLayer: probabilistic optimization layer
  • post_processing: callable

See also: FixedAtomsProbabilityDistribution.

source
InferOpt.RegularizedFrankWolfeType
RegularizedFrankWolfe <: AbstractRegularized

Regularized optimization layer which relies on the Frank-Wolfe algorithm to define a probability distribution while solving

ŷ(θ) = argmax_{y ∈ C} {θᵀy - Ω(y)}
Warning

Since this is a conditional dependency, you need to have loaded the package DifferentiableFrankWolfe.jl before using RegularizedFrankWolfe.

Fields

  • linear_maximizer: linear maximization oracle θ -> argmax_{x ∈ C} θᵀx, implicitly defines the polytope C
  • Ω: regularization function Ω(y)
  • Ω_grad: gradient function of the regularization function ∇Ω(y)
  • frank_wolfe_kwargs: named tuple of keyword arguments passed to the Frank-Wolfe algorithm

Frank-Wolfe parameters

Some values you can tune:

  • epsilon::Float64: precision target
  • max_iteration::Integer: max number of iterations
  • timeout::Float64: max runtime in seconds
  • lazy::Bool: caching strategy
  • away_steps::Bool: avoid zig-zagging
  • line_search::FrankWolfe.LineSearchMethod: step size selection
  • verbose::Bool: console output

See the documentation of FrankWolfe.jl for details.

source
InferOpt.RegularizedFrankWolfeMethod
(regularized::RegularizedFrankWolfe)(θ; kwargs...)

Apply compute_probability_distribution(regularized, θ; kwargs...) and return the expectation.

source
InferOpt.SPOPlusLossType
SPOPlusLoss <: AbstractLossLayer

Convex surrogate of the Smart "Predict-then-Optimize" loss.

Fields

  • maximizer: linear maximizer function of the form θ -> ŷ(θ) = argmax θᵀy
  • α::Float64: convexification parameter, default = 2.0

Reference: https://arxiv.org/abs/1710.08005

source
InferOpt.SoftArgmaxType
SoftArgmax <: Regularized

Soft argmax activation function s(z) = (e^zᵢ / ∑ e^zⱼ)ᵢ.

Corresponds to regularized prediction on the probability simplex with entropic penalty.

source
InferOpt.SoftRankType
SoftRank{is_l2_regularized} <: AbstractRegularized

Fast differentiable ranking regularized layer. It uses an L2 regularization if is_l2_regularized is true, else it uses an entropic (kl) regularization.

As an AbstractRegularized layer, it can also be used for supervised learning with a FenchelYoungLoss.

Fields

  • ε::Float64: size of the regularization
  • rev::Bool: rank in ascending order if false

Reference: https://arxiv.org/abs/2002.08871

source
InferOpt.SoftRankMethod
SoftRank(; ε::Float64=1.0, rev::Bool=false, is_l2_regularized::Bool=true)

Constructor for SoftRank.

Arguments

  • ε::Float64=1.0: size of the regularization
  • rev::Bool=false: rank in ascending order if false
  • `regularization="l2": used regularization, either "l2" or "kl"
source
InferOpt.SoftSortType
SoftSort{is_l2_regularized} <: AbstractOptimizationLayer

Fast differentiable sorting optimization layer. It uses an L2 regularization if is_l2_regularized is true, else it uses an entropic (kl) regularization.

Reference https://arxiv.org/abs/2002.08871

Fields

  • ε::Float64: size of the regularization
  • rev::Bool: sort in ascending order if false
source
InferOpt.SoftSortMethod
SoftSort(; ε::Float64=1.0, rev::Bool=false, is_l2_regularized::Bool=true)

Constructor for SoftSort.

Arguments

  • ε::Float64=1.0: size of the regularization
  • rev::Bool=false: sort in ascending order if false
  • is_l2_regularized::Bool=true: use l2 regularization if true, else kl regularization
source
InferOpt.SparseArgmaxType
SparseArgmax <: AbstractRegularized

Compute the Euclidean projection of the vector z onto the probability simplex.

Corresponds to regularized prediction on the probability simplex with square norm penalty.

source

Functions

Base.randMethod
rand([rng,] probadist)

Sample from the atoms of probadist according to their weights.

source
InferOpt.apply_on_atomsMethod
apply_on_atoms(post_processing, probadist)

Create a new distribution by applying the function post_processing to each atom of probadist (the weights remain the same).

source
InferOpt.compute_expectationFunction
compute_expectation(probadist[, post_processing=identity])

Compute the expectation of post_processing(X) where X is a random variable distributed according to probadist.

This operation is made differentiable thanks to a custom reverse rule, even when post_processing itself is not a differentiable function.

Warning

Derivatives are computed with respect to probadist.weights only, assuming that probadist.atoms doesn't change (hence the name FixedAtomsProbabilityDistribution).

source
InferOpt.compute_probability_distributionMethod
compute_probability_distribution(perturbed::AbstractPerturbed, θ; kwargs...)

Turn random perturbations of θ into a distribution on polytope vertices.

Keyword arguments are passed to the underlying linear maximizer.

source
InferOpt.compute_probability_distributionMethod
compute_probability_distribution(pushforward, θ)

Output the distribution of pushforward.post_processing(X), where X follows the distribution defined by pushforward.optimization_layer applied to θ.

This function is not differentiable if pushforward.post_processing isn't.

See also: apply_on_atoms.

source
InferOpt.get_y_trueFunction
get_y_true(t_true::Any)

Retrieve y_true from t_true.

This method should be implemented when using a custom data structure for t_true other than a NamedTuple.

source
InferOpt.get_y_trueMethod
get_y_true(t_true::NamedTuple)

Retrieve y_true from t_true. t_true must contain an y_true field.

source
InferOpt.objective_valueMethod
objective_value(f, θ, y, kwargs...)

Computes the objective value of given GeneralizedMaximizer f, knowing weights θ and solution y.

source
InferOpt.perturbation_grad_logdensityFunction
perturbation_grad_logdensity(
    ::RuleConfig,
    ::AbstractPerturbed,
    θ::AbstractArray,
    sample::AbstractArray,
)

Compute de gradient w.r.t to the input θ of the logdensity of the perturbed input distribution evaluated in the observed perturbation sample η.

source
InferOpt.rankingMethod
ranking(θ[; rev])

Compute the vector r such that rᵢ is the rank of θᵢ in θ.

source
InferOpt.sample_perturbationsFunction
sample_perturbations(perturbed::AbstractPerturbed, θ::AbstractArray)

Draw nb_samples random perturbations from the perturbation distribution.

source
InferOpt.soft_rankMethod
soft_rank(θ::AbstractVector; ε=1.0, rev::Bool=false)

Fast differentiable ranking of vector θ.

Arguments

  • θ: vector to sort

Keyword (optional) arguments

  • ε::Float64=1.0: size of the regularization
  • rev::Bool=false: sort in ascending order if false
  • regularization=:l2: use l2 regularization if :l2, and kl regularization if :kl

See also soft_rank_l2 and soft_rank_kl.

source
InferOpt.soft_sortMethod
soft_sort(θ::AbstractVector; ε=1.0, rev::Bool=false, regularization=:l2)

Fast differentiable sort of vector θ.

Arguments

  • θ: vector to sort

Keyword (optional) arguments

  • ε::Float64=1.0: size of the regularization
  • rev::Bool=false: sort in ascending order if false
  • regularization=:l2: use l2 regularization if :l2, and kl regularization if :kl

See also soft_sort_l2 and soft_sort_kl.

source
InferOpt.zero_one_lossMethod
zero_one_loss(y, y_true)

0-1 loss for multiclass classification: δ(y, y_true) = 0 if y = y_true, and 1 otherwise.

source

Index