API Reference
InferOpt — Module
InferOptA toolbox for using combinatorial optimization algorithms within machine learning pipelines.
See our preprint https://arxiv.org/abs/2207.13513
Types
InferOpt.AbstractLayer — Type
AbstractLayerSupertype for all the layers defined in InferOpt.
All of these layers are callable, and differentiable with any ChainRules-compatible autodiff backend.
Interface
(layer::AbstractLayer)(args...; kwargs...)
InferOpt.AbstractLossLayer — Type
AbstractLossLayer <: AbstractLayerSupertype for all the loss layers defined in InferOpt.
Depending on the precise loss, the arguments to the layer might vary
Interface
(layer::AbstractLossLayer)(θ; kwargs...)or(layer::AbstractLossLayer)(θ, θ_true; kwargs...)or(layer::AbstractLossLayer)(θ, y_true; kwargs...)or(layer::AbstractLossLayer)(θ, (; θ_true, y_true); kwargs...)
InferOpt.AbstractOptimizationLayer — Type
AbstractOptimizationLayer <: AbstractLayerSupertype for all the optimization layers defined in InferOpt.
Interface
(layer::AbstractOptimizationLayer)(θ::AbstractArray; kwargs...)compute_probability_distribution(layer, θ; kwargs...)(only if the layer is probabilistic)
InferOpt.AbstractPerturbation — Type
abstract type AbstractPerturbation <: Distributions.Distribution{Distributions.Univariate, Distributions.Continuous}Abstract type for a perturbation. It's a function that takes a parameter θ and returns a perturbed parameter by a distribution perturbation_dist.
All subtypes should implement a perturbation_dist field, which is a ContinuousUnivariateDistribution.
Existing implementations
InferOpt.AbstractRegularized — Type
AbstractRegularized <: AbstractOptimizationLayerConvex regularization perturbation of a black box linear (in θ) optimizer
ŷ(θ) = argmax_{y ∈ C} {θᵀg(y) + h(y) - Ω(y)}with g and h functions of y.
Interface
(regularized::AbstractRegularized)(θ; kwargs...): returnŷ(θ)compute_regularization(regularized, y): return `Ω(y)get_maximizer(regularized): return the associated optimizer
Available implementations
InferOpt.AdditivePerturbation — Type
struct AdditivePerturbation{F}Additive perturbation: θ ↦ θ + εZ, where Z is a random variable following perturbation_dist.
Fields
perturbation_dist::Any: base distribution for the perturbationε::Float64: perturbation size
InferOpt.AdditivePerturbation — Method
Apply the additive perturbation to the parameter θ.
InferOpt.ExponentialOf — Type
Data structure modeling the exponential of a continuous univariate random variable.
Random.rand and Distributions.logpdf are defined for the ExponentialOf distribution.
InferOpt.FenchelYoungLoss — Type
struct FenchelYoungLoss{O<:InferOpt.AbstractOptimizationLayer} <: InferOpt.AbstractLossLayerFenchel-Young loss associated with a given optimization layer.
L(θ, y_true) = (Ω(y_true) - θᵀy_true) - (Ω(ŷ) - θᵀŷ)Reference: https://arxiv.org/abs/1901.02324
Fields
optimization_layer::AbstractOptimizationLayer: optimization layer that can be formulated asŷ(θ) = argmax {θᵀy - Ω(y)}(either regularized or perturbed)
Compatibility
This loss is compatible with:
LinearMaximizer-based layers.PerturbedOraclelayers, with additive or multiplicative perturbations (generic perturbations are not supported).- any
AbstractRegularizedlayer.
InferOpt.FenchelYoungLoss — Method
Compute L(θ, y_true).
InferOpt.Fix1Kwargs — Type
struct Fix1Kwargs{F, K, T} <: FunctionCallable struct that fixes the first argument of f to x, and the keyword arguments to kwargs....
Fields
f::Any: functionx::Any: fixed first argumentkwargs::Any: fixed keyword arguments
InferOpt.FixFirst — Type
struct FixFirst{F, T}Callable struct that fixes the first argument of f to x. Compared to Base.Fix1, works on functions with more than two arguments.
InferOpt.FixKwargs — Type
struct FixKwargs{F, K}Callable struct that fixes the keyword arguments of f to kwargs..., and only accepts positional arguments.
Fields
f::Any: functionkwargs::Any: fixed keyword arguments
InferOpt.IdentityRelaxation — Type
IdentityRelaxation <: AbstractOptimizationLayerNaive relaxation of a black-box optimizer where constraints are simply forgotten.
Consider (centering and) normalizing θ before applying it.
Fields
maximizer: underlying argmax function
Reference: https://arxiv.org/abs/2205.15213
InferOpt.ImitationLoss — Type
ImitationLoss <: AbstractLossLayerGeneric imitation loss of the form
L(θ, t_true) = max_y {δ(y, t_true) + α θᵀ(y - y_true) - (Ω(y) - Ω(y_true))}- When
δis zero, this is equivalent to aFenchelYoungLoss. - When
Ωis zero, this is equivalent to aStructuredSVMLoss.
Note: by default, t_true is a named tuple with field y_true, but it can be any data structure for which the get_y_true method is implemented.
Fields
aux_loss_maximizer: function of(θ, t_true, α)that computes the argmax in the problem aboveδ: base loss functionΩ: regularization functionα::Float64: hyperparameter with a default value of 1.0
InferOpt.ImitationLoss — Method
(il::ImitationLoss)(θ, t_true; kwargs...)InferOpt.ImitationLoss — Method
ImitationLoss(; aux_loss_maximizer, δ, Ω, α=1.0)Explicit constructor with keyword arguments.
InferOpt.Interpolation — Type
Interpolation <: AbstractOptimizationLayerPiecewise-linear interpolation of a black-box optimizer.
Fields
maximizer: underlying argmax functionλ::Float64: smoothing parameter (smaller = more faithful approximation, larger = more informative gradients)
Reference: https://arxiv.org/abs/1912.02175
InferOpt.LinearMaximizer — Type
struct LinearMaximizer{F, G, H}Wrapper for generic minear maximizers of the form argmax_y θᵀg(y) + h(y). It is compatible with the following layers
PerturbedAdditive(with or without aFenchelYoungLoss)PerturbedMultiplicative(with or without aFenchelYoungLoss)SPOPlusLoss
Fields
maximizer::Any: function θ ⟼ argmax_y θᵀg(y) + h(y)g::Any: function g(y) used in the objectiveh::Any: function h(y) used in the objective
InferOpt.LinearMaximizer — Method
Calls the wrapped maximizer.
InferOpt.LinearMaximizer — Method
LinearMaximizer(
maximizer;
g,
h
) -> LinearMaximizer{_A, typeof(InferOpt.identity_kw), ComposedFunction{typeof(zero), typeof(InferOpt.eltype_kw)}} where _A
Constructor for LinearMaximizer.
InferOpt.MultiplicativePerturbation — Type
MultiplicativePerturbation(
perturbation_dist,
ε
) -> InferOpt.MultiplicativePerturbation
MultiplicativePerturbation(
perturbation_dist,
ε,
shift
) -> InferOpt.MultiplicativePerturbation
Constructor for MultiplicativePerturbation.
InferOpt.MultiplicativePerturbation — Type
struct MultiplicativePerturbation{F}Multiplicative perturbation: θ ↦ θ ⊙ exp(εZ - shift)
Fields
perturbation_dist::Any: base distribution for the perturbationε::Float64: perturbation sizeshift::Float64: optional shift to have 0 mean, default value is ε²/2
InferOpt.MultiplicativePerturbation — Method
Apply the multiplicative perturbation to the parameter θ.
InferOpt.NormalAdditiveGradLogdensity — Type
struct NormalAdditiveGradLogdensityMethod with parameters to compute the gradient of the logdensity of η = θ + εZ w.r.t. θ., with Z ∼ N(0, 1).
Fields
ε::Float64: perturbation size
InferOpt.NormalAdditiveGradLogdensity — Method
Compute the gradient of the logdensity of η = θ + εZ w.r.t. θ., with Z ∼ N(0, 1).
InferOpt.NormalMultiplicativeGradLogdensity — Type
struct NormalMultiplicativeGradLogdensityMethod with parameters to compute the gradient of the logdensity of η = θ ⊙ exp(εZ - shift) w.r.t. θ., with Z ∼ N(0, 1).
Fields
ε::Float64: perturbation sizeshift::Float64: optional shift to have 0 mean
InferOpt.NormalMultiplicativeGradLogdensity — Method
Compute the gradient of the logdensity of η = θ ⊙ exp(εZ - shift) w.r.t. θ., with Z ∼ N(0, 1).
InferOpt.PerturbedOracle — Type
struct PerturbedOracle{D, F, t, variance_reduction, G, R, S} <: InferOpt.AbstractOptimizationLayerDifferentiable perturbation of a black box optimizer of type F, with perturbation of type D.
This struct is as wrapper around Reinforce from DifferentiableExpectations.jl.
There are three different available constructors that behave differently in the package:
InferOpt.PerturbedOracle — Method
Forward pass of the perturbed optimizer.
InferOpt.PerturbedOracle — Method
PerturbedOracle(
maximizer,
dist_constructor;
dist_logdensity_grad,
nb_samples,
variance_reduction,
threaded,
seed,
rng,
kwargs...
) -> PerturbedOracle{_A, _B, _C, _D, Nothing, Random.TaskLocalRNG, Nothing} where {_A, _B, _C, _D}
Constructor for PerturbedOracle.
InferOpt.Pushforward — Type
struct Pushforward{O<:InferOpt.AbstractOptimizationLayer, P} <: InferOpt.AbstractLayerDifferentiable pushforward of a probabilistic optimization layer with an arbitrary function post-processing function.
Pushforward can be used for direct regret minimization (aka learning by experience) when the post-processing returns a cost.
Fields
optimization_layer::InferOpt.AbstractOptimizationLayer: probabilistic optimization layerpost_processing::Any: callable
InferOpt.Pushforward — Method
Output the expectation of pushforward.post_processing(X), where X follows the distribution defined by pushforward.optimization_layer applied to θ.
This function is differentiable, even if pushforward.post_processing isn't.
InferOpt.RegularizedFrankWolfe — Type
RegularizedFrankWolfe <: AbstractRegularizedRegularized optimization layer which relies on the Frank-Wolfe algorithm to define a probability distribution while solving
ŷ(θ) = argmax_{y ∈ C} {θᵀy - Ω(y)}Since this is a conditional dependency, you need to have loaded the following packages before using RegularizedFrankWolfe:
DifferentiableFrankWolfe.jlFrankWolfe.jlImplicitDifferentiation.jl
Fields
linear_maximizer: linear maximization oracleθ -> argmax_{x ∈ C} θᵀx, implicitly defines the polytopeCΩ: regularization functionΩ(y)Ω_grad: gradient function of the regularization function∇Ω(y)frank_wolfe_kwargs: named tuple of keyword arguments passed to the Frank-Wolfe algorithmimplicit_kwargs: named tuple of keyword arguments passed to the implicit differentiation algorithm (in particular, the needed linear solver)
Frank-Wolfe parameters
Some values you can tune:
epsilon::Float64: precision targetmax_iteration::Integer: max number of iterationstimeout::Float64: max runtime in secondslazy::Bool: caching strategyaway_steps::Bool: avoid zig-zaggingline_search::FrankWolfe.LineSearchMethod: step size selectionverbose::Bool: console output
See the documentation of FrankWolfe.jl for details.
InferOpt.RegularizedFrankWolfe — Method
(regularized::RegularizedFrankWolfe)(θ; kwargs...)Apply compute_probability_distribution(regularized, θ; kwargs...) and return the expectation.
InferOpt.SPOPlusLoss — Type
struct SPOPlusLoss{F} <: InferOpt.AbstractLossLayerConvex surrogate of the Smart "Predict-then-Optimize" loss.
Fields
maximizer::Any: linear maximizer function of the formθ -> ŷ(θ) = argmax θᵀyα::Float64: convexification parameter, default = 2.0
Reference: https://arxiv.org/abs/1710.08005
InferOpt.SPOPlusLoss — Method
Forward pass of the SPO+ loss with given target θ_true and y_true. The third argument y_true is optional, as it can be computed from θ_true. However, providing it directly can save computation time.
InferOpt.SPOPlusLoss — Method
Forward pass of the SPO+ loss with given target θ_true. For better performance, you can also provide y_true directly as a third argument.
InferOpt.SPOPlusLoss — Method
SPOPlusLoss(maximizer; α) -> SPOPlusLoss
Constructor for SPOPlusLoss.
InferOpt.SoftArgmax — Type
SoftArgmax <: RegularizedSoft argmax activation function s(z) = (e^zᵢ / ∑ e^zⱼ)ᵢ.
Corresponds to regularized prediction on the probability simplex with entropic penalty.
InferOpt.SoftRank — Type
SoftRank{is_l2_regularized} <: AbstractRegularizedFast differentiable ranking regularized layer. It uses an L2 regularization if is_l2_regularized is true, else it uses an entropic (kl) regularization.
As an AbstractRegularized layer, it can also be used for supervised learning with a FenchelYoungLoss.
Fields
ε::Float64: size of the regularizationrev::Bool: rank in ascending order if false
Reference: https://arxiv.org/abs/2002.08871
InferOpt.SoftRank — Method
SoftRank(; ε::Float64=1.0, rev::Bool=false, is_l2_regularized::Bool=true)Constructor for SoftRank.
Arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: rank in ascending order if false- `regularization="l2": used regularization, either "l2" or "kl"
InferOpt.SoftSort — Type
SoftSort{is_l2_regularized} <: AbstractOptimizationLayerFast differentiable sorting optimization layer. It uses an L2 regularization if is_l2_regularized is true, else it uses an entropic (kl) regularization.
Reference https://arxiv.org/abs/2002.08871
Fields
ε::Float64: size of the regularizationrev::Bool: sort in ascending order if false
InferOpt.SoftSort — Method
SoftSort(; ε::Float64=1.0, rev::Bool=false, is_l2_regularized::Bool=true)Constructor for SoftSort.
Arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: sort in ascending order if falseis_l2_regularized::Bool=true: use l2 regularization if true, else kl regularization
InferOpt.SparseArgmax — Type
SparseArgmax <: AbstractRegularizedCompute the Euclidean projection of the vector z onto the probability simplex.
Corresponds to regularized prediction on the probability simplex with square norm penalty.
InferOpt.StructuredSVMLoss — Type
StructuredSVMLoss <: AbstractLossLayerLoss associated with the Structured Support Vector Machine, defined by
L(θ, y_true) = max_y {δ(y, y_true) + α θᵀ(y - y_true)}Reference: http://www.nowozin.net/sebastian/papers/nowozin2011structured-tutorial.pdf (Chapter 6)
Fields
aux_loss_maximizer::M: function of(θ, y_true, α)that computes the argmax in the problem aboveδ::L: base loss functionα::Float64: hyperparameter with a default value of 1.0
InferOpt.StructuredSVMLoss — Method
(ssvml::StructuredSVMLoss)(θ, y_true; kwargs...)InferOpt.StructuredSVMLoss — Method
StructuredSVMLoss(; aux_loss_maximizer, δ, α=1.0)Functions
Distributions.logpdf — Method
logpdf(d::InferOpt.ExponentialOf, x::Real) -> Any
Return the log-density of the ExponentialOf distribution at x. It is equal to $logpdf(d, log(x)) - log(x)$.
InferOpt.PerturbedAdditive — Method
PerturbedAdditive(
maximizer;
ε,
perturbation_dist,
nb_samples,
variance_reduction,
seed,
threaded,
rng,
dist_logdensity_grad
) -> Union{PerturbedOracle{InferOpt.AdditivePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, Nothing, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}, PerturbedOracle{InferOpt.AdditivePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, InferOpt.NormalAdditiveGradLogdensity, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}}
Constructor for PerturbedOracle with an additive perturbation.
InferOpt.PerturbedMultiplicative — Method
PerturbedMultiplicative(
maximizer;
ε,
perturbation_dist,
nb_samples,
variance_reduction,
seed,
threaded,
rng,
dist_logdensity_grad
) -> Union{PerturbedOracle{InferOpt.MultiplicativePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, Nothing, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}, PerturbedOracle{InferOpt.MultiplicativePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, InferOpt.NormalMultiplicativeGradLogdensity, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}}
Constructor for PerturbedOracle with a multiplicative perturbation.
InferOpt.ZeroOneImitationLoss — Function
ZeroOneStructuredSVMLoss(α)Implementation of the ImitationLoss based on a 0-1 loss for multiclass classification with no regularization.
InferOpt.ZeroOneStructuredSVMLoss — Function
ZeroOneStructuredSVMLossImplementation of the StructuredSVMLoss based on a 0-1 loss for multiclass classification.
InferOpt.apply_g — Method
apply_g(f::LinearMaximizer, y; kwargs...) -> Any
Applies the function g of the LinearMaximizer f to y.
InferOpt.compute_probability_distribution — Function
compute_probability_distribution(layer, θ; kwargs...)Apply a probabilistic optimization layer to an objective direction θ in order to generate a FixedAtomsProbabilityDistribution on the vertices of a polytope.
InferOpt.compute_regularization — Function
compute_regularization(regularized::AbstractRegularized, y)Return the convex penalty Ω(y) associated with an AbstractRegularized layer.
InferOpt.get_y_true — Function
get_y_true(t_true::Any)Retrieve y_true from t_true.
This method should be implemented when using a custom data structure for t_true other than a NamedTuple.
InferOpt.get_y_true — Method
get_y_true(t_true::NamedTuple)Retrieve y_true from t_true. t_true must contain an y_true field.
InferOpt.half_square_norm — Method
half_square_norm(x::AbstractArray) -> Any
Compute the squared Euclidean norm of x and divide it by 2.
InferOpt.isproba — Method
isproba(x::Real) -> Any
Check whether x ∈ [0,1].
InferOpt.isprobadist — Method
isprobadist(p::AbstractArray{R<:Real, 1}) -> Any
Check whether the elements of p are nonnegative and sum to 1.
InferOpt.objective_value — Method
objective_value(f::LinearMaximizer, θ, y; kwargs...) -> Any
Computes the objective value of given LinearMaximizer f, knowing weights θ and solution y. i.e. θᵀg(y) + h(y)
InferOpt.one_hot_argmax — Method
one_hot_argmax(
z::AbstractArray{R<:Real, 1};
kwargs...
) -> Any
One-hot encoding of the argmax function.
InferOpt.positive_part — Method
positive_part(x) -> Any
Compute max(x, 0).
InferOpt.ranking — Method
ranking(θ::AbstractVector; rev, kwargs...) -> Any
Compute the vector r such that rᵢ is the rank of θᵢ in θ.
InferOpt.shannon_entropy — Method
shannon_entropy(p::AbstractArray{R<:Real, 1}) -> Any
Compute the Shannon entropy of a probability distribution: H(p) = -∑ pᵢlog(pᵢ).
InferOpt.simplex_projection_and_support — Method
simplex_projection_and_support(z)Compute the Euclidean projection p of z on the probability simplex (also called sparse_argmax), and the indicators s of its support.
Reference: https://arxiv.org/abs/1602.02068.
InferOpt.soft_rank — Method
soft_rank(θ::AbstractVector; ε=1.0, rev::Bool=false)Fast differentiable ranking of vector θ.
Arguments
θ: vector to sort
Keyword (optional) arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: sort in ascending order if falseregularization=:l2: use l2 regularization if :l2, and kl regularization if :kl
See also soft_rank_l2 and soft_rank_kl.
InferOpt.soft_rank_kl — Method
soft_rank_kl(θ::AbstractVector; ε=1.0, rev::Bool=false)Rank vector θ with kl regularization.
InferOpt.soft_rank_l2 — Method
soft_rank_l2(θ::AbstractVector; ε=1.0, rev::Bool=false)Rank vector θ with l2 regularization.
InferOpt.soft_sort — Method
soft_sort(θ::AbstractVector; ε=1.0, rev::Bool=false, regularization=:l2)Fast differentiable sort of vector θ.
Arguments
θ: vector to sort
Keyword (optional) arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: sort in ascending order if falseregularization=:l2: use l2 regularization if :l2, and kl regularization if :kl
See also soft_sort_l2 and soft_sort_kl.
InferOpt.soft_sort_kl — Method
soft_sort_kl(θ::AbstractVector; ε=1.0, rev::Bool=false)Sort vector θ with kl regularization.
InferOpt.soft_sort_l2 — Method
soft_sort_l2(θ::AbstractVector; ε=1.0, rev::Bool=false)Sort vector θ with l2 regularization.
InferOpt.zero_one_loss — Method
zero_one_loss(y, y_true)0-1 loss for multiclass classification: δ(y, y_true) = 0 if y = y_true, and 1 otherwise.
InferOpt.zero_one_loss_maximizer — Method
zero_one_loss_maximizer(y, y_true; α)For δ = zero_one_loss, compute
argmax_y {δ(y, y_true) + α θᵀ(y - y_true)}Index
InferOpt.AbstractLayerInferOpt.AbstractLossLayerInferOpt.AbstractOptimizationLayerInferOpt.AbstractPerturbationInferOpt.AbstractRegularizedInferOpt.AdditivePerturbationInferOpt.AdditivePerturbationInferOpt.ExponentialOfInferOpt.FenchelYoungLossInferOpt.FenchelYoungLossInferOpt.Fix1KwargsInferOpt.FixFirstInferOpt.FixKwargsInferOpt.IdentityRelaxationInferOpt.ImitationLossInferOpt.ImitationLossInferOpt.ImitationLossInferOpt.InterpolationInferOpt.LinearMaximizerInferOpt.LinearMaximizerInferOpt.LinearMaximizerInferOpt.MultiplicativePerturbationInferOpt.MultiplicativePerturbationInferOpt.MultiplicativePerturbationInferOpt.NormalAdditiveGradLogdensityInferOpt.NormalAdditiveGradLogdensityInferOpt.NormalMultiplicativeGradLogdensityInferOpt.NormalMultiplicativeGradLogdensityInferOpt.PerturbedOracleInferOpt.PerturbedOracleInferOpt.PerturbedOracleInferOpt.PushforwardInferOpt.PushforwardInferOpt.RegularizedFrankWolfeInferOpt.RegularizedFrankWolfeInferOpt.SPOPlusLossInferOpt.SPOPlusLossInferOpt.SPOPlusLossInferOpt.SPOPlusLossInferOpt.SoftArgmaxInferOpt.SoftRankInferOpt.SoftRankInferOpt.SoftSortInferOpt.SoftSortInferOpt.SparseArgmaxInferOpt.StructuredSVMLossInferOpt.StructuredSVMLossInferOpt.StructuredSVMLossInferOpt.PerturbedAdditiveInferOpt.PerturbedMultiplicativeInferOpt.ZeroOneImitationLossInferOpt.ZeroOneStructuredSVMLossInferOpt.apply_gInferOpt.compute_probability_distributionInferOpt.compute_regularizationInferOpt.get_y_trueInferOpt.get_y_trueInferOpt.half_square_normInferOpt.isprobaInferOpt.isprobadistInferOpt.objective_valueInferOpt.one_hot_argmaxInferOpt.positive_partInferOpt.rankingInferOpt.shannon_entropyInferOpt.simplex_projection_and_supportInferOpt.soft_rankInferOpt.soft_rank_klInferOpt.soft_rank_l2InferOpt.soft_sortInferOpt.soft_sort_klInferOpt.soft_sort_l2InferOpt.zero_one_lossInferOpt.zero_one_loss_maximizer