API Reference
InferOpt — ModuleInferOptA toolbox for using combinatorial optimization algorithms within machine learning pipelines.
See our preprint https://arxiv.org/abs/2207.13513
Types
InferOpt.AbstractLayer — TypeAbstractLayerSupertype for all the layers defined in InferOpt.
All of these layers are callable, and differentiable with any ChainRules-compatible autodiff backend.
Interface
(layer::AbstractLayer)(args...; kwargs...)
InferOpt.AbstractLossLayer — TypeAbstractLossLayer <: AbstractLayerSupertype for all the loss layers defined in InferOpt.
Depending on the precise loss, the arguments to the layer might vary
Interface
(layer::AbstractLossLayer)(θ; kwargs...)or(layer::AbstractLossLayer)(θ, θ_true; kwargs...)or(layer::AbstractLossLayer)(θ, y_true; kwargs...)or(layer::AbstractLossLayer)(θ, (; θ_true, y_true); kwargs...)
InferOpt.AbstractOptimizationLayer — TypeAbstractOptimizationLayer <: AbstractLayerSupertype for all the optimization layers defined in InferOpt.
Interface
(layer::AbstractOptimizationLayer)(θ::AbstractArray; kwargs...)compute_probability_distribution(layer, θ; kwargs...)(only if the layer is probabilistic)
InferOpt.AbstractPerturbation — Typeabstract type AbstractPerturbation <: Distributions.Distribution{Distributions.Univariate, Distributions.Continuous}Abstract type for a perturbation. It's a function that takes a parameter θ and returns a perturbed parameter by a distribution perturbation_dist.
All subtypes should implement a perturbation_dist field, which is a ContinuousUnivariateDistribution.
Existing implementations
InferOpt.AbstractRegularized — TypeAbstractRegularized <: AbstractOptimizationLayerConvex regularization perturbation of a black box linear (in θ) optimizer
ŷ(θ) = argmax_{y ∈ C} {θᵀg(y) + h(y) - Ω(y)}with g and h functions of y.
Interface
(regularized::AbstractRegularized)(θ; kwargs...): returnŷ(θ)compute_regularization(regularized, y): return `Ω(y)get_maximizer(regularized): return the associated optimizer
Available implementations
InferOpt.AdditivePerturbation — Typestruct AdditivePerturbation{F}Additive perturbation: θ ↦ θ + εZ, where Z is a random variable following perturbation_dist.
Fields
perturbation_dist::Any: base distribution for the perturbationε::Float64: perturbation size
InferOpt.AdditivePerturbation — MethodApply the additive perturbation to the parameter θ.
InferOpt.ExponentialOf — TypeData structure modeling the exponential of a continuous univariate random variable.
Random.rand and Distributions.logpdf are defined for the ExponentialOf distribution.
InferOpt.FenchelYoungLoss — Typestruct FenchelYoungLoss{O<:InferOpt.AbstractOptimizationLayer} <: InferOpt.AbstractLossLayerFenchel-Young loss associated with a given optimization layer.
L(θ, y_true) = (Ω(y_true) - θᵀy_true) - (Ω(ŷ) - θᵀŷ)Reference: https://arxiv.org/abs/1901.02324
Fields
optimization_layer::AbstractOptimizationLayer: optimization layer that can be formulated asŷ(θ) = argmax {θᵀy - Ω(y)}(either regularized or perturbed)
Compatibility
This loss is compatible with:
LinearMaximizer-based layers.PerturbedOraclelayers, with additive or multiplicative perturbations (generic perturbations are not supported).- any
AbstractRegularizedlayer.
InferOpt.FenchelYoungLoss — MethodCompute L(θ, y_true).
InferOpt.Fix1Kwargs — Typestruct Fix1Kwargs{F, K, T} <: FunctionCallable struct that fixes the first argument of f to x, and the keyword arguments to kwargs....
Fields
f::Any: functionx::Any: fixed first argumentkwargs::Any: fixed keyword arguments
InferOpt.FixFirst — Typestruct FixFirst{F, T}Callable struct that fixes the first argument of f to x. Compared to Base.Fix1, works on functions with more than two arguments.
InferOpt.FixKwargs — Typestruct FixKwargs{F, K}Callable struct that fixes the keyword arguments of f to kwargs..., and only accepts positional arguments.
Fields
f::Any: functionkwargs::Any: fixed keyword arguments
InferOpt.IdentityRelaxation — TypeIdentityRelaxation <: AbstractOptimizationLayerNaive relaxation of a black-box optimizer where constraints are simply forgotten.
Consider (centering and) normalizing θ before applying it.
Fields
maximizer: underlying argmax function
Reference: https://arxiv.org/abs/2205.15213
InferOpt.ImitationLoss — TypeImitationLoss <: AbstractLossLayerGeneric imitation loss of the form
L(θ, t_true) = max_y {δ(y, t_true) + α θᵀ(y - y_true) - (Ω(y) - Ω(y_true))}- When
δis zero, this is equivalent to aFenchelYoungLoss. - When
Ωis zero, this is equivalent to aStructuredSVMLoss.
Note: by default, t_true is a named tuple with field y_true, but it can be any data structure for which the get_y_true method is implemented.
Fields
aux_loss_maximizer: function of(θ, t_true, α)that computes the argmax in the problem aboveδ: base loss functionΩ: regularization functionα::Float64: hyperparameter with a default value of 1.0
InferOpt.ImitationLoss — Method(il::ImitationLoss)(θ, t_true; kwargs...)InferOpt.ImitationLoss — MethodImitationLoss(; aux_loss_maximizer, δ, Ω, α=1.0)Explicit constructor with keyword arguments.
InferOpt.Interpolation — TypeInterpolation <: AbstractOptimizationLayerPiecewise-linear interpolation of a black-box optimizer.
Fields
maximizer: underlying argmax functionλ::Float64: smoothing parameter (smaller = more faithful approximation, larger = more informative gradients)
Reference: https://arxiv.org/abs/1912.02175
InferOpt.LinearMaximizer — Typestruct LinearMaximizer{F, G, H}Wrapper for generic minear maximizers of the form argmax_y θᵀg(y) + h(y). It is compatible with the following layers
PerturbedAdditive(with or without aFenchelYoungLoss)PerturbedMultiplicative(with or without aFenchelYoungLoss)SPOPlusLoss
Fields
maximizer::Any: function θ ⟼ argmax_y θᵀg(y) + h(y)g::Any: function g(y) used in the objectiveh::Any: function h(y) used in the objective
InferOpt.LinearMaximizer — MethodCalls the wrapped maximizer.
InferOpt.LinearMaximizer — MethodLinearMaximizer(
maximizer;
g,
h
) -> LinearMaximizer{_A, typeof(InferOpt.identity_kw), ComposedFunction{typeof(zero), typeof(InferOpt.eltype_kw)}} where _A
Constructor for LinearMaximizer.
InferOpt.MultiplicativePerturbation — TypeMultiplicativePerturbation(
perturbation_dist,
ε
) -> InferOpt.MultiplicativePerturbation
MultiplicativePerturbation(
perturbation_dist,
ε,
shift
) -> InferOpt.MultiplicativePerturbation
Constructor for MultiplicativePerturbation.
InferOpt.MultiplicativePerturbation — Typestruct MultiplicativePerturbation{F}Multiplicative perturbation: θ ↦ θ ⊙ exp(εZ - shift)
Fields
perturbation_dist::Any: base distribution for the perturbationε::Float64: perturbation sizeshift::Float64: optional shift to have 0 mean, default value is ε²/2
InferOpt.MultiplicativePerturbation — MethodApply the multiplicative perturbation to the parameter θ.
InferOpt.NormalAdditiveGradLogdensity — Typestruct NormalAdditiveGradLogdensityMethod with parameters to compute the gradient of the logdensity of η = θ + εZ w.r.t. θ., with Z ∼ N(0, 1).
Fields
ε::Float64: perturbation size
InferOpt.NormalAdditiveGradLogdensity — MethodCompute the gradient of the logdensity of η = θ + εZ w.r.t. θ., with Z ∼ N(0, 1).
InferOpt.NormalMultiplicativeGradLogdensity — Typestruct NormalMultiplicativeGradLogdensityMethod with parameters to compute the gradient of the logdensity of η = θ ⊙ exp(εZ - shift) w.r.t. θ., with Z ∼ N(0, 1).
Fields
ε::Float64: perturbation sizeshift::Float64: optional shift to have 0 mean
InferOpt.NormalMultiplicativeGradLogdensity — MethodCompute the gradient of the logdensity of η = θ ⊙ exp(εZ - shift) w.r.t. θ., with Z ∼ N(0, 1).
InferOpt.PerturbedOracle — Typestruct PerturbedOracle{D, F, t, variance_reduction, G, R, S} <: InferOpt.AbstractOptimizationLayerDifferentiable perturbation of a black box optimizer of type F, with perturbation of type D.
This struct is as wrapper around Reinforce from DifferentiableExpectations.jl.
There are three different available constructors that behave differently in the package:
InferOpt.PerturbedOracle — MethodForward pass of the perturbed optimizer.
InferOpt.PerturbedOracle — MethodPerturbedOracle(
maximizer,
dist_constructor;
dist_logdensity_grad,
nb_samples,
variance_reduction,
threaded,
seed,
rng,
kwargs...
) -> PerturbedOracle{_A, _B, _C, _D, Nothing, Random.TaskLocalRNG, Nothing} where {_A, _B, _C, _D}
Constructor for PerturbedOracle.
InferOpt.Pushforward — Typestruct Pushforward{O<:InferOpt.AbstractOptimizationLayer, P} <: InferOpt.AbstractLayerDifferentiable pushforward of a probabilistic optimization layer with an arbitrary function post-processing function.
Pushforward can be used for direct regret minimization (aka learning by experience) when the post-processing returns a cost.
Fields
optimization_layer::InferOpt.AbstractOptimizationLayer: probabilistic optimization layerpost_processing::Any: callable
InferOpt.Pushforward — MethodOutput the expectation of pushforward.post_processing(X), where X follows the distribution defined by pushforward.optimization_layer applied to θ.
This function is differentiable, even if pushforward.post_processing isn't.
InferOpt.RegularizedFrankWolfe — TypeRegularizedFrankWolfe <: AbstractRegularizedRegularized optimization layer which relies on the Frank-Wolfe algorithm to define a probability distribution while solving
ŷ(θ) = argmax_{y ∈ C} {θᵀy - Ω(y)}Since this is a conditional dependency, you need to have loaded the following packages before using RegularizedFrankWolfe:
DifferentiableFrankWolfe.jlFrankWolfe.jlImplicitDifferentiation.jl
Fields
linear_maximizer: linear maximization oracleθ -> argmax_{x ∈ C} θᵀx, implicitly defines the polytopeCΩ: regularization functionΩ(y)Ω_grad: gradient function of the regularization function∇Ω(y)frank_wolfe_kwargs: named tuple of keyword arguments passed to the Frank-Wolfe algorithmimplicit_kwargs: named tuple of keyword arguments passed to the implicit differentiation algorithm (in particular, the needed linear solver)
Frank-Wolfe parameters
Some values you can tune:
epsilon::Float64: precision targetmax_iteration::Integer: max number of iterationstimeout::Float64: max runtime in secondslazy::Bool: caching strategyaway_steps::Bool: avoid zig-zaggingline_search::FrankWolfe.LineSearchMethod: step size selectionverbose::Bool: console output
See the documentation of FrankWolfe.jl for details.
InferOpt.RegularizedFrankWolfe — Method(regularized::RegularizedFrankWolfe)(θ; kwargs...)Apply compute_probability_distribution(regularized, θ; kwargs...) and return the expectation.
InferOpt.SPOPlusLoss — Typestruct SPOPlusLoss{F} <: InferOpt.AbstractLossLayerConvex surrogate of the Smart "Predict-then-Optimize" loss.
Fields
maximizer::Any: linear maximizer function of the formθ -> ŷ(θ) = argmax θᵀyα::Float64: convexification parameter, default = 2.0
Reference: https://arxiv.org/abs/1710.08005
InferOpt.SPOPlusLoss — MethodForward pass of the SPO+ loss with given target θ_true and y_true. The third argument y_true is optional, as it can be computed from θ_true. However, providing it directly can save computation time.
InferOpt.SPOPlusLoss — MethodForward pass of the SPO+ loss with given target θ_true. For better performance, you can also provide y_true directly as a third argument.
InferOpt.SPOPlusLoss — MethodSPOPlusLoss(maximizer; α) -> SPOPlusLoss
Constructor for SPOPlusLoss.
InferOpt.SoftArgmax — TypeSoftArgmax <: RegularizedSoft argmax activation function s(z) = (e^zᵢ / ∑ e^zⱼ)ᵢ.
Corresponds to regularized prediction on the probability simplex with entropic penalty.
InferOpt.SoftRank — TypeSoftRank{is_l2_regularized} <: AbstractRegularizedFast differentiable ranking regularized layer. It uses an L2 regularization if is_l2_regularized is true, else it uses an entropic (kl) regularization.
As an AbstractRegularized layer, it can also be used for supervised learning with a FenchelYoungLoss.
Fields
ε::Float64: size of the regularizationrev::Bool: rank in ascending order if false
Reference: https://arxiv.org/abs/2002.08871
InferOpt.SoftRank — MethodSoftRank(; ε::Float64=1.0, rev::Bool=false, is_l2_regularized::Bool=true)Constructor for SoftRank.
Arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: rank in ascending order if false- `regularization="l2": used regularization, either "l2" or "kl"
InferOpt.SoftSort — TypeSoftSort{is_l2_regularized} <: AbstractOptimizationLayerFast differentiable sorting optimization layer. It uses an L2 regularization if is_l2_regularized is true, else it uses an entropic (kl) regularization.
Reference https://arxiv.org/abs/2002.08871
Fields
ε::Float64: size of the regularizationrev::Bool: sort in ascending order if false
InferOpt.SoftSort — MethodSoftSort(; ε::Float64=1.0, rev::Bool=false, is_l2_regularized::Bool=true)Constructor for SoftSort.
Arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: sort in ascending order if falseis_l2_regularized::Bool=true: use l2 regularization if true, else kl regularization
InferOpt.SparseArgmax — TypeSparseArgmax <: AbstractRegularizedCompute the Euclidean projection of the vector z onto the probability simplex.
Corresponds to regularized prediction on the probability simplex with square norm penalty.
InferOpt.StructuredSVMLoss — TypeStructuredSVMLoss <: AbstractLossLayerLoss associated with the Structured Support Vector Machine, defined by
L(θ, y_true) = max_y {δ(y, y_true) + α θᵀ(y - y_true)}Reference: http://www.nowozin.net/sebastian/papers/nowozin2011structured-tutorial.pdf (Chapter 6)
Fields
aux_loss_maximizer::M: function of(θ, y_true, α)that computes the argmax in the problem aboveδ::L: base loss functionα::Float64: hyperparameter with a default value of 1.0
InferOpt.StructuredSVMLoss — Method(ssvml::StructuredSVMLoss)(θ, y_true; kwargs...)InferOpt.StructuredSVMLoss — MethodStructuredSVMLoss(; aux_loss_maximizer, δ, α=1.0)Functions
Base.rand — Methodrand(
rng::Random.AbstractRNG,
perturbation::InferOpt.AbstractPerturbation
) -> Any
Base.rand — Methodrand(
rng::Random.AbstractRNG,
d::InferOpt.ExponentialOf
) -> Any
Distributions.logpdf — Methodlogpdf(d::InferOpt.ExponentialOf, x::Real) -> Any
Return the log-density of the ExponentialOf distribution at x. It is equal to $logpdf(d, log(x)) - log(x)$.
InferOpt.PerturbedAdditive — MethodPerturbedAdditive(
maximizer;
ε,
perturbation_dist,
nb_samples,
variance_reduction,
seed,
threaded,
rng,
dist_logdensity_grad
) -> Union{PerturbedOracle{InferOpt.AdditivePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, Nothing, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}, PerturbedOracle{InferOpt.AdditivePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, InferOpt.NormalAdditiveGradLogdensity, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}}
Constructor for PerturbedOracle with an additive perturbation.
InferOpt.PerturbedMultiplicative — MethodPerturbedMultiplicative(
maximizer;
ε,
perturbation_dist,
nb_samples,
variance_reduction,
seed,
threaded,
rng,
dist_logdensity_grad
) -> Union{PerturbedOracle{InferOpt.MultiplicativePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, Nothing, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}, PerturbedOracle{InferOpt.MultiplicativePerturbation{Distributions.Normal{Float64}}, _A, _B, _C, InferOpt.NormalMultiplicativeGradLogdensity, Random.TaskLocalRNG, Nothing} where {_A, _B, _C}}
Constructor for PerturbedOracle with a multiplicative perturbation.
InferOpt.ZeroOneImitationLoss — FunctionZeroOneStructuredSVMLoss(α)Implementation of the ImitationLoss based on a 0-1 loss for multiclass classification with no regularization.
InferOpt.ZeroOneStructuredSVMLoss — FunctionZeroOneStructuredSVMLossImplementation of the StructuredSVMLoss based on a 0-1 loss for multiclass classification.
InferOpt.apply_g — Methodapply_g(f::LinearMaximizer, y; kwargs...) -> Any
Applies the function g of the LinearMaximizer f to y.
InferOpt.compute_probability_distribution — Functioncompute_probability_distribution(layer, θ; kwargs...)Apply a probabilistic optimization layer to an objective direction θ in order to generate a FixedAtomsProbabilityDistribution on the vertices of a polytope.
InferOpt.compute_regularization — Functioncompute_regularization(regularized::AbstractRegularized, y)Return the convex penalty Ω(y) associated with an AbstractRegularized layer.
InferOpt.get_y_true — Functionget_y_true(t_true::Any)Retrieve y_true from t_true.
This method should be implemented when using a custom data structure for t_true other than a NamedTuple.
InferOpt.get_y_true — Methodget_y_true(t_true::NamedTuple)Retrieve y_true from t_true. t_true must contain an y_true field.
InferOpt.half_square_norm — Methodhalf_square_norm(x::AbstractArray) -> Any
Compute the squared Euclidean norm of x and divide it by 2.
InferOpt.isproba — Methodisproba(x::Real) -> Any
Check whether x ∈ [0,1].
InferOpt.isprobadist — Methodisprobadist(p::AbstractArray{R<:Real, 1}) -> Any
Check whether the elements of p are nonnegative and sum to 1.
InferOpt.objective_value — Methodobjective_value(f::LinearMaximizer, θ, y; kwargs...) -> Any
Computes the objective value of given LinearMaximizer f, knowing weights θ and solution y. i.e. θᵀg(y) + h(y)
InferOpt.one_hot_argmax — Methodone_hot_argmax(
z::AbstractArray{R<:Real, 1};
kwargs...
) -> Any
One-hot encoding of the argmax function.
InferOpt.positive_part — Methodpositive_part(x) -> Any
Compute max(x, 0).
InferOpt.ranking — Methodranking(θ::AbstractVector; rev, kwargs...) -> Any
Compute the vector r such that rᵢ is the rank of θᵢ in θ.
InferOpt.shannon_entropy — Methodshannon_entropy(p::AbstractArray{R<:Real, 1}) -> Any
Compute the Shannon entropy of a probability distribution: H(p) = -∑ pᵢlog(pᵢ).
InferOpt.simplex_projection_and_support — Methodsimplex_projection_and_support(z)Compute the Euclidean projection p of z on the probability simplex (also called sparse_argmax), and the indicators s of its support.
Reference: https://arxiv.org/abs/1602.02068.
InferOpt.soft_rank — Methodsoft_rank(θ::AbstractVector; ε=1.0, rev::Bool=false)Fast differentiable ranking of vector θ.
Arguments
θ: vector to sort
Keyword (optional) arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: sort in ascending order if falseregularization=:l2: use l2 regularization if :l2, and kl regularization if :kl
See also soft_rank_l2 and soft_rank_kl.
InferOpt.soft_rank_kl — Methodsoft_rank_kl(θ::AbstractVector; ε=1.0, rev::Bool=false)Rank vector θ with kl regularization.
InferOpt.soft_rank_l2 — Methodsoft_rank_l2(θ::AbstractVector; ε=1.0, rev::Bool=false)Rank vector θ with l2 regularization.
InferOpt.soft_sort — Methodsoft_sort(θ::AbstractVector; ε=1.0, rev::Bool=false, regularization=:l2)Fast differentiable sort of vector θ.
Arguments
θ: vector to sort
Keyword (optional) arguments
ε::Float64=1.0: size of the regularizationrev::Bool=false: sort in ascending order if falseregularization=:l2: use l2 regularization if :l2, and kl regularization if :kl
See also soft_sort_l2 and soft_sort_kl.
InferOpt.soft_sort_kl — Methodsoft_sort_kl(θ::AbstractVector; ε=1.0, rev::Bool=false)Sort vector θ with kl regularization.
InferOpt.soft_sort_l2 — Methodsoft_sort_l2(θ::AbstractVector; ε=1.0, rev::Bool=false)Sort vector θ with l2 regularization.
InferOpt.zero_one_loss — Methodzero_one_loss(y, y_true)0-1 loss for multiclass classification: δ(y, y_true) = 0 if y = y_true, and 1 otherwise.
InferOpt.zero_one_loss_maximizer — Methodzero_one_loss_maximizer(y, y_true; α)For δ = zero_one_loss, compute
argmax_y {δ(y, y_true) + α θᵀ(y - y_true)}Index
InferOpt.AbstractLayerInferOpt.AbstractLossLayerInferOpt.AbstractOptimizationLayerInferOpt.AbstractPerturbationInferOpt.AbstractRegularizedInferOpt.AdditivePerturbationInferOpt.AdditivePerturbationInferOpt.ExponentialOfInferOpt.FenchelYoungLossInferOpt.FenchelYoungLossInferOpt.Fix1KwargsInferOpt.FixFirstInferOpt.FixKwargsInferOpt.IdentityRelaxationInferOpt.ImitationLossInferOpt.ImitationLossInferOpt.ImitationLossInferOpt.InterpolationInferOpt.LinearMaximizerInferOpt.LinearMaximizerInferOpt.LinearMaximizerInferOpt.MultiplicativePerturbationInferOpt.MultiplicativePerturbationInferOpt.MultiplicativePerturbationInferOpt.NormalAdditiveGradLogdensityInferOpt.NormalAdditiveGradLogdensityInferOpt.NormalMultiplicativeGradLogdensityInferOpt.NormalMultiplicativeGradLogdensityInferOpt.PerturbedOracleInferOpt.PerturbedOracleInferOpt.PerturbedOracleInferOpt.PushforwardInferOpt.PushforwardInferOpt.RegularizedFrankWolfeInferOpt.RegularizedFrankWolfeInferOpt.SPOPlusLossInferOpt.SPOPlusLossInferOpt.SPOPlusLossInferOpt.SPOPlusLossInferOpt.SoftArgmaxInferOpt.SoftRankInferOpt.SoftRankInferOpt.SoftSortInferOpt.SoftSortInferOpt.SparseArgmaxInferOpt.StructuredSVMLossInferOpt.StructuredSVMLossInferOpt.StructuredSVMLossInferOpt.PerturbedAdditiveInferOpt.PerturbedMultiplicativeInferOpt.ZeroOneImitationLossInferOpt.ZeroOneStructuredSVMLossInferOpt.apply_gInferOpt.compute_probability_distributionInferOpt.compute_regularizationInferOpt.get_y_trueInferOpt.get_y_trueInferOpt.half_square_normInferOpt.isprobaInferOpt.isprobadistInferOpt.objective_valueInferOpt.one_hot_argmaxInferOpt.positive_partInferOpt.rankingInferOpt.shannon_entropyInferOpt.simplex_projection_and_supportInferOpt.soft_rankInferOpt.soft_rank_klInferOpt.soft_rank_l2InferOpt.soft_sortInferOpt.soft_sort_klInferOpt.soft_sort_l2InferOpt.zero_one_lossInferOpt.zero_one_loss_maximizer