Optimization Tutorial

This tutorial describes the optimization strategies and parameters from the Hyperactive and Gradient-Free-Optimizers python-packages. All optimizer- and parameter-names correspond to version 0.X.X of Gradient-Free-Optimizers.

Overview • Optimizers • Parameters

Overview

The following table shows the expected results for each optimization strategy for the given type of problems:

- Convex function with fast evaluation time (<0.1s)
- Non-convex function with fast evaluation time (<0.1s)
- Machine learning model hyperparameter optimization
- Deep learning model hyperparameter optimization

Those recomendations are just estimated based on personal experience and can heavily change dependend on optimization parameters, exact type of problem and number of iterations.

Optimizer Classes and default Parameters

HillClimbingOptimizer

Hill climbing is a very basic optimization technique, that explores the search space only localy. It starts at an initial point, which is often chosen randomly and continues to move to positions with a better solution. It has no method against getting stuck in local optima.

Available parameters:

epsilon=0.05
distribution="normal"
n_neighbours=3
rand_rest_p=0.03

Use case/properties:

Never as a first method of optimization
When you have a very good initial point to start from
If the search space is very simple and has few local optima or saddle points

RepulsingHillClimbingOptimizer

Available parameters:

epsilon=0.05
distribution="normal"
n_neighbours=3
rand_rest_p=0.03
repulsion_factor=3

Use case/properties:

When you have a good initial point to start from

SimulatedAnnealingOptimizer

Simulated annealing chooses its next possible position similar to hill climbing, but it accepts worse results with a probability that decreases with time:

It simulates a temperature that decreases with each iteration, similar to a material cooling down. The following normalization is used to calculate the probability independent of the metric:

Available parameters:

epsilon=0.05
distribution="normal"
n_neighbours=3
rand_rest_p=0.03
annealing_rate=0.975
start_temp=1

Use case/properties:

When you have a good initial point to start from, but expect the surrounding search space to be very complex
Good as a second method of optimization

RandomSearchOptimizer

The random search explores by choosing a new position at random after each iteration. Some random search implementations choose a new position within a large hypersphere around the current position. The implementation in hyperactive is purely random across the search space in each step.

Use case/properties:

Very good as a first method of optimization or to start exploring the search space
For a short optimization run to get an acceptable solution

RandomRestartHillClimbingOptimizer

Random restart hill climbing works by starting a hill climbing search and jumping to a random new position after a number of iterations.

Available parameters:

epsilon=0.05
distribution="normal"
n_neighbours=3
rand_rest_p=0.03
n_iter_restart=10

Use case/properties:

Good as a first method of optimization
For a short optimization run to get an acceptable solution

RandomAnnealingOptimizer

An algorithm that chooses a new position within a large hypersphere around the current position. This hypersphere gets smaller over time.

Available parameters:

epsilon=0.05
distribution="normal"
n_neighbours=3
rand_rest_p=0.03
annealing_rate=0.975
start_temp=1

Use case/properties:

Disclaimer: I have not seen this algorithm before, but invented it myself. It seems to be a good alternative to the other random algorithms
Good as a first method of optimization
For a short optimization run to get an acceptable solution

ParallelTemperingOptimizer

Parallel Tempering initializes multiple simulated annealing searches with different temperatures and chooses to swap those temperatures with the following probability:

Available parameters:

population=10
n_iter_swap=10
rand_rest_p=0.03

Use case/properties:

Not as dependend of a good initial position as simulated annealing
If you have enough time for many model evaluations

ParticleSwarmOptimizer

Particle swarm optimization works by initializing a number of positions at the same time and moving all of those closer to the best one after each iteration.

Available parameters:

population=10
inertia=0.5
cognitive_weight=0.5
social_weight=0.5
rand_rest_p=0.03

Use case/properties:

If the search space is complex and large
If you have enough time for many model evaluations

EvolutionStrategyOptimizer

Evolution strategy mutates and combines the best individuals of a population across a number of generations without transforming them into an array of bits (like genetic algorithms) but uses the real values of the positions.

Available parameters:

population=10
mutation_rate=0.7
crossover_rate=0.3
rand_rest_p=0.03

Use case/properties:

If the search space is very complex and large
If you have enough time for many model evaluations

BayesianOptimizer

Bayesian optimization chooses new positions by calculating the expected improvement of every position in the search space based on a gaussian process that trains on already evaluated positions.

Available parameters:

gpr=gaussian_process["gp_nonlinear"]
xi=0.03
warm_start_smbo=None
rand_rest_p=0.03

Use case/properties:

If model evaluations take a long time
If you do not want to do many iterations
If your search space is not to big

TreeStructuredParzenEstimators

Tree of Parzen Estimators also chooses new positions by calculating the expected improvement. It does so by calculating the ratio of probability being among the best positions and the worst positions. Those probabilities are determined with a kernel density estimator, that is trained on alrady evaluated positions.

Available parameters:

gamma_tpe=0.5
warm_start_smbo=None
rand_rest_p=0.03

Use case/properties:

If model evaluations take a long time
If you do not want to do many iterations
If your search space is not to big

DecisionTreeOptimizer

Available parameters:

tree_regressor="extra_tree"
xi=0.01
warm_start_smbo=None
rand_rest_p=0.03

Optimizer Parameters

epsilon

When climbing to new positions epsilon determines how far the hill climbing based algorithm jumps from one position to the next points. Higher epsilon leads to longer jumps.

available values: float

Used by:

HillClimbingOptimizer
RepulsingHillClimbingOptimizer
SimulatedAnnealingOptimizer
RandomRestartHillClimbingOptimizer
RandomAnnealingOptimizer
ParallelTemperingOptimizer
ParticleSwarmOptimizer
EvolutionStrategyOptimizer

distribution

The mathematical distribution the algorithm draws samples from.

available values: str; "normal", "laplace", "logistic", "gumbel"

Used by:

HillClimbingOptimizer
RepulsingHillClimbingOptimizer
SimulatedAnnealingOptimizer
RandomRestartHillClimbingOptimizer
RandomAnnealingOptimizer
ParallelTemperingOptimizer
ParticleSwarmOptimizer
EvolutionStrategyOptimizer

n_neighbours

The number of positions the algorithm explores from its current postion before jumping to the best one.

available values: int

Used by:

HillClimbingOptimizer
RepulsingHillClimbingOptimizer
SimulatedAnnealingOptimizer
RandomRestartHillClimbingOptimizer
RandomAnnealingOptimizer
ParallelTemperingOptimizer
ParticleSwarmOptimizer
EvolutionStrategyOptimizer

rand_rest_p

Probability for the optimization algorithm to jump to a random position in an iteration step.

available values: float; [0.0, ... ,0.5, ... ,1]

Used by:

HillClimbingOptimizer
RepulsingHillClimbingOptimizer
SimulatedAnnealingOptimizer
RandomRestartHillClimbingOptimizer
RandomAnnealingOptimizer
ParallelTemperingOptimizer
ParticleSwarmOptimizer
EvolutionStrategyOptimizer
BayesianOptimizer
TreeStructuredParzenEstimators
DecisionTreeOptimizer

repulsion_factor

If the algorithm does not find a better position the repulsion factor increases epsilon for the next jump.

available values: float

Used by:

RepulsingHillClimbingOptimizer

annealing_rate

Rate at which the temperatur-value of the algorithm decreases. An annealing rate above 1 increases the temperature over time.

available values: float; [0.0, ... ,0.5, ... ,1]

Used by:

SimulatedAnnealingOptimizer
RandomAnnealingOptimizer
ParallelTemperingOptimizer

start_temp

The start temperatur determines the probability for the algorithm to jump to a worse position.

available values: float

Used by:

SimulatedAnnealingOptimizer
RandomAnnealingOptimizer
ParallelTemperingOptimizer

n_iter_restart

The number of iterations the algorithm performs before jumping to a random position.

available values: int

Used by:

RandomRestartHillClimbingOptimizer

n_iter_swap

The number of iterations the algorithm performs before switching temperatures of the individual optimizers in the population.

available values: int

Used by:

ParallelTemperingOptimizer

population

Size of the population for population-based optimization algorithms.

available values: float

Used by:

ParallelTemperingOptimizer
ParticleSwarmOptimizer
EvolutionStrategyOptimizer

inertia

The inertia of the movement of the individual optimizers in the population.

available values: float

Used by:

ParticleSwarmOptimizer

cognitive_weight

A factor of the movement towards the personal best position of the individual optimizers in the population.

available values: float

Used by:

ParticleSwarmOptimizer

social_weight

A factor of the movement towards the global best position of the individual optimizers in the population.

available values: float

Used by:

ParticleSwarmOptimizer

mutation_rate

Probability of an individual in the population to perform an hill climbing step.

available values: float

Used by:

EvolutionStrategyOptimizer

crossover_rate

Probability of an individual to perform a crossover with the best individual in the population.

available values: float

Used by:

EvolutionStrategyOptimizer

gpr

The access to the surrogate model class. Example surrogate model classes can be found in a separate repository.

available values: class

Used by:

BayesianOptimizer

xi

Parameter for the expected uncertainty of the estimation.

available values: float

Used by:

BayesianOptimizer
DecisionTreeOptimizer

warm_start_smbo

Dataframe that contains the search data of a previous optimization run.

available values: dataframe

Used by:

BayesianOptimizer
TreeStructuredParzenEstimators
DecisionTreeOptimizer

gamma_tpe

Separates the explored positions into good and bad.

available values: float; [0.0, ... ,0.5, ... ,1]

Used by:

TreeStructuredParzenEstimators

tree_regressor

The access to the surrogate model class. Example surrogate model classes can be found in a separate repository.

available values: class

Used by:

DecisionTreeOptimizer

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimization Tutorial

Overview • Optimizers • Parameters

Overview

Optimizer Classes and default Parameters

Optimizer Parameters

About

Uh oh!

Releases

Packages

License

arshgoyal/optimization-tutorial

Folders and files

Latest commit

History

Repository files navigation

Optimization Tutorial

Overview • Optimizers • Parameters

Overview

Optimizer Classes and default Parameters

Optimizer Parameters

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages