Jevgenij Gamper
TIA Lab Meeting 10/11/2017
Slides available at jgamper.github.io/GettingFeetWetInBayesianOptimisation
for all the links and useful resources click below
Video Lectures in order of decreasing awesomnes:
Bayesian Optimisation Software (Python)
Interesting literature:
"Civilisation advances by the number of important operations which we can perform without thinking of them" (Alfred North Whitehead)
We are interested in smarter automation!
![]() |
Baltz et al., “Achievement of Sustained Net Plasma Heating in a Fusion Experiment with the Optometrist Algorithm.” |
![]() |
Snoek, Larochelle, and Adams, “Practical Bayesian Optimization of Machine Learning Algorithms.” |
$$ x^{*} = \underset{x}{\arg\max} \ \mathcal{f}(x)$$
$$ y = \mathcal{f}(x) + \epsilon$$
How do people usually optimise?
$p(\begin{bmatrix} \mathbf{y} \\ \mathbf{f}^{*} \end{bmatrix} | \sigma^{2}) = \mathcal{N}(0, \begin{bmatrix} \mathbf{C} + \sigma^{2}\mathbf{I} & \mathbf{R}\\\mathbf{R}^{T} & \mathbf{C}^{*}\end{bmatrix}),$
$p(\mathbf{f}^{*} | \mathbf{y}, \sigma^{2}) = \mathcal{N}(\mathbf{\mu}^{*}, \mathbf{\Sigma}^{*}),$
$\mathbf{\mu}^{*} = \mathbf{R}^{T} (\mathbf{C} + \sigma^{2}\mathbf{I})^{-1} \mathbf{y}$
$\mathbf{\Sigma}^{*} = \mathbf{C}^{*} -\mathbf{R}^{T} (\mathbf{C} + \sigma^{2}\mathbf{I})^{-1} \mathbf{R}$
Bold move - switch to GPSS slides!
Uncertainty quantification: Making informed decisions
Utility should represent our study design goal:
$$r_{N} = \sum_{n=1}^{N} \mathcal{f}(x_{n}) - N\mathcal{f}(x_{M})$$
(1) does a lot of exploration, whereas (2) encourages exploitation about the minimum of surrogate $\mathcal{f}$
GP upper (lower) confidence bound
Direct balance between exploration and exploitation:
$$\alpha_{LCB}(\mathbf{x}; \theta, \mathcal{D}) = - \mu(\mathbf{x}; \theta, \mathcal{D}) + \beta_{t} \sigma(\mathbf{x}; \theta, \mathcal{D})$$
Bayesian Optimisation - methodology to perform global optimisation of multi-modal black box functions
Iterate between 2 and 4 until the budget is over
Acquisition function $u(x)$ guides the optimisation by determining which $x_{t+1}$ to observe next
Choice of $u(x)$ heavily affects optimisation results
Scalability in number of parameters
Up to 10 is okay..
But there are solutions!
Swersky, Snoek, and Adams, “Freeze-Thaw Bayesian Optimization.”
$\mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma}),$
Snoek, Larochelle, and Adams, “Practical Bayesian Optimization of Machine Learning Algorithms.”