Crash course in Bayesian inversion

Mauro Werder

Bayesian what?

Bayesian inversion

Bayesian inversion is a statistical method for estimating unknown model parameters including their uncertainty.

Using Bayes’ theorem, it fits a forward model to observations whilst also taking into account prior information.

Examples: Brinkerhoff and others (2021)

Looks at coupled dynamics of ice flow and subglacial drainage

  • first study of a coupled ice-flow (lastest generation) drainage system model inversion
  • uses a surrogate forward model
  • Bayesian inversion based around MCMC (Markov chain Monte Carlo)

Fig: Model region in West Greenland

Examples: Brinkerhoff and others (2021)

  • inversion estimates eight parameters
  • and their distributions

[explain what that plot shows]

Examples: Werder and others (2020)

Invert for ice thickness given radar and flow speed observations:

  • uses the Farinotti and others (2009) ice thickness model as forward model
  • fits observations of thickness and flow speed by tuning various parameters
  • this is what got me started with Bayesian inference / inversions

Fig: inversion for Unteraargletscher

Examples: Werder and others (2020)

Fig: inversion for Unteraargletscher: comparison to data

Examples: Pohle and others (2022)

Invert for parameters of R-channel model given our measurements

  • forward models: two simple R-channel models
  • one is fitted, one errors are just forward propagated
  • fit to channel size S
  • fit with temperature gradient and initial S
  • comparison between the predicted temperature gradients

Disclaimer

  • I am not a specialist on this, let alone a statistician, merely a user
  • this presentation is a bit less clear than I was hoping… good luck!

Terminology

Terminology: Uncertainty vs. Errors

  • Error: discrepancy between model / measurement and reality
    • typically we don’t actually know the error (as the truth is unknown)
    • examples:
      • data error: instrument precision limits, noise, etc
      • numerical error: difference between numerical model result and true solution
      • others…
  • Uncertainty: lack of complete knowledge about a system or its parameters
    • typically given as a probability distribution
    • errors can be conceptualized as random draws from this distribution

A probabilistic framework quantifies uncertainty, enabling us to model errors probabilistically.

Terminology: Types of Uncertainty

Aleatoric Uncertainty

  • stochastic variability inherent to a process
  • also called irreducible uncertainty
  • Examples:
    • atmospheric variability
    • sensor noise
    • quantum fluctuations

Epistemic Uncertainty

  • due to lack of knowledge
  • also called systematic uncertainty
  • can be reduced with better data or models
  • examples:
    • poorly constrained model parameters
    • sparse glacier field measurements

Ideally we want to quantify both types…

Terminology: forward vs inverse models

A forward model is a model as we know and love it:

  • feed it input and it spits out a result
  • typically that is the kind of model we create when envisaging to model a process

An inverse model is the combination of:

  • a forward model
  • and a way to fit that model’s output to observations / data

Forward uncertainty propagation

When is forward uncertainty propagation enough?

Forward propagation is the only possibility when:

  • model input parameters and their uncertainty are know or can be estimated (guesses, expert judgement,etc.)
  • there are no observation/data to fit to the outputs of the forward model

In this case, the focus is on propagating known input uncertainty through the forward model to quantify uncertainty in the outputs.

Classical Error Propagation

  • Assumes small, uncorrelated uncertainties and linear relationships.
  • Uses partial derivatives to propagate input uncertainty through the model.

1D Example:

For a function \(y = f(x)\), where \(\sigma_x\) is the uncertainty in \(x\):

\(\sigma_y(x_0) = \frac{\partial f}{\partial x} \Big|_{x=x_0} \cdot \sigma_x(x_0)\) 

Example: If \(y = x^2\) and \(\sigma_x = 0.1\):

\(\sigma_y = 2x \cdot \sigma_x = 0.2 x\)

Note: this linearises the function and thus is only valid for linear functions or close the the point of evaluation.

Monte Carlo Uncertainty Propagation

Monte Carlo (MC) methods propagate uncertainty by creating many input parameter sets by sampling the input’s distributions and running the forward model for each set.


Key Steps:

  1. define uncertainty distributions for each input parameter
    • example: \(\mathcal{N}(\mu, \sigma)\), \(\text{Uniform}(a, b)\)
  2. randomly sample input parameters from these distributions
  3. run the forward model to calculate the corresponding outputs
  4. analyze the distribution of the outputs for uncertainty in results

Pros:

  • handles nonlinear models and correlated inputs
  • any form of input distribution (not limited to Gaussian)

Julia Code Example: Monte Carlo Uncertainty Propagation

A glacier mass-balance model calculates annual mass balance at a location on the glacier, \(MB\). In gereral it would look something like:

\(MB(P, T, m, ...) = f(P, T, m, ..)\)

where \(f\) is some, maybe complicated, function.


For now we take this simple model:

\(MB = P - T \cdot m\)

where:

  • \(P\): solid precipitation (uncertain, modeled as \(\mathcal{N}(1000, 100)\)).
  • \(T\): mean annual temperature (uncertain, \(\mathcal{N}(2, 0.5)\)).
  • \(m\): melt factor (\(10.0\), fixed).

Julia implementation

using Distributions, Random, Statistics, CairoMakie # for stats and plotting

# Define the forward model
glacier_mb(P, T, m) = P - T*m

# Define input distributions
P_dist = Normal(1000, 100)  # Snow precipitation distribution (mean=1000mm, std=100mm)
T_dist = Normal(2, 0.5)     # Annual temperature distribution (mean=2°C, std=0.5°C)
# Fixed melt factor
m = 200.0

# Monte Carlo sampling
num_samples = 10000
P_samples = rand(P_dist, num_samples)  # Sample precipitation
T_samples = rand(T_dist, num_samples)  # Sample temperature

# Compute mb for each sample
mb_samples_MC = glacier_mb.(P_samples, T_samples, m)

# Analyze results
mean_mb, std_mb  = mean(mb_samples_MC), std(mb_samples_MC)
# Print results
println("\nMean annual mass balance: $(mean_mb) mm")
println("Standard deviation of MB: $(std_mb) mm")

Mean annual mass balance: 599.4495332173688 mm
Standard deviation of MB: 143.1536843819519 mm

Output plot

# Plot results
fig = Figure()
ax = Axis(fig[1, 1], title="Monte Carlo Mass Balance Distribution", xlabel="Mass balance (mm)", ylabel="Frequency")
hist!(ax, mb_samples_MC, bins=50, color=:blue, strokewidth=0.0,); fig
A Gaussian shaped histogram

Distribution of simulated melt

Inversions

When do we need inversions?

Essentially, when our model produces output for which we have measurements and we want to use that to constrain the model parameters.

  • “parameters” is used here in the sense of any model input
    • for example: ice density, DEM, numerical stuff, magic constants, etc
    • note that in the scheme presented here, any parameters which have uncertainty need to be fitted; i.e. no “straight” MC propagation of errors
  • inversions are in general ill-posed and need regularisation. This is done in Bayesian inversion with so-called priors (but they can be used for other things too)


Mass balance example: if we got measurements of the mass balance we can fit the model to those.

Prior knowledge, aka “Priors”

Before we start to run our inversion we typically already have some knowledge of the model parameters:

  • direct measurements of a parameter, e.g. the DEM and its uncertainty
  • inversions of a parameter from a previous study
  • physics and other fundamental constraints
  • expert opinion…


Note that this is what we used in the uncertainty forward propagation example: just our prior knowledge of the parameters.

Prior: \(p(\theta)\)

Formally, priors are written as

\(p(\theta)\)

where \(p\) is a probability density function (PDF) and \(\theta\) is the vector of parameters.


So for our previous example:

\(\theta = [P, T]\),

remember \(m\) was fixed, with \(P\sim N(1000,100)\) and \(T\sim N(2,0.5)\).

And as our normal priors on \(P\) and \(T\) are uncorrelated the PDF of the prior is:

\(p(\theta) = \frac{1}{\sqrt{2\pi100^2}} e^{-\frac{(\theta_P-1000)^2}{2\cdot 100^2}} \cdot \frac{1}{\sqrt{2\pi0.5^2}} e^{-\frac{(\theta_T-2)^2}{2 \cdot 0.5^2}}\).

Forward model and how to test it with data, aka the likelihood

Going back to our mass balance model: it has the form

\(MB(T,P,m,...) = f(T,P,m,...)\)

(where above \(f\) was super simple. However \(f\) may well be more complicated, say taking DEMs, wind fields, etc as inputs.)

Now we have data for \(MB\) from direct glaciological observations: how do we incorporate that data?

–> Fit the model to the data.

  • For this we need a cost-function (aka objective-function, etc.). In Bayesian inversion this is the likelihood.

Likelihood: \(p(d|\theta)\)

To compare the data to the forward model output, an assumption on how the errors are distributed. This is captured in the likelihood \(p(d|\theta)\), which reads something as:

“the likelihood of measuring the data \(d\) given our forward model run with parameters \(\theta\) and a stochastic model of the errors”

Usually, for lack of better knowledge, a normal distribution is assumed for the errors:

\(p(d | \theta) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma_i^2}} e^{-\frac{(F(\theta)_i - d_i)^2}{2\sigma_i^2}}\)

for the case of \(n\) uncorrelated errors, where \(F(\theta)\) is the forward model.


For our example this would be

\(p(d | \theta) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(MB(\theta)_i - d)^2}{2\sigma^2}}\)

with a single observation of the mass-balance \(d\).

Bayes’ theorem

Combining prior and knowledge gained from modelling

We now want to combine the prior \(p(\theta)\) and the likelihood \(p(d|\theta)\)… Bayes’ theorem to the rescue!

Stated in terms of PDFs:

\(p(\theta|d) = \frac{p(d|\theta) \cdot p(\theta)}{p(d)}\)

where you will spot the prior and likelihood as well as the marginal likelihood \(p(d)\) and the posterior \(p(\theta|d)\).


The posterior is what we are after, it tells us the probability of a set of parameters being appropriate given our prior knowledge, the forward model and an error model.

The marginal likelihood is of little relevance as the methods used to evaluate the posterior work without knowing that.

Worked example

We will now continue with our simple mass balance example from before (this uses a package for more compact code, we’ll do it from scratch this afternoon).

using Turing # using a MCMC package, and plotting

@model function bayesian_glacier_model(mb_measured, mb_uncertainty)
    # Priors
    P ~ Normal(1000, 100)  # Precipitation
    T ~ Normal(2, 0.5)     # Temperature
    # Fixed melt factor
    m = 200.0
    
    # Likelihood
    mb = P - T * m
    mb_measured ~ Normal(mb, mb_uncertainty)
end

# Define model and draw samples from the posterior
model = bayesian_glacier_model(400.0, 100.0)
chain = sample(model, NUTS(0.65), 1000, progress=false);
summarystats(chain)
┌ Info: Found initial step size
└   ϵ = 1.6
Summary Statistics
  parameters       mean       std      mcse   ess_bulk   ess_tail      rhat    ⋯
      Symbol    Float64   Float64   Float64    Float64    Float64   Float64    ⋯

           P   932.8217   80.6855    2.9857   728.2007   598.7228    1.0000    ⋯
           T     2.3136    0.3919    0.0144   736.6765   732.3138    1.0035    ⋯
                                                                1 column omitted

Prior (green) vs posterior (blue) distirbution of P (left) and T (right)

The histograms show how our prior knowledge of precipitation and temperature is updated by the fitting to the measured mass balance.

Forecasts: aka the uncertainty in the model output

We can also look at what the model then “forecasts” for the actual value of the mass balance, taking prior knowledge into account as well.

Prior (green) vs posterior (blue) distirbution of mass balance

Joint probabilities

The correlations are as expected, postive between T-mb and T-P and negative T-mb.

Universality and flexibility

The method is

  • pretty universal approach to uncertainty quantification and can be applied to most situations and models.
  • takes equifinality into account by showing distribution of parameters
  • make incorporation of prior knowledge easy and rigours

It is also flexible. Adding another parameter to the inversion is usually straight-forward.

Our example with fitting \(m\) as well:

@model function bayesian_glacier_model(mb_measured, mb_uncertainty)
    # Priors
    P ~ Normal(1000, 100)  # Precipitation
    T ~ Normal(2, 0.5)     # Temperature
    # Fixed melt factor
    m ~ Normal(200.0, 20)
    
    # Likelihood
    mb = P - T * m
    mb_measured ~ Normal(mb, mb_uncertainty)
end

Methods for doing Bayesian inversions

Sampling, why sampling?

  • a sampler takes a PDF and returns a sample from it. The rand() command is an example sampling from Uniform(0,1)
  • good to characterize high dimensional probability distributions via histograms and joint-probability scatter plots
  • integration, which is needed for expectation values, etc., is just a summation away

Maximum posterior estimates

  • instead of sampling the posterior to get the full distribution, just finding the maximum can also be done
  • uses gradient ascent or similar and probably faster than sampling as less model evaluations are needed
  • this is largely equivalent to other types of inversions where a cost function (likelihood) with some regularisation (prior) is minimized

What is that MCMC stuff?

Sampling is important not least because it can be done very efficiently via Markov chain Monte Carlo (MCMC) methods. Thus MCMC and Bayesian inversion often is used almost interchangeably; but MCMC is really just the method used to sample.

Metropolis Hastings is the original MCMC sampler

  • in the example above, this “Turing”-thing uses an MCMC sampler
  • we will look at MH this afternoon
  • but nowadays often more modern samplers are used. In particular Hamiltonian Monte Carlo methods are often used, but require calculation of the gradient of the posterior
  • we will use an affine invariant sampler, which is what I used in most of my work

Steps to Bayesianism

  • have / make a forward model
  • have some data for model parameters and outputs
  • make priors on model parameters p()$
  • make likelihood combining forward model and stochastic error model \(p(d|\theta)\)
  • evaluate posterior \(p(\theta|d)\propto p(d|\theta) p(\theta)\)
  • make forecasts with ensemble of \(\theta\)s

Limitations of Bayesian inversion

Pros:

  • principled way to go about quantifying uncertainties
  • can provide both best fit and uncertainties of parameters and model forecasts

Cons:

  • quite a few pieces need to come together to make things work
  • needs more forward model evaluations than some other inversion strategies
  • statistics is quite difficult to get into

Things not covered

  • the construction of the likelihood and priors are not always trivial, but I get away with being not good at it
    • correlations should be accounted for, e.g. spatial, to ensure smoothness, avoid overfitting and not overestimate measurement contributions
    • non informative priors: how to choose a prior such that it has no impact on the results
  • testing convergence of the MCMC iterations
  • there are packages out there to do Bayesian inference, I find them impenetrable because the use some much stats jargon…
  • yes, stats-jargon! Can’t help you there. If you get dizzy because of it, focus on your forward model and wait for it to pass.
  • probably other things which I’m not even aware of

Summary

Brinkerhoff et al., 2022
  • taking uncertainties into account is important
  • forward propagation of uncertainties is often good enough (i.e. no need to fit a model)
    • use MC scheme
  • Bayesian inference is a nice inversion scheme
    • an MCMC-based approach will give the best fit as well as uncertainties
    • however, maximum posterior evaluation can also be done; this is then pretty equivalent to other inversions
  • the plan for this afternoon is to code such an MCMC-based inversion scheme

Thanks!

References

Brinkerhoff D, Aschwanden A and Fahnestock M (2021) Constraining subglacial processes from surface velocity observations using surrogate-based Bayesian inference. Journal of Glaciology 67(263), 385–403. doi:10.1017/jog.2020.112.
Farinotti D, Huss M, Bauder A, Funk M and Truffer M (2009) A method to estimate ice volume and ice thickness distribution of alpine glaciers. Journal of Glaciology 55(151), 422–430.
Pohle A, Werder MA, Gräff D and Farinotti D (2022) Characterising englacial R-channels using artificial moulins. Journal of Glaciology 68(271), 879–890. doi:10.1017/jog.2022.4.
Werder MA, Huss M, Paul F, Dehecq A and Farinotti D (2020) A Bayesian ice thickness estimation model for large-scale applications. Journal of Glaciology 66(255), 137–152. doi:10.1017/jog.2019.93.