Bounding H2MM#

See also

This can also be viewed as a Jupyter Notebook Download H2MM_Bounds_Tutorial-OOP.ipynb

The data file can be downloaded here: sample_data_3det.txt

Let’s get our obligatory imports in order, this time we’ll start with the 3 detector data.

import os
import numpy as np
from matplotlib import pyplot as plt

import H2MM_C as hm

# load the data
def load_txtdata(filename):
    color = list()
    times = list()
    with open(filename,'r') as f:
        for i, line in enumerate(f):
            if i % 2 == 0:
                times.append(np.array([int(x) for x in line.split()],dtype=np.int64))
            else:
                color.append(np.array([int(x) for x in line.split()],dtype=np.uint8))
    return color, times

color3, times3 = load_txtdata('sample_data_3det.txt')

Bounding Parameter Values#

Sometimes we may want to restrain the range of possible values in an H2MM model (the h2mm_model). For instance, we may want to keep transition rates (h2mm_model.trans) within values reasonable for the duration of the experiment, or because you want restrain the values in the emission probability matrix (h2mm_model.obs) based on how we know the system should behave.

Bounds are defined using h2mm_limits objects. This object is passed to h2mm_model.optimize() (or EM_H2MM_C() in a functional approach) through the bounds keyword argument. When this is specified, it is also necessary to specify another keyword argument, bounds_func.

So let’s see an example

alt_period = 4000 # a fake alternation period
us_bounds = hm.h2mm_limits(max_trans = 1/(alt_period))

prior = np.array([1/4, 1/4, 1/4, 1/4])
trans = np.array([[1-3e-6, 1e-6, 1e-6, 1e-6],
                  [1e-6, 1-3e-6, 1e-6, 1e-6],
                  [1e-6, 1e-6, 1-3e-6, 1e-6],
                  [1e-6, 1e-6, 1e-6, 1-3e-6]])
obs = np.array([[0.4,0.4,0.2],
                [0.3,0.1,0.6],
                [0.2,0.4,0.4],
                [0.1,0.1,0.8]])

us_opt_model4 = hm.h2mm_model(prior, trans, obs)

us_opt_model4.optimize(color3, times3, bounds_func='revert', bounds=us_bounds)
us_opt_model4
The model converged after 631 iterations
nstate: 4, ndet: 3, nphot: 436084, niter: 631, loglik: -408203.01780807425 converged state: 0x27
prior:
0.19742522045704233, 0.5611254558625469, 0.24144932368041058, 7.251074733815892e-42
trans:
0.9999562426518485, 2.620839826060098e-05, 1.818962272427181e-06, 1.5729987618497615e-05
7.049720131795914e-06, 0.9999698856343252, 6.991342045371946e-06, 1.607330349755956e-05
1.2716807355060168e-06, 1.7388217608512278e-05, 0.9999781791003083, 3.1610013477364933e-06
1.7301823234579738e-05, 0.00011452568669777342, 8.076641015599715e-06, 0.999860095849052
obs:
0.849528664181505, 0.07564782657329712, 0.074823509245198
0.47168581743329263, 0.09134399902467148, 0.4369701835420359
0.14909987819343531, 0.31276918990273284, 0.5381309319038319
0.15084679777173912, 0.07681315977150306, 0.7723400424567578

So, what did we just to? The h2mm_limits object us_bounds prevents any value (except on-diagonal values) of the transition probability matrix (h2mm_model.trans) from ever being larger (i.e. faster transition rate) than 1/4000

Bounds Process#

When you use a bounds method, each iteration goes through the following steps:

  1. Calculate the loglikelihood and a new model

  2. Check if the optimization has converged

  3. Analyze the new model, and correct if necessary.

    1. Check if any values are smaller or larger than the limits set in the bounds argument.

    2. If values are out of bounds, apply a correction, based on the method defined by the argument passed to bounds_func

  4. Repeat optimization (return to step 1)

When creating a h2mm_limits object, all arguments are passed as keyword arguments. They come in the form of min/max_[name] where [name] is prior, trans, or obs (the core arrays of h2mm_model objects). These specify the minimum/maximum values in the respective array. If no value is specified for a given min/max array, then the values of that array can be as small or as large as possible for an unbounded model.

In all cases, these values can be specified as either a float or an array. - If specified as a float, then the value is universal for all values in the given array (this is most useful for the h2mm_model.trans matrix). This means less granularity, but then the h2mm_limits object can be used for any number of states/detectors - If specified as an array, then the values in the array are matched with the corresponding array in h2mm_model. This means greater granularity, but then you are locked into using the h2mm_limits object only for optimizations with a specific number of states /detectors.

As mentioned in the above outline, you also need to specify the bounds_func argument when using bounds. There are 3 options for this:

  1. 'minmax': The shallowest correction, sets the out of bounds value to its minimum or maximum

  2. 'revert': The preferred method, sets the out of bounds value to the value it was in the previous model

  3. 'revert_old': A more extreme form of 'revert' which goes to the model before the last in the optimization, and sets the out of bounds value to that “older” value.

Using factory_h2mm_model() with Bounds#

You will note that in the previous example, we specified the h2mm_model explicity, instead of using the factory_h2mm_model() function. This is because it is possible that the factory_h2mm_model() could return a h2mm_model whose values are already out of bounds based for the h2mm_limits object. This could create odd behavior during optimization.

There is a way around this: you can call factory_h2mm_model() with the h2mm_limits object passed through the bounds keyword argument, and the function will automatically return a h2mm_model object that is within the bound provided.

Note

See the documentation of factory_h2mm_model() for a fill list of options for customizing the function’s output.

us_bounds = hm.h2mm_limits(max_trans = 1/4000)
# make factory_h2mm_model make a model within bounds
us_model = hm.factory_h2mm_model(3,3, bounds=us_bounds)
us_model.optimize(color3, times3, bounds=us_bounds, bounds_func='revert')
The model converged after 198 iterations

Custom Bounds#

There is a final option for specifying bounds: write your own function and hand it to bounds_func.

Note

This feature was designed to allow the user to handle things/circumstances that the writers of H2MM_C had not anticipated. Therefore this example is very simple, and does not show a useful method.

So how does it work?

The function must take 4 input arguments, which will be handed to it ever iteration by EM_H2MM_C(), these are (in order):

  1. new_model: The model that will be optimized next, this is the one that should be bounded

  2. current_model: The model whose loglikelihood was just calculated, this is the model that 'revert' uses to reset an out of bounds value

  3. old_model: The model that was calculated before the current one, this is the model that 'revert_old' uses to reste an out of bounds value

  4. bound: The argument passed to the bound keyword argument, if not specified, it will be None, and can be anything (so long as you are passing a function to bounds_func

Finally, the function must return a single h2mm_model object, which is the function that will actually be calculated next- this is the equivalent of a “corrected” model based on the bounds.

Warning

Make sure the model you return makes sense, otherwise the optimization will behave very strangely. Think of this as the “gloves off” appraoch, you might have a very powerful new method, or you might get something meaningless depending on how you code it. That’s your responsibility.

So let’s see this in action

def sample_bounds(new_model,current_model,old_model,bound):
    # it's usually best to just keep the function signature the same
    # grab the obs matrix
    obs = new_model.obs
    # set first row of obs matrix to bound
    obs[0,:] = bound
    # change the obs matrix of the new model
    new_model.obs = obs
    # return the adjusted model
    return new_model
bnd = np.array([0.09,0.01,0.9])
prior = np.array([1/4, 1/4, 1/4, 1/4])
trans = np.array([[1-3e-6, 1e-6, 1e-6, 1e-6],
                  [1e-6, 1-3e-6, 1e-6, 1e-6],
                  [1e-6, 1e-6, 1-3e-6, 1e-6],
                  [1e-6, 1e-6, 1e-6, 1-3e-6]])
obs = np.array([bnd,
                [0.3,0.1,0.6],
                [0.2,0.4,0.4],
                [0.1,0.1,0.8]])

us_opt_model4 = hm.h2mm_model(prior, trans, obs)
us_opt_model4.optimize(color3, times3, bounds_func=sample_bounds, bounds=bnd)
The model converged after 712 iterations