H2MM_C Secondary Control Features#
See also
This can also be viewed as a Jupyter Notebook
Download H2MM_Control_Optimization.ipynb
Download the data file here: sample_data_3det.txt
First, our obligatory imports and loading of 3 detector data:
import os
import numpy as np
from matplotlib import pyplot as plt
import H2MM_C as hm
# load the data
def load_txtdata(filename):
color = list()
times = list()
with open(filename,'r') as f:
for i, line in enumerate(f):
if i % 2 == 0:
times.append(np.array([int(x) for x in line.split()],dtype=np.int64))
else:
color.append(np.array([int(x) for x in line.split()],dtype=np.uint8))
return color, times
color3, times3 = load_txtdata('sample_data_3det.txt')
Optimization Control#
Sometimes you want to control when optimizations stop, or how many cores the optimization uses. Basically the course of the optimization is the same, you’re just changing the thresholds for when to stop optimizing.
There are 4 of these “limits”:
num_coresThe number of threads to use when optimizing/calculating a model or state-path.max_iterThe maximum number of iterations to optimize a model until automatically quittingconverged_minThe threshold of improvement required to continue optimizing, i.e. if the new model improves the loglikelihood by less than this value, the optimization will stop.max_timeThe maximum duration to conduct optimization, after which optimization will automatically stop. Uses inaccurate clock, by default is infinite, and recommended not to be changed
Note
The timer used in max_time uses the inaccurate C clock, which tends to run fast, so
youroptimizations will often end earlier than the time entered. Thus the use of max_time is
generally discouraged.
Setting by Keword Arguments#
These can be adjusted by passing these as keyword arguments to EM_H2MM_C() and h2mm_model.optimize().
num_cores also works in H2MM_arr(), h2mm_model.evaluate(), viterbi_path(), viterbi_sort(), in these there are no limits/thresholds that apply to these since they are not optimizations, however, they can be parallelized, and thus num_cores is applicable.
Heres a quick example, where the number of optimizations is increased to 7200 iterations:
>>> model_5s3d = hm.EM_H2MM_C(hm.factory_h2mm_model(4,3), color3, times3, max_iter=7200)
Optimization reached maximum number of iterations
Setting Universal Defaults#
The defaults of these are stored in the module variable H2MM_C.optimization_limits.
Note
This variable functions similarly to rcParams in matplotlib.
It’s purpose is to make it easy to set the default value, instead of having to repeatebly input the same keyword arguments every time.
Values in H2MM_C.optimization_limits can be accessed and set like both dictionary keys and attributes.
The default values are:
H2MM_C.optimization_limits.num_cores = os.cpu_count() // 2This value is set onimport H2MM_Cthis sets the number of C threads (which can run on different cores at the same time, making them like python processes in that regard, but they can share memory) the algorithms in H2MM_C will use. Since most of these algorithms are cpu intensive, they will generally not benefit from multi-threading. Sinceos.cpu_count()actually returns the number of threads, and most CPUs are multi-threaded,os.cpu_count()generally returns twice the number of CPUs than the machine actually has. Therefore the choice to setnum_cores = os.cpu_count() //2. If your machine is not multi-threaded or has some other oddity, consider setting this to a more reasonable value.H2MM_C.optimization_limits.max_iter = 3600This is perhaps the most arbitrary parameter, set high enough that you are confident the model is good. 3600 was simply set because that is the number of seconds there are in an hour.H2MM_C.optimization_limits.converged_min = 1e-14This value is very small, near the floating point error for most optimizations, in fact it is often smaller than the floating point error. For especially large data sets, (roughly >10,000 trajectories with >75 photons each) the floating point error is even larger, and so it would be recommended to set this to a larger value like1e-7since when differences are less than that, changes in the value are less than the amount of error in the calculation itself.H2MM_C.optimization_limits.max_time = np.infThe timer used in H2MM_C is the basic C-level clock, it tends to be inaccurate (and often runs fast), but it doesn’t slow down the optimization much when checking the time each round. Therefore it is generally recommended to keep it at infinite, so that an optimization doesn’t terminate at a random pont.
So lets see an example of setting these values with H2MM_C.optimization_limits.
These settings will apply to all latter calls to H2MM_C functions/methods, unless a value is explicitly specified as a keyword argument in the function/method call.
hm.optimization_limits['num_cores'] = 2
hm.optimization_limits['max_iter'] = 1000
hm.optimization_limits['converged_min'] = 1e-7
model_5s3d = hm.EM_H2MM_C(hm.factory_h2mm_model(4,3), color3, times3)
This is equivalent to:
hm.optimization_limits.num_cores = 2
hm.optimization_limits.max_iter = 1000
hm.optimization_limits.converged_min = 1e-7
model_5s3d = hm.EM_H2MM_C(hm.factory_h2mm_model(4,3), color3, times3)
You can also view these values:
hm.optimization_limits.num_cores
Or as a whole:
hm.optimization limits
Note
The parameters formatter` and outstream are disucssed in Advanced options: Formatter
New v2.0#
Frozen models#
Version 2.0 introduced limited abilities to hash and use h2mm_model
objects as keys in dictionaries.
By default, most models cannot be used in this way,
rather a model must first be put into a canonical form.
To generate such a model, call the method h2mm_model.sort_states()
on any existing model. The returned model will be hashable.
sorted_model_5s3d = model_5s3d.sort_states()
hash(sorted_model_5s3d)
Along with the hash function, h2mm_model s can also be compared
to one another with the equality operator. When two models have
identical cannonical forms, they will evaluate to True in
equality comparisons.
Note
This comparison is very strict, ie it is not an almost equal comparison. This is so that using hashable models as dictionary keys will still function. The floating point numbers in two models must be identical.
if sorted_model_5s3d == model_5s3d:
print("sorteted model are equivalent to unsroted models")
else:
print("sorteted model are not equivalent to unsroted models")
model_dummy = hm.factory_h2mm_model(4,3)
if model_dummy == model_5s3d:
print("oops, unsorted models are comparable, and identical")
else:
print("unsoreted models are comparable, but not identical")
Note
Once a model is sorted so it can be hashed, it is fixed.
It can no longer be assigned new prior / trans / obs
values, and optimizations must be performed with inplace=False
convcode bitmask#
The h2mm_model.conv_code property is a number that informs on the nature of the model.
While is is more common to interpret this with the h2mm_model.conv_str,
A full list is in the Constants section of the documentation, but we will provide a simple example of the usage here.
To test if a given model has been sorted and frozen, we can & (bitwise and) the constant H2MM_C.convcode.frozen with a model:
bool(model_5s3d.conv_code & hm.convcode.frozen), bool(sorted_model_5s3d.conv_code & hm.convcode.frozen)
Model Serialization#
Version 2.0.6 introduced the h2mm_model.tobytes() and h2mm_model.frombytes()
method and classmethod respectively.
These generate and interpret a bytes objects that represent all values of
an h2mm_model object.
This is useful if you want to save your data for later:
model_bytes = model_5s3d.tobytes()
print(f'tobytes creates a {type(model_bytes)}')
model_loaded = hm.h2mm_model.frombytes(model_bytes)
print(f'frombytes creates a {type(model_loaded)}')
print("Original model:")
print(repr(model_5s3d))
print('--------------------------------')
print("loaded model")
print(repr(model_loaded))