Documentation#
Data Analysis Functions#
- H2MM_C.EM_H2MM_C(h2mm_model h_mod, indexes, times, max_iter=None, print_func=u'iter', print_args=None, bounds_func=None, bounds=None, max_time=np.inf, converged_min=None, num_cores=None, reset_niter=True, gamma=False, opt_array=False)#
Calculate the most likely model that explains the given set of data. The input model is used as the start of the optimization.
- Parameters:
h_model (h2mm_model) – An initial guess for the H2MM model, just give a general guess, the algorithm will optimize, and generally the algorithm will converge even when the initial guess is very far off
indexes (list of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
print_func (None, str or callable, optional) –
Specifies how the results of each iteration will be displayed, several strings specify built-in functions. Default is ‘iter’ Acceptable inputs: str, Callable or None
None:
causes no printout anywhere of the results of each iteration
Str: ‘console’, ‘all’, ‘diff’, ‘comp’, ‘iter’
’console’:
the results will be printed to the terminal/console window this is useful to still be able to see the results, but not clutter up the output of your Jupyter Notebook, this is the default
’all’:
prints out the full
h2mm_model
just evaluated, this is very verbose, and not generally recommended unless you really want to know every step of the optimization’diff’:
prints the loglik of the iteration, and the difference between the current and old loglik, this is the same format that the ‘console’ option used, the difference is whether the print function used is the C printf, or Cython print, which changes the destination from the console to the Jupyter notebook
’diff_time’:
same as ‘diff’ but adds the time of the last iteration and the total optimization time
’comp’:
prints out the current and old loglik
’comp_time’:
same as ‘comp’ but with times, like in ‘diff_time’
’iter’:
prints out only the current iteration
- Callable: user defined function
A python function for printing out a custom output, the function must accept the input of (int,
h2mm_model
,h2mm_model
,h2mm_model
, float, float) as the function will be handed (niter, new, current, old, t_iter, t_total) from a special Cython wrapper. t_iter and t_total are the times of the iteration and total time respectively, in seconds, based on the C level clock function, which is notably inaccurate, often reporting larger than actual values.
print_args (2-tuple/list (int, bool) or None, optional) – Arguments to further customize the printing options. The format is (int bool) where int is how many iterations before updating the display and the bool is True if the printout will concatenate, and False if the display will be kept to one line, The default is None, which is changed into (1, False). If only an int or a bool is specified, then the default value for the other will be used. If a custom printing function is given then this argument will be passed to the function as *args. Default is None
bounds_func (str, callable or None, optional) –
function to be evaluated after every iteration of the H2MM algorithm its primary function is to place bounds on
h2mm_model
Note
bounding the
h2mm_model
causes the guarantee of improvement on each iteration until convergence to no longer apply, therefore the results are no longer guaranteed to be the optimal model within bounds.Default is None
Acceptable inputs:
- C level limits: ‘minmax’ ‘revert’ ‘revert_old’
prevents the model from having values outside of those defined by
h2mm_limits
class given to bounds, if an iteration produces a new model for loglik calculation in the next iteration with values that are out of those bounds, the 3 function differ in how they correct the model when a new model has a value out of bounds, they are as follows:’minmax’:
sets the out of bounds value to the min or max value closer to the original value
’revert’:
sets the out of bounds value to the value from the last model for which the loglik was calculated
’revert_old’:
similar to revert, but instead reverts to the value from the model one before the last calculated model
- Callable: python function that takes 4 inputs and returns
h2mm_model
object Warning
The user takes full responsibility for errors in the results.
must be a python function that takes 4 arguments, the first 3 are the
h2mm_model
objects, in this order: new, current, old, the fourth is the argument supplied to the bounds keyword argument, the function must return a singleh2mm_limits
object, the prior, trans and obs fields of which will be optimized next, other values are ignored.
bounds (h2mm_limits, str, 4th input to callable, or None, optional) – The argument to be passed to the bounds_func function. If bounds_func is None, then bounds will be ignored. If bounds_func is ‘minmax’, ‘revert’ or ‘revert_old’ (calling C-level bounding functions), then bounds must be a
h2mm_limits
object, if bounds_func is a callable, bounds must be an acceptable input as the fourth argument to the function passed to bounds_func Default is Nonemax_iter (int or None, optional) – the maximum number of iterations to conduct before returning the current
h2mm_model
, if None (default) use value from optimization_limits. Default is Nonemax_time (float or None, optional) – The maximum time (in seconds) before returning current model NOTE this uses the C clock, which has issues, often the time assessed by C, which is usually longer than the actual time. If None (default) use value from optimization_limits. Default is None
converged_min (float or None, optional) – The difference between new and current
h2mm_model
to consider the model converged, the default setting is close to floating point error, it is recommended to only increase the size. If None (default) use value from optimization_limits. Default is Nonenum_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
reset_niter (bool, optional) – Tells the algorithm whether or not to reset the iteration counter of the model, True means that even if the model was previously optimized, the iteration counter will start at 0, so calling
EM_H2MM_C()
on a model that reached its maximum iteration number will proceed another max_iter iterations. On the other hand, if set to False, then the new iterations will be added to the old. If set to False, you likely will have to increase max_iter, as the optimization will count the previous iterations towards the max_iter threshold. Default is Falsegamma (bool, optional) –
Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state.
Note
If opt_array is True, then only the gamma arrays for the ideal model are returned.
Default is False
opt_array (bool) –
Defines how the result is returned, if False (default) then return only the ideal model, if True, then a numpy array is returned containing all models during the optimization.
Note
The last model in the array is the model after the convergence criterion have been met. Therefore check the
h2mm_model.conv_code
, if it is 7, then the second to last model is the ideal model. Still, usually the last model will be so close to ideal that it will not make any significant differenceDefault is False
- Returns:
out (h2mm_model) – The optimized
h2mm_model
. will return after one of the following conditions are met: model has converged (according to converged_min, default 1e-14), maximum iterations reached, maximum time has passed, or an error has occurred (usually the result of a nan from a floating point precision error)gamma (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst. The arrays are returned in the same order and type as the input indexes, individual data points organized as [photon, state] within each data sequence.
- H2MM_C.factory_h2mm_model(nstate, ndet, bounds=None, trans_scale=1e-5, prior_dist='equal', trans_dist='equal', obs_dist='equal')#
Function for generating a model of an arbitrary number of states/streams
- Parameters:
nstate (int) – Number of states in the return model
ndet (int, optional) – Number of photons streams in the return model
bounds (h2mm_limits or None, optional) – The limits expected to be used when optimizing the model. For limits are given, the model will always be within those limits, distribution of states within the model will depend on how The default is None.
trans_scale (float) – Only used when no
h2mm_limits
object is supplied to bounds, sets the minimum (slowest) transition rates in the trans arrayprior_dist ('even' or 'random', optional) – Define how to distribute prior states. options are ‘equal’ and ‘random’ The default is ‘equal’.
trans_dist ('even' or 'random', optional) – Define how to distribute trans states. options are ‘equal’ and ‘random’ The default is ‘equal’.
obs_dist ('even' or 'random', optional) – Define how to distribute obs states. options are ‘equal’ and ‘random’ The default is ‘equal’.
- Returns:
model – An
h2mm_model
object ready for optimization- Return type:
h2MM_model
- H2MM_C.H2MM_arr(h_mod, indexes, times, num_cores=None, gamma=False)#
Calculate the logliklihood of every model in a list/array given a set of data
- Parameters:
h_model (list, tuple, or numpy.ndarray of h2mm_model objects) – All of the
h2mm_model
object for which the loglik will be calculatedindexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
gamma (bool, optional) –
Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state. the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
- Returns:
out (list tuple, or numpy.ndarray of h2mm_model) – A list, tuple, or numpy array of the
h2mm_model
with the loglik computed The converged state is automatically set to 0, and nphot is set in accordance with the number of photons in the data set. The datatype returned is the same as the datatype of h_modelgamma_arr (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst and model. The arrays are returned in the same order and type as the input models and indexes, individual data points organized as [photon, state] within each data sequence.
- H2MM_C.viterbi_path(h2mm_model h_mod, indexes, times, num_cores=None)#
Calculate the most likely state path through a set of data given a H2MM model
- Parameters:
h_model (h2mm_model) – An
h2mm_model
, should be optimized for the given data set (result ofEM_H2MM_C()
orh2mm_model.optimize()
) to ensure results correspond to give the most likely pathindexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
num_cores (int or None, optional) – the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations. Note that os.cpu_count() returns number of threads, but it is ideal to take the number of physical cores. Therefore, unless reset by user, optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. If None (default) use value from optimization_limits. Default is None
- Returns:
path (list of NUMPY 1D int arrays) – The most likely state path for each photon
scale (list of NUMPY 1D float arrays) – The posterior probability for each photon
ll (NUMPY 1D float array) – loglikelihood of each burst
icl (float) – Integrated complete likelihood, essentially the BIC for the Viterbi path
- H2MM_C.viterbi_sort(h2mm_model hmod, indexes, times, num_cores=None)#
An all inclusive viterbi processing algorithm. Returns the ICL, the most likely state path, posterior probabilities, and a host of information sorted by bursts and dwells
- Parameters:
h_model (h2mm_model) – An
h2mm_model
, should be optimized for the given data set (result ofEM_H2MM_C()
orh2mm_model.optimize()
) to ensure results correspond to give the most likely pathindexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
num_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
- Returns:
icl (float) – Integrated complete likelihood, essentially the BIC for the Viterbi path
path (list of NUMPY 1D int arrays) – The most likely state path for each photon
scale (list of NUMPY 1D float arrays) – The posterior probability for each photon
ll (NUMPY 1D float array) – loglikelihood of each burst
burst_type (NUMPY 1D int array) – Identifies which states are present in the burst, identified in a binary format, ie if states 0 and 2 are present in a burst, that element will be 0b101 = 5
dwell_mid (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells with full residence time in the data set, sorted by the state of the dwell (Top level list), and the successor state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.
dwell_beg (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells that start at the beginning of a burst, sorted by the state of the dwell (Top level list), and the successor state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.
dwell_end (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells that start at the end of a burst, sorted by the state of the dwell (Top level list), and the preceding state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.
dwell_burst (list of NUMPY 1D int arrays) – List of bursts that were only in one state, giving their durations.
ph_counts (list of NUMPY 2D int arrays) – Counts of photons in each dwell, sorted by index and state of the dwell The index of the list identifies the state of the burst, the 1 index identifies the index of photon, and the 0 index are individual dwells.
ph_mid (square list of NUMPY 2D arrays) – Gives the photon counts of dwells in the middle of bursts, sorted by state of the dwell (top level of the list) and the state of the next dwell in the burst (lower level burst), each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
ph_beg (square list of NUMPY 2D arrays) – Gives the photon counts of dwells at the beginning of bursts, sorted by state of the dwell (top level of the list) and the state of the next dwell in the burst (lower level list), each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
ph_end (square list of NUMPY 2D arrays) – Gives the photon counts of dwells at the ends of bursts, sorted by state of the dwell (top level of the list) and the state of the previous dwell in the burst, each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
ph_burst (list of NUMPY 2D arrays) – Gives the counts of photons in bursts with only a single state. Sorted according to the state of the burst. Each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
- H2MM_C.path_loglik(h2mm_model model, indexes, times, state_path, num_cores=None, BIC=True, total_loglik=False, loglikarray=False)#
Function for calculating the statistical parameters of a specific state path and data set given a model (ln(P(XY|lambda))). By default returns the BIC of the entire data set, can also return the base logliklihood in total and/or per burst. Which return values selected are defined in keyword arguments.
- Parameters:
model (h2mm_model) – DESCRIPTION.
indexes (list[numpy.ndarray], tuple[numpy.ndarray] or numpy.ndarray[numpy.ndarray]) – DESCRIPTION.
times (list, tuple, or numpy.ndarray of integer numpy.ndarray) – DESCRIPTION.
state_path (list, tuple, or numpy.ndarray of integer numpy.ndarray) – DESCRIPTION.
num_cores (int, optional) – Number of cores to use in calculation, if None, then use value in optimization_limits. The default is None.
BIC (bool, optional) – Whether to return the Bayes Information Criterion of the calculation. The default is True.
total_loglik (bool, optional) – Whether to return the loglikelihood of the entire data set as a single number. The default is False.
loglikarray (bool, optional) – Whether to return the loglikelihoods of each burst as a numpy array. The default is False.
- Raises:
TypeError – One or more inputs did not contain the proper data types.
ValueError – One or more values out of the acceptable ranges given the input model
RuntimeError – Deeper issue, raise issue on github.
- Returns:
bic (float) – Bayes information criterion of entire data set and path.
loglik (float) – Log likelihood of entire dataset
loglik_array (np.ndarray) – Loglikelihood of each burst separately
Simulation Functions#
- H2MM_C.sim_phtraj_from_times(h2mm_model hmod, ndarray times, seed=None)#
Generate a state path and photon trajectory for a given set of times
- Parameters:
hmod (h2mm_model) – An
h2mm_model
to build the path and stream trajectories fromtimes (numpy.ndarray 1D int) – An unsigned integer of monotonically increasing times for the burst
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
TypeError – tiems array must be 1D
ValueError – times were not monotonically increasing
RuntimeError – Unknown error, raise issue on github
- Returns:
path (numpy.ndarray 1D int) – State trajectory generated for the input times
dets (numpy.ndarray 1D int) – stream trajectory generate for the input times, derived from path
- H2MM_C.sim_phtraj_from_state(h2mm_model hmod, ndarray states, seed=None)#
Generate a photon trajectory from a h2mm model and state trajectory
- Parameters:
hmod (h2mm_model) – An
h2mm_model
to build the stream trajectory fromstates (numpy ndarray 1D int) –
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
TypeError – The state array was not 1D
RuntimeError – Unknown error, please raise issue on github
- Returns:
dets – The random photon stream indexes based on model and statepath
- Return type:
numpy ndarray 1D int
- H2MM_C.sim_sparsestatepath(h2mm_model hmod, ndarray times, seed=None)#
Generate a state path from a model and a sparse set of arrival times
- Parameters:
hmod (h2mm_model) – An
h2mm_model
to build the state path fromtimes (numpy ndarray 1D int) – An unsigned integer of monotonically increasing times for the burst
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
TypeError – Your array was not 1D
ValueError – Your time were not monotonically increasing
RuntimeError – Unknown error, please raise issue on github
- Returns:
path – A randomly generated statepath based on the input model and times
- Return type:
numpy ndarray 1D long int
- H2MM_C.sim_statepath(h2mm_model hmod, int lenp, seed=None)#
A function to generate a dense statepath (HMM as opposed to H2MM) of length lenp
- Parameters:
hmod (h2mm_model) – An
h2mm_model
to build the state path fromlenp (int) – The length of the path you want to generate
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
RuntimeError – A problem with the C code, this is for future proofing
- Returns:
path_ret – The random dense state-path
- Return type:
numpy ndarray, positive integer
Classes#
- class H2MM_C.h2mm_model(prior, trans, obs, loglik=- np.inf, niter=0, nphot=0, is_conv=False)#
The base class for storing objects representing the H2MM model, stores the relevant matrices, loglikelihood and information about status of optimization.
- Parameters:
prior (numpy.ndarray) – The initial probability matrix, 1D, size nstate
trans (numpy.ndarray) – The transition probability matrix, 2D, shape [nstate, nstate]
obs (numpy.ndarray) – The emission probability matrix, 2D, shape [nstate, ndet]
Parameters (Optional) –
------------------- –
note:: (..) – These are generally only used when trying to re-create a model from, for instance a previous optimization of a closed notebook.
loglik (float (optional)) – The loglikelihood, always set with nphot otherwise BIC will not be calculated correctly. Usually should be set by evaluation against data. Default is -inf
niter (int) – The number of iterations, used for recreating previously optimized model. Usually should be set by evaluation against data. Default is 0
nphot (int (optional)) – The number of photons in data set used for optimization, should only be set when recreating model, Usually should be set by evaluation against data. Default is 0
is_conv (bool (optional)) – Whether or not the model has met a convergence criterion, should only be set when recreating model, otherwise allow evaluation functions to handle this operation. Default is False
- bic#
Bayes Information Criterion of model
- conv_code#
The convergence code of model, an int
- conv_str#
String description of how model optimization/calculation ended
- copy(self)#
Make a duplicate model in new memory
- evaluate(self, indexes, times, gamma=False, num_cores=None, inplace=True)#
Calculate the loglikelihood of the model given a set of data. The loglikelihood is stored with the model. NOTE: this method calls the H2MM_arr function, and has essentially the same syntax
- Parameters:
indexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to11 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
gamma (bool (optional)) – Whether or not to return the gamma array. (The default is False)
num_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
inplace (bool, optional) – Whether or not to store the evaluated model in the current model object. Default is True
- Returns:
out (h2mm_model) – The evaluated model
gamma_arr (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst and model. The arrays are returned in the same order and type as the indexes, individual data points organized as [photon, state] within each data sequence.
- is_calc#
If model has been optimized/eveluated, vs being an initial model
- is_conv#
Whether or not the optimization reached convergence rather than exceeding the maximum iteration/time of optimization
- is_opt#
Whether or not the model has undergone optimization, as opposed to evaluation or being an initial model
- k#
Number of free parameters in model
- loglik#
Loglikelihood of model
- ndet#
Nubmer of detectors/photon streams in model
- niter#
Number of iterations optimization took
- normalize(self)#
For internal use, ensures all model arrays are row stochastic
- nphot#
Number of photons in data set used to optimize model
- nstate#
Number of states in model
- obs#
Emission probability matrix, 2D shape nstate x ndet
- optimize(self, indexes, times, max_iter=None, print_func='iter', print_args=None, bounds=None, bounds_func=None, max_time=None, converged_min=None, num_cores=None, reset_niter=False, gamma=False, opt_array=False, inplace=True)#
Optimize the H2MM model for the given set of data.
Note
This method calls the
EM_H2MM_C()
function.- Parameters:
indexes (list of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
print_func (None, str or callable, optional) –
Specifies how the results of each iteration will be displayed, several strings specify built-in functions. Default is ‘iter’ Acceptable inputs: str, Callable or None
None:
causes no printout anywhere of the results of each iteration
Str: ‘console’, ‘all’, ‘diff’, ‘comp’, ‘iter’
’console’:
the results will be printed to the terminal/console window this is useful to still be able to see the results, but not clutter up the output of your Jupyter Notebook, this is the default
’all’:
prints out the full h2mm_model just evaluated, this is very verbose, and not generally recommended unless you really want to know every step of the optimization
’diff’:
prints the loglik of the iteration, and the difference between the current and old loglik, this is the same format that the ‘console’ option used, the difference is whether the print function used is the C printf, or Cython print, which changes the destination from the console to the Jupyter notebook
’diff_time’:
same as ‘diff’ but adds the time of the last iteration and the total optimization time
’comp’:
prints out the current and old loglik
’comp_time’:
same as ‘comp’ but with times, like in ‘diff_time’
’iter’:
prints out only the current iteration
- Callable: user defined function
A python function for printing out a custom output, the function must accept the input of (int,
h2mm_model
,h2mm_model
,h2mm_model
,float, float) as the function will be handed (niter, new, current, old, t_iter, t_total) from a special Cython wrapper. t_iter and t_total are the times of the iteration and total time respectively, in seconds, based on the C level clock function, which is notably inaccurate, often reporting larger than actual values.
print_args (2-tuple/list (int, bool) or None, optional) – Arguments to further customize the printing options. The format is (int bool) where int is how many iterations before updating the display and the bool is True if the printout will concatenate, and False if the display will be kept to one line, The default is None, which is changed into (1, False). If only an int or a bool is specified, then the default value for the other will be used. If a custom printing function is given then this argument will be passed to the function as *args. Default is None
bounds_func (str, callable or None, optional) –
function to be evaluated after every iteration of the H2MM algorithm its primary function is to place bounds on
h2mm_model
Note
bounding the
h2mm_model
causes the guarantee of improvement on each iteration until convergence to no longer apply, therefore the results are no longer guaranteed to be the optimal model within bounds.Default is None Acceptable inputs:
- C level limits: ‘minmax’ ‘revert’ ‘revert_old’
prevents the model from having values outside of those defined by
h2mm_limits
class given to bounds, if an iteration produces a new model for loglik calculation in the next iteration with values that are out of those bounds, the 3 function differ in how they correct the model when a new model has a value out of bounds, they are as follows:’minmax’:
sets the out of bounds value to the min or max value closer to the original value
’revert’:
sets the out of bounds value to the value from the last model for which the loglik was calculated
’revert_old’:
similar to revert, but instead reverts to the value from the model one before the last calculated model
- Callable: python function that takes 4 inputs and returns
h2mm_model
object Warning
The user takes full responsibility for errors in the results.
must be a python function that takes 4 arguments, the first 3 are the
h2mm_model
objects, in this order: new, current, old, the fourth is the argument supplied to the bounds keyword argument, the function must return a singleh2mm_limits
object, the prior, trans and obs fields of which will be optimized next, other values are ignored.
bounds (h2mm_limits, str, 4th input to callable, or None, optional) – The argument to be passed to the bounds_func function. If bounds_func is None, then bounds will be ignored. If bounds_func is ‘minmax’, ‘revert’ or ‘revert_old’ (calling C-level bounding functions), then bounds must be a
h2mm_limits
object, if bounds_func is a callable, bounds must be an acceptable input as the fourth argument to the function passed to bounds_func Default is Nonemax_iter (int or None, optional) – the maximum number of iterations to conduct before returning the current
h2mm_model
, if None (default) use value from optimization_limits. Default is Nonemax_time (float or None, optional) –
The maximum time (in seconds) before returning current model
Note
this uses the C clock, which has issues, often the time assessed by C, which is usually longer than the actual time.
If None (default) use value from optimization_limits. Default is None
converged_min (float or None, optional) – The difference between new and current
h2mm_model
to consider the model converged, the default setting is close to floating point error, it is recommended to only increase the size. If None (default) use value from optimization_limits. Default is Nonenum_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
reset_niter (bool, optional) – Tells the algorithm whether or not to reset the iteration counter of the model, True means that even if the model was previously optimized, the iteration counter will start at 0, so calling
EM_H2MM_C()
on a model that reached its maximum iteration number will proceed another max_iter iterations. On the other hand, if set to False, then the new iterations will be added to the old. If set to False, you likely will have to increase max_iter, as the optimization will count the previous iterations towards the max_iter threshold. Default is Falsegamma (bool, optional) –
Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state.
Note
If opt_array is True, then only the gamma arrays for the ideal model are returned.
Default is False
opt_array (bool) –
Defines how the result is returned, if False (default) then return only the ideal model, if True, then a numpy array is returned containing all models during the optimization.
Note
The last model in the array is the model after the convergence criterion have been met. Therefore check the
h2mm_model.conv_code
, if it is 7, then the second to last model is the ideal model. Still, usually the last model will be so close to ideal that it will not make any significant difference.Default is False
inplace (bool, optional) –
Whether or not to store the optimized model in the current model object.
Note
When opt_array is True, then the ideal model is copied into the current model object, while the return value contains all models
Default is True
- Returns:
out (h2mm_model) – The optimized
h2mm_model
. will return after one of the following conditions are met: model has converged (according to converged_min, default 1e-14), maximum iterations reached, maximum time has passed, or an error has occurred (usually the result of a nan from a floating point precision error)gamma (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst. The arrays are returned in the same order and type as the input indexes, individual data points organized as [photon, state] within each data sequence.
- prior#
Prior probability matrix, 1D, size nstate
- set_converged(self, converged)#
- trans#
Transition probability matrix, square, dimenstions nstate x nstate
- class H2MM_C.h2mm_limits(model=None, min_prior=None, max_prior=None, min_trans=None, max_trans=None, min_obs=None, max_obs=None, nstate=0, ndet=0)#
Special class for setting limits on the h2mm_model, as min and max values
- Parameters:
model (h2mm_model, optional) – An
h2mm_model
to base the limits off of, if None. The main purpose is to allow the user to check that the limits are valid for the model. Specifying this model will also lock the states/streams of the model, while if None, the limits is more flexible If None, none of these checks will be in place The default is None.min_prior (float or 1D numpy array, optional) – The minimum value(s) of the prior array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 0.0) The default is None.
max_prior (float or 1D numpy array, optional) – The maximum value(s) of the prior array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 1.0) The default is None.
min_trans (float or 2D square numpy array, optional) – The minimum value(s) of the trans array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. Values on diagonal set to 0.0 If None, no limits are set (all values min is 0.0) The default is None.
max_trans (float or 2D square numpy array, optional) – The maximum value(s) of the trans array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. Values on diagonal set to 1.0 If None, no limits are set (all values min is 1.0) The default is None.
min_obs (float or 2D numpy array, optional) – The minimum value(s) of the obs array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 0.0) The default is None.
max_obs (float or 2D numpy array, optional) – The maximum value(s) of the obs array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 1.0) The default is None.
nstate (int, optional) – Number of states in the model to be optimized, if set to 0, and not specified elsewhere, number of states is flexible. The default is 0.
ndet (int, optional) – Number of streams in the model to be optimized, if set to 0, and not specified elsewhere, number of states is flexible. The default is 0.
- make_model(self, model, warning=True)#
Method for checking the limits arrays generated from the input model
- Parameters:
model (h2mm_model) –
h2mm_model
for which the limits are to be specified. Also, values are checked and a warning is raised if the model has values that are out of the range of the limitswarning (bool optional) – Whether or not to check of the initial model is within the bounds generated by make_model Default is True
- Raises:
Exception – When user has reset a specific field so the field is no longer valid.
ValueError – Limits cannot be made for the given model.
- Returns:
model (h2mm_model) – Model to be optimized with specified limits.
min_prior (1D numpy float array) – Minimum values for each element of the prior array
max_prior (1D numpy float array) – Maximum values for each element of the prior array
min_trans (2D square numpy float array) – Minimum values of each element of the trans array
max_trans (2D square numpy float array) – Maximum values of each element of the trans array
min_obs (2D numpy float array) – Minimum values of each element of the obs array
max_obs (2D numpy float array) – Maximum values of each element of the obs array