Documentation#
Data Analysis Functions#
- H2MM_C.EM_H2MM_C(model, indexes, times, max_iter=None, bounds_func=None, bounds=None, bounds_kwargs=None, print_func='iter', print_freq=1, print_args=None, print_kwargs=None, print_stream=None, print_formatter=None, print_fmt_args=None, print_fmt_kwargs=None, max_time=inf, converged_min=None, num_cores=None, reset_niter=True, gamma=False, opt_array=False)#
Calculate the most likely model that explains the given set of data. The input model is used as the start of the optimization.
- Parameters:
model (h2mm_model) – An initial guess for the H2MM model, just give a general guess, the algorithm will optimize, and generally the algorithm will converge even when the initial guess is very far off
indexes (list of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
max_iter (int or None, optional) – the maximum number of iterations to conduct before returning the current
h2mm_model, if None (default) use value from optimization_limits. Default is Nonebounds_func (str, callable or None, optional) –
function to be evaluated after every iteration of the H2MM algorithm its primary function is to place bounds on
h2mm_modelNote
bounding the
h2mm_modelcauses the guarantee of improvement on each iteration until convergence to no longer apply, therefore the results are no longer guaranteed to be the optimal model within bounds.Default is None
Acceptable inputs:
- C level limits: ‘minmax’ ‘revert’ ‘revert_old’
prevents the model from having values outside of those defined by
h2mm_limitsclass given to bounds, if an iteration produces a new model for loglik calculation in the next iteration with values that are out of those bounds, the 3 function differ in how they correct the model when a new model has a value out of bounds, they are as follows:’minmax’:
sets the out of bounds value to the min or max value closer to the original value
’revert’:
sets the out of bounds value to the value from the last model for which the loglik was calculated
’revert_old’:
similar to revert, but instead reverts to the value from the model one before the last calculated model
- Callable: python function that takes 4 inputs and returns
h2mm_modelobject Warning
The user takes full responsibility for errors in the results.
must be a python function that takes the signature
bounds_func(new, current, old, *bounds, **bounds_kwargs)new,current, andoldareh2mm_modelobjects, withnewbeing the model whose loglik has yet to be calculated resulting from the latest iteration,currentbeing the model whose loglik was just calculated, andoldbeing the result of the previous iteration. Be one of the following:h2mm_model,bool,int, or a 2-tuple ofh2mm_modeland eitherboolorint. In all cases,intvalues must 0, 1, or 2.h2mm_modelobjects will be used as the next model to compute the loglik of.boolandintobjects determine if optimization will continue. IfTrue, then optimization will continue, ifFalse, then optimization will cease, and theoldmodel will be returned as the ideal model. If0, then optimization will continue, if1, then returnoldmodel as optimal model, if2, then returncurrentmodel as optimizal model.
bounds (h2mm_limits, tuple, or None, optional) –
Argument(s) pass to bounds_func function. If bounds_func is a string, then bounds must be specified, and must be a
h2mm_limitsobject.If
bounds_funcis callable, and bounds is not None or a tuple, bounds is passed as the 4th argument to bounds_func, iebounds_func(new, current, old, bounds, **bounds_kwargs), if bounds is a tuple, then passed as *args, iebounds_func(new, current, old, *bounds, **bounds_kwargs)Default is Nonebounds_kwargs (dict, or None, optional) – Only used when
bounds_funcis callable, passed as **kwargs tobounds_funcie iebounds_func(new, current, old, *bounds, **bounds_kwargs). IfNoneThe default is Noneprint_func (None, str or callable, optional) –
Specifies how the results of each iteration will be displayed, several strings specify built-in functions. Default is ‘iter’ Acceptable inputs: str, Callable or None
None:
causes no printout anywhere of the results of each iteration
Str: ‘all’, ‘diff’, ‘comp’, ‘iter’
’all’:
prints out the full
h2mm_modeljust evaluated, this is very verbose, and not generally recommended unless you really want to know every step of the optimization’diff’:
prints the loglik of the iteration, and the difference between the current and old loglik, this is the same format that the ‘console’ option used, the difference is whether the print function used is the C printf, or Cython print, which changes the destination from the console to the Jupyter notebook
’diff_time’:
same as ‘diff’ but adds the time of the last iteration and the total optimization time
’comp’:
prints out the current and old loglik
’comp_time’:
same as ‘comp’ but with times, like in ‘diff_time’
’iter’:
prints out only the current iteration
- Callable: user defined function
A python function for printing out a custom output, the function must accept the input of (int,
h2mm_model,h2mm_model,h2mm_model, float, float) as the function will be handed(niter, new, current, old, iter_time, total_time)from a special Cython wrapper. iter_time and total_time are the times of the latest iteration and total time of optimization, respectively. Note that iter_time and total_time are based on the fast, but inaccurate C clock function.
print_freq (int, optional) – Number of iterations between updating display. Default is 1
print_args (tuple or None, optional) – Only used when
print_funcis callable. Passed as final argument toprint_funcif notNoneortuple, ieprint_func(niter, new, current, old, iter_time, total_time, print_args, **print_kwargs)iftuple, then passed as *args, ieprint_func(niter, new, current, old, iter_time, total_time, *print_args, **print_kwargs)IfNone, then ignored. The default isNone.print_kwargs (dict or None, optional) – Only used when
print_funcis callable, passed as **kwargs toprint_funcieprint_func(niter, new, current, old, iter_time, time_total, *print_args, **print_kwargs). IfNone, then ignored. The default isNone.print_stream (OutStream, optional) – Typically OutStream, the stream where the output will be displayed. Passed as first argument to print_formatter, if
None, then use stream specified in optimization_limites. Default is Noneprint_formatter (Printer, optional) – Class object (usually subclass of
Printer) that formats text for output toprint_stream. Instance ofprint_formatterclass will be created withfmtr = print_formatter(print_stream, *print_fmt_args, **print_fmt_kwargs)to use for printing. Each print will callfmtr.update(text)wheretextis output ofprint_funccall. At end of optimization,fmtr.close()will be called. Ifprint_formatterisNonethen will use the formatter specified inoptimization_limits. The default isNone.print_fmt_args (tuple, Any or None, optional) – Additional arguments passed to
print_formatter, if None, ignored, if not tuple, then treated as single argument. The default isNone.print_fmt_kwargs (dict or None, optional) – Additional keyword arguments passed to
print_formatter, if None, ignored. The default isNone.max_time (float or None, optional) – The maximum time (in seconds) before returning current model NOTE this uses the inaccurate C clock, which is often longer than the actual time. If
None(default) use value from optimization_limits. Default isNone.converged_min (float or None, optional) – The difference between new and current
h2mm_modelto consider the model converged, the default setting is close to floating point error, it is recommended to only increase the size. If None (default) use value from optimization_limits. Default is Nonenum_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
reset_niter (bool, optional) – Tells the algorithm whether or not to reset the iteration counter of the model, True means that even if the model was previously optimized, the iteration counter will start at 0, so calling
EM_H2MM_C()on a model that reached its maximum iteration number will proceed another max_iter iterations. On the other hand, if set to False, then the new iterations will be added to the old. If set to False, you likely will have to increase max_iter, as the optimization will count the previous iterations towards the max_iter threshold. Default is Falsegamma (bool, optional) –
Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state.
Note
If opt_array is True, then only the gamma arrays for the ideal model are returned.
Default is False
opt_array (bool | int, optional) – Defines how the result is returned, if False (default) (or 0) then return only the ideal model, if True (or 1), then a numpy array is returned containing all models during the optimization, with the last model being the optimial model. If 2 then return all models, including models after converged. Default is False
- Returns:
out (h2mm_model) – The optimized
h2mm_model. will return after one of the following conditions are met: model has converged (according to converged_min, default 1e-14), maximum iterations reached, maximum time has passed, or an error has occurred (usually the result of a nan from a floating point precision error)gamma (numpy.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst. The arrays are returned in the same order and type as the input indexes, individual data points organized as [photon, state] within each data sequence.
- H2MM_C.factory_h2mm_model(nstate, ndet, bounds=None, trans_scale=1e-05, prior_dist='equal', trans_dist='equal', obs_dist='equal')#
Function for generating a model of an arbitrary number of states/streams
- Parameters:
nstate (int) – Number of states in the return model
ndet (int, optional) – Number of photons streams in the return model
bounds (h2mm_limits or None, optional) – The limits expected to be used when optimizing the model. For limits are given, the model will always be within those limits, distribution of states within the model will depend on how The default is None.
trans_scale (float) – Only used when no
h2mm_limitsobject is supplied to bounds, sets the minimum (slowest) transition rates in the trans arrayprior_dist ('even' or 'random', optional) – Define how to distribute prior states. options are ‘equal’ and ‘random’ The default is ‘equal’.
trans_dist ('even' or 'random', optional) – Define how to distribute trans states. options are ‘equal’ and ‘random’ The default is ‘equal’.
obs_dist ('even' or 'random', optional) – Define how to distribute obs states. options are ‘equal’ and ‘random’ The default is ‘equal’.
- Returns:
model – An
h2mm_modelobject ready for optimization- Return type:
h2MM_model
- H2MM_C.H2MM_arr(models, indexes, times, num_cores=None, gamma=False)#
Calculate the logliklihood of every model in a list/array given a set of data
- Parameters:
models (list, tuple, or numpy.ndarray of h2mm_model objects) – All of the
h2mm_modelobject for which the loglik will be calculatedindexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
gamma (bool, optional) –
Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state. the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
- Returns:
out (list tuple, or numpy.ndarray of h2mm_model) – A list, tuple, or numpy array of the
h2mm_modelwith the loglik computed The converged state is automatically set to 0, and nphot is set in accordance with the number of photons in the data set. The datatype returned is the same as the datatype of h_modelgamma_arr (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst and model. The arrays are returned in the same order and type as the input models and indexes, individual data points organized as [photon, state] within each data sequence.
- H2MM_C.viterbi_path(h_mod, indexes, times, num_cores=None)#
Calculate the most likely state path through a set of data given a H2MM model
- Parameters:
h_model (h2mm_model) – An
h2mm_model, should be optimized for the given data set (result ofEM_H2MM_C()orh2mm_model.optimize()) to ensure results correspond to give the most likely pathindexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
num_cores (int or None, optional) – the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations. Note that os.cpu_count() returns number of threads, but it is ideal to take the number of physical cores. Therefore, unless reset by user, optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. If None (default) use value from optimization_limits. Default is None
- Returns:
path (list of NUMPY 1D int arrays) – The most likely state path for each photon
scale (list of NUMPY 1D float arrays) – The posterior probability for each photon
ll (NUMPY 1D float array) – loglikelihood of each burst
icl (float) – Integrated complete likelihood, essentially the BIC for the Viterbi path
- H2MM_C.viterbi_sort(hmod, indexes, times, num_cores=None)#
An all inclusive viterbi processing algorithm. Returns the ICL, the most likely state path, posterior probabilities, and a host of information sorted by bursts and dwells
- Parameters:
h_model (h2mm_model) – An
h2mm_model, should be optimized for the given data set (result ofEM_H2MM_C()orh2mm_model.optimize()) to ensure results correspond to give the most likely pathindexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
num_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
- Returns:
icl (float) – Integrated complete likelihood, essentially the BIC for the Viterbi path
path (list of NUMPY 1D int arrays) – The most likely state path for each photon
scale (list of NUMPY 1D float arrays) – The posterior probability for each photon
ll (NUMPY 1D float array) – loglikelihood of each burst
burst_type (NUMPY 1D int array) – Identifies which states are present in the burst, identified in a binary format, ie if states 0 and 2 are present in a burst, that element will be 0b101 = 5
dwell_mid (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells with full residence time in the data set, sorted by the state of the dwell (Top level list), and the successor state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.
dwell_beg (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells that start at the beginning of a burst, sorted by the state of the dwell (Top level list), and the successor state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.
dwell_end (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells that start at the end of a burst, sorted by the state of the dwell (Top level list), and the preceding state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.
dwell_burst (list of NUMPY 1D int arrays) – List of bursts that were only in one state, giving their durations.
ph_counts (list of NUMPY 2D int arrays) – Counts of photons in each dwell, sorted by index and state of the dwell The index of the list identifies the state of the burst, the 1 index identifies the index of photon, and the 0 index are individual dwells.
ph_mid (square list of NUMPY 2D arrays) – Gives the photon counts of dwells in the middle of bursts, sorted by state of the dwell (top level of the list) and the state of the next dwell in the burst (lower level burst), each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
ph_beg (square list of NUMPY 2D arrays) – Gives the photon counts of dwells at the beginning of bursts, sorted by state of the dwell (top level of the list) and the state of the next dwell in the burst (lower level list), each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
ph_end (square list of NUMPY 2D arrays) – Gives the photon counts of dwells at the ends of bursts, sorted by state of the dwell (top level of the list) and the state of the previous dwell in the burst, each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
ph_burst (list of NUMPY 2D arrays) – Gives the counts of photons in bursts with only a single state. Sorted according to the state of the burst. Each row in an array is a single dwell, the columns correspond to the counts of each photon stream.
- H2MM_C.path_loglik(model, indexes, times, state_path, num_cores=None, BIC=True, total_loglik=False, loglikarray=False, loglikpath=False)#
Function for calculating the statistical parameters of a specific state path and data set given a model (ln(P(XY|lambda))). By default returns the BIC of the entire data set, can also return the base logliklihood in total and/or per burst. Which return values selected are defined in keyword arguments.
- Parameters:
model (h2mm_model) –
h2mm_modelfor which to calculate the loglik of the path.indexes (list[numpy.ndarray], tuple[numpy.ndarray] or numpy.ndarray[numpy.ndarray]) – Indexes of photons in bursts in data (simulated or real).
times (list, tuple, or numpy.ndarray of integer numpy.ndarray) – Arrival times of photons in bursts in (simulated or real).
state_path (list, tuple, or numpy.ndarray of integer numpy.ndarray) – state for each photon in bursts in data (must be infered through viterbi or other algorithm).
num_cores (int, optional) – Number of cores to use in calculation, if None, then use value in optimization_limits. The default is None.
BIC (bool, optional) – Whether to return the Bayes Information Criterion of the calculation. The default is True.
total_loglik (bool, optional) – Whether to return the loglikelihood of the entire data set as a single number. The default is False.
loglikarray (bool, optional) – Whether to return the loglikelihoods of each burst as a numpy array. The default is False.
loglikpath (bool, optional) – Whether to return the loglikelihoods of each photon in each burst as a numpy array of numpy arrays. The default is False
- Raises:
TypeError – One or more inputs did not contain the proper data types.
ValueError – One or more values out of the acceptable ranges given the input model
RuntimeError – Deeper issue, raise issue on github.
- Returns:
bic (float) – Bayes information criterion of entire data set and path.
loglik (float) – Log likelihood of entire dataset
loglik_array (numpy.ndarray) – Loglikelihood of each burst separately
loglikpath (numpy.ndarray) – Loglikelihood of each photon of each burst array.
Simulation Functions#
- H2MM_C.sim_phtraj_from_times(hmod, times, seed=None)#
Generate a state path and photon trajectory for a given set of times
- Parameters:
hmod (h2mm_model) – An
h2mm_modelto build the path and stream trajectories fromtimes (numpy.ndarray 1D int) – An unsigned integer of monotonically increasing times for the burst
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
TypeError – tiems array must be 1D
ValueError – times were not monotonically increasing
RuntimeError – Unknown error, raise issue on github
- Returns:
path (numpy.ndarray 1D int) – State trajectory generated for the input times
dets (numpy.ndarray 1D int) – stream trajectory generate for the input times, derived from path
- H2MM_C.sim_phtraj_from_state(hmod, states, seed=None)#
Generate a photon trajectory from a h2mm model and state trajectory
- Parameters:
hmod (h2mm_model) – An
h2mm_modelto build the stream trajectory fromstates (numpy ndarray 1D int) –
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
TypeError – The state array was not 1D
RuntimeError – Unknown error, please raise issue on github
- Returns:
dets – The random photon stream indexes based on model and statepath
- Return type:
numpy ndarray 1D int
- H2MM_C.sim_sparsestatepath(hmod, times, seed=None)#
Generate a state path from a model and a sparse set of arrival times
- Parameters:
hmod (h2mm_model) – An
h2mm_modelto build the state path fromtimes (numpy ndarray 1D int) – An unsigned integer of monotonically increasing times for the burst
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
TypeError – Your array was not 1D
ValueError – Your time were not monotonically increasing
RuntimeError – Unknown error, please raise issue on github
- Returns:
path – A randomly generated statepath based on the input model and times
- Return type:
numpy ndarray 1D long int
- H2MM_C.sim_statepath(hmod, lenp, seed=None)#
A function to generate a dense statepath (HMM as opposed to H2MM) of length lenp
- Parameters:
hmod (h2mm_model) – An
h2mm_modelto build the state path fromlenp (int) – The length of the path you want to generate
seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.
- Raises:
RuntimeError – A problem with the C code, this is for future proofing
- Returns:
path_ret – The random dense state-path
- Return type:
numpy ndarray, positive integer
Classes#
- class H2MM_C.h2mm_model#
The base class for storing objects representing the H2MM model, stores the relevant matrices, loglikelihood and information about status of optimization.
- Parameters:
prior (numpy.ndarray) – The initial probability matrix, 1D, size nstate
trans (numpy.ndarray) – The transition probability matrix, 2D, shape [nstate, nstate]
obs (numpy.ndarray) – The emission probability matrix, 2D, shape [nstate, ndet]
Parameters (Optional) –
------------------- –
note:: (..) – These are generally only specified when trying to re-create a model from, for instance a previous optimization of a closed notebook.
loglik (float (optional)) – The loglikelihood, always set with nphot otherwise BIC will not be calculated correctly. Usually should be set by evaluation against data. Default is -inf
niter (int) – The number of iterations, used for recreating previously optimized model. Usually should be set by evaluation against data. Default is 0
nphot (int (optional)) – The number of photons in data set used for optimization, should only be set when recreating model, Usually should be set by evaluation against data. Default is 0
is_conv (bool (optional)) – Whether or not the model has met a convergence criterion, should only be set when recreating model, otherwise allow evaluation functions to handle this operation. Default is False
- bic#
Bayes Information Criterion of model
- conv_code#
The convergence code of model, an int
- conv_str#
String description of how model optimization/calculation ended
- copy()#
Make a duplicate model in new memory
- evaluate(indexes, times, gamma=False, num_cores=None, inplace=True)#
Calculate the loglikelihood of the model given a set of data. The loglikelihood is stored with the model. NOTE: this method calls the H2MM_arr function, and has essentially the same syntax
- Parameters:
indexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to11 correspondence to the times list
times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
gamma (bool (optional)) – Whether or not to return the gamma array. (The default is False)
num_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
inplace (bool, optional) – Whether or not to store the evaluated model in the current model object. Default is True
- Returns:
out (h2mm_model) – The evaluated model
gamma_arr (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst and model. The arrays are returned in the same order and type as the indexes, individual data points organized as [photon, state] within each data sequence.
- classmethod frombytes(buffer)#
Create
h2mm_modelfrom a bytes object generated byh2mm_model.toobytes().- Parameters:
buffer (bytes) – Bytes representation of
h2mm_modelobject. Should have originated fromh2mm_model.tobytes().- Raises:
ValueError – Invalid bytes represenation of
h2mm_model, data either was never ah2mm_modelor was corrupted.- Returns:
Data converted to
h2mm_model- Return type:
- is_calc#
If model has been optimized/eveluated, vs being an initial model
- is_conv#
Whether or not the optimization reached convergence rather than exceeding the maximum iteration/time of optimization
- is_opt#
Whether or not the model has undergone optimization, as opposed to evaluation or being an initial model
- k#
Number of free parameters in model
- loglik#
Loglikelihood of model
- ndet#
Nubmer of detectors/photon streams in model
- niter#
Number of iterations optimization took
- normalize()#
For internal use, ensures all model arrays are row stochastic
- nphot#
Number of photons in data set used to optimize model
- nstate#
Number of states in model
- obs#
Emission probability matrix, 2D shape nstate x ndet
- optimize(indexes, times, max_iter=None, bounds_func=None, bounds=None, bounds_kwargs=None, print_func='iter', print_freq=1, print_args=None, print_kwargs=None, print_stream=None, print_formatter=None, print_fmt_args=None, print_fmt_kwargs=None, max_time=inf, converged_min=None, num_cores=None, reset_niter=True, gamma=False, opt_array=False, inplace=False)#
Optimize the H2MM model for the given set of data.
Note
This method calls the
EM_H2MM_C()function.- Parameters:
model (h2mm_model) – An initial guess for the H2MM model, just give a general guess, the algorithm will optimize, and generally the algorithm will converge even when the initial guess is very far off
indexes (list of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list
times (list of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list
max_iter (int or None, optional) – the maximum number of iterations to conduct before returning the current
h2mm_model, if None (default) use value from optimization_limits. Default is Nonebounds_func (str, callable or None, optional) –
function to be evaluated after every iteration of the H2MM algorithm its primary function is to place bounds on
h2mm_modelNote
bounding the
h2mm_modelcauses the guarantee of improvement on each iteration until convergence to no longer apply, therefore the results are no longer guaranteed to be the optimal model within bounds.Default is None
Acceptable inputs:
- C level limits: ‘minmax’ ‘revert’ ‘revert_old’
prevents the model from having values outside of those defined by
h2mm_limitsclass given to bounds, if an iteration produces a new model for loglik calculation in the next iteration with values that are out of those bounds, the 3 function differ in how they correct the model when a new model has a value out of bounds, they are as follows:’minmax’:
sets the out of bounds value to the min or max value closer to the original value
’revert’:
sets the out of bounds value to the value from the last model for which the loglik was calculated
’revert_old’:
similar to revert, but instead reverts to the value from the model one before the last calculated model
- Callable: python function that takes 4 inputs and returns
h2mm_modelobject Warning
The user takes full responsibility for errors in the results.
must be a python function that takes the signature
bounds_func(new, current, old, *bounds, **bounds_kwargs)new,current, andoldareh2mm_modelobjects, withnewbeing the model whose loglik has yet to be calculated resulting from the latest iteration,currentbeing the model whose loglik was just calculated, andoldbeing the result of the previous iteration. Be one of the following:h2mm_model,bool,int, or a 2-tuple ofh2mm_modeland eitherboolorint. In all cases,intvalues must 0, 1, or 2.h2mm_modelobjects will be used as the next model to compute the loglik of.boolandintobjects determine if optimization will continue. IfTrue, then optimization will continue, ifFalse, then optimization will cease, and theoldmodel will be returned as the ideal model. If0, then optimization will continue, if1, then returnoldmodel as optimal model, if2, then returncurrentmodel as optimizal model.
bounds (h2mm_limits, tuple, or None, optional) –
Argument(s) pass to bounds_func function. If bounds_func is a string, then bounds must be specified, and must be a
h2mm_limitsobject.If
bounds_funcis callable, and bounds is not None or a tuple, bounds is passed as the 4th argument to bounds_func, iebounds_func(new, current, old, bounds, **bounds_kwargs), if bounds is a tuple, then passed as *args, iebounds_func(new, current, old, *bounds, **bounds_kwargs)Default is Nonebounds_kwargs (dict, or None, optional) – Only used when
bounds_funcis callable, passed as **kwargs tobounds_funcie iebounds_func(new, current, old, *bounds, **bounds_kwargs). IfNoneThe default is Noneprint_func (None, str or callable, optional) –
Specifies how the results of each iteration will be displayed, several strings specify built-in functions. Default is ‘iter’ Acceptable inputs: str, Callable or None
None:
causes no printout anywhere of the results of each iteration
Str: ‘all’, ‘diff’, ‘comp’, ‘iter’
’all’:
prints out the full
h2mm_modeljust evaluated, this is very verbose, and not generally recommended unless you really want to know every step of the optimization’diff’:
prints the loglik of the iteration, and the difference between the current and old loglik, this is the same format that the ‘console’ option used, the difference is whether the print function used is the C printf, or Cython print, which changes the destination from the console to the Jupyter notebook
’diff_time’:
same as ‘diff’ but adds the time of the last iteration and the total optimization time
’comp’:
prints out the current and old loglik
’comp_time’:
same as ‘comp’ but with times, like in ‘diff_time’
’iter’:
prints out only the current iteration
- Callable: user defined function
A python function for printing out a custom output, the function must accept the input of (int,
h2mm_model,h2mm_model,h2mm_model, float, float) as the function will be handed(niter, new, current, old, iter_time, total_time)from a special Cython wrapper. iter_time and total_time are the times of the latest iteration and total time of optimization, respectively. Note that iter_time and total_time are based on the fast, but inaccurate C clock function.
print_freq (int, optional) – Number of iterations between updating display. Default is 1
print_args (tuple or None, optional) – Only used when
print_funcis callable. Passed as final argument toprint_funcif notNoneortuple, ieprint_func(niter, new, current, old, iter_time, total_time, print_args, **print_kwargs)iftuple, then passed as *args, ieprint_func(niter, new, current, old, iter_time, total_time, *print_args, **print_kwargs)IfNone, then ignored. The default isNone.print_kwargs (dict or None, optional) – Only used when
print_funcis callable, passed as **kwargs toprint_funcieprint_func(niter, new, current, old, iter_time, time_total, *print_args, **print_kwargs). IfNone, then ignored. The default isNone.print_stream (OutStream, optional) – Typically OutStream, the stream where the output will be displayed. Passed as first argument to print_formatter, if
None, then use stream specified in optimization_limites. Default is Noneprint_formatter (Printer, optional) – Class object (usually subclass of
Printer) that formats text for output toprint_stream. Instance ofprint_formatterclass will be created withfmtr = print_formatter(print_stream, *print_fmt_args, **print_fmt_kwargs)to use for printing. Each print will callfmtr.update(text)wheretextis output ofprint_funccall. At end of optimization,fmtr.close()will be called. Ifprint_formatterisNonethen will use the formatter specified inoptimization_limits. The default isNone.print_fmt_args (tuple, Any or None, optional) – Additional arguments passed to
print_formatter, if None, ignored, if not tuple, then treated as single argument. The default isNone.print_fmt_kwargs (dict or None, optional) – Additional keyword arguments passed to
print_formatter, if None, ignored. The default isNone.max_time (float or None, optional) – The maximum time (in seconds) before returning current model NOTE this uses the inaccurate C clock, which is often longer than the actual time. If
None(default) use value from optimization_limits. Default isNone.converged_min (float or None, optional) – The difference between new and current
h2mm_modelto consider the model converged, the default setting is close to floating point error, it is recommended to only increase the size. If None (default) use value from optimization_limits. Default is Nonenum_cores (int or None, optional) –
the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.
Note
optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()
If None (default) use value from optimization_limits. Default is None
reset_niter (bool, optional) – Tells the algorithm whether or not to reset the iteration counter of the model, True means that even if the model was previously optimized, the iteration counter will start at 0, so calling
EM_H2MM_C()on a model that reached its maximum iteration number will proceed another max_iter iterations. On the other hand, if set to False, then the new iterations will be added to the old. If set to False, you likely will have to increase max_iter, as the optimization will count the previous iterations towards the max_iter threshold. Default is Falsegamma (bool, optional) –
Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state.
Note
If opt_array is True, then only the gamma arrays for the ideal model are returned.
Default is False
opt_array (bool | int, optional) – Defines how the result is returned, if False (default) (or 0) then return only the ideal model, if True (or 1), then a numpy array is returned containing all models during the optimization, with the last model being the optimial model. If 2 then return all models, including models after converged. Default is False
inplace (bool, optional) –
Whether or not to store the optimized model in the current model object.
Note
When opt_array is True, then the ideal model is copied into the current model object, while the return value contains all models
Default is False
- Returns:
out (h2mm_model) – The optimized
h2mm_model. will return after one of the following conditions are met: model has converged (according to converged_min, default 1e-14), maximum iterations reached, maximum time has passed, or an error has occurred (usually the result of a nan from a floating point precision error)gamma (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst. The arrays are returned in the same order and type as the input indexes, individual data points organized as [photon, state] within each data sequence.
- prior#
Prior probability matrix, 1D, size nstate
- set_converged(converged=True)#
Modify model to mark as converged or not
- sort_states()#
Return model with states sorted by values in prior, and set as ‘hashable’
- tobytes()#
Serialize
h2mm_modelinto bytes, which can be read back intoh2mm_modelwithh2mm_model.frombytes().- Returns:
Serialized bytes represenation of
h2mm_model. Includes all information contained within model.- Return type:
bytes
- trans#
Transition probability matrix, square, dimenstions nstate x nstate
- class H2MM_C.h2mm_limits(model=None, min_prior=None, max_prior=None, min_trans=None, max_trans=None, min_obs=None, max_obs=None, nstate=0, ndet=0)#
Special class for setting limits on the h2mm_model, as min and max values
- Parameters:
model (h2mm_model, optional) – An
h2mm_modelto base the limits off of, if None. The main purpose is to allow the user to check that the limits are valid for the model. Specifying this model will also lock the states/streams of the model, while if None, the limits is more flexible If None, none of these checks will be in place The default is None.min_prior (float or 1D numpy array, optional) – The minimum value(s) of the prior array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 0.0) The default is None.
max_prior (float or 1D numpy array, optional) – The maximum value(s) of the prior array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 1.0) The default is None.
min_trans (float or 2D square numpy array, optional) – The minimum value(s) of the trans array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. Values on diagonal set to 0.0 If None, no limits are set (all values min is 0.0) The default is None.
max_trans (float or 2D square numpy array, optional) – The maximum value(s) of the trans array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. Values on diagonal set to 1.0 If None, no limits are set (all values min is 1.0) The default is None.
min_obs (float or 2D numpy array, optional) – The minimum value(s) of the obs array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 0.0) The default is None.
max_obs (float or 2D numpy array, optional) – The maximum value(s) of the obs array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 1.0) The default is None.
nstate (int, optional) – Number of states in the model to be optimized, if set to 0, and not specified elsewhere, number of states is flexible. The default is 0.
ndet (int, optional) – Number of streams in the model to be optimized, if set to 0, and not specified elsewhere, number of states is flexible. The default is 0.
- make_model(model, warning=True)#
Method for checking the limits arrays generated from the input model
- Parameters:
model (h2mm_model) –
h2mm_modelfor which the limits are to be specified. Also, values are checked and a warning is raised if the model has values that are out of the range of the limitswarning (bool optional) – Whether or not to check of the initial model is within the bounds generated by make_model Default is True
- Raises:
Exception – When user has reset a specific field so the field is no longer valid.
ValueError – Limits cannot be made for the given model.
- Returns:
model (h2mm_model) – Model to be optimized with specified limits.
min_prior (1D numpy float array) – Minimum values for each element of the prior array
max_prior (1D numpy float array) – Maximum values for each element of the prior array
min_trans (2D square numpy float array) – Minimum values of each element of the trans array
max_trans (2D square numpy float array) – Maximum values of each element of the trans array
min_obs (2D numpy float array) – Minimum values of each element of the obs array
max_obs (2D numpy float array) – Maximum values of each element of the obs array
- class H2MM_C.Printer#
Abstract base class for formatting progress display of optimizations.
- abstract close()#
Method called at end of
H2MM_C.EM_H2MM_C(), used to terminate the output, for instance adding a newline character to output.
- abstract update(text)#
Method called to update output each iteration. Accepts output of print_funcion as only argument.
- class H2MM_C.StdPrinter(buffer, keep=False, single_line=True)#
- close()#
Method called at end of
H2MM_C.EM_H2MM_C(), used to terminate the output, for instance adding a newline character to output.
- update(text)#
Method called to update output each iteration. Accepts output of print_funcion as only argument.
- class H2MM_C.IPyPrinter(buffer, keep=False)#
Class for prining to
IPython.display.DisplayHandleobject.- Parameters:
buffer (IPython.display.DisplayHandle) – DisplayHandle where output is displayed
keep (bool, optional) – Whether to keep old output or replace with new. The default is False
- close()#
Method called at end of
H2MM_C.EM_H2MM_C(), used to terminate the output, for instance adding a newline character to output.
- update(text)#
Update display with text in text
- Parameters:
text (str) – Text to display
- Return type:
None
Constants#
The following are all constants that serve as a bitmask for h2mm_model.conv_code if
& with a model and the given constant is non-zero, then the given condition is met
- int H2MM_C.convcode.ll_computed
Model has had loglik computed against some data (either during optimization or singly
- int H2MM_C.convcode.from_opt
Model was created during optimization
- int H2MM_C.convcode.output
Model is best model during optimization, by some criterion marked by other flags
- int H2MM_C.convcode.converged
Model has best logliklihood within converged_min threshold
- int H2MM_C.convcode.max_iter
Optimization reached maximum number of iterations (only set for models furing terminal iteration)
- int H2MM_C.convcode.max_time
Optimization reach maximum time (only set for models furing terminal iteration)
- int H2MM_C.convcode.error
Error occured during optimization
- int H2MM_C.convcode.post_opt
Model is either the model with poorer loglikelihood between current and old, or is new and loglik is not calculated
- int H2MM_C.convcode.frozen
Model cannot be modified in place