Documentation#

Data Analysis Functions#

H2MM_C.EM_H2MM_C(model, indexes, times, max_iter=None, bounds_func=None, bounds=None, bounds_kwargs=None, print_func='iter', print_freq=1, print_args=None, print_kwargs=None, print_stream=None, print_formatter=None, print_fmt_args=None, print_fmt_kwargs=None, max_time=inf, converged_min=None, num_cores=None, reset_niter=True, gamma=False, opt_array=False)#

Calculate the most likely model that explains the given set of data. The input model is used as the start of the optimization.

Parameters:
  • model (h2mm_model) – An initial guess for the H2MM model, just give a general guess, the algorithm will optimize, and generally the algorithm will converge even when the initial guess is very far off

  • indexes (list of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list

  • times (list of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list

  • max_iter (int or None, optional) – the maximum number of iterations to conduct before returning the current h2mm_model, if None (default) use value from optimization_limits. Default is None

  • bounds_func (str, callable or None, optional) –

    function to be evaluated after every iteration of the H2MM algorithm its primary function is to place bounds on h2mm_model

    Note

    bounding the h2mm_model causes the guarantee of improvement on each iteration until convergence to no longer apply, therefore the results are no longer guaranteed to be the optimal model within bounds.

    Default is None

    Acceptable inputs:

    C level limits: ‘minmax’ ‘revert’ ‘revert_old’

    prevents the model from having values outside of those defined by h2mm_limits class given to bounds, if an iteration produces a new model for loglik calculation in the next iteration with values that are out of those bounds, the 3 function differ in how they correct the model when a new model has a value out of bounds, they are as follows:

    ’minmax’:

    sets the out of bounds value to the min or max value closer to the original value

    ’revert’:

    sets the out of bounds value to the value from the last model for which the loglik was calculated

    ’revert_old’:

    similar to revert, but instead reverts to the value from the model one before the last calculated model

    Callable: python function that takes 4 inputs and returns h2mm_model object

    Warning

    The user takes full responsibility for errors in the results.

    must be a python function that takes the signature bounds_func(new, current, old, *bounds, **bounds_kwargs) new, current, and old are h2mm_model objects, with new being the model whose loglik has yet to be calculated resulting from the latest iteration, current being the model whose loglik was just calculated, and old being the result of the previous iteration. Be one of the following: h2mm_model, bool, int, or a 2-tuple of h2mm_model and either bool or int. In all cases, int values must 0, 1, or 2. h2mm_model objects will be used as the next model to compute the loglik of. bool and int objects determine if optimization will continue. If True, then optimization will continue, if False, then optimization will cease, and the old model will be returned as the ideal model. If 0, then optimization will continue, if 1, then return old model as optimal model, if 2, then return current model as optimizal model.

  • bounds (h2mm_limits, tuple, or None, optional) –

    Argument(s) pass to bounds_func function. If bounds_func is a string, then bounds must be specified, and must be a h2mm_limits object.

    If bounds_func is callable, and bounds is not None or a tuple, bounds is passed as the 4th argument to bounds_func, ie bounds_func(new, current, old, bounds, **bounds_kwargs), if bounds is a tuple, then passed as *args, ie bounds_func(new, current, old, *bounds, **bounds_kwargs) Default is None

  • bounds_kwargs (dict, or None, optional) – Only used when bounds_func is callable, passed as **kwargs to bounds_func ie ie bounds_func(new, current, old, *bounds, **bounds_kwargs). If None The default is None

  • print_func (None, str or callable, optional) –

    Specifies how the results of each iteration will be displayed, several strings specify built-in functions. Default is ‘iter’ Acceptable inputs: str, Callable or None

    None:

    causes no printout anywhere of the results of each iteration

    Str: ‘all’, ‘diff’, ‘comp’, ‘iter’

    ’all’:

    prints out the full h2mm_model just evaluated, this is very verbose, and not generally recommended unless you really want to know every step of the optimization

    ’diff’:

    prints the loglik of the iteration, and the difference between the current and old loglik, this is the same format that the ‘console’ option used, the difference is whether the print function used is the C printf, or Cython print, which changes the destination from the console to the Jupyter notebook

    ’diff_time’:

    same as ‘diff’ but adds the time of the last iteration and the total optimization time

    ’comp’:

    prints out the current and old loglik

    ’comp_time’:

    same as ‘comp’ but with times, like in ‘diff_time’

    ’iter’:

    prints out only the current iteration

    Callable: user defined function

    A python function for printing out a custom output, the function must accept the input of (int, h2mm_model, h2mm_model, h2mm_model, float, float) as the function will be handed (niter, new, current, old, iter_time, total_time) from a special Cython wrapper. iter_time and total_time are the times of the latest iteration and total time of optimization, respectively. Note that iter_time and total_time are based on the fast, but inaccurate C clock function.

  • print_freq (int, optional) – Number of iterations between updating display. Default is 1

  • print_args (tuple or None, optional) – Only used when print_func is callable. Passed as final argument to print_func if not None or tuple, ie print_func(niter, new, current, old, iter_time, total_time, print_args, **print_kwargs) if tuple, then passed as *args, ie print_func(niter, new, current, old, iter_time, total_time, *print_args, **print_kwargs) If None, then ignored. The default is None.

  • print_kwargs (dict or None, optional) – Only used when print_func is callable, passed as **kwargs to print_func ie print_func(niter, new, current, old, iter_time, time_total, *print_args, **print_kwargs). If None, then ignored. The default is None.

  • print_stream (OutStream, optional) – Typically OutStream, the stream where the output will be displayed. Passed as first argument to print_formatter, if None, then use stream specified in optimization_limites. Default is None

  • print_formatter (Printer, optional) – Class object (usually subclass of Printer) that formats text for output to print_stream. Instance of print_formatter class will be created with fmtr = print_formatter(print_stream, *print_fmt_args, **print_fmt_kwargs) to use for printing. Each print will call fmtr.update(text) where text is output of print_func call. At end of optimization, fmtr.close() will be called. If print_formatter is None then will use the formatter specified in optimization_limits. The default is None.

  • print_fmt_args (tuple, Any or None, optional) – Additional arguments passed to print_formatter, if None, ignored, if not tuple, then treated as single argument. The default is None.

  • print_fmt_kwargs (dict or None, optional) – Additional keyword arguments passed to print_formatter, if None, ignored. The default is None.

  • max_time (float or None, optional) – The maximum time (in seconds) before returning current model NOTE this uses the inaccurate C clock, which is often longer than the actual time. If None (default) use value from optimization_limits. Default is None.

  • converged_min (float or None, optional) – The difference between new and current h2mm_model to consider the model converged, the default setting is close to floating point error, it is recommended to only increase the size. If None (default) use value from optimization_limits. Default is None

  • num_cores (int or None, optional) –

    the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.

    Note

    optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()

    If None (default) use value from optimization_limits. Default is None

  • reset_niter (bool, optional) – Tells the algorithm whether or not to reset the iteration counter of the model, True means that even if the model was previously optimized, the iteration counter will start at 0, so calling EM_H2MM_C() on a model that reached its maximum iteration number will proceed another max_iter iterations. On the other hand, if set to False, then the new iterations will be added to the old. If set to False, you likely will have to increase max_iter, as the optimization will count the previous iterations towards the max_iter threshold. Default is False

  • gamma (bool, optional) –

    Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state.

    Note

    If opt_array is True, then only the gamma arrays for the ideal model are returned.

    Default is False

  • opt_array (bool | int, optional) – Defines how the result is returned, if False (default) (or 0) then return only the ideal model, if True (or 1), then a numpy array is returned containing all models during the optimization, with the last model being the optimial model. If 2 then return all models, including models after converged. Default is False

Returns:

  • out (h2mm_model) – The optimized h2mm_model. will return after one of the following conditions are met: model has converged (according to converged_min, default 1e-14), maximum iterations reached, maximum time has passed, or an error has occurred (usually the result of a nan from a floating point precision error)

  • gamma (numpy.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst. The arrays are returned in the same order and type as the input indexes, individual data points organized as [photon, state] within each data sequence.

H2MM_C.factory_h2mm_model(nstate, ndet, bounds=None, trans_scale=1e-05, prior_dist='equal', trans_dist='equal', obs_dist='equal')#

Function for generating a model of an arbitrary number of states/streams

Parameters:
  • nstate (int) – Number of states in the return model

  • ndet (int, optional) – Number of photons streams in the return model

  • bounds (h2mm_limits or None, optional) – The limits expected to be used when optimizing the model. For limits are given, the model will always be within those limits, distribution of states within the model will depend on how The default is None.

  • trans_scale (float) – Only used when no h2mm_limits object is supplied to bounds, sets the minimum (slowest) transition rates in the trans array

  • prior_dist ('even' or 'random', optional) – Define how to distribute prior states. options are ‘equal’ and ‘random’ The default is ‘equal’.

  • trans_dist ('even' or 'random', optional) – Define how to distribute trans states. options are ‘equal’ and ‘random’ The default is ‘equal’.

  • obs_dist ('even' or 'random', optional) – Define how to distribute obs states. options are ‘equal’ and ‘random’ The default is ‘equal’.

Returns:

model – An h2mm_model object ready for optimization

Return type:

h2MM_model

H2MM_C.H2MM_arr(models, indexes, times, num_cores=None, gamma=False)#

Calculate the logliklihood of every model in a list/array given a set of data

Parameters:
  • models (list, tuple, or numpy.ndarray of h2mm_model objects) – All of the h2mm_model object for which the loglik will be calculated

  • indexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list

  • times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list

  • gamma (bool, optional) –

    Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state. the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.

    Note

    optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()

    If None (default) use value from optimization_limits. Default is None

Returns:

  • out (list tuple, or numpy.ndarray of h2mm_model) – A list, tuple, or numpy array of the h2mm_model with the loglik computed The converged state is automatically set to 0, and nphot is set in accordance with the number of photons in the data set. The datatype returned is the same as the datatype of h_model

  • gamma_arr (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst and model. The arrays are returned in the same order and type as the input models and indexes, individual data points organized as [photon, state] within each data sequence.

H2MM_C.viterbi_path(h_mod, indexes, times, num_cores=None)#

Calculate the most likely state path through a set of data given a H2MM model

Parameters:
  • h_model (h2mm_model) – An h2mm_model, should be optimized for the given data set (result of EM_H2MM_C() or h2mm_model.optimize()) to ensure results correspond to give the most likely path

  • indexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list

  • times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list

  • num_cores (int or None, optional) – the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations. Note that os.cpu_count() returns number of threads, but it is ideal to take the number of physical cores. Therefore, unless reset by user, optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. If None (default) use value from optimization_limits. Default is None

Returns:

  • path (list of NUMPY 1D int arrays) – The most likely state path for each photon

  • scale (list of NUMPY 1D float arrays) – The posterior probability for each photon

  • ll (NUMPY 1D float array) – loglikelihood of each burst

  • icl (float) – Integrated complete likelihood, essentially the BIC for the Viterbi path

H2MM_C.viterbi_sort(hmod, indexes, times, num_cores=None)#

An all inclusive viterbi processing algorithm. Returns the ICL, the most likely state path, posterior probabilities, and a host of information sorted by bursts and dwells

Parameters:
  • h_model (h2mm_model) – An h2mm_model, should be optimized for the given data set (result of EM_H2MM_C() or h2mm_model.optimize()) to ensure results correspond to give the most likely path

  • indexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list

  • times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list

  • num_cores (int or None, optional) –

    the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.

    Note

    optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()

    If None (default) use value from optimization_limits. Default is None

Returns:

  • icl (float) – Integrated complete likelihood, essentially the BIC for the Viterbi path

  • path (list of NUMPY 1D int arrays) – The most likely state path for each photon

  • scale (list of NUMPY 1D float arrays) – The posterior probability for each photon

  • ll (NUMPY 1D float array) – loglikelihood of each burst

  • burst_type (NUMPY 1D int array) – Identifies which states are present in the burst, identified in a binary format, ie if states 0 and 2 are present in a burst, that element will be 0b101 = 5

  • dwell_mid (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells with full residence time in the data set, sorted by the state of the dwell (Top level list), and the successor state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.

  • dwell_beg (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells that start at the beginning of a burst, sorted by the state of the dwell (Top level list), and the successor state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.

  • dwell_end (square list of lists of NUMPY 1D int arrays) – Gives the dwell times of all dwells that start at the end of a burst, sorted by the state of the dwell (Top level list), and the preceding state (Lower level list), each element of the numpy array is the duration of the dwell in the clock rate of the data set.

  • dwell_burst (list of NUMPY 1D int arrays) – List of bursts that were only in one state, giving their durations.

  • ph_counts (list of NUMPY 2D int arrays) – Counts of photons in each dwell, sorted by index and state of the dwell The index of the list identifies the state of the burst, the 1 index identifies the index of photon, and the 0 index are individual dwells.

  • ph_mid (square list of NUMPY 2D arrays) – Gives the photon counts of dwells in the middle of bursts, sorted by state of the dwell (top level of the list) and the state of the next dwell in the burst (lower level burst), each row in an array is a single dwell, the columns correspond to the counts of each photon stream.

  • ph_beg (square list of NUMPY 2D arrays) – Gives the photon counts of dwells at the beginning of bursts, sorted by state of the dwell (top level of the list) and the state of the next dwell in the burst (lower level list), each row in an array is a single dwell, the columns correspond to the counts of each photon stream.

  • ph_end (square list of NUMPY 2D arrays) – Gives the photon counts of dwells at the ends of bursts, sorted by state of the dwell (top level of the list) and the state of the previous dwell in the burst, each row in an array is a single dwell, the columns correspond to the counts of each photon stream.

  • ph_burst (list of NUMPY 2D arrays) – Gives the counts of photons in bursts with only a single state. Sorted according to the state of the burst. Each row in an array is a single dwell, the columns correspond to the counts of each photon stream.

H2MM_C.path_loglik(model, indexes, times, state_path, num_cores=None, BIC=True, total_loglik=False, loglikarray=False, loglikpath=False)#

Function for calculating the statistical parameters of a specific state path and data set given a model (ln(P(XY|lambda))). By default returns the BIC of the entire data set, can also return the base logliklihood in total and/or per burst. Which return values selected are defined in keyword arguments.

Parameters:
  • model (h2mm_model) – h2mm_model for which to calculate the loglik of the path.

  • indexes (list[numpy.ndarray], tuple[numpy.ndarray] or numpy.ndarray[numpy.ndarray]) – Indexes of photons in bursts in data (simulated or real).

  • times (list, tuple, or numpy.ndarray of integer numpy.ndarray) – Arrival times of photons in bursts in (simulated or real).

  • state_path (list, tuple, or numpy.ndarray of integer numpy.ndarray) – state for each photon in bursts in data (must be infered through viterbi or other algorithm).

  • num_cores (int, optional) – Number of cores to use in calculation, if None, then use value in optimization_limits. The default is None.

  • BIC (bool, optional) – Whether to return the Bayes Information Criterion of the calculation. The default is True.

  • total_loglik (bool, optional) – Whether to return the loglikelihood of the entire data set as a single number. The default is False.

  • loglikarray (bool, optional) – Whether to return the loglikelihoods of each burst as a numpy array. The default is False.

  • loglikpath (bool, optional) – Whether to return the loglikelihoods of each photon in each burst as a numpy array of numpy arrays. The default is False

Raises:
  • TypeError – One or more inputs did not contain the proper data types.

  • ValueError – One or more values out of the acceptable ranges given the input model

  • RuntimeError – Deeper issue, raise issue on github.

Returns:

  • bic (float) – Bayes information criterion of entire data set and path.

  • loglik (float) – Log likelihood of entire dataset

  • loglik_array (numpy.ndarray) – Loglikelihood of each burst separately

  • loglikpath (numpy.ndarray) – Loglikelihood of each photon of each burst array.

Simulation Functions#

H2MM_C.sim_phtraj_from_times(hmod, times, seed=None)#

Generate a state path and photon trajectory for a given set of times

Parameters:
  • hmod (h2mm_model) – An h2mm_model to build the path and stream trajectories from

  • times (numpy.ndarray 1D int) – An unsigned integer of monotonically increasing times for the burst

  • seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.

Raises:
  • TypeError – tiems array must be 1D

  • ValueError – times were not monotonically increasing

  • RuntimeError – Unknown error, raise issue on github

Returns:

  • path (numpy.ndarray 1D int) – State trajectory generated for the input times

  • dets (numpy.ndarray 1D int) – stream trajectory generate for the input times, derived from path

H2MM_C.sim_phtraj_from_state(hmod, states, seed=None)#

Generate a photon trajectory from a h2mm model and state trajectory

Parameters:
  • hmod (h2mm_model) – An h2mm_model to build the stream trajectory from

  • states (numpy ndarray 1D int) –

  • seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.

Raises:
  • TypeError – The state array was not 1D

  • RuntimeError – Unknown error, please raise issue on github

Returns:

dets – The random photon stream indexes based on model and statepath

Return type:

numpy ndarray 1D int

H2MM_C.sim_sparsestatepath(hmod, times, seed=None)#

Generate a state path from a model and a sparse set of arrival times

Parameters:
  • hmod (h2mm_model) – An h2mm_model to build the state path from

  • times (numpy ndarray 1D int) – An unsigned integer of monotonically increasing times for the burst

  • seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.

Raises:
  • TypeError – Your array was not 1D

  • ValueError – Your time were not monotonically increasing

  • RuntimeError – Unknown error, please raise issue on github

Returns:

path – A randomly generated statepath based on the input model and times

Return type:

numpy ndarray 1D long int

H2MM_C.sim_statepath(hmod, lenp, seed=None)#

A function to generate a dense statepath (HMM as opposed to H2MM) of length lenp

Parameters:
  • hmod (h2mm_model) – An h2mm_model to build the state path from

  • lenp (int) – The length of the path you want to generate

  • seed (positive int, optional) – The initial random seed, use if you want to be able to replicate results. If None, then uses the current time as the seed. The default is None.

Raises:

RuntimeError – A problem with the C code, this is for future proofing

Returns:

path_ret – The random dense state-path

Return type:

numpy ndarray, positive integer

Classes#

class H2MM_C.h2mm_model#

The base class for storing objects representing the H2MM model, stores the relevant matrices, loglikelihood and information about status of optimization.

Parameters:
  • prior (numpy.ndarray) – The initial probability matrix, 1D, size nstate

  • trans (numpy.ndarray) – The transition probability matrix, 2D, shape [nstate, nstate]

  • obs (numpy.ndarray) – The emission probability matrix, 2D, shape [nstate, ndet]

  • Parameters (Optional) –

  • -------------------

  • note:: (..) – These are generally only specified when trying to re-create a model from, for instance a previous optimization of a closed notebook.

  • loglik (float (optional)) – The loglikelihood, always set with nphot otherwise BIC will not be calculated correctly. Usually should be set by evaluation against data. Default is -inf

  • niter (int) – The number of iterations, used for recreating previously optimized model. Usually should be set by evaluation against data. Default is 0

  • nphot (int (optional)) – The number of photons in data set used for optimization, should only be set when recreating model, Usually should be set by evaluation against data. Default is 0

  • is_conv (bool (optional)) – Whether or not the model has met a convergence criterion, should only be set when recreating model, otherwise allow evaluation functions to handle this operation. Default is False

bic#

Bayes Information Criterion of model

conv_code#

The convergence code of model, an int

conv_str#

String description of how model optimization/calculation ended

copy()#

Make a duplicate model in new memory

evaluate(indexes, times, gamma=False, num_cores=None, inplace=True)#

Calculate the loglikelihood of the model given a set of data. The loglikelihood is stored with the model. NOTE: this method calls the H2MM_arr function, and has essentially the same syntax

Parameters:
  • indexes (list or tuple of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to11 correspondence to the times list

  • times (list or tuple of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list

  • gamma (bool (optional)) – Whether or not to return the gamma array. (The default is False)

  • num_cores (int or None, optional) –

    the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.

    Note

    optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()

    If None (default) use value from optimization_limits. Default is None

  • inplace (bool, optional) – Whether or not to store the evaluated model in the current model object. Default is True

Returns:

  • out (h2mm_model) – The evaluated model

  • gamma_arr (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst and model. The arrays are returned in the same order and type as the indexes, individual data points organized as [photon, state] within each data sequence.

classmethod frombytes(buffer)#

Create h2mm_model from a bytes object generated by h2mm_model.toobytes().

Parameters:

buffer (bytes) – Bytes representation of h2mm_model object. Should have originated from h2mm_model.tobytes().

Raises:

ValueError – Invalid bytes represenation of h2mm_model, data either was never a h2mm_model or was corrupted.

Returns:

Data converted to h2mm_model

Return type:

h2mm_model

is_calc#

If model has been optimized/eveluated, vs being an initial model

is_conv#

Whether or not the optimization reached convergence rather than exceeding the maximum iteration/time of optimization

is_opt#

Whether or not the model has undergone optimization, as opposed to evaluation or being an initial model

k#

Number of free parameters in model

loglik#

Loglikelihood of model

ndet#

Nubmer of detectors/photon streams in model

niter#

Number of iterations optimization took

normalize()#

For internal use, ensures all model arrays are row stochastic

nphot#

Number of photons in data set used to optimize model

nstate#

Number of states in model

obs#

Emission probability matrix, 2D shape nstate x ndet

optimize(indexes, times, max_iter=None, bounds_func=None, bounds=None, bounds_kwargs=None, print_func='iter', print_freq=1, print_args=None, print_kwargs=None, print_stream=None, print_formatter=None, print_fmt_args=None, print_fmt_kwargs=None, max_time=inf, converged_min=None, num_cores=None, reset_niter=True, gamma=False, opt_array=False, inplace=False)#

Optimize the H2MM model for the given set of data.

Note

This method calls the EM_H2MM_C() function.

Parameters:
  • model (h2mm_model) – An initial guess for the H2MM model, just give a general guess, the algorithm will optimize, and generally the algorithm will converge even when the initial guess is very far off

  • indexes (list of NUMPY 1D int arrays) – A list of the arrival indexes for each photon in each burst. Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The indexes list must maintain 1-to-1 correspondence to the times list

  • times (list of NUMPY 1D int arrays) – A list of the arrival times for each photon in each burst Each element of the list (a numpy array) corresponds to a burst, and each element of the array is a singular photon. The times list must maintain 1-to-1 correspondence to the indexes list

  • max_iter (int or None, optional) – the maximum number of iterations to conduct before returning the current h2mm_model, if None (default) use value from optimization_limits. Default is None

  • bounds_func (str, callable or None, optional) –

    function to be evaluated after every iteration of the H2MM algorithm its primary function is to place bounds on h2mm_model

    Note

    bounding the h2mm_model causes the guarantee of improvement on each iteration until convergence to no longer apply, therefore the results are no longer guaranteed to be the optimal model within bounds.

    Default is None

    Acceptable inputs:

    C level limits: ‘minmax’ ‘revert’ ‘revert_old’

    prevents the model from having values outside of those defined by h2mm_limits class given to bounds, if an iteration produces a new model for loglik calculation in the next iteration with values that are out of those bounds, the 3 function differ in how they correct the model when a new model has a value out of bounds, they are as follows:

    ’minmax’:

    sets the out of bounds value to the min or max value closer to the original value

    ’revert’:

    sets the out of bounds value to the value from the last model for which the loglik was calculated

    ’revert_old’:

    similar to revert, but instead reverts to the value from the model one before the last calculated model

    Callable: python function that takes 4 inputs and returns h2mm_model object

    Warning

    The user takes full responsibility for errors in the results.

    must be a python function that takes the signature bounds_func(new, current, old, *bounds, **bounds_kwargs) new, current, and old are h2mm_model objects, with new being the model whose loglik has yet to be calculated resulting from the latest iteration, current being the model whose loglik was just calculated, and old being the result of the previous iteration. Be one of the following: h2mm_model, bool, int, or a 2-tuple of h2mm_model and either bool or int. In all cases, int values must 0, 1, or 2. h2mm_model objects will be used as the next model to compute the loglik of. bool and int objects determine if optimization will continue. If True, then optimization will continue, if False, then optimization will cease, and the old model will be returned as the ideal model. If 0, then optimization will continue, if 1, then return old model as optimal model, if 2, then return current model as optimizal model.

  • bounds (h2mm_limits, tuple, or None, optional) –

    Argument(s) pass to bounds_func function. If bounds_func is a string, then bounds must be specified, and must be a h2mm_limits object.

    If bounds_func is callable, and bounds is not None or a tuple, bounds is passed as the 4th argument to bounds_func, ie bounds_func(new, current, old, bounds, **bounds_kwargs), if bounds is a tuple, then passed as *args, ie bounds_func(new, current, old, *bounds, **bounds_kwargs) Default is None

  • bounds_kwargs (dict, or None, optional) – Only used when bounds_func is callable, passed as **kwargs to bounds_func ie ie bounds_func(new, current, old, *bounds, **bounds_kwargs). If None The default is None

  • print_func (None, str or callable, optional) –

    Specifies how the results of each iteration will be displayed, several strings specify built-in functions. Default is ‘iter’ Acceptable inputs: str, Callable or None

    None:

    causes no printout anywhere of the results of each iteration

    Str: ‘all’, ‘diff’, ‘comp’, ‘iter’

    ’all’:

    prints out the full h2mm_model just evaluated, this is very verbose, and not generally recommended unless you really want to know every step of the optimization

    ’diff’:

    prints the loglik of the iteration, and the difference between the current and old loglik, this is the same format that the ‘console’ option used, the difference is whether the print function used is the C printf, or Cython print, which changes the destination from the console to the Jupyter notebook

    ’diff_time’:

    same as ‘diff’ but adds the time of the last iteration and the total optimization time

    ’comp’:

    prints out the current and old loglik

    ’comp_time’:

    same as ‘comp’ but with times, like in ‘diff_time’

    ’iter’:

    prints out only the current iteration

    Callable: user defined function

    A python function for printing out a custom output, the function must accept the input of (int, h2mm_model, h2mm_model, h2mm_model, float, float) as the function will be handed (niter, new, current, old, iter_time, total_time) from a special Cython wrapper. iter_time and total_time are the times of the latest iteration and total time of optimization, respectively. Note that iter_time and total_time are based on the fast, but inaccurate C clock function.

  • print_freq (int, optional) – Number of iterations between updating display. Default is 1

  • print_args (tuple or None, optional) – Only used when print_func is callable. Passed as final argument to print_func if not None or tuple, ie print_func(niter, new, current, old, iter_time, total_time, print_args, **print_kwargs) if tuple, then passed as *args, ie print_func(niter, new, current, old, iter_time, total_time, *print_args, **print_kwargs) If None, then ignored. The default is None.

  • print_kwargs (dict or None, optional) – Only used when print_func is callable, passed as **kwargs to print_func ie print_func(niter, new, current, old, iter_time, time_total, *print_args, **print_kwargs). If None, then ignored. The default is None.

  • print_stream (OutStream, optional) – Typically OutStream, the stream where the output will be displayed. Passed as first argument to print_formatter, if None, then use stream specified in optimization_limites. Default is None

  • print_formatter (Printer, optional) – Class object (usually subclass of Printer) that formats text for output to print_stream. Instance of print_formatter class will be created with fmtr = print_formatter(print_stream, *print_fmt_args, **print_fmt_kwargs) to use for printing. Each print will call fmtr.update(text) where text is output of print_func call. At end of optimization, fmtr.close() will be called. If print_formatter is None then will use the formatter specified in optimization_limits. The default is None.

  • print_fmt_args (tuple, Any or None, optional) – Additional arguments passed to print_formatter, if None, ignored, if not tuple, then treated as single argument. The default is None.

  • print_fmt_kwargs (dict or None, optional) – Additional keyword arguments passed to print_formatter, if None, ignored. The default is None.

  • max_time (float or None, optional) – The maximum time (in seconds) before returning current model NOTE this uses the inaccurate C clock, which is often longer than the actual time. If None (default) use value from optimization_limits. Default is None.

  • converged_min (float or None, optional) – The difference between new and current h2mm_model to consider the model converged, the default setting is close to floating point error, it is recommended to only increase the size. If None (default) use value from optimization_limits. Default is None

  • num_cores (int or None, optional) –

    the number of C threads (which ignore the gil, thus functioning more like python processes), to use when calculating iterations.

    Note

    optimization_limtis sets this to be os.cpu_count() // 2, as most machines are multithreaded. Because os.cpu_count() returns the number of threads, not necessarily the number of cpus. For most machines, being multithreaded, this actually returns twice the number of physical cores. Hence the default to set at os.cpu_count() // 2. If your machine is not multithreaded, it is best to set optimization_limits.num_cores = os.cpu_count()

    If None (default) use value from optimization_limits. Default is None

  • reset_niter (bool, optional) – Tells the algorithm whether or not to reset the iteration counter of the model, True means that even if the model was previously optimized, the iteration counter will start at 0, so calling EM_H2MM_C() on a model that reached its maximum iteration number will proceed another max_iter iterations. On the other hand, if set to False, then the new iterations will be added to the old. If set to False, you likely will have to increase max_iter, as the optimization will count the previous iterations towards the max_iter threshold. Default is False

  • gamma (bool, optional) –

    Whether or not to return the gamma array, which gives the probabilities of each photon being in a given state.

    Note

    If opt_array is True, then only the gamma arrays for the ideal model are returned.

    Default is False

  • opt_array (bool | int, optional) – Defines how the result is returned, if False (default) (or 0) then return only the ideal model, if True (or 1), then a numpy array is returned containing all models during the optimization, with the last model being the optimial model. If 2 then return all models, including models after converged. Default is False

  • inplace (bool, optional) –

    Whether or not to store the optimized model in the current model object.

    Note

    When opt_array is True, then the ideal model is copied into the current model object, while the return value contains all models

    Default is False

Returns:

  • out (h2mm_model) – The optimized h2mm_model. will return after one of the following conditions are met: model has converged (according to converged_min, default 1e-14), maximum iterations reached, maximum time has passed, or an error has occurred (usually the result of a nan from a floating point precision error)

  • gamma (list, tuple or np.ndarray (optional)) – If gamma = True, this variable is returned, which contains (as numpy arrays) the probabilities of each photon to be in each state, for each burst. The arrays are returned in the same order and type as the input indexes, individual data points organized as [photon, state] within each data sequence.

prior#

Prior probability matrix, 1D, size nstate

set_converged(converged=True)#

Modify model to mark as converged or not

sort_states()#

Return model with states sorted by values in prior, and set as ‘hashable’

tobytes()#

Serialize h2mm_model into bytes, which can be read back into h2mm_model with h2mm_model.frombytes().

Returns:

Serialized bytes represenation of h2mm_model. Includes all information contained within model.

Return type:

bytes

trans#

Transition probability matrix, square, dimenstions nstate x nstate

class H2MM_C.h2mm_limits(model=None, min_prior=None, max_prior=None, min_trans=None, max_trans=None, min_obs=None, max_obs=None, nstate=0, ndet=0)#

Special class for setting limits on the h2mm_model, as min and max values

Parameters:
  • model (h2mm_model, optional) – An h2mm_model to base the limits off of, if None. The main purpose is to allow the user to check that the limits are valid for the model. Specifying this model will also lock the states/streams of the model, while if None, the limits is more flexible If None, none of these checks will be in place The default is None.

  • min_prior (float or 1D numpy array, optional) – The minimum value(s) of the prior array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 0.0) The default is None.

  • max_prior (float or 1D numpy array, optional) – The maximum value(s) of the prior array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 1.0) The default is None.

  • min_trans (float or 2D square numpy array, optional) – The minimum value(s) of the trans array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. Values on diagonal set to 0.0 If None, no limits are set (all values min is 0.0) The default is None.

  • max_trans (float or 2D square numpy array, optional) – The maximum value(s) of the trans array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. Values on diagonal set to 1.0 If None, no limits are set (all values min is 1.0) The default is None.

  • min_obs (float or 2D numpy array, optional) – The minimum value(s) of the obs array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 0.0) The default is None.

  • max_obs (float or 2D numpy array, optional) – The maximum value(s) of the obs array. If float, all values are set the same, but unless fixed elsewhere, the number of states is flexible. If None, no limits are set (all values min is 1.0) The default is None.

  • nstate (int, optional) – Number of states in the model to be optimized, if set to 0, and not specified elsewhere, number of states is flexible. The default is 0.

  • ndet (int, optional) – Number of streams in the model to be optimized, if set to 0, and not specified elsewhere, number of states is flexible. The default is 0.

make_model(model, warning=True)#

Method for checking the limits arrays generated from the input model

Parameters:
  • model (h2mm_model) – h2mm_model for which the limits are to be specified. Also, values are checked and a warning is raised if the model has values that are out of the range of the limits

  • warning (bool optional) – Whether or not to check of the initial model is within the bounds generated by make_model Default is True

Raises:
  • Exception – When user has reset a specific field so the field is no longer valid.

  • ValueError – Limits cannot be made for the given model.

Returns:

  • model (h2mm_model) – Model to be optimized with specified limits.

  • min_prior (1D numpy float array) – Minimum values for each element of the prior array

  • max_prior (1D numpy float array) – Maximum values for each element of the prior array

  • min_trans (2D square numpy float array) – Minimum values of each element of the trans array

  • max_trans (2D square numpy float array) – Maximum values of each element of the trans array

  • min_obs (2D numpy float array) – Minimum values of each element of the obs array

  • max_obs (2D numpy float array) – Maximum values of each element of the obs array

class H2MM_C.Printer#

Abstract base class for formatting progress display of optimizations.

abstract close()#

Method called at end of H2MM_C.EM_H2MM_C(), used to terminate the output, for instance adding a newline character to output.

abstract update(text)#

Method called to update output each iteration. Accepts output of print_funcion as only argument.

class H2MM_C.StdPrinter(buffer, keep=False, single_line=True)#
close()#

Method called at end of H2MM_C.EM_H2MM_C(), used to terminate the output, for instance adding a newline character to output.

update(text)#

Method called to update output each iteration. Accepts output of print_funcion as only argument.

class H2MM_C.IPyPrinter(buffer, keep=False)#

Class for prining to IPython.display.DisplayHandle object.

Parameters:
  • buffer (IPython.display.DisplayHandle) – DisplayHandle where output is displayed

  • keep (bool, optional) – Whether to keep old output or replace with new. The default is False

close()#

Method called at end of H2MM_C.EM_H2MM_C(), used to terminate the output, for instance adding a newline character to output.

update(text)#

Update display with text in text

Parameters:

text (str) – Text to display

Return type:

None

Constants#

The following are all constants that serve as a bitmask for h2mm_model.conv_code if & with a model and the given constant is non-zero, then the given condition is met

int H2MM_C.convcode.ll_computed

Model has had loglik computed against some data (either during optimization or singly

int H2MM_C.convcode.from_opt

Model was created during optimization

int H2MM_C.convcode.output

Model is best model during optimization, by some criterion marked by other flags

int H2MM_C.convcode.converged

Model has best logliklihood within converged_min threshold

int H2MM_C.convcode.max_iter

Optimization reached maximum number of iterations (only set for models furing terminal iteration)

int H2MM_C.convcode.max_time

Optimization reach maximum time (only set for models furing terminal iteration)

int H2MM_C.convcode.error

Error occured during optimization

int H2MM_C.convcode.post_opt

Model is either the model with poorer loglikelihood between current and old, or is new and loglik is not calculated

int H2MM_C.convcode.frozen

Model cannot be modified in place