Simulations Tutorial ==================== .. currentmodule:: H2MM_C .. seealso:: This can also be viewed as a Jupyter Notebook Download :download:`H2MM_Simulation_Tutorial.ipynb ` Download data file here: :download:`sample_data_3det.txt ` The other side of H2MM_C is a set of functions for producing simulated data. The simulation functions of H2MM_C are starte with :code:`sim_`. Within |H2MM| there are several "levels" of data: - **times**, treated as fixed points, part of normal data - **state** of each data point, these can be simulated from a :class:`h2mm_model` and a set of **times** - **index** of each data point- the observable of |H2MM|, these can be simulated with **times**, a :class:`h2mm_model` and (optionally) **states** .. note:: These simulates are based purely on hidden Markov modeling. They are **NOT** molecular, fluorescence or other sort of simulation. Use another package if you want to simulate something while explicitly handling such complexities in the simulation First, let's get our imports out of the way. .. code-block:: import os import numpy as np from matplotlib import pyplot as plt import H2MM_C as hm Basic Simulation ---------------- The most likely simulation function you will use is :func:`sim_phtraj_from_times`. This function takes a :class:`h2mm_model` object and an array of arrival times (equivalent to 1 element of the :code:`times` list given to :func:`EM_H2MM_C`/:meth:`h2mm_model.optimize`, and nearly all the other non-simulation functions/methods, and returns a set of **states** and **indices** (the latter equivalent to one element of the :code:`indexes` list. So, first lets generate a random distribution of times: .. code-block:: time = np.cumsum(np.random.exponential(100, size=50).astype(int)) Then a model (we'll make a rough approximation of teh model from the 3 detector setup from :doc:`FOptimizationTutorial` .. code-block:: # define the arrays prior = np.array([0.63, 0.03, 0.19, 0.15]) trans = np.array([[0.9997, 0.0001, 0.0001, 0.0001], [2e-5, 1-3.2e-5, 1e-5, 2e-6], [5e-6, 7e-6, 1-2.2e-5, 1e-5], [3e-6, 3e-6, 4e-5, 1-4.6e-5]]) obs = np.array([[0.62, 0.37, 0.01], [0.14, 0.29, 0.57], [0.44, 0.09, 0.47], [0.84, 0.08, 0.08]]) # make the model sim_model = hm.h2mm_model(prior, trans, obs) And finally, simulate the data to get an array of **states** and **indices**. These are simulations producing the resutls of *Viterbi* and the data point indices respectively. .. code-block:: simstate, simcolor = hm.sim_phtraj_from_times(sim_model, time) simstate, simcolor | (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | 0, 0, 0, 0, 0, 0], dtype=uint32), | array([1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, | 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, | 1, 1, 0, 1, 1, 1], dtype=uint32)) Recreating Data: Multiple Arrays -------------------------------- Now, if we want to simulate a full data set, you will want to repeat the previous process many times, which means using a loop. .. code-block:: # initiate lists simtimes = list() simstates = list() simcolors = list() # loop to create each set for _ in range(1000): # generate new time array simtime = np.cumsum(np.random.exponential(100, size=np.random.randint(50,150)).astype(int)) # simulate data simstate, simcolor = hm.sim_phtraj_from_times(sim_model, simtime) # append arrays to lists simtimes.append(simtime) simstates.append(simstate) simcolors.append(simcolor) Using Existing Times -------------------- Another strategy, which can be a way to check the reasonableness of a model, is to use the actual times of an experiment, and the resulting :class:`h2mm_model`, and then compare the real data to the simulated data. .. note:: Markov processes are inherently random, you will need to come up with some metric along which to compare. You will certainly not get the same indices in the simulated data and real data. The ratios of the indices should be similar however- use you knowledge about the system to figure out a legitimate way to compare. So let's load the times from the 3 detector data, and simulate the data from that: .. code-block:: ############################################################## # The code here is just for loading the data # load the data # color3 = list() # to save memory, we will not load this times3 = list() i = 0 with open('sample_data_3det.txt','r') as f: for line in f: if i % 2 == 0: times3.append(np.array([int(x) for x in line.split()],dtype='Q')) # No need to load the color, so comment it out # else: # color3.append(np.array([int(x) for x in line.split()],dtype='L')) i += 1 # End of data loading segment ############################################################## # initiate arrays simstates3 = list() simcolors3 = list() # loop over each time array and simulate each set for tm3 in times3: # conduct the simulation simstate3, simcolor3 = hm.sim_phtraj_from_times(sim_model, tm3) # append arrays to lists simstates3.append(simstate3) simcolors3.append(simcolor3) Simulating from Components -------------------------- It is also possible to simulate first **states**, and then with a separate function, simulate the times. If you only want the state path, you can just do the first step, and then save the memory, computational time, and code complexity, or maybe you already have the state path, so just to the second half. This is done with the :func:`sim_sparsestatepath` and :func:`sim_phtraj_from_state` functions. .. note:: :func:`sim_sparsestatepath` uses only the :attr:`h2mm_model.prior` and :attr:`h2mm_model.trans` arrays to calculate the states, because indices are not calcualted, the :attr:`h2mm_model.obs` array will have no influence on the results. Conversely, :func:`sim_phtraj_from_state` already has the states from input, and therefore the only important array for its simulation is the :attr:`h2mm_model.obs` array. **Simulating a State Path** .. code-block:: simtime = np.cumsum(np.random.exponential(100, size=50).astype(int)) statepath = hm.sim_sparsestatepath(sim_model, simtime) statepath | array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, | 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, | 2, 2, 2, 2, 2, 2], dtype=uint32) **Simulating Data from States** .. code-block:: color = hm.sim_phtraj_from_state(sim_model, statepath) color | array([0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 2, | 1, 2, 0, 0, 2, 1, 1, 2, 2, 2, 0, 0, 2, 2, 0, 2, 1, 0, 0, 2, 2, 0, | 0, 2, 2, 2, 0, 0], dtype=uint32) Setting the Random Seed for Reproducibility ------------------------------------------- Since these simulations are based on a random number generator, results will be different each time. However, if repeatability is desired, the seed of the random number generator can be set with the keyword argument :code:`seed`. The same syntax is used across all three simulation functions. .. note:: The random seed is persitent, so it should only be set once. Each time the seed is set, the counter on the random number generator is reset. .. code-block:: simpath, simcolor = hm.sim_phtraj_from_times(sim_model, time, seed=100) simpath, simcolor | (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | 0, 0, 0, 0, 0, 0], dtype=uint32), | array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, | 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, | 0, 0, 0, 0, 1, 1], dtype=uint32)) .. |H2MM| replace:: H\ :sup:`2`\ MM