Simulations Tutorial#
See also
This can also be viewed as a Jupyter Notebook
Download H2MM_Simulation_Tutorial.ipynb
Download data file here: sample_data_3det.txt
The other side of H2MM_C is a set of functions for producing simulated data.
The simulation functions of H2MM_C are starte with sim_
.
Within H2MM there are several “levels” of data:
times, treated as fixed points, part of normal data
state of each data point, these can be simulated from a
h2mm_model
and a set of timesindex of each data point- the observable of H2MM, these can be simulated with times, a
h2mm_model
and (optionally) states
Note
These simulates are based purely on hidden Markov modeling. They are NOT molecular, fluorescence or other sort of simulation. Use another package if you want to simulate something while explicitly handling such complexities in the simulation
First, let’s get our imports out of the way.
import os
import numpy as np
from matplotlib import pyplot as plt
import H2MM_C as hm
Basic Simulation#
The most likely simulation function you will use is sim_phtraj_from_times()
.
This function takes a h2mm_model
object and an array of arrival times (equivalent to 1 element of the times
list given to EM_H2MM_C()
/h2mm_model.optimize()
, and nearly all the other non-simulation functions/methods, and returns a set of states and indices (the latter equivalent to one element of the indexes
list.
So, first lets generate a random distribution of times:
time = np.cumsum(np.random.exponential(100, size=50).astype(int))
Then a model (we’ll make a rough approximation of teh model from the 3 detector setup from Basic Optimizations with H2MM_C
# define the arrays
prior = np.array([0.63, 0.03, 0.19, 0.15])
trans = np.array([[0.9997, 0.0001, 0.0001, 0.0001],
[2e-5, 1-3.2e-5, 1e-5, 2e-6],
[5e-6, 7e-6, 1-2.2e-5, 1e-5],
[3e-6, 3e-6, 4e-5, 1-4.6e-5]])
obs = np.array([[0.62, 0.37, 0.01],
[0.14, 0.29, 0.57],
[0.44, 0.09, 0.47],
[0.84, 0.08, 0.08]])
# make the model
sim_model = hm.h2mm_model(prior, trans, obs)
And finally, simulate the data to get an array of states and indices. These are simulations producing the resutls of Viterbi and the data point indices respectively.
simstate, simcolor = hm.sim_phtraj_from_times(sim_model, time)
simstate, simcolor
Recreating Data: Multiple Arrays#
Now, if we want to simulate a full data set, you will want to repeat the previous process many times, which means using a loop.
# initiate lists
simtimes = list()
simstates = list()
simcolors = list()
# loop to create each set
for _ in range(1000):
# generate new time array
simtime = np.cumsum(np.random.exponential(100, size=np.random.randint(50,150)).astype(int))
# simulate data
simstate, simcolor = hm.sim_phtraj_from_times(sim_model, simtime)
# append arrays to lists
simtimes.append(simtime)
simstates.append(simstate)
simcolors.append(simcolor)
Using Existing Times#
Another strategy, which can be a way to check the reasonableness of a model, is to use the actual times of an experiment, and the resulting h2mm_model
, and then compare the real data to the simulated data.
Note
Markov processes are inherently random, you will need to come up with some metric along which to compare. You will certainly not get the same indices in the simulated data and real data. The ratios of the indices should be similar however- use you knowledge about the system to figure out a legitimate way to compare.
So let’s load the times from the 3 detector data, and simulate the data from that:
##############################################################
# The code here is just for loading the data
# load the data
# color3 = list() # to save memory, we will not load this
times3 = list()
i = 0
with open('sample_data_3det.txt','r') as f:
for line in f:
if i % 2 == 0:
times3.append(np.array([int(x) for x in line.split()],dtype='Q'))
# No need to load the color, so comment it out
# else:
# color3.append(np.array([int(x) for x in line.split()],dtype='L'))
i += 1
# End of data loading segment
##############################################################
# initiate arrays
simstates3 = list()
simcolors3 = list()
# loop over each time array and simulate each set
for tm3 in times3:
# conduct the simulation
simstate3, simcolor3 = hm.sim_phtraj_from_times(sim_model, tm3)
# append arrays to lists
simstates3.append(simstate3)
simcolors3.append(simcolor3)
Simulating from Components#
It is also possible to simulate first states, and then with a separate function, simulate the times. If you only want the state path, you can just do the first step, and then save the memory, computational time, and code complexity, or maybe you already have the state path, so just to the second half.
This is done with the sim_sparsestatepath()
and sim_phtraj_from_state()
functions.
Note
sim_sparsestatepath()
uses only the h2mm_model.prior
and h2mm_model.trans
arrays to calculate the states, because indices are not calcualted, the h2mm_model.obs
array will have no influence on the results.
Conversely, sim_phtraj_from_state()
already has the states from input, and therefore the only important array for its simulation is the h2mm_model.obs
array.
Simulating a State Path
simtime = np.cumsum(np.random.exponential(100, size=50).astype(int))
statepath = hm.sim_sparsestatepath(sim_model, simtime)
statepath
Simulating Data from States
color = hm.sim_phtraj_from_state(sim_model, statepath)
color
Setting the Random Seed for Reproducibility#
Since these simulations are based on a random number generator, results will be different each time.
However, if repeatability is desired, the seed of the random number generator can be set with the keyword argument seed
.
The same syntax is used across all three simulation functions.
Note
The random seed is persitent, so it should only be set once. Each time the seed is set, the counter on the random number generator is reset.
simpath, simcolor = hm.sim_phtraj_from_times(sim_model, time, seed=100)
simpath, simcolor