The Annals of Applied Statistics
2019, Vol. 13, No. 3, 1397–1429
https://doi.org/10.1214/19-AOAS1240
© Institute of Mathematical Statistics, 2019
A HIDDEN MARKOV MODEL APPROACH TO CHARACTERIZING
THE PHOTO-SWITCHING BEHAVIOR OF FLUOROPHORES
BY LEKHA PATEL∗,1, NILS GUSTAFSSON†,2, YU LIN‡, RAIMUND OBER§,¶,3,
RICARDO HENRIQUES†,∥,4 AND EDWARD COHEN∗,5
Imperial College London∗, University College London†, European Molecular
Biology Laboratory Heidelberg‡, Texas A&M§, University of Southampton¶
and Francis Crick Institute∥
Fluorescing molecules (ﬂuorophores) that stochastically switch between
photon-emitting and dark states underpin some of the most celebrated ad-
vancements in super-resolution microscopy. While this stochastic behavior
has been heavily exploited, full characterization of the underlying models can
potentially drive forward further imaging methodologies. Under the assump-
tion that ﬂuorophores move between ﬂuorescing and dark states as continu-
ous time Markov processes, the goal is to use a sequence of images to select
a model and estimate the transition rates. We use a hidden Markov model to
relate the observed discrete time signal to the hidden continuous time pro-
cess. With imaging involving several repeat exposures of the ﬂuorophore, we
show the observed signal depends on both the current and past states of the
hidden process, producing emission probabilities that depend on the transi-
tion rate parameters to be estimated. To tackle this unusual coupling of the
transition and emission probabilities, we conceive transmission (transition-
emission) matrices that capture all dependencies of the model. We provide
a scheme of computing these matrices and adapt the forward-backward al-
gorithm to compute a likelihood which is readily optimized to provide rate
estimates. When confronted with several model proposals, combining this
procedure with the Bayesian Information Criterion provides accurate model
selection.
1. Introduction.
Fluorescence microscopy is a collection of techniques that
utilize the photon emitting properties of ﬂuorescing molecules, called ﬂuo-
rophores, to perform optical imaging, particularly in cell biology and biomedical
applications. Recent years have seen the advent of a number of super-resolution
Received January 2018; revised September 2018.
1Supported by the Imperial College London President’s Scholarship.
2Supported by the Engineering and Physical Sciences Research Council (EP/L504889/1).
3Supported by the National Institute of Health (R01 GM085575).
4Supported by the Medical Research Council (MR/K015826/1) and Biotechnology and Biological
Sciences Research Council (BB/M022374/1).
5Corresponding author.
Key words and phrases. Hidden Markov models, Markov processes, rate estimation, forward-
backward algorithm, super-resolution microscopy.
1397

----!@#$NewPage!@#$----
1398
L. PATEL ET AL.
microscopy techniques that have bypassed the classical resolution limits of ﬂu-
orescence microscopy (Huang, Bates and Zhuang (2009)). Speciﬁcally, single
molecule localization microscopy (SMLM) approaches, such as photoactivated
localization microscopy (PALM) (Betzig et al. (2006), Hess, Girirajan and Mason
(2006)) and stochastic optical reconstruction microscopy (STORM) (Rust, Bates
and Zhuang (2006), Heilemann et al. (2008)), rely on the ability exhibited by some
ﬂuorophores to photoswitch stochastically between a photon emitting On state
and nonemitting dark states (Van de Linde and Sauer (2014), Ha and Tinnefeld
(2012)). A specimen decorated with a spatially dense number of photon emitting
ﬂuorophores prevents accurate identiﬁcation of individual ﬂuorophores and res-
olution of structures smaller than the diffraction limit—see Figure 1(a). Using a
ﬂuorophore with stochastic photo-switching properties can provide an imaging en-
vironment where the majority of ﬂuorophores are in a dark state, while a sparse
number have stochastically switched into a transient photon emitting On state.
This results in the visible ﬂuorophores being sparse and well separated in space;
with the use of a high-performance camera the individual ﬂuorophores in the On
state can be identiﬁed and localized with nanometer scale precision by ﬁtting point
spread functions (Ober et al. (2015), Sage et al. (2015))—see Figure 1(b). Through
the acquisition across time of a large sequence of images (typically thousands)—
see Figure 1(a)—many more photo-switching ﬂuorophores can be isolated in time
and precisely localized in space. When aggregated and plotted, these localizations
provide an accurate and detailed map of ﬂuorophore positions giving rise to a
super-resolved image.
Lateral resolutions of 10–30 nanometers (nm) are possible in biological sam-
ples using SMLM, however, the resolution and image quality is strongly depen-
dent on the photo-switching properties of the ﬂuorophore used. While longer
On states provide a greater number of photons being recorded by the camera,
FIG. 1.
(a) Illustration of the SMLM imaging process. When all ﬂuorophores simultaneously stay in
a photon emitting On state, diffraction renders structures unresolvable. Stochastically photo-switch-
ing ﬂuorophores imaged over time across several frames give rise to a sequence of sparsely populated
images where each ﬂuorophore can be isolated and localized with high precision. Aggregating these
frames gives rise to a super-resolved image. Data from Sage et al. (2015). (b) Isolated ﬂuorophores
are localized by ﬁtting the point spread function (PSF) to the diffraction limited spot.

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1399
which in turn leads to greater precision in localizing spatially isolated ﬂuorophores
(Ober, Ram and Ward (2004), Ram, Ward and Ober (2013), Thompson, Larson
and Webb (2002), Rieger and Stallinga (2014)), the increased random occurrence
of ﬂuorophores simultaneously occupying the On state within a diffraction lim-
ited spot can lead to signiﬁcant imprecision, missed events and unwanted arti-
facts (Van de Linde et al. (2010), Nieuwenhuizen et al. (2015)). Thus, a careful
choice of ﬂuorophore and the environment used to promote photo-switching—
controlled by the buffer solution and illumination intensity—must be made for the
intended application. This is particularly important in live-cell applications when
considerations must be made for temporal resolution and reduced laser intensi-
ties.
To inform the choice of ﬂuorophore with its environment, and aid the develop-
ment of novel ﬂuorophores, accurate characterization of the photo-kinetic model
of the ﬂuorophore, together with estimation of photo-switching rates (the rate at
which ﬂuorophores transition between On and dark states) is required (Dempsey
et al. (2011), Lehmann et al. (2016)). Further, accurate knowledge of the photo-
switching characteristics could be employed to maximize resolutions achieved us-
ing advanced analytical methods, for example, 3B analysis (Cox et al. (2011))
and DeconSTORM (Mukamel, Babcock and Zhuang (2012)) and improve the
performance of molecular counting techniques (Rollins et al. (2014), Lee et al.
(2012)).
Several attempts have been made to model the kinetic schemes of ﬂuorophore
photo-switching and estimate the corresponding photo-switching rates. These ki-
netic schemes, as is common across single molecule biophysics, are characterized
by Markovian transitions between a ﬁnite set of discrete states and are therefore
ideally suited to being modeled as continuous time Markov processes. In Figure 2
are four models for photo-switching ﬂuorophores. The ﬁrst, Figure 2(a), depicts
a typical kinetic model, accompanied by the state-space diagram we will adopt
in this paper. This model contains a photon emitting On state 1 (involving rapid
transitions between excited state S1 and ground state S0 via the absorption and
emission of a photon), two temporary dark states 0 and 01 (the triplet state, T1
and the redox states, F+ and F−) and an absorption 2 (BF/BT0/BS0) which in this
application is known as the photobleached state. Then in Figures 2(b)–2(d) are
three further common state space models. Figure 2(b) portrays a photo-switching
model with a simple two state {On(1) Dark(0)} structure. Models of this type
are suitable for super-resolution methods including point accumulation for imag-
ing in nanoscale topography (PAINT) and DNA-PAINT (Jungmann et al. (2010),
Sharonov and Hochstrasser (2006)). Figure 2(c) depicts a model that incorporates
an absorbing state 2. This form of photo-switching followed by absorption de-
scribes a ﬁrst approximation to the behavior that occurs spontaneously in a num-
ber of organic ﬂuorophores and post-activation of photoactivatable proteins (Ha
and Tinnefeld (2012), Van de Linde and Sauer (2014), Vogelsang et al. (2010)).
Figure 2(d) considers a model in which three distinct dark states are hypothesized

----!@#$NewPage!@#$----
1400
L. PATEL ET AL.
FIG. 2.
Common models used to describe the continuous time photo-switching dynamics of a ﬂuo-
rophore with homogeneous transition rates. See text for details.
which in some cases is a necessary extension to model (c), for instance when very
rapid imaging is used (Lin et al. (2015)).
The challenge comes in selecting the correct model and estimating the transition
rates of the continuous time Markov process {X(t) : t ∈ R≥0} from an observed
discrete-time random process {Yn : n ∈ Z≥0}. Here, R≥0 and Z≥0 denote the non-
negative reals and integers, respectively. Typically, {Yn} is derived from a sequence
of images (frames) with Yn corresponding to the observed state of the molecule in
the nth frame. This is formed by an exposure of the continuous time process {X(t)}
over the time-interval [n�,(n + 1)�), where � is the frame length. Process {Yn}
can either be a sequence of photon ﬂuxes associated with that molecule for each
frame (Figure 3(a)), or a simple sequence of 1’s and 0’s indicating if the molecule
was detected in the frame or not (Figure 3(b)). In all cases, the observations are
subject to the effects of noise and instrument limitations. Essential to the subse-
quent analysis, therefore, is the ability to account for missed state transition events
due to noise and the temporal resolution of the data acquisition, as well as the de-
tection threshold used to determine the state of the system (Figure 3(c)). Similar
problems occur in other areas of biophysics where estimating transition rates of
an underlying continuous time Markov process must be inferred from an observed
discrete time signal. In particular, ion-channels have formed the focus of much

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1401
FIG. 3.
(a) A simulated intensity signal of a ﬂuorophore across time. Each measurement corre-
sponds to the intensity in a frame. 7500 frames were recorded over 250 seconds at a rate of 30 frames
per second. (b) Close up of the signal over the time window of 35 s to 55 s. In red is the observed
signal {Yn} indicating if the ﬂuorophore was detected in a particular frame. (c) A further close up of
the signal showing intensity read-outs for independent frames. The true, hidden photon emitting On
state of the molecule is also indicated, demonstrating how sub-frame length photon emitting events
can be missed due to noise or the temporal resolution of the data acquisition.
work (Colquhoun and Hawkes (1981), Qin and Li (2004), Rief et al. (2000)), in-
cluding methods that attempt to account for missed events (Qin, Auerbach and
Sachs (1996), Colquhoun, Hawkes and Srodzinski (1996), Hawkes, Jalali and
Colquhoun (1990), Hawkes, Jalali and Colquhoun (1992), Epstein, Calderhead,
Girolami and Sivilotti (2016)). However, the mechanism by which the observed
signal is obtained and processed from the raw signal is fundamentally different to
that of ﬂuorescence microscopy imaging.
Up until now, methods for estimating photo-switching transition rates in ﬂuo-
rescence microscopy are limited. The method in Lin et al. (2015) involves deﬁning
{Yn} to be the sequence of 1’s and 0’s and extracting the dwell times, namely

----!@#$NewPage!@#$----
1402
L. PATEL ET AL.
the durations when Yn is in the On state and when it is in its dark states. As-
suming these dwell times to be exponentially distributed (or equal in distribution
to a sum of exponentially distributed random variables in the case of multiple
dark states), maximum likelihood estimates of the transition times are then com-
puted. This method, termed here as exponential ﬁtting and given a detailed discus-
sion in Supplementary Materials Section S5 (Patel et al. (2019)), has two ﬂaws.
First, it does not correctly account for the effect of the imaging procedure on the
stochastic structure of the discrete time process. Second, it does not allow for the
absorbing (photobleached) state, which must be identiﬁed and accounted for by
truncation of the data to the last observed On state. This is especially trouble-
some as, to an observer, it is indistinguishable from a temporary dark state. This
method therefore results in the absence of estimates for the absorption rate and can
lead to signiﬁcantly biased estimates of the transition rates between On and dark
states.
Hidden Markov models (HMMs) are used widely across scientiﬁc and engineer-
ing disciplines to relate a sequence of observations, called emissions, to the states
of an unobserved (hidden) Markov process, the target of inference. Their use is par-
ticularly prevalent in image processing where the observations are a sequence of
images in time and it is commonly assumed that each image is dependent only on
the state of the hidden process at the time at which it is observed. Such an approach
has been proposed for this problem in Greenfeld et al. (2015), where the hidden
process is a discretized version of {X(t)}. Here, they let {Yn} be the sequence of
photon-ﬂuxes such that it is a standard (ﬁrst-order) HMM with Poisson emissions.
They then implement the Baum–Welch algorithm (Baum and Petrie (1966), Baum
and Eagon (1967), Baum and Sell (1968), Baum et al. (1970)) to estimate the tran-
sition probabilities of the discretized process and use an approximation to obtain
the transition rates of the continuous time process {X(t)}. In doing so, they ac-
knowledge that missed events will heavily bias rate estimates. Furthermore, their
model is also unable to deal with the absorbing state.
In this paper we provide two important contributions. First, in Section 2, by
considering a general model for {X(t)} that includes multiple dark states and
an absorbing state, we rigorously formulate the discrete time stochastic process
{Yn} that indicates whether a molecule is detected in each frame. A crucial part
of this formulation is recognizing that an image is not formed from an instanta-
neous sampling of the true state, as is usually assumed in image processing, but
is instead formed by exposing a camera sensor over a time interval of length �.
That is to say, Yn is not dependent on just X(n�), but instead on the integral
(i.e., all values) of {X(t)} over the interval [n�,(n + 1)�). Taking consideration
of noise and instrument sensitivity, we fully account for missed events and give
important results on the stochastic structure of {Yn}, including showing it is non-
Markovian.

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1403
The second contribution of this paper is to propose novel methodology for esti-
mating the state transition rates of {X(t)} under this correct treatment of the imag-
ing procedure. In Section 3, we develop an HMM for {Yn} where we ﬁrst imple-
ment a time discretization scheme on the hidden Markov process {X(t)}. Crucially,
as discussed above, correct understanding of the imaging procedure dictates two
key properties. First, Yn depends on both the current (end of frame) and previous
(beginning of frame) hidden states, X((n + 1)�) and X(n�), respectively. Sec-
ond, this HMM possesses emission probabilities that are dependent on the static
parameters of the hidden process state transitions that we ultimately wish to esti-
mate. This coupled behavior renders traditional expectation maximization (EM)-
type methods (e.g., Baum et al. (1970)) of parameter estimation inappropriate. We
therefore make the novel step of introducing what we call transmission (transition-
emission) matrices that incorporate this coupling between transition and emission
probabilities by capturing all the dependencies in the model. For a given photo-
switching kinetic model, we provide both a scheme for computing these matrices
and an adaptation of the forward-backward algorithm to compute the likelihood of
observations. Through numerical optimization we are able to compute maximum
likelihood estimates of the transition rate parameters for the continuous time pro-
cess {X(t)} that we wish to draw inference on. A bootstrapping scheme is also
presented for computing conﬁdence intervals. In the case of an unknown kinetic
model, we propose the use of the Bayesian information criterion (BIC) for select-
ing the best suited model from a set of proposals, thus also providing a powerful
tool for chemists wishing to infer the number of quantum states a particular ﬂuo-
rophore can exist in.
In Section 4, we provide extensive empirical analysis of the proposed method.
We begin this section with a simulation study that compares this new estimation
scheme to the exponential ﬁtting method on a range of photo-switching models,
demonstrating signiﬁcant improvements in both the bias and the variance of our
rate estimates. We further show the BIC performs accurate model selection when
presented with a range of model proposals. We then proceed with a discussion
on identiﬁability and consistency, providing empirical evidence that our model is
identiﬁable and estimators consistent under normal experimental conditions. We
further demonstrate that the bootstrapping scheme proposed in Section 3 for com-
puting conﬁdence intervals has approximately the correct coverage, and conclude
with a discussion on length biased sampling. In Section 5, the estimation scheme
presented in this paper is applied to the Alexa Fluor 647 data originally analyzed
by the exponential ﬁtting method in Lin et al. (2015), consistently selecting the
hypothesized three temporary off-state model (Figure 2(d)) and revealing clear
dependence between laser intensity and key transition rate parameters. In the ac-
companying Supplementary Materials, as well as key mathematical details, we
include an extensive simulations section where we report a signiﬁcant improve-
ment on rate estimates across a range of models and relevant experimental condi-
tions.

----!@#$NewPage!@#$----
1404
L. PATEL ET AL.
2. Modeling photo-switching behavior.
The true photo-switching behavior
of the ﬂuorophore is a continuous time stochastic phenomenon. However, an ex-
perimenter can only ever observe a discretized manifestation of this by imag-
ing the ﬂuorophore in a sequence of frames. These frames are regarded as a set
of sequential exposures of the ﬂuorophore and the observed discrete time sig-
nal indicates whether the ﬂuorophore has been observed in a particular frame.
It is the continuous time process on which we wish to draw inference based on
the observed discrete-time process indicating whether the ﬂuorophore was ob-
served in a frame. In this section we ﬁrst present the continuous time Markov
model of the true (hidden) photo-switching behavior, and then derive the ob-
served discrete time signal, together with key results on its statistical proper-
ties.
2.1. Continuous time.
We model the true photo-switching effect of the ﬂuo-
rophore as a continuous time Markov process, {X(t) : t ∈ R≥0} with discrete state
space SX.
In this paper we consider a general model for {X(t)} that can accommodate the
numerous mechanisms of photo-switching utilized in standard SMLM approaches
such as (F)PALM and (d)STORM. Speciﬁcally, this model consists of a pho-
ton emitting (On) state 1, m + 1 non photon emitting (dark/temporary off) states
00,01,...,0m, where m ∈ Z≥0, and a photobleached (absorbing/permanently off)
state 2. We denote the state 00 ≡ 0 for the m = 0 case of a single dark state. The
model, illustrated in Figure 4, allows for transitions from state 1 to the multi-
ple dark states (from a photochemical perspective, these can include triplet, redox
and quenched states). These dark states are typically accessed via the ﬁrst dark
state 0 (reached as a result of inter-system crossing of the excited S1 electron to
the triplet T1 state; see Figure 2(a)). Further dark states 0i+1, i = 0,...,m − 1,
are accessible by previous dark states 0i (by, e.g., the successive additions of
electrons forming radical anions (Van de Linde et al. (2010))). We allow the
On state 1 to be accessible by any dark state and we consider the most general
model in which the absorption state 2 is accessible from any combination of other
states (Ha and Tinnefeld (2012), Van de Linde and Sauer (2014), Vogelsang et al.
(2010)).
FIG. 4.
General m + 3 state (m ∈ Z≥0) model of a ﬂuorophore.

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1405
The state space of {X(t)} is SX = {0,01,...,0m,1,2} and is of cardinality
m + 3. We denote λij to be the transition rate between states i and j and μi to
be the absorbing rate from state i to 2, where i,j ∈ ¯SX := SX \ {2}.
The generator matrix for {X(t)} is therefore given as
(2.1)
G =
⎛
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
−σ0
λ001
0
0
0
0
...
λ01
μ0
0
−σ01
λ0102
0
0
0
...
λ011
μ01
0
0
−σ02
λ0203
0
0
...
λ021
μ02
...
...
...
...
...
...
...
...
...
0
0
0
0
0
...
−σ0m
λ0m1
μ0m
λ10
0
0
0
0
0
...
−σ1
μ1
0
0
0
0
0
0
...
0
0
⎞
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
,
where σ0m = λ0m1 + μ0m, σ1 = λ10 + μ1 and when m > 0, σ0i = λ0i0i+1 + λ0i1 +
μ0i, for i = 0,...,m − 1. For full characterization, we deﬁne its initial probability
mass νX := (ν0 ν01 ... ν0m ν1 ν2)⊤ with �
j∈SX νj = 1. Typically, all ﬂuorophores
receive an initial excitation to the photon-emitting state, thus the most commonly
occurring probability mass vector in practice has ν1 = 1. Moreover, although the
case when 0 < ν2 < 1 may give rise to ﬂuorophores that are never observed, for
inference purposes, we discard all traces containing no observations (1’s) of ﬂuo-
rophores and set ν2 = 0.
In this paper, we will refer to speciﬁc models (from that shown in Figure 4) in
the form Mm
A . Here, m is the number of multiple dark states beyond the 00 state
that is present in all models, and A ⊆ ¯SX denotes the set of states from which
the absorption state 2 is accessible. For the three classical models presented in
Figure 2: model (b) is M0
∅: the m = 0 case where μ0 = μ1 = 0, model (c) is
M0
{0,1}: the m = 0 case where μ0,μ1 > 0, and model (d) is M2
∅: the m = 2 case
where μ0 = μ01 = μ1 = 0.
2.2. Discrete time observation process.
Having presented the continuous time
model for the true photo-switching behavior, we will now introduce the model for
the observed discrete time process and show how the transition rates given in (2.1)
are not amenable to direct estimation.
The imaging procedure requires taking a series of successive frames. Frame n is
formed by an exposure over the time interval [n�,(n +1)�), where n ∈ Z≥0. The
constant � corresponds to the exposure time for a single frame, also known as the
frame length. We deﬁne the discrete time observed process {Yn : n ∈ Z≥0}, with
state space SY = {0,1}, as Yn = 1 if the ﬂuorophore (characterized by {X(t)}) is
observed in frame n and equal to 0 otherwise. For the ﬂuorophore to be observed in
the time interval [n�,(n+1)�) it must be in the On state 1 for a minimum time of
δ ∈ [0,�). The value of δ is unknown and is a result of background noise and the
imaging system’s limited sensitivity. We note that if {X(t)} exhibits multiple jumps
to state 1 within a frame, then a sufﬁcient condition for observing the ﬂuorophore

----!@#$NewPage!@#$----
1406
L. PATEL ET AL.
FIG. 5.
Illustration of how the states for Yn derive from the process X(t).
is that the total time spent in the On state exceeds δ. The δ = 0 case is the idealistic
scenario of a noiseless system and perfect sensitivity such that the ﬂuorophore
is detected if it enters the On state for any nonzero amount of time during the
exposure time �.
We formally deﬁne the observed process as
(2.2)
Yn = 1[δ,�)
�� (n+1)�
n�
1{1}
�X(t)
�dt
�
,
where 1A(·) is the indicator function such that 1A(x) = 1 if x ∈ A and is zero
otherwise. Figure 5 illustrates the manifestation of the discrete time signal {Yn}
from the continuous time signal {X(t)}.
2.3. The inference problem.
The inference problem is two-fold. First, for a
given model, the aim is to estimate the (4m + 8) unknown parameters
θ =
�
λ001
...
λ0m−10m
λ01
...
λ0m1
λ10
μ0
...
μ0m
μ1
νX
δ
�⊤
from a ﬁnite length realization of {Yn}. Crucially, it is shown in Supplementary
Materials Section S1 (Patel et al. (2019)) that {Yn} does not exhibit the Markov
property (of any order) for any m ∈ Z≥0, and for any � and δ such that � > δ ≥ 0.
The non-Markovianity excludes classical inference methods and motivates the use
of a Hidden Markov Model (HMM), with a likelihood based approach for estimat-
ing θ.
Beyond this, it may be the case that the true model (characterized by its num-
ber of dark states) is unknown and may need to be selected in addition to esti-
mating the unknown parameters. We tackle both of these problems in the next
section.
3. Characterizing photo-switching behavior.
Hidden Markov models, ﬁrst
presented in Baum and Petrie (1966), relate a sequence of observations to the
states of an unobserved or hidden Markov chain. The aim of building a hidden
Markov model (HMM) is to allow inference on the hidden process using these
observations. In its simplest form, an HMM assumes the propagation of both

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1407
state and observed sequences to be in discrete time, and a general ﬁrst order
HMM assumes that the observation process {Yn : n ∈ Z≥0} is related to a hid-
den ﬁrst order Markov Chain {Xn : n ∈ Z≥0} via an emission probability distribu-
tion B := (B)i,j = P(Yn = j|Xn = i), considered to be fully independent of the
static parameters that characterize the probability distribution of state transitions
P := (P)i,j = P(Xn = j|Xn−1 = i). In this setting we say B and P are decou-
pled. For a sequence y0,y1,...,yN of observations from this model, the Baum–
Welch re-estimation algorithm (Baum and Petrie (1966), Baum and Eagon (1967),
Baum and Sell (1968), Baum et al. (1970)) is an EM type method that utilizes
the forward-backward algorithm (see Levinson et al. (1983), for details) to opti-
mize the likelihood function and compute maximum likelihood estimates of νX
(the probability mass of X0), B and P . This in turn can be used to estimate param-
eters of the emission and state transition probabilities. When the hidden Markov
process and/or the observation process are of higher order, the HMM can be trans-
formed to a general ﬁrst order process (Du Preez (1998), Lee and Lee (2006),
Ching, Fung and Ng (2003)) and Baum–Welch can be applied in the usual way.
Readers are directed to MacDonald and Zucchini (1997) for a comprehensive re-
view.
While standard, ﬁrst (or higher) order HMMs have been extensively studied and
are most frequently used in applications, the rigid framework of being in discrete
time with emission probabilities decoupled from state transition probabilities is
not always suitable, as we will now show is the case for images formed by expo-
sures over a time interval. We take time to carefully formulate the HMM suitable
for this application, presenting what we call transmission (transmission-emission)
matrices to capture the dependencies in the model. We then go on to provide a
novel adaptation of the forward-backward algorithm to estimate θ, the unknown
parameters of our HMM in the case of a known state-space SX of the hidden pro-
cess. We will then show how the Bayesian information criterion (BIC) can be used
for model selection and parameter estimation in the case of an unknown state-
space.
3.1. Photo-switching hidden Markov model.
In this section we build an HMM
for our observation process {Yn}, which we call the Photo-switching hidden
Markov model (PSHMM). The ﬁrst immediate reason as to why the standard
set-up outlined above is inappropriate for this application is because the hidden
Markov process {X(t)} evolves in continuous time. To deal with this, we need to
adopt a time-discretization scheme for the hidden process. Analogously to Liu et
al. (2015), we state that {X(t)} propagates in �-separated discrete time steps ac-
cording to the transition probability matrix P� = eG�, where G is given in (2.1).
Our hidden process is therefore now represented by the discrete time Markov chain
{X(n�) : n ∈ Z≥0}.

----!@#$NewPage!@#$----
1408
L. PATEL ET AL.
FIG. 6.
Illustration of the HMM setup. (a) Traditional HMM where observed state is dependent on
current hidden. (b) Our HMM where observed state depends on both the current and past hidden
states.
When Yn depends solely on X(n�) (see Figure 6(a)) and the correspond-
ing emission matrix B is decoupled from P , a continuous time EM algorithm
(Liu et al. (2015)) analogous to the Baum–Welch can be used to estimate νX,
B and P . However, this will be inappropriate in our setting for two related rea-
sons. First, we have shown in Section 2, speciﬁcally equation (2.2), that expos-
ing images over a nonzero length of time means Yn depends on the full path
of {X(t)} within the interval [n�,(n + 1)�). To correctly deal with this it is
necessary to construct the HMM to consider dependence between Yn and both
X(n�) and X((n + 1)�) (see Figure 6(b)). Second, this construction of {Yn}
in (2.2) means the emission probabilities are clearly dependent on the static pa-
rameters θ of the hidden process and are therefore coupled with P . The EM
procedures highlighted above require decoupled B and P so that at each step
the quasi-likelihood can be optimized separately. To the best of our knowledge,
methods for dealing with coupled systems have not been dealt with in the lit-
erature. While an EM algorithm could be used for a coupled system, analytic
forms for the update steps would in general be intractable, leading to numeri-
cal maximization procedures at each iteration, thereby increasing computational
complexity. We will now formally characterize the PSHMM and provide a novel
method for estimating the unknown static parameters in the case of a coupled sys-
tem.
3.1.1. Formal characterization of the PSHMM.
Formally, we characterize our
PSHMM with:
1. An initial probability vector νX = (ν0 ν01 ... ν0m ν1 ν2)⊤ where νi :=
P(X(0) = i) for i ∈ SX;

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1409
2. Transmission matrices
(3.1)
B(l)
� =
⎛
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
b(l)
00,�
b(l)
001,�
...
b(l)
00m,�
b(l)
01,�
b(l)
02,�
b(l)
010,�
b(l)
0101,�
...
b(l)
010m,�
b(l)
011,�
b(l)
012,�
...
...
...
...
...
...
b(l)
0m0,�
b(l)
0m01,�
...
b(l)
0m0m,�
b(l)
0m1,�
b(l)
0m2,�
b(l)
10,�
b(l)
101,�
...
b(l)
10m,�
b(l)
11,�
b(l)
12,�
0
0
0
0
...
b(l)
22,�
⎞
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
,
where
b(l)
ij,� := P
�Yn = l,X
�(n + 1)�
� = j|X(n�) = i
�
= P
�Y0 = l,X(�) = j|X(0) = i
�
i,j ∈ SX,l ∈ SY ,
b(l)
22,� = 1{0}(l).
These transmission matrices combine the transition and emission probabilities,
thereby allowing us to account for a coupled system. The full mathematical formu-
lation for deriving their forms involves conditioning on the number of jumps from
all m + 1 dark states within the interval [0,�). From this, we use Laplace trans-
forms and the distributions of state holding times to iteratively compute matrices
that converge to our set of transmission matrices. A more detailed explanation of
this methodology, along with full derivations and expressions is presented in Sup-
plementary Materials Section S2 (Patel et al. (2019)). Furthermore, an algorithm
(Algorithm 1) detailing all computational steps to evaluate these matrices suitable
for any m ∈ Z≥0 (any number of multiple dark states) can be found in Supplemen-
tary Materials Section S3 (Patel et al. (2019)).
3.2. Estimating unknown parameters of the PSHMM.
We now provide an al-
gorithm for estimating the unknown parameters θ of the PSHMM, which utilizes
a suitable adaptation of the forward-backward dynamic programming algorithm
(Rabiner (1989)), making use of the transmission matrices in (3.1).
Let y = (y0 y1 ... yNF −1)⊤ be the sequence of observations across NF frames
for a single photo-switching ﬂuorophore. We deﬁne the forward-backward proba-
bilities as
αn,i = P
�Y0 = y0,...,Yn−1 = yn−1,X(n�) = i
�
n = 1,...,NF ,
βn,i = P
�Yn = yn,...,YNF −1 = yNF −1|X(n�) = i
�
n = 0,...,NF − 1.
For each such n, we deﬁne the forward-backward vectors as
αn =
�αn,0
...
αn,0m
αn,1
αn,2
�⊤ ,
βn =
�βn,0
...
βn,0m
βn,1
βn,2
�⊤ .

----!@#$NewPage!@#$----
1410
L. PATEL ET AL.
Using this notation, we can show that α⊤
n = α⊤
n−1B(yn−1)
�
for n = 2,...,NF and
α⊤
1 = ν⊤
XB(y0)
�
when n = 1. This yields the following recursion formula
α0 = νX,
α⊤
n = α⊤
n−1B(yn−1)
�
n = 1,...,NF ,
βNF = 1m+3,
βn = B(yn)
�
βn+1
n = 0,...,NF − 1,
(3.2)
where 1m+3 is the (m + 3) × 1 vector of ones. It now follows that the likeli-
hood of observation vector y given parameter vector θ is L(y;θ) = α⊤
n βn for
all n = 0,...,NF . In particular, we have L(y;θ) = α⊤
NF 1m+3, which can be read-
ily computed using the transmission matrices together with recursive computation
for α⊤
n as indicated in (3.2). In the situation where we have NE ≥ 1 independent
photo-switching ﬂuorophores, the log-likelihood is given by
ℓ(Y;θ) =
NE
�
k=1
log
�α⊤
NF ,k1m+3
�,
(3.3)
where Y = (y1 y2 ... yNE) and αNF ,k is the forward probability vector for emitter
k = 1,...,NE. Maximizing (3.3) with respect to θ can be done either through
numerically approximating derivatives or by using derivative-free optimization,
for example with the Nelder–Mead algorithm. A discussion on multimodality and
choosing a starting point for optimization can be found in Supplementary Materials
Section S6 (Patel et al. (2019)).
3.2.1. Accounting for false positive observations.
Occasionally, random peaks
in the background noise may exceed the threshold value used to determine a ﬂu-
orophore in the On state, resulting in a false positive identiﬁcation of the ﬂuo-
rophore. For experiments conducted over a large enough number of frames, this
false positive rate may become signiﬁcant in the observed process {Yn}.
Speciﬁcally if ω ∈ [0,1] denotes the probability of falsely observing a ﬂuo-
rophore, assumed independent of the general observation process, then we may
use the updated transmission matrices
B∗(0)
�
= (1 − ω)B(0)
� ,
B∗(1)
�
= B(1)
� + ωB(0)
� ,
in the evaluation of the log-likelihood ℓ(Y;θ∗) in (3.3). This would thus involve
estimating θ∗ = [θ⊤ ω]⊤ from the observations Y.
3.3. Bootstrapping.
When only one experiment is conducted to produce an
NF × NE dataset Y, a single prediction ˆθ is obtained. In this circumstance, a
bootstrapping scheme can be used to gain approximate conﬁdence intervals for
each component of θ.

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1411
In the same manner as is presented in Efron and Tibshirani (1993), we gen-
erate R (typically large) bootstrap datasets Y∗1,Y∗2,...,Y∗R each consisting of
re-sampled (with replacement) columns of Y. From each dataset, we acquire boot-
strap replicated parameter estimates ˆθ
∗1, ˆθ
∗2,..., ˆθ
∗R using the same PSHMM
maximum likelihood procedure used to obtain ˆθ. For 0.5 < p < 1, letting ˆθ∗
i,(p)
and ˆθ∗
i,(1−p) be the 100 · pth and 100 · (1 − p)th empirical percentiles of the ith
component of θ obtained from ˆθ∗1
i , ˆθ∗2
i ,..., ˆθ∗R
i
, a percentile bootstrap interval of
length 1 − 2p is given by (see Efron and Tibshirani (1993))
[ ˆθi,%lo, ˆθi,%up] ≈ [ ˆθ∗
i,(p), ˆθ∗
i,(1−p)].
3.4. Model selection.
To determine the unknown number of multiple m dark
states, we have chosen to use the Bayesian information criterion (BIC) to deter-
mine the most likely model given data Y. Although similar to the Akaike informa-
tion criterion (AIC), the BIC offers greater protection against over-ﬁtting when the
number of data-points is large, as is the case in this setting.
The BIC is deﬁned in our context as q log(NENF ) − 2ℓ(Y; ˆθ), where q denotes
the number of unknown parameters estimated in θ and ℓ(Y; ˆθ) denotes the max-
imized log-likelihood using the maximum likelihood estimates ˆθ. This criterion
can be computed among all suitable models, with the most preferred being chosen
as that with the smallest BIC value.
4. Simulations and analysis.
Simulation studies have been conducted to as-
sess and analyze the performance of the PSHMM method as detailed in Section 3.
To make the results applicable, we restrict ourselves to realistic parameter values
that typically occur in an experimental setting.
4.1. Performance on images and comparison with exponential ﬁtting.
To test
the performance of parameter estimation against the exponential ﬁtting method
of Lin et al. (2015), synthetic imaging data of photo-switching ﬂuorophores was
simulated. We begin our focus on the model M0
{1}, since for many practical appli-
cations the life-times of further dark (in particular the triplet (T1)) states is short
relative to �. As such, this dark state has been considered as part of the meta-stable
On state (Ha and Tinnefeld (2012), Vogelsang et al. (2010)). Since the predomi-
nant pathway to absorption is via the triplet state, a simpliﬁed model can be used
in which the absorption state 2 is only accessible from state 1. Given the popu-
larity of this model and its ease of analysis, we have derived the exact solution of
the corresponding transmission matrices (see Supplementary Materials Section S2
(Patel et al. (2019))).
Details on the image simulation method and how the discretized state sequences
were extracted can be found in Supplementary Materials Section S7 (Patel et al.
(2019)). Global parameter values are also noted. The extracted state sequences

----!@#$NewPage!@#$----
1412
L. PATEL ET AL.
were analyzed using an implementation of Algorithm 1 (see Supplementary Ma-
terial Section S3 (Patel et al. (2019))). The resulting parameter estimates were
compared to estimates derived from the exponential ﬁtting method, which was ex-
tended in this study to allow the calculation of absorption rates (see Supplementary
Materials Section S5 (Patel et al. (2019))).
Table 2 (see the Appendix) shows estimated parameter statistics over 16 im-
age simulation studies with 100 replicates (datasets) per study. Rate parameters
θ, were chosen to cover a range of observed behaviors of organic ﬂuorophores
and ﬂuorescent proteins (Dempsey et al. (2011)) with NE = 100 ﬂuorophores per
study. The number of frames NF in each study was adjusted to standardize the
average number of transitions predicted from θ. Scatterplots of these rate esti-
mates are presented in Figure 7. It is evident that the PSHHM yields estimates
with much lower bias and root mean squared errors (RMSE) when compared
to the exponential ﬁtting method, although they have a tendency to increase as
transition and absorption rates are increased. The reported empirical (2.5,97.5)
percentile intervals contain the true parameter values across all studies for the
PSHMM method and further highlight the bias in estimates obtained from expo-
nential ﬁtting.
For experimenters, the effect of imaging parameters on the performance of the
estimators is of particular interest and importance. Further simulation studies car-
ried out under model M0
{1} highlight the consistency in both accuracy and precision
of the PSHMM estimator across a range of different experimental conditions. Fig-
ure 8 compares the PSHMM with exponential ﬁtting rate estimates when we vary
the emission intensity of the ﬂuorophores (measured in the mean number of pho-
tons each emits when in the On state for time �). Further investigation of other
parameters, including the frame length (�), the number of frames (NF ) and the
FIG. 7.
Estimates of log10(λ01) and log10(λ10) simulated from model M0
{1} using both exponential
ﬁtting (a) and PSHMM ﬁtting (b) are plotted in dark yellow and pink respectively. True rates are
plotted as black crosses. Estimates for the absorption rate μ1, along with means, RMSEs and 2.5
and 97.5 empirical percentiles are given in Table 2 (see the Appendix).

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1413
FIG. 8.
Top Left: Examples of single simulated frames at the indicated number of photons per
frame (Supplementary Materials Section S7 (Patel et al. (2019))). Box-plots showing quantiles from
estimates of λ01, λ10 and μ1 from both exponential ﬁtting (black) and PSHMM ﬁtting (gray) are
plotted against increasing photons per frame. NF = 9872 for all simulations. True rates given by the
blue line.
detection threshold (proportional to δ) under this model, are provided in Supple-
mentary Materials Section S7 (Patel et al. (2019)). Across the full range of relevant
parameters tested, the PSHMM estimator performs signiﬁcantly better than expo-
nential ﬁtting.
To assess the accuracy of parameter estimates for the extended models m = 1
and m = 2 over fast, medium and slow switching scenarios, additional simulations
were performed by directly sampling the continuous time processes {X(t)} and
extracting the observation sequences Y as in (2.2), using ﬁxed values of θ. Results
from the analyses of these simulations are shown in Tables 3 and 4 in the Appendix.
While it is evident that the estimates for λ0m0m+1 and λ0m+11 incur greater bias as m
increases, the empirical (2.5,97.5) percentile intervals predominantly cover true
parameter values, albeit over a larger area due to the increase in the RMSEs. As
is seen when m = 0, the exponential ﬁtting method performs less well, yielding
much higher bias and RMSEs for particular parameter values.
4.2. Model selection.
Using these simulated datasets, the BIC was used in
model selection from the set of proposals {M0
{1},M1
{1},M2
{1}} (i.e., under the as-
sumption that the absorption state was known to only be accessible by the On
state). Applying model selection to the M0
{1} dataset used to estimate parameters
in Table 2 results in the true state model being chosen in all (100%) cases. 100

----!@#$NewPage!@#$----
1414
L. PATEL ET AL.
TABLE 1
Confusion table showing the empirical percentage of models predicted from three candidates: M0
{1},
M1
{1} and M2
{1} under simulation studies 16, 19 and 20 (see Tables 2, 3 and 4 in the Appendix), with
NE = 300, δ =
1
100 and � = 1
50 s. 100 datasets from each study were generated and the BIC used
to select the best ﬁtted model
Predicted →
M0
{1}
M1
{1}
M2
{1}
True ↓
M0
{1}
100
0
0
M1
{1}
0
98
2
M2
{1}
0
1
99
datasets, each for m = 0,1,2 were generated for studies 2,17 and 20 with � = 1
50s
and NE = 300. These results presented in Table 1 demonstrate the accuracy of se-
lecting the correct model.
4.3. Identiﬁability and consistency.
We give a detailed discussion on the inter-
related issues of identiﬁability and consistency in Supplementary Materials Sec-
tion S4 (Patel et al. (2019)). Here, we summarize these results. The parameter
values for “fast,” “medium” and “slow” transition rates can be found in Table S1,
Supplementary Materials Section S4 (Patel et al. (2019)).
The formal deﬁnition of model identiﬁability is there exists a bijective map-
ping from the parameter space to the space of distributions for the data; or equiv-
alently for any θ, there exists no other θ∗ in the parameter space such that
ℓ(Y;θ) = ℓ(Y;θ∗) almost everywhere. Obtaining such a result for the PSHMM
is highly nontrivial, if not intractable. We therefore explore the issue of identi-
ﬁability through empirical studies. To do so, we begin by exploring local iden-
tiﬁability. Parameter vector θ is said to be locally identiﬁable if there exists a
neighborhood around it such that there is no other θ∗ in that neighborhood for
which ℓ(Y;θ) = ℓ(Y;θ∗) almost everywhere (Little, Heidenreich and Li (2010)).
It can be shown that θ is locally identiﬁable if and only if the Fisher information
matrix I(θ) is nonsingular (Rothenberg (1971)), and this becomes our object of
interest. Again, due to the complexity of the model, the Fisher information ma-
trix can not be computed, however we can study local identiﬁability via the ob-
served Fisher information matrix (the Hessian matrix of the log-likelihood func-
tion) J (ˆθ) = −∇∇⊤ℓ(Y;θ)|θ=ˆθ evaluated at the maximum likelihood estimate ˆθ
of θ (Colquhoun, Hatton and Hawkes (2003)). This is averaged over several re-
peated simulations of data set Y. In particular, if J (ˆθ) is singular then ℓ(Y; ˆθ) is
not a unique (local) maximum, typically due to a ﬂat ridge in one or more direc-
tions, and θ is unidentiﬁable.

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1415
Summarizing the ﬁndings presented in Supplementary Materials Section S4
(Patel et al. (2019)), in the sets of parameters studied, the observed Fisher infor-
mation matrix is shown to be nonsingular in almost all circumstances, providing
strong empirical evidence of structural identiﬁability over the parameter values
most likely encountered in practice. It is obvious that the form of the observed
Fisher information matrix will be dependent on the relative values of the known
and unknown parameters. Broadly speaking, we ﬁnd that as δ/� → 1, that is, the
time a ﬂuorophore needs to stay in the On state to be detected tends towards the
frame rate, there appears to be a breakdown in identiﬁability. This is indicated by
the observed correlation matrix, as derived from the observed covariance matrix
J (ˆθ)−1, showing strong correlation between ˆδ and ˆλ01, and hence the existence
of a ridge on the likelihood surface, albeit still with curvature. While still tech-
nically identiﬁable, the correlation between these two parameter estimators indi-
cates they cannot be independently identiﬁed and pose difﬁculties to numerical
optimization methods (Jacquez and Greif (1985)). This effect is more pronounced
for faster transition rates due to an increased chance of transitions into the On
state not being observed. This trend as δ/� increases is expected as the model will
be completely unidentiﬁable when δ = � as no ﬂuorophore will be observed in
the On state. However, for low values of δ/� (<0.2) as is typically encountered
in practice, correlation between all elements of the estimator ˆθ is low, providing
clear empirical evidence of locally identiﬁability for all parameter values stud-
ied.
We also see a breakdown in identiﬁability if the frame length is too large in
comparison to the switching rates. For example, for a frame length of � = 1/30 s
(30 frames per second), under slow and medium switching, the models appear
to be identiﬁable, however under fast switching this breaks down. Increasing the
frame rate to �−1 = 100 s−1 appears, from empirical evidence, to be sufﬁcient
for making the model identiﬁable. This intuitively seems correct as large frame
lengths will fail to capture the nuanced photo-kinetic behavior of fast switching
ﬂuorophores.
To get a handle on global identiﬁability we assess whether the likelihood sur-
face has a unique global maximum or if there are further modes the optimization
method is determining to be the maximum. For the 3 state case (m = 0), an ap-
proximation scheme is used to ﬁnd a suitable starting point for the Nelder–Mead
simplex (see Supplementary Materials Section S6 (Patel et al. (2019))) and we en-
sure we locate the correct mode. In the 4 and 5 state cases (m = 1 and m = 2,
respectively), a stochastic search method is deployed that trials multiple starting
points. Unimodal histograms for the parameter estimates would indicate a single
global maximum, whereas a multi modal histogram would indicate further domi-
nant modes being located instead. Analysis presented in Supplementary Materials
Section S4 (Patel et al. (2019)) suggests a single global maximum in all cases with
one exception; there appears to be two different modes being located for the λ0102

----!@#$NewPage!@#$----
1416
L. PATEL ET AL.
parameter in the m = 2 model, although this disappears as the number of frames
NF increases.
Studies on consistency of the PSHMM maximum likelihood estimator cor-
roborate our ﬁndings on identiﬁability. A break down in identiﬁability will re-
sult in the estimator becoming inconsistent. In Supplementary Materials S4 (Patel
et al. (2019)), empirical evidence suggests that the mean squared error tends to
zero (as the number of emitters NE increases) when δ/� is within the normal
range (<0.2). However, as δ/� increases towards 1, consistency of the estimator
breaks down with it becoming more biased (although with a reduction in vari-
ance).
4.4. Length biased sampling.
Length biased sampling is an issue that could
appear in practice. In our setting, this would occur if there are ﬂuorophores whose
traces we do not include when estimating the unknown parameters due to them
never being observed. We note that for this to happen the ﬂuorophore would have
to never be in the On state for longer than δ in any frame. As soon as it is, we have
observed it and can populate its trace up to that point with zeros. In the 3 state
(m = 0) case, even under the extreme situation of δ = 0.9�, the probability of not
observing any given ﬂuorophore in all 10,000 frames is 1.3 × 10−2 for fast transi-
tion rates and 1.0 × 10−3 for slow transition rates. Under a more realistic setting
of δ = 0.1�, this reduces to 1.1×10−3 and 1.0×10−4 for fast and slow transition
rates, respectively. Therefore, for a single experiment with 100 ﬂuorophores, the
probability all of them are observed is 0.90 when under fast switching, climbing
to 0.99 for slow switching.
4.5. Bootstrap interval coverage.
Simulations were performed to verify the
coverage of the bootstrapped conﬁdence intervals presented in Section 3.3. As
an example, for the 3 state (m = 0) model under slow switching (see parame-
ter values in row 1 of Table S1 in Supplementary Materials Section S4 (Patel
et al. (2019))), the coverage of the 95% bootstrapped conﬁdence intervals are
92.8% (λ01), 94.6% (λ10) and 94.6% (μ1). These results were obtained from
500 simulated bootstrap intervals with each interval being formed from 100 es-
timates.
5. Application to Alexa Fluor 647 data.
In this section we apply the method
presented in this paper to the data analyzed with the exponential ﬁtting method
in Lin et al. (2015). The details, including experimental methods, can be found in
this reference. In summary, antibodies labeled with Alexa Fluor 647 at a ratio of
0.13–0.3 dye molecules per antibody were sparsely absorbed to a cover slip and
imaged by Total Internal Fluorescence microscopy to investigate the effect of eight
different laser intensities on the photo-switching behavior of Alexa Fluor 647. The
study contains 27 experiments with differing combinations of laser intensity and

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1417
frame rate. These values, together with the number of emitters detected and the
number of frames over which they were imaged is summarized in Table 5 of the
Appendix. For each photo-switchable molecule detected, the discrete observation
trace indicating if the emitter was observed in each frame, was extracted (see Sup-
plementary Materials Section S7 (Patel et al. (2019))). In all experiments, the true
model and its associated parameters were unknown. Subsequently, we will show
comparisons between estimates from both the PSHMM and modiﬁed exponential
ﬁtting methods.6
Initially, the BIC model selection criterion as outlined in Section 3.4 was
used to select the most suitable model for the data from the range of models
M0
∅, M0
{0}, M0
{1}, M1
∅, M1
{0}, M1
{01}, M1
{1}, M2
∅, M2
{0}, M2
{01}, M2
{02} and M2
{1},
with the model M2
{1} being selected on all (100%) occasions. This supports
Lin et al. (2015), who hypothesize this, with bleaching model and assume the
M2
∅ (without bleaching) model for rate estimates gained from exponential ﬁt-
ting. PSHMM maximum likelihood estimates were then computed for the esti-
mation of θ∗ = (λ001 λ01 λ0102 λ011 λ021 λ10 μ1 νX δ ω)⊤ for each of the 27
datasets. Associated with these, 95% bootstrapped intervals were computed us-
ing the method in Section 3.3 (R = 100 due to computational intensity). The re-
sults are shown in Figure 9. Comparisons with exponential ﬁtting bootstrapped
re-estimates (where νX, δ and ω are not estimable in this setting) are also
shown.
The results indicate that the exponential ﬁtting predicts a much slower switch-
ing scenario for the Alexa Fluor 647 antibodies, with many estimates shown to be
several orders of magnitude below those predicted by the PSHMM. This resembles
the conclusions reached from the results of the simulation studies as described in
Section 4 and are thought to occur as a result of the exponential ﬁtting method
missing events within frames. Incidentally, the higher variance of predictions from
both methods are shown to be reported at higher laser intensities, where faster
switching of ﬂuorophores is promoted. This is especially pronounced in some par-
ticularly large simulated conﬁdence sets for the exponential ﬁtting estimates of
λ0102 and λ021 (see Figure 9).
6. Summary and discussion.
Accurate measurement of ﬂuorophore photo-
switching rates has the potential to enable tailored design of single molecule
localization microscopy experiments to speciﬁc requirements. For example, one
may wish to select a ﬂuorophore and photo-switching environment to achieve the
rapid photo-switching at low laser intensities required for live-cell samples. Al-
ternatively, one may wish to promote long off times required for densely packed
6We modiﬁed the exponential ﬁtting algorithm used by Lin et al. (2015) to allow for the absorption
parameter (see Supplementary Materials Section S5 (Patel et al. (2019)) for more details).

----!@#$NewPage!@#$----
1418
L. PATEL ET AL.
FIG. 9.
Rate predictions and associated 95% bootstrap conﬁdence sets are shown for λ01, λ10,
μ1, λ011, λ0102, λ011 and λ021, for eight different values of laser intensity (see Table 5 in the Ap-
pendix for exact experimental parameters). Intervals in black correspond to those from exponential
ﬁtting and those in gray correspond to those gained from the PSHMM. Point estimates from each of
the 27 datasets are given by the diamond (PSHMM) or square (exponential). While in some cases
PSHMM produces much wider intervals, it also yields less biased estimates than the exponential
ﬁtting method; see text for discussion.
samples. Furthermore, precise estimates of photo-switching rates has the poten-
tial to advance data processing methods used in single molecule localization mi-
croscopy imaging, enabling more accurate image reconstruction and aiding proper
quantitative analysis. For this purpose, we have presented a method for char-
acterizing the photo-switching kinetics of ﬂuorophores from a sequence of im-
ages.
For the most general continuous time photo-switching model, we have carefully
deﬁned the observation process and linked it to the hidden continuous time photo-
switching behavior that we wish to infer upon. From this, we have formulated
a hidden Markov model to link the observations to the continuous time photo-
switching model. Importantly, images being formed by exposing the camera over
a nonzero time interval violates the traditional assumption placed on HMMs that

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1419
the emission and transition probabilities are decoupled. To tackle this, we have
introduced transmission matrices that capture all the dependencies present in the
model and provided a detailed scheme for computing them for any continuous
time photo-switching model. A modiﬁcation of the forward-backward algorithm
tailored for these coupled HMMs has been presented and numerical maximiza-
tion of the computed likelihood was performed to generate accurate estimates of
the true photo-switching rates. Through a detailed simulation study, these were
compared to estimates from an existing exponential ﬁtting method. We found
that our proposed method of parameter estimation is highly robust to a range of
simulated experimental parameters including low signal-to-noise ratios and fast
frame rates, frequently outperforming estimates from exponential ﬁtting. We fur-
ther found that by using the BIC, it is possible to perform accurate model selection
from a range of model proposals, thus providing a powerful new tool for chemists
wishing to infer the number of quantum states a particular ﬂuorophore can ex-
ist in. Empirical analysis provided strong evidence that the PSHMM is identiﬁ-
able and the estimators approximately consistent in the normal parameter range
encountered in experiments. Although, experimenters should ensure fast switch-
ing ﬂuorophores are imaged with greater frame rates to ensure model identiﬁabil-
ity.
The model selection and estimation method presented in this paper was then
applied to real data collected from the study of Lin et al. (2015). We provide strong
evidence of a relationship between laser intensity and photo-switching rates and
support the hypothesis that Alexa Fluor 647 has three off-states in addition to a
photo-bleached state.
While this paper focuses on single molecule localization microscopy, the type of
kinetic models discussed in this paper are unlikely to be unique to photo-switching
ﬂuorophores and super-resolution applications. Certainly, stochastic processes in
which the observed signal depends on both the current and past states of a hidden
process are likely to be a general feature of digital, discretized measurements of
stochastic signals. This is particularly true in image processing where images are
inevitably formed by exposing the camera’s sensor over a nonzero length time win-
dow. The coupling between the emission and transition probabilities of the HMM
is a direct consequence of this exposure time, and therefore it is likely that the
presented methodology for dealing with this will ﬁnd use in imaging applications
that are beyond the scope of this paper.
Further theoretical discussions and a comprehensive simulations and methods
section, can be found in the Supplementary Materials.
APPENDIX: RATE ESTIMATES
See Tables 2–5.

----!@#$NewPage!@#$----
1420
L. PATEL ET AL.
TABLE 2
Simulation results showing mean, bias, root mean squared error (RMSE) and the 2.5 and 97.5 empirical percentiles of the estimates of
θ = (λ01 λ10 μ1)⊤ under model M0
{1} for both the PSHMM and exponential ﬁtting (Exp) methods across 100 repeat experiments. � = 1
30 s, δ,ω > 0
(unknown), the number of emitters NE = 100. NF indicates the number of frames simulated. For both methods, log−log scatterplots of λ01 and λ10 are
shown in Figure 7
PSHMM
PSHMM
Exp
Exp
PSHMM
PSHMM
RMSE
(2.5%,97.5%)
Exp
Exp
RMSE
(2.5%,97.5%)
Study
NF
θ
Mean
Bias
(×10−2)
percentiles
Mean
Bias
(×10−2)
percentiles
1
16,800
0.32
0.32
0.00
0.97
(0.30,0.34)
0.29
−0.029
3.29
(0.26,0.32)
0.32
0.32
−0.001
0.76
(0.30,0.33)
0.31
−0.007
0.89
(0.30,0.32)
0.01
0.01
0.001
0.21
(0.01,0.02)
0.01
0.001
0.11
(0.01,0.01)
2
11,151
0.32
0.32
−0.001
0.66
(0.30,0.33)
0.30
−0.012
1.47
(0.29,0.32)
1
1.00
0.003
1.91
(0.96,1.04)
0.95
−0.053
5.95
(0.89,0.99)
0.03
0.03
0.001
0.35
(0.03,0.04)
0.03
0.001
0.33
(0.03,0.04)
3
9364
0.32
0.31
−0.004
0.68
(0.30,0.32)
0.30
−0.017
1.90
(0.28,0.32)
3.16
3.16
0.002
6.56
(3.05,3.28)
2.45
−0.712
77.85
(1.78,3.92)
0.11
0.11
0.001
1.02
(0.09,0.13)
0.09
−0.017
2.12
(0.06,0.12)
4
8799
0.32
0.30
−0.013
1.40
(0.29,0.31)
0.28
−0.032
3.35
(0.27,0.30)
10
9.96
−0.042
23.03
(9.52,10.42)
3.19
−6.809
690.87
(1.52,5.91)
0.33
0.35
0.014
3.79
(0.29,0.42)
0.12
−0.210
21.49
(0.06,0.25)
5
10,962
1
1.00
−0.002
1.86
(0.96,1.04)
0.90
−0.104
12.43
(0.74,1.01)
0.32
0.32
0.000
0.72
(0.30,0.33)
0.30
−0.013
1.47
(0.29,0.32)
0.01
0.01
0.000
0.10
(0.01,0.01)
0.01
0.001
0.11
(0.01,0.01)
6
5312
1
1.00
−0.004
1.81
(0.96,1.03)
0.95
−0.054
6.44
(0.87,1.01)
1
1.00
0.001
1.76
(0.96,1.04)
0.93
−0.066
6.88
(0.89,0.97)
0.03
0.03
0.001
0.29
(0.03,0.04)
0.03
0.001
0.28
(0.03,0.04)

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1421
TABLE 2
(Continued)
PSHMM
PSHMM
Exp
Exp
PSHMM
PSHMM
RMSE
(2.5%,97.5%)
Exp
Exp
RMSE
(2.5%,97.5%)
Study
NF
θ
Mean
Bias
(×10−2)
percentiles
Mean
Bias
(×10−2)
percentiles
7
3526
1
0.99
−0.015
2.32
(0.95,1.02)
0.95
−0.053
5.78
(0.91,0.99)
3.16
3.17
0.003
7.33
(3.01,3.30)
2.71
−0.451
46.16
(2.50,3.89)
0.11
0.11
0.001
1.06
(0.08,0.13)
0.10
−0.004
0.99
(0.08,0.12)
8
2961
1
0.97
−0.033
3.91
(0.93,1.00)
0.91
−0.095
9.75
(0.85,0.95)
10
9.94
0.003
27.21
(9.46,10.47)
5.02
0.003
504.34
(3.66,6.52)
0.33
0.35
0.017
3.94
(0.28,0.42)
0.20
−0.133
13.77
(0.14,0.27)
9
9116
3.16
3.15
−0.008
6.88
(3.04,3.29)
2.31
−0.855
95.19
(1.64,3.04)
0.32
0.31
−0.002
1.53
(0.28,0.34)
0.28
−0.037
3.74
(0.27,0.29)
0.01
0.01
0.000
0.11
(0.01,0.01)
0.01
0.001
0.13
(0.01,0.01)
10
3466
3.16
3.13
−0.035
7.47
(3.01,3.28)
2.83
−0.335
37.76
(2.49,3.06)
1
1.00
0.004
4.04
(0.90,1.07)
0.87
−0.129
13.00
(0.84,0.90)
0.03
0.03
0.001
0.37
(0.03,0.04)
0.04
0.002
0.36
(0.03,0.04)
11
1680
3.16
3.11
−0.052
9.49
(2.98,3.31)
2.92
−0.245
25.63
(2.75,3.08)
3.16
3.18
0.015
9.12
(2.99,3.37)
2.60
−0.567
56.96
(2.48,3.70)
0.11
0.11
0.002
1.21
(0.09,0.13)
0.11
−0.000
0.98
(0.09,0.13)
12
1115
3.16
3.03
−0.135
14.73
(2.92,3.15)
2.79
−0.377
38.19
(2.66,3.92)
10
9.99
−0.008
24.86
(9.54,10.48)
6.35
−3.648
365.97
(5.72,6.93)
0.33
0.35
0.015
3.92
(0.29,0.44)
0.27
−0.061
6.79
(0.22,0.33)

----!@#$NewPage!@#$----
1422
L. PATEL ET AL.
TABLE 2
(Continued)
PSHMM
PSHMM
Exp
Exp
PSHMM
PSHMM
RMSE
(2.5%,97.5%)
Exp
Exp
RMSE
(2.5%,97.5%)
Study
NF
θ
Mean
Bias
(×10−2)
percentiles
Mean
Bias
(×10−2)
percentiles
13
8532
10
9.93
−0.069
25.64
(9.47,10.47)
5.33
−4.666
506.70
(1.98,8.60)
0.32
0.32
0.000
4.42
(0.24,0.37)
0.22
−0.099
9.95
(0.21,0.23)
0.01
0.01
0.000
0.09
(0.01,0.01)
0.01
0.001
0.10
(0.01,0.01)
14
2882
10
9.86
−0.142
29.52
(9.44,10.39)
7.75
−2.246
241.85
(5.53,8.71)
1
1.03
0.026
10.53
(0.78,1.16)
0.68
−0.323
32.37
(0.64,0.71)
0.03
0.03
0.001
0.36
(0.03,0.04)
0.04
0.001
0.37
(0.03,0.04)
15
1096
10
9.73
−0.266
38.40
(9.22,10.34)
8.19
−1.814
184.84
(7.38,8.66)
3.16
3.21
0.049
20.73
(2.45,3.47)
2.05
−1.108
110.97
(1.97,3.16)
0.11
0.11
0.004
1.22
(0.09,0.13)
0.11
0.004
1.04
(0.09,0.13)
16
531
10
9.50
−0.501
55.72
(9.10,9.96)
7.93
−2.072
207.90
(7.55,8.22)
10
9.91
−0.095
54.47
(9.02,10.88)
5.63
−4.368
436.96
(5.40,5.89)
0.33
0.34
0.007
4.51
(0.26,0.43)
0.30
−0.029
4.06
(0.26,0.36)

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1423
TABLE 3
Simulation results showing mean, bias, root mean squared error (RMSE) and the 2.5 and 97.5 empirical percentiles of the estimates of
θ = (λ001 λ01 λ 011 λ10 μ1)⊤ under model M1
{1} for both the PSHMM and exponential ﬁtting (Exp) methods across 100 repeat experiments. � = 1
30 s,
δ = 0.01,ω = 0 (unknown), the number of emitters NE = 100. NF indicates the number of frames simulated
PSHMM
PSHMM
Exp
Exp
PSHMM
PSHMM
RMSE
(2.5%,97.5%)
Exp
Exp
RMSE
(2.5%,97.5%)
Study
NF
θ
Mean
Bias
(×10−2)
percentiles
Mean
Bias
(×10−2)
percentiles
17
11,151
0.15
0.15
0.002
1.69
(0.12,0.19)
0.15
−0.004
1.66
(0.11,0.19)
0.3
0.30
0.001
0.85
(0.28,0.32)
0.30
−0.002
0.84
(0.28,0.31)
0.1
0.10
0.000
0.43
(0.09,0.11)
0.10
0.002
0.45
(0.10,0.11)
0.80
0.80
−0.001
1.28
(0.78,0.82)
0.76
−0.039
4.12
(0.74,0.79)
0.01
0.01
0.000
0.15
(0.01,0.01)
0.02
0.010
0.97
(0.02,0.02)
18
9364
0.35
0.36
0.005
5.44
(0.24,0.43)
0.33
−0.022
5.32
(0.24,0.43)
1
1.00
0.003
3.68
(0.94,1.07)
0.95
−0.049
5.83
(0.90,1.01)
0.3
0.30
−0.002
2.01
(0.26,0.34)
0.29
−0.008
2.12
(0.25,0.33)
2.30
2.30
−0.003
5.01
(2.21,2.39)
2.04
−0.262
26.48
(1.95,2.11)
0.10
0.10
0.002
0.98
(0.09,0.12)
0.10
−0.005
1.03
(0.08,0.11)
19
7000
2
2.03
0.033
18.14
(1.75,2.45)
2.16
0.156
21.25
(1.89,3.50)
10
9.78
−0.218
54.49
(8.55,10.53)
6.94
−3.061
306.69
(6.59,7.34)
0.7
0.71
0.011
4.88
(0.64,0.83)
0.67
−0.031
4.85
(0.60,0.76)
10
10.00
0.002
63.62
(9.22,11.65)
4.97
−5.030
503.14
(4.75,5.17)
0.33
0.34
0.005
7.29
(0.20,0.56)
0.27
−0.068
7.30
(0.22,0.32)

----!@#$NewPage!@#$----
1424
L. PATEL ET AL.
TABLE 4
Simulation results showing mean, bias, root mean squared error (RMSE) and the 2.5 and 97.5 empirical percentiles of the estimates of
θ = (λ001 λ01 λ0102 λ011 λ021 λ10 μ1)⊤ under model M2
{1} for both the PSHMM and exponential ﬁtting (Exp) methods across 100 repeat experiments.
� = 1
30 s, δ = 0.01,ω = 0 (unknown), the number of emitters NE = 100. NF indicates the number of frames simulated
PSHMM
PSHMM
Exp
Exp
PSHMM
PSHMM
RMSE
(2.5%,97.5%)
Exp
Exp
RMSE
(2.5%,97.5%)
Study
NF
θ
Mean
Bias
(×10−2)
percentiles
Mean
Bias
(×10−2)
percentiles
20
7000
2
2.03
0.032
3.14
(1.75,2.37)
2.05
0.054
2.12
(1.79,3.31)
10
9.85
−0.153
13.42
(9.20,10.49)
7.04
−2.958
878.82
(6.68,7.48)
0.2
0.21
0.009
0.10
(0.16,0.27)
0.18
−0.024
0.11
(0.13,0.21)
0.7
0.69
−0.012
0.37
(0.59,0.83)
0.66
−0.037
0.35
(0.57,0.75)
0.01
0.01
−0.001
0.00
(0.01,0.01)
0.01
0.005
0.02
(0.01,0.02)
10
9.63
−0.368
35.57
(8.73,10.53)
4.91
−5.087
2588.85
(4.67,5.16)
0.33
0.32
−0.009
0.32
(0.24,0.45)
0.32
−0.013
0.12
(0.26,0.38)

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1425
TABLE 5
A description of the Alexa Fluor 647 datasets with reference to the laser intensities in kW/cm2 and frames sampled per second (or �−1) measured in
s−1 used to characterize each of the 27 experiments. The NF × NE size of each dataset is also included
Dataset
Laser intensity
�−1
NE
NF
Dataset
Laser intensity
�−1
NE
NF
Dataset
Laser intensity
�−1
NE
NF
1
1.0
200
275
49,796
10
16
200
292
39,703
19
62
800
443
29,107
2
1.9
200
259
49,533
11
16
800
305
29,074
20
62
800
425
29,551
3
3.9
200
335
49,815
12
16
800
290
29,145
21
62
800
425
29,426
4
3.9
200
393
39,758
13
31
800
617
29,059
22
62
800
398
28,989
5
7.8
200
340
39,721
14
31
800
534
29,778
23
97
800
454
29,191
6
7.8
800
244
29,418
15
31
800
515
29,179
24
97
800
440
29,198
7
7.8
800
230
29,257
16
31
800
493
29,400
25
97
800
436
29,270
8
7.8
800
230
29,438
17
31
800
456
29,071
26
97
800
422
29,295
9
16
800
437
29,467
18
62
800
554
29,327
27
97
800
414
29,218

----!@#$NewPage!@#$----
1426
L. PATEL ET AL.
Acknowledgments.
The authors would like to thank Prof Joerg Bewersdorf,
Department of Cell Biology, Yale University, for his help in making the Alexa
Fluor 647 data available, and are most grateful to the Editor, Associate Editor and
referees for their insightful comments and discussion.
SUPPLEMENTARY MATERIAL
Supplementary materials (DOI: 10.1214/19-AOAS1240SUPPA; .pdf). The
Supplementary Materials supporting this paper contains detailed proofs and
derivations regarding our method, discussions on its implementation and further
simulation studies, including exact details on the image analysis.
Code and data (DOI: 10.1214/19-AOAS1240SUPPB; .zip). MATLAB code
and imaging data sets used for the algorithms presented in this paper, can be found
at https://github.com/eakcohen/photoswitching.
REFERENCES
BAUM, L. E. and EAGON, J. A. (1967). An inequality with applications to statistical estimation for
probabilistic functions of Markov processes and to a model for ecology. Bull. Amer. Math. Soc.
73 360–363. MR0210217
BAUM, L. E. and PETRIE, T. (1966). Statistical inference for probabilistic functions of ﬁnite state
Markov chains. Ann. Math. Stat. 37 1554–1563. MR0202264
BAUM, L. E. and SELL, G. R. (1968). Growth transformations for functions on manifolds. Paciﬁc
J. Math. 27 211–227. MR0234494
BAUM, L. E., PETRIE, T., SOULES, G. and WEISS, N. (1970). A maximization technique occurring
in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41 164–
171. MR0287613
BETZIG, E., PATTERSON, G. H., SOUGRAT, R., LINDWASSER, O. W., OLENYCH, S., BONIFA-
CINO, J. S., DAVIDSON, M. W., LIPPINCOTT-SCHWARTZ, J. and HESS, H. F. (2006). Imaging
intracellular ﬂuorescent proteins at nanometer resolution. Science 313 1642–1645.
CHING, W., FUNG, E. and NG, M. (2003). Higher-order hidden Markov models with applications to
DNA sequences. In Intelligent Data Engineering and Automated Learning. IDEAL 2003 (J. Liu,
Y. Cheung and H. Yin, eds.). Lecture Notes in Computer Science 2690 535–539. Springer, Berlin,
Heidelberg.
COLQUHOUN, D., HATTON, C. J. and HAWKES, A. G. (2003). The quality of maximum likelihood
estimates of ion channel rate constants. J. Physiol. 547 699–728.
COLQUHOUN, D. and HAWKES, A. G. (1981). On the stochastic properties of single ion channels.
Proc. R. Soc. Lond., B Biol. Sci. 211 205–235.
COLQUHOUN, D., HAWKES, A. G. and SRODZINSKI, K. (1996). Joint distributions of apparent
open and shut times of single-ion channels and maximum likelihood ﬁtting of mechanisms. Phi-
los. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 354 2555–2590.
COX, S., ROSTEN, E., MONYPENNY, J., JOVANOVIC-TALISMAN, T., BURNETTE, D. T.,
LIPPINCOTT-SCHWARTZ, J., JONES, G. E. and HEINTZMANN, R. (2011). Bayesian localiza-
tion microscopy reveals nanoscale podosome dynamics. Nat. Methods 9 195–200.
DEMPSEY, G. T., VAUGHAN, J. C., CHEN, K. H., BATES, M. and ZHUANG, X. (2011). Evalua-
tion of ﬂuorophores for optimal performance in localization-based super-resolution imaging. Nat.
Methods 8 1027–1036.

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1427
DU PREEZ, J. (1998). Efﬁcient training of high-order hidden Markov models using ﬁrst-order rep-
resentations. Comput. Speech Lang. 12 23–39.
EFRON, B. and TIBSHIRANI, R. J. (1993). An Introduction to the Bootstrap. Monographs on Statis-
tics and Applied Probability 57. CRC Press, New York. MR1270903
EPSTEIN, M., CALDERHEAD, B., GIROLAMI, M. A. and SIVILOTTI, L. (2016). Bayesian statistical
inference in ion-channel models with exact missed event correction. Biophys. J. 111 333–348.
GREENFELD, M., PAVLICHIN, D. S., MABUCHI, H. and HERSCHLAG, D. (2015). Single molecule
analysis research tool (SMART): An integrated approach for analysing single molecule data.
PLoS ONE 7 e30024.
HA, T. and TINNEFELD, P. (2012). Photophysics of ﬂuorescent probes for single-molecule bio-
physics and super-resolution imaging. Annu. Rev. Phys. Chem. 63 595–617.
HAWKES, A. G., JALALI, A. and COLQUHOUN, D. (1990). The distributions of the apparent open
times and shut times in a single channel record when brief events cannot be detected. Philos.
Trans. R. Soc. Lond. Ser. A 332 511–538. MR1084721
HAWKES, A. G., JALALI, A. and COLQUHOUN, D. (1992). Asymptotic distributions of apparent
open times and shut times in a single channel record allowing for the omission of brief events.
Philos. Trans. R. Soc. Lond. B, Biol. Sci. 337 383–404.
HEILEMANN, M., VAN DE LINDE, S., SCHÜTTPELZ, M., KASPER, R., SEEFELDT, B., MUKHER-
JEE, A., TINNEFELD, P. and SAUER, M. (2008). Subdiffraction—resolution ﬂuorescence imag-
ing with conventional ﬂuorescent probes. Angew. Chem. Int. Ed. 47 6172–6176.
HESS, S. T., GIRIRAJAN, T. P. K. and MASON, M. D. (2006). Ultra-high resolution imaging by
ﬂuorescence photoactivation localization microscopy. Biophys. J. 91 4258–4272.
HUANG, B., BATES, M. and ZHUANG, X. (2009). Super-resolution ﬂuorescence microscopy. Annu.
Rev. Biochem. 78 993–1016.
JACQUEZ, J. A. and GREIF, P. (1985). Numerical parameter identiﬁability and estimability: Inte-
grating identiﬁability, estimability, and optimal sampling design. Math. Biosci. 77 201–227.
JUNGMANN, R., STEINHAUER, C., SCHEIBLE, M., KUZYK, A., TINNEFELD, P. and SIMMEL,
F. C. (2010). Single-molecule kinetics and super-resolution microscopy by ﬂuorescence imaging
of transient binding on DNA origami. Nano Lett. 10 4756–4761.
LEE, L.-M. and LEE, J.-C. (2006). A study on high-order hidden Markov models and applications
to speech recognition. In Advances in Applied Artiﬁcial Intelligence. IEA/AIE 2006 (M. Ali and R.
Dapoigny, eds.). Lecture Notes in Computer Science 4031 682–690. Springer, Berlin, Heidelberg.
LEE, S. H., SHIN, J. Y., LEE, A. and BUSTAMANTE, C. (2012). Counting single photoactivatable
ﬂuorescent molecules by photoactivated localization microscopy (PALM). Proc. Natl. Acad. Sci.
USA 109 17436–17441.
LEHMANN, M., LICHTNER, G., KLENZ, H. and SCHMORANZER, J. (2016). Novel organic dyes
for multicolor localization-based super-resolution microscopy. J. Biophotonics 9 161–170.
LEVINSON, S. E., RABINER, L. R. and SONDHI, M. M. (1983). An introduction to the application
of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell
Syst. Tech. J. 62 1035–1074. MR0702893
LIN, Y., LONG, J. J., HUANG, F., DUIM, W. C., KIRSCHBAUM, S., ZHANG, Y., SCHROEDER,
L. K., REBANE, A. A., VELASCO, M. G. M. et al. (2015). Quantifying and optimizing single-
molecule switching nanoscopy at high speeds. PLoS ONE 10 e0128135.
LITTLE, M. P., HEIDENREICH, W. F. and LI, G. (2010). Quantifying and optimizing single-
molecule switching nanoscopy at high speeds. PLoS ONE 5 e8915.
LIU, Y.-Y., LI, S., LI, F., SONG, L. and REHG, J. (2015). Efﬁcient learning of continuous-time
hidden Markov models for disease progression. In NIPS Proceedings 3600–3608.
MACDONALD, I. L. and ZUCCHINI, W. (1997). Hidden Markov and Other Models for Discrete-
Valued Time Series. Monographs on Statistics and Applied Probability 70. CRC Press, London.
MR1692202

----!@#$NewPage!@#$----
1428
L. PATEL ET AL.
MUKAMEL, E., BABCOCK, H. and ZHUANG, X. (2012). Statistical deconvolution for superresolu-
tion ﬂuorescence microscopy. Biophys. J. 102 2391–2400.
NIEUWENHUIZEN, R. P. J., BATES, M., SZYMBORSKA, A., LIDKE, K. A., RIEGER, B. and
STALLINGA, S. (2015). Quantitative localization microscopy: Effects of photophysics and la-
beling stoichiometry. PLoS ONE 10 e0127989.
OBER, R. J., RAM, S. and WARD, E. S. (2004). Localization accuracy in single-molecule mi-
croscopy. Biophys. J. 87 1185–1200.
OBER, R., TAHMASBI, A., RAM, S., LIN, Z. and WARD, E. (2015). Quantitative aspects of single-
molecule microscopy: Information-theoretic analysis of single-molecule data. IEEE Signal Pro-
cess. Mag. 32 58–69.
PATEL, L., GUSTAFSSON, N., LIN, Y., OBER, R., HENRIQUES, R. and COHEN, E. (2019). Sup-
plement to “A hidden Markov model approach to characterizing the photo-switching behavior of
ﬂuorophores.” DOI:10.1214/19-AOAS1240SUPPA, DOI:10.1214/19-AOAS1240SUPPB.
QIN, F., AUERBACH, A. and SACHS, F. (1996). Estimating single-channel kinetic parameters from
idealized patch-clamp data containing missed events. Biophys. J. 70 264–280.
QIN, F. and LI, L. (2004). Model-based ﬁtting of single-channel dwell-time distributions. Biophys.
J. 87 1657–1671.
RABINER, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech
recognition. Proc. IEEE 77 257–286.
RAM, S., WARD, E. S. and OBER, R. J. (2013). A stochastic analysis of distance estimation ap-
proaches in single molecule microscopy: Quantifying the resolution limits of photon-limited
imaging systems. Multidimens. Syst. Signal Process. 24 503–542. MR3041619
RIEF, M., ROCK, R. S., MEHTA, A. D., MOOSEKER, M. S., CHENEY, R. E. and SPUDICH, J. A.
(2000). Myosin-V stepping kinetics: A molecular model for processivity. Proc. Natl. Acad. Sci.
USA 97 9482–9486.
RIEGER, B. and STALLINGA, S. (2014). The lateral and axial localization uncertainty in super-
resolution light microscopy. ChemPhysChem 15 664–670.
ROLLINS, G. C., SHIN, J. Y., BUSTAMANTE, C. and PRESSÉ, S. (2014). Stochastic approach to
the molecular counting problem in superresolution microscopy. Proc. Natl. Acad. Sci. USA 112
110–118.
ROTHENBERG, T. J. (1971). Identiﬁcation in parametric models. Econometrica 39 577–591.
MR0436944
RUST, M. J., BATES, M. and ZHUANG, X. (2006). Sub-diffraction-limit imaging by stochastic op-
tical reconstruction microscopy (STORM). Nat. Methods 793–795.
SAGE, D., KIRSHNER, H., PENGO, T., STUURMAN, N., MIN, J., MANLEY, S. and USHER, M.
(2015). Quantitative evaluation of software packages for single-molecule localization microscopy.
Nat. Methods 12 717–724.
SHARONOV, A. and HOCHSTRASSER, R. M. (2006). Wide-ﬁeld subdiffraction imaging by accumu-
lated binding of diffusing probes. Proc. Natl. Acad. Sci. USA 103 18911–18916.
THOMPSON, R. E., LARSON, D. R. and WEBB, W. W. (2002). Precise nanometer localization
analysis for individual ﬂuorescent probes. Biophys. J. 82 2775–2783.
VAN DE LINDE, S. and SAUER, M. (2014). How to switch a ﬂuorophore: From undesired blinking
to controlled photoswitching. Chem. Soc. Rev. 43 1076–1087.
VAN DE LINDE, S., WOLTER, S., HEILEMANN, M. and SAUER, M. (2010). The effect of photo-
switching kinetics and labeling densities on super-resolution ﬂuorescence imaging. J. Biotechnol.
149 260–266.
VOGELSANG, J., STEINHAUER, C., FORTHMANN, C., STEIN, I. H., PERSON-SKEGRO, B.,
CORDES, T. and TINNEFELD, P. (2010). Make them blink: Probes for super-resolution mi-
croscopy. ChemPhysChem 11 2475–2490.

----!@#$NewPage!@#$----
CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES
1429
L. PATEL
E. COHEN
DEPARTMENT OF MATHEMATICS
IMPERIAL COLLEGE LONDON
LONDON, SW7 2AZ
UNITED KINGDOM
E-MAIL: lp1611@ic.ac.uk
e.cohen@imperial.ac.uk
N. GUSTAFSSON
MRC LABORATORY FOR
MOLECULAR CELL BIOLOGY
UNIVERSITY COLLEGE LONDON
LONDON, WC1E 6BT
UNITED KINGDOM
E-MAIL: nils.gustafsson.13@alumni.ucl.ac.uk
Y. LIN
CELL BIOLOGY AND BIOPHYSICS
EUROPEAN MOLECULAR BIOLOGY LABORATORY
69117 HEIDELBERG
GERMANY
E-MAIL: yu.lin@embl.de
R. OBER
CENTRE FOR CANCER IMMUNOLOGY
FACULTY OF MEDICINE
UNIVERSITY OF SOUTHAMPTON
SOUTHAMPTON
UNITED KINGDOM
AND
DEPARTMENT OF BIOMEDICAL ENGINEERING
TEXAS A&M UNIVERSITY
COLLEGE STATION, TEXAS 77843
USA
E-MAIL: raimund.ober@tamu.edu
R. HENRIQUES
MRC LABORATORY FOR
MOLECULAR CELL BIOLOGY
UNIVERSITY COLLEGE LONDON
LONDON, WC1E 6BT
UNITED KINGDOM
AND
FRANCIS CRICK INSTITUTE
LONDON, NW1 1AT
UNITED KINGDOM
E-MAIL: r.henriques@ucl.ac.uk