The Annals of Applied Statistics 2019, Vol. 13, No. 3, 1397–1429 https://doi.org/10.1214/19-AOAS1240 © Institute of Mathematical Statistics, 2019 A HIDDEN MARKOV MODEL APPROACH TO CHARACTERIZING THE PHOTO-SWITCHING BEHAVIOR OF FLUOROPHORES BY LEKHA PATEL∗,1, NILS GUSTAFSSON†,2, YU LIN‡, RAIMUND OBER§,¶,3, RICARDO HENRIQUES†,∥,4 AND EDWARD COHEN∗,5 Imperial College London∗, University College London†, European Molecular Biology Laboratory Heidelberg‡, Texas A&M§, University of Southampton¶ and Francis Crick Institute∥ Fluorescing molecules (fluorophores) that stochastically switch between photon-emitting and dark states underpin some of the most celebrated ad- vancements in super-resolution microscopy. While this stochastic behavior has been heavily exploited, full characterization of the underlying models can potentially drive forward further imaging methodologies. Under the assump- tion that fluorophores move between fluorescing and dark states as continu- ous time Markov processes, the goal is to use a sequence of images to select a model and estimate the transition rates. We use a hidden Markov model to relate the observed discrete time signal to the hidden continuous time pro- cess. With imaging involving several repeat exposures of the fluorophore, we show the observed signal depends on both the current and past states of the hidden process, producing emission probabilities that depend on the transi- tion rate parameters to be estimated. To tackle this unusual coupling of the transition and emission probabilities, we conceive transmission (transition- emission) matrices that capture all dependencies of the model. We provide a scheme of computing these matrices and adapt the forward-backward al- gorithm to compute a likelihood which is readily optimized to provide rate estimates. When confronted with several model proposals, combining this procedure with the Bayesian Information Criterion provides accurate model selection. 1. Introduction. Fluorescence microscopy is a collection of techniques that utilize the photon emitting properties of fluorescing molecules, called fluo- rophores, to perform optical imaging, particularly in cell biology and biomedical applications. Recent years have seen the advent of a number of super-resolution Received January 2018; revised September 2018. 1Supported by the Imperial College London President’s Scholarship. 2Supported by the Engineering and Physical Sciences Research Council (EP/L504889/1). 3Supported by the National Institute of Health (R01 GM085575). 4Supported by the Medical Research Council (MR/K015826/1) and Biotechnology and Biological Sciences Research Council (BB/M022374/1). 5Corresponding author. Key words and phrases. Hidden Markov models, Markov processes, rate estimation, forward- backward algorithm, super-resolution microscopy. 1397 ----!@#$NewPage!@#$---- 1398 L. PATEL ET AL. microscopy techniques that have bypassed the classical resolution limits of flu- orescence microscopy (Huang, Bates and Zhuang (2009)). Specifically, single molecule localization microscopy (SMLM) approaches, such as photoactivated localization microscopy (PALM) (Betzig et al. (2006), Hess, Girirajan and Mason (2006)) and stochastic optical reconstruction microscopy (STORM) (Rust, Bates and Zhuang (2006), Heilemann et al. (2008)), rely on the ability exhibited by some fluorophores to photoswitch stochastically between a photon emitting On state and nonemitting dark states (Van de Linde and Sauer (2014), Ha and Tinnefeld (2012)). A specimen decorated with a spatially dense number of photon emitting fluorophores prevents accurate identification of individual fluorophores and res- olution of structures smaller than the diffraction limit—see Figure 1(a). Using a fluorophore with stochastic photo-switching properties can provide an imaging en- vironment where the majority of fluorophores are in a dark state, while a sparse number have stochastically switched into a transient photon emitting On state. This results in the visible fluorophores being sparse and well separated in space; with the use of a high-performance camera the individual fluorophores in the On state can be identified and localized with nanometer scale precision by fitting point spread functions (Ober et al. (2015), Sage et al. (2015))—see Figure 1(b). Through the acquisition across time of a large sequence of images (typically thousands)— see Figure 1(a)—many more photo-switching fluorophores can be isolated in time and precisely localized in space. When aggregated and plotted, these localizations provide an accurate and detailed map of fluorophore positions giving rise to a super-resolved image. Lateral resolutions of 10–30 nanometers (nm) are possible in biological sam- ples using SMLM, however, the resolution and image quality is strongly depen- dent on the photo-switching properties of the fluorophore used. While longer On states provide a greater number of photons being recorded by the camera, FIG. 1. (a) Illustration of the SMLM imaging process. When all fluorophores simultaneously stay in a photon emitting On state, diffraction renders structures unresolvable. Stochastically photo-switch- ing fluorophores imaged over time across several frames give rise to a sequence of sparsely populated images where each fluorophore can be isolated and localized with high precision. Aggregating these frames gives rise to a super-resolved image. Data from Sage et al. (2015). (b) Isolated fluorophores are localized by fitting the point spread function (PSF) to the diffraction limited spot. ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1399 which in turn leads to greater precision in localizing spatially isolated fluorophores (Ober, Ram and Ward (2004), Ram, Ward and Ober (2013), Thompson, Larson and Webb (2002), Rieger and Stallinga (2014)), the increased random occurrence of fluorophores simultaneously occupying the On state within a diffraction lim- ited spot can lead to significant imprecision, missed events and unwanted arti- facts (Van de Linde et al. (2010), Nieuwenhuizen et al. (2015)). Thus, a careful choice of fluorophore and the environment used to promote photo-switching— controlled by the buffer solution and illumination intensity—must be made for the intended application. This is particularly important in live-cell applications when considerations must be made for temporal resolution and reduced laser intensi- ties. To inform the choice of fluorophore with its environment, and aid the develop- ment of novel fluorophores, accurate characterization of the photo-kinetic model of the fluorophore, together with estimation of photo-switching rates (the rate at which fluorophores transition between On and dark states) is required (Dempsey et al. (2011), Lehmann et al. (2016)). Further, accurate knowledge of the photo- switching characteristics could be employed to maximize resolutions achieved us- ing advanced analytical methods, for example, 3B analysis (Cox et al. (2011)) and DeconSTORM (Mukamel, Babcock and Zhuang (2012)) and improve the performance of molecular counting techniques (Rollins et al. (2014), Lee et al. (2012)). Several attempts have been made to model the kinetic schemes of fluorophore photo-switching and estimate the corresponding photo-switching rates. These ki- netic schemes, as is common across single molecule biophysics, are characterized by Markovian transitions between a finite set of discrete states and are therefore ideally suited to being modeled as continuous time Markov processes. In Figure 2 are four models for photo-switching fluorophores. The first, Figure 2(a), depicts a typical kinetic model, accompanied by the state-space diagram we will adopt in this paper. This model contains a photon emitting On state 1 (involving rapid transitions between excited state S1 and ground state S0 via the absorption and emission of a photon), two temporary dark states 0 and 01 (the triplet state, T1 and the redox states, F+ and F−) and an absorption 2 (BF/BT0/BS0) which in this application is known as the photobleached state. Then in Figures 2(b)–2(d) are three further common state space models. Figure 2(b) portrays a photo-switching model with a simple two state {On(1) Dark(0)} structure. Models of this type are suitable for super-resolution methods including point accumulation for imag- ing in nanoscale topography (PAINT) and DNA-PAINT (Jungmann et al. (2010), Sharonov and Hochstrasser (2006)). Figure 2(c) depicts a model that incorporates an absorbing state 2. This form of photo-switching followed by absorption de- scribes a first approximation to the behavior that occurs spontaneously in a num- ber of organic fluorophores and post-activation of photoactivatable proteins (Ha and Tinnefeld (2012), Van de Linde and Sauer (2014), Vogelsang et al. (2010)). Figure 2(d) considers a model in which three distinct dark states are hypothesized ----!@#$NewPage!@#$---- 1400 L. PATEL ET AL. FIG. 2. Common models used to describe the continuous time photo-switching dynamics of a fluo- rophore with homogeneous transition rates. See text for details. which in some cases is a necessary extension to model (c), for instance when very rapid imaging is used (Lin et al. (2015)). The challenge comes in selecting the correct model and estimating the transition rates of the continuous time Markov process {X(t) : t ∈ R≥0} from an observed discrete-time random process {Yn : n ∈ Z≥0}. Here, R≥0 and Z≥0 denote the non- negative reals and integers, respectively. Typically, {Yn} is derived from a sequence of images (frames) with Yn corresponding to the observed state of the molecule in the nth frame. This is formed by an exposure of the continuous time process {X(t)} over the time-interval [n�,(n + 1)�), where � is the frame length. Process {Yn} can either be a sequence of photon fluxes associated with that molecule for each frame (Figure 3(a)), or a simple sequence of 1’s and 0’s indicating if the molecule was detected in the frame or not (Figure 3(b)). In all cases, the observations are subject to the effects of noise and instrument limitations. Essential to the subse- quent analysis, therefore, is the ability to account for missed state transition events due to noise and the temporal resolution of the data acquisition, as well as the de- tection threshold used to determine the state of the system (Figure 3(c)). Similar problems occur in other areas of biophysics where estimating transition rates of an underlying continuous time Markov process must be inferred from an observed discrete time signal. In particular, ion-channels have formed the focus of much ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1401 FIG. 3. (a) A simulated intensity signal of a fluorophore across time. Each measurement corre- sponds to the intensity in a frame. 7500 frames were recorded over 250 seconds at a rate of 30 frames per second. (b) Close up of the signal over the time window of 35 s to 55 s. In red is the observed signal {Yn} indicating if the fluorophore was detected in a particular frame. (c) A further close up of the signal showing intensity read-outs for independent frames. The true, hidden photon emitting On state of the molecule is also indicated, demonstrating how sub-frame length photon emitting events can be missed due to noise or the temporal resolution of the data acquisition. work (Colquhoun and Hawkes (1981), Qin and Li (2004), Rief et al. (2000)), in- cluding methods that attempt to account for missed events (Qin, Auerbach and Sachs (1996), Colquhoun, Hawkes and Srodzinski (1996), Hawkes, Jalali and Colquhoun (1990), Hawkes, Jalali and Colquhoun (1992), Epstein, Calderhead, Girolami and Sivilotti (2016)). However, the mechanism by which the observed signal is obtained and processed from the raw signal is fundamentally different to that of fluorescence microscopy imaging. Up until now, methods for estimating photo-switching transition rates in fluo- rescence microscopy are limited. The method in Lin et al. (2015) involves defining {Yn} to be the sequence of 1’s and 0’s and extracting the dwell times, namely ----!@#$NewPage!@#$---- 1402 L. PATEL ET AL. the durations when Yn is in the On state and when it is in its dark states. As- suming these dwell times to be exponentially distributed (or equal in distribution to a sum of exponentially distributed random variables in the case of multiple dark states), maximum likelihood estimates of the transition times are then com- puted. This method, termed here as exponential fitting and given a detailed discus- sion in Supplementary Materials Section S5 (Patel et al. (2019)), has two flaws. First, it does not correctly account for the effect of the imaging procedure on the stochastic structure of the discrete time process. Second, it does not allow for the absorbing (photobleached) state, which must be identified and accounted for by truncation of the data to the last observed On state. This is especially trouble- some as, to an observer, it is indistinguishable from a temporary dark state. This method therefore results in the absence of estimates for the absorption rate and can lead to significantly biased estimates of the transition rates between On and dark states. Hidden Markov models (HMMs) are used widely across scientific and engineer- ing disciplines to relate a sequence of observations, called emissions, to the states of an unobserved (hidden) Markov process, the target of inference. Their use is par- ticularly prevalent in image processing where the observations are a sequence of images in time and it is commonly assumed that each image is dependent only on the state of the hidden process at the time at which it is observed. Such an approach has been proposed for this problem in Greenfeld et al. (2015), where the hidden process is a discretized version of {X(t)}. Here, they let {Yn} be the sequence of photon-fluxes such that it is a standard (first-order) HMM with Poisson emissions. They then implement the Baum–Welch algorithm (Baum and Petrie (1966), Baum and Eagon (1967), Baum and Sell (1968), Baum et al. (1970)) to estimate the tran- sition probabilities of the discretized process and use an approximation to obtain the transition rates of the continuous time process {X(t)}. In doing so, they ac- knowledge that missed events will heavily bias rate estimates. Furthermore, their model is also unable to deal with the absorbing state. In this paper we provide two important contributions. First, in Section 2, by considering a general model for {X(t)} that includes multiple dark states and an absorbing state, we rigorously formulate the discrete time stochastic process {Yn} that indicates whether a molecule is detected in each frame. A crucial part of this formulation is recognizing that an image is not formed from an instanta- neous sampling of the true state, as is usually assumed in image processing, but is instead formed by exposing a camera sensor over a time interval of length �. That is to say, Yn is not dependent on just X(n�), but instead on the integral (i.e., all values) of {X(t)} over the interval [n�,(n + 1)�). Taking consideration of noise and instrument sensitivity, we fully account for missed events and give important results on the stochastic structure of {Yn}, including showing it is non- Markovian. ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1403 The second contribution of this paper is to propose novel methodology for esti- mating the state transition rates of {X(t)} under this correct treatment of the imag- ing procedure. In Section 3, we develop an HMM for {Yn} where we first imple- ment a time discretization scheme on the hidden Markov process {X(t)}. Crucially, as discussed above, correct understanding of the imaging procedure dictates two key properties. First, Yn depends on both the current (end of frame) and previous (beginning of frame) hidden states, X((n + 1)�) and X(n�), respectively. Sec- ond, this HMM possesses emission probabilities that are dependent on the static parameters of the hidden process state transitions that we ultimately wish to esti- mate. This coupled behavior renders traditional expectation maximization (EM)- type methods (e.g., Baum et al. (1970)) of parameter estimation inappropriate. We therefore make the novel step of introducing what we call transmission (transition- emission) matrices that incorporate this coupling between transition and emission probabilities by capturing all the dependencies in the model. For a given photo- switching kinetic model, we provide both a scheme for computing these matrices and an adaptation of the forward-backward algorithm to compute the likelihood of observations. Through numerical optimization we are able to compute maximum likelihood estimates of the transition rate parameters for the continuous time pro- cess {X(t)} that we wish to draw inference on. A bootstrapping scheme is also presented for computing confidence intervals. In the case of an unknown kinetic model, we propose the use of the Bayesian information criterion (BIC) for select- ing the best suited model from a set of proposals, thus also providing a powerful tool for chemists wishing to infer the number of quantum states a particular fluo- rophore can exist in. In Section 4, we provide extensive empirical analysis of the proposed method. We begin this section with a simulation study that compares this new estimation scheme to the exponential fitting method on a range of photo-switching models, demonstrating significant improvements in both the bias and the variance of our rate estimates. We further show the BIC performs accurate model selection when presented with a range of model proposals. We then proceed with a discussion on identifiability and consistency, providing empirical evidence that our model is identifiable and estimators consistent under normal experimental conditions. We further demonstrate that the bootstrapping scheme proposed in Section 3 for com- puting confidence intervals has approximately the correct coverage, and conclude with a discussion on length biased sampling. In Section 5, the estimation scheme presented in this paper is applied to the Alexa Fluor 647 data originally analyzed by the exponential fitting method in Lin et al. (2015), consistently selecting the hypothesized three temporary off-state model (Figure 2(d)) and revealing clear dependence between laser intensity and key transition rate parameters. In the ac- companying Supplementary Materials, as well as key mathematical details, we include an extensive simulations section where we report a significant improve- ment on rate estimates across a range of models and relevant experimental condi- tions. ----!@#$NewPage!@#$---- 1404 L. PATEL ET AL. 2. Modeling photo-switching behavior. The true photo-switching behavior of the fluorophore is a continuous time stochastic phenomenon. However, an ex- perimenter can only ever observe a discretized manifestation of this by imag- ing the fluorophore in a sequence of frames. These frames are regarded as a set of sequential exposures of the fluorophore and the observed discrete time sig- nal indicates whether the fluorophore has been observed in a particular frame. It is the continuous time process on which we wish to draw inference based on the observed discrete-time process indicating whether the fluorophore was ob- served in a frame. In this section we first present the continuous time Markov model of the true (hidden) photo-switching behavior, and then derive the ob- served discrete time signal, together with key results on its statistical proper- ties. 2.1. Continuous time. We model the true photo-switching effect of the fluo- rophore as a continuous time Markov process, {X(t) : t ∈ R≥0} with discrete state space SX. In this paper we consider a general model for {X(t)} that can accommodate the numerous mechanisms of photo-switching utilized in standard SMLM approaches such as (F)PALM and (d)STORM. Specifically, this model consists of a pho- ton emitting (On) state 1, m + 1 non photon emitting (dark/temporary off) states 00,01,...,0m, where m ∈ Z≥0, and a photobleached (absorbing/permanently off) state 2. We denote the state 00 ≡ 0 for the m = 0 case of a single dark state. The model, illustrated in Figure 4, allows for transitions from state 1 to the multi- ple dark states (from a photochemical perspective, these can include triplet, redox and quenched states). These dark states are typically accessed via the first dark state 0 (reached as a result of inter-system crossing of the excited S1 electron to the triplet T1 state; see Figure 2(a)). Further dark states 0i+1, i = 0,...,m − 1, are accessible by previous dark states 0i (by, e.g., the successive additions of electrons forming radical anions (Van de Linde et al. (2010))). We allow the On state 1 to be accessible by any dark state and we consider the most general model in which the absorption state 2 is accessible from any combination of other states (Ha and Tinnefeld (2012), Van de Linde and Sauer (2014), Vogelsang et al. (2010)). FIG. 4. General m + 3 state (m ∈ Z≥0) model of a fluorophore. ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1405 The state space of {X(t)} is SX = {0,01,...,0m,1,2} and is of cardinality m + 3. We denote λij to be the transition rate between states i and j and μi to be the absorbing rate from state i to 2, where i,j ∈ ¯SX := SX \ {2}. The generator matrix for {X(t)} is therefore given as (2.1) G = ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ −σ0 λ001 0 0 0 0 ... λ01 μ0 0 −σ01 λ0102 0 0 0 ... λ011 μ01 0 0 −σ02 λ0203 0 0 ... λ021 μ02 ... ... ... ... ... ... ... ... ... 0 0 0 0 0 ... −σ0m λ0m1 μ0m λ10 0 0 0 0 0 ... −σ1 μ1 0 0 0 0 0 0 ... 0 0 ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ , where σ0m = λ0m1 + μ0m, σ1 = λ10 + μ1 and when m > 0, σ0i = λ0i0i+1 + λ0i1 + μ0i, for i = 0,...,m − 1. For full characterization, we define its initial probability mass νX := (ν0 ν01 ... ν0m ν1 ν2)⊤ with � j∈SX νj = 1. Typically, all fluorophores receive an initial excitation to the photon-emitting state, thus the most commonly occurring probability mass vector in practice has ν1 = 1. Moreover, although the case when 0 < ν2 < 1 may give rise to fluorophores that are never observed, for inference purposes, we discard all traces containing no observations (1’s) of fluo- rophores and set ν2 = 0. In this paper, we will refer to specific models (from that shown in Figure 4) in the form Mm A . Here, m is the number of multiple dark states beyond the 00 state that is present in all models, and A ⊆ ¯SX denotes the set of states from which the absorption state 2 is accessible. For the three classical models presented in Figure 2: model (b) is M0 ∅: the m = 0 case where μ0 = μ1 = 0, model (c) is M0 {0,1}: the m = 0 case where μ0,μ1 > 0, and model (d) is M2 ∅: the m = 2 case where μ0 = μ01 = μ1 = 0. 2.2. Discrete time observation process. Having presented the continuous time model for the true photo-switching behavior, we will now introduce the model for the observed discrete time process and show how the transition rates given in (2.1) are not amenable to direct estimation. The imaging procedure requires taking a series of successive frames. Frame n is formed by an exposure over the time interval [n�,(n +1)�), where n ∈ Z≥0. The constant � corresponds to the exposure time for a single frame, also known as the frame length. We define the discrete time observed process {Yn : n ∈ Z≥0}, with state space SY = {0,1}, as Yn = 1 if the fluorophore (characterized by {X(t)}) is observed in frame n and equal to 0 otherwise. For the fluorophore to be observed in the time interval [n�,(n+1)�) it must be in the On state 1 for a minimum time of δ ∈ [0,�). The value of δ is unknown and is a result of background noise and the imaging system’s limited sensitivity. We note that if {X(t)} exhibits multiple jumps to state 1 within a frame, then a sufficient condition for observing the fluorophore ----!@#$NewPage!@#$---- 1406 L. PATEL ET AL. FIG. 5. Illustration of how the states for Yn derive from the process X(t). is that the total time spent in the On state exceeds δ. The δ = 0 case is the idealistic scenario of a noiseless system and perfect sensitivity such that the fluorophore is detected if it enters the On state for any nonzero amount of time during the exposure time �. We formally define the observed process as (2.2) Yn = 1[δ,�) �� (n+1)� n� 1{1} �X(t) �dt � , where 1A(·) is the indicator function such that 1A(x) = 1 if x ∈ A and is zero otherwise. Figure 5 illustrates the manifestation of the discrete time signal {Yn} from the continuous time signal {X(t)}. 2.3. The inference problem. The inference problem is two-fold. First, for a given model, the aim is to estimate the (4m + 8) unknown parameters θ = � λ001 ... λ0m−10m λ01 ... λ0m1 λ10 μ0 ... μ0m μ1 νX δ �⊤ from a finite length realization of {Yn}. Crucially, it is shown in Supplementary Materials Section S1 (Patel et al. (2019)) that {Yn} does not exhibit the Markov property (of any order) for any m ∈ Z≥0, and for any � and δ such that � > δ ≥ 0. The non-Markovianity excludes classical inference methods and motivates the use of a Hidden Markov Model (HMM), with a likelihood based approach for estimat- ing θ. Beyond this, it may be the case that the true model (characterized by its num- ber of dark states) is unknown and may need to be selected in addition to esti- mating the unknown parameters. We tackle both of these problems in the next section. 3. Characterizing photo-switching behavior. Hidden Markov models, first presented in Baum and Petrie (1966), relate a sequence of observations to the states of an unobserved or hidden Markov chain. The aim of building a hidden Markov model (HMM) is to allow inference on the hidden process using these observations. In its simplest form, an HMM assumes the propagation of both ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1407 state and observed sequences to be in discrete time, and a general first order HMM assumes that the observation process {Yn : n ∈ Z≥0} is related to a hid- den first order Markov Chain {Xn : n ∈ Z≥0} via an emission probability distribu- tion B := (B)i,j = P(Yn = j|Xn = i), considered to be fully independent of the static parameters that characterize the probability distribution of state transitions P := (P)i,j = P(Xn = j|Xn−1 = i). In this setting we say B and P are decou- pled. For a sequence y0,y1,...,yN of observations from this model, the Baum– Welch re-estimation algorithm (Baum and Petrie (1966), Baum and Eagon (1967), Baum and Sell (1968), Baum et al. (1970)) is an EM type method that utilizes the forward-backward algorithm (see Levinson et al. (1983), for details) to opti- mize the likelihood function and compute maximum likelihood estimates of νX (the probability mass of X0), B and P . This in turn can be used to estimate param- eters of the emission and state transition probabilities. When the hidden Markov process and/or the observation process are of higher order, the HMM can be trans- formed to a general first order process (Du Preez (1998), Lee and Lee (2006), Ching, Fung and Ng (2003)) and Baum–Welch can be applied in the usual way. Readers are directed to MacDonald and Zucchini (1997) for a comprehensive re- view. While standard, first (or higher) order HMMs have been extensively studied and are most frequently used in applications, the rigid framework of being in discrete time with emission probabilities decoupled from state transition probabilities is not always suitable, as we will now show is the case for images formed by expo- sures over a time interval. We take time to carefully formulate the HMM suitable for this application, presenting what we call transmission (transmission-emission) matrices to capture the dependencies in the model. We then go on to provide a novel adaptation of the forward-backward algorithm to estimate θ, the unknown parameters of our HMM in the case of a known state-space SX of the hidden pro- cess. We will then show how the Bayesian information criterion (BIC) can be used for model selection and parameter estimation in the case of an unknown state- space. 3.1. Photo-switching hidden Markov model. In this section we build an HMM for our observation process {Yn}, which we call the Photo-switching hidden Markov model (PSHMM). The first immediate reason as to why the standard set-up outlined above is inappropriate for this application is because the hidden Markov process {X(t)} evolves in continuous time. To deal with this, we need to adopt a time-discretization scheme for the hidden process. Analogously to Liu et al. (2015), we state that {X(t)} propagates in �-separated discrete time steps ac- cording to the transition probability matrix P� = eG�, where G is given in (2.1). Our hidden process is therefore now represented by the discrete time Markov chain {X(n�) : n ∈ Z≥0}. ----!@#$NewPage!@#$---- 1408 L. PATEL ET AL. FIG. 6. Illustration of the HMM setup. (a) Traditional HMM where observed state is dependent on current hidden. (b) Our HMM where observed state depends on both the current and past hidden states. When Yn depends solely on X(n�) (see Figure 6(a)) and the correspond- ing emission matrix B is decoupled from P , a continuous time EM algorithm (Liu et al. (2015)) analogous to the Baum–Welch can be used to estimate νX, B and P . However, this will be inappropriate in our setting for two related rea- sons. First, we have shown in Section 2, specifically equation (2.2), that expos- ing images over a nonzero length of time means Yn depends on the full path of {X(t)} within the interval [n�,(n + 1)�). To correctly deal with this it is necessary to construct the HMM to consider dependence between Yn and both X(n�) and X((n + 1)�) (see Figure 6(b)). Second, this construction of {Yn} in (2.2) means the emission probabilities are clearly dependent on the static pa- rameters θ of the hidden process and are therefore coupled with P . The EM procedures highlighted above require decoupled B and P so that at each step the quasi-likelihood can be optimized separately. To the best of our knowledge, methods for dealing with coupled systems have not been dealt with in the lit- erature. While an EM algorithm could be used for a coupled system, analytic forms for the update steps would in general be intractable, leading to numeri- cal maximization procedures at each iteration, thereby increasing computational complexity. We will now formally characterize the PSHMM and provide a novel method for estimating the unknown static parameters in the case of a coupled sys- tem. 3.1.1. Formal characterization of the PSHMM. Formally, we characterize our PSHMM with: 1. An initial probability vector νX = (ν0 ν01 ... ν0m ν1 ν2)⊤ where νi := P(X(0) = i) for i ∈ SX; ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1409 2. Transmission matrices (3.1) B(l) � = ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ b(l) 00,� b(l) 001,� ... b(l) 00m,� b(l) 01,� b(l) 02,� b(l) 010,� b(l) 0101,� ... b(l) 010m,� b(l) 011,� b(l) 012,� ... ... ... ... ... ... b(l) 0m0,� b(l) 0m01,� ... b(l) 0m0m,� b(l) 0m1,� b(l) 0m2,� b(l) 10,� b(l) 101,� ... b(l) 10m,� b(l) 11,� b(l) 12,� 0 0 0 0 ... b(l) 22,� ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ , where b(l) ij,� := P �Yn = l,X �(n + 1)� � = j|X(n�) = i � = P �Y0 = l,X(�) = j|X(0) = i � i,j ∈ SX,l ∈ SY , b(l) 22,� = 1{0}(l). These transmission matrices combine the transition and emission probabilities, thereby allowing us to account for a coupled system. The full mathematical formu- lation for deriving their forms involves conditioning on the number of jumps from all m + 1 dark states within the interval [0,�). From this, we use Laplace trans- forms and the distributions of state holding times to iteratively compute matrices that converge to our set of transmission matrices. A more detailed explanation of this methodology, along with full derivations and expressions is presented in Sup- plementary Materials Section S2 (Patel et al. (2019)). Furthermore, an algorithm (Algorithm 1) detailing all computational steps to evaluate these matrices suitable for any m ∈ Z≥0 (any number of multiple dark states) can be found in Supplemen- tary Materials Section S3 (Patel et al. (2019)). 3.2. Estimating unknown parameters of the PSHMM. We now provide an al- gorithm for estimating the unknown parameters θ of the PSHMM, which utilizes a suitable adaptation of the forward-backward dynamic programming algorithm (Rabiner (1989)), making use of the transmission matrices in (3.1). Let y = (y0 y1 ... yNF −1)⊤ be the sequence of observations across NF frames for a single photo-switching fluorophore. We define the forward-backward proba- bilities as αn,i = P �Y0 = y0,...,Yn−1 = yn−1,X(n�) = i � n = 1,...,NF , βn,i = P �Yn = yn,...,YNF −1 = yNF −1|X(n�) = i � n = 0,...,NF − 1. For each such n, we define the forward-backward vectors as αn = �αn,0 ... αn,0m αn,1 αn,2 �⊤ , βn = �βn,0 ... βn,0m βn,1 βn,2 �⊤ . ----!@#$NewPage!@#$---- 1410 L. PATEL ET AL. Using this notation, we can show that α⊤ n = α⊤ n−1B(yn−1) � for n = 2,...,NF and α⊤ 1 = ν⊤ XB(y0) � when n = 1. This yields the following recursion formula α0 = νX, α⊤ n = α⊤ n−1B(yn−1) � n = 1,...,NF , βNF = 1m+3, βn = B(yn) � βn+1 n = 0,...,NF − 1, (3.2) where 1m+3 is the (m + 3) × 1 vector of ones. It now follows that the likeli- hood of observation vector y given parameter vector θ is L(y;θ) = α⊤ n βn for all n = 0,...,NF . In particular, we have L(y;θ) = α⊤ NF 1m+3, which can be read- ily computed using the transmission matrices together with recursive computation for α⊤ n as indicated in (3.2). In the situation where we have NE ≥ 1 independent photo-switching fluorophores, the log-likelihood is given by ℓ(Y;θ) = NE � k=1 log �α⊤ NF ,k1m+3 �, (3.3) where Y = (y1 y2 ... yNE) and αNF ,k is the forward probability vector for emitter k = 1,...,NE. Maximizing (3.3) with respect to θ can be done either through numerically approximating derivatives or by using derivative-free optimization, for example with the Nelder–Mead algorithm. A discussion on multimodality and choosing a starting point for optimization can be found in Supplementary Materials Section S6 (Patel et al. (2019)). 3.2.1. Accounting for false positive observations. Occasionally, random peaks in the background noise may exceed the threshold value used to determine a flu- orophore in the On state, resulting in a false positive identification of the fluo- rophore. For experiments conducted over a large enough number of frames, this false positive rate may become significant in the observed process {Yn}. Specifically if ω ∈ [0,1] denotes the probability of falsely observing a fluo- rophore, assumed independent of the general observation process, then we may use the updated transmission matrices B∗(0) � = (1 − ω)B(0) � , B∗(1) � = B(1) � + ωB(0) � , in the evaluation of the log-likelihood ℓ(Y;θ∗) in (3.3). This would thus involve estimating θ∗ = [θ⊤ ω]⊤ from the observations Y. 3.3. Bootstrapping. When only one experiment is conducted to produce an NF × NE dataset Y, a single prediction ˆθ is obtained. In this circumstance, a bootstrapping scheme can be used to gain approximate confidence intervals for each component of θ. ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1411 In the same manner as is presented in Efron and Tibshirani (1993), we gen- erate R (typically large) bootstrap datasets Y∗1,Y∗2,...,Y∗R each consisting of re-sampled (with replacement) columns of Y. From each dataset, we acquire boot- strap replicated parameter estimates ˆθ ∗1, ˆθ ∗2,..., ˆθ ∗R using the same PSHMM maximum likelihood procedure used to obtain ˆθ. For 0.5 < p < 1, letting ˆθ∗ i,(p) and ˆθ∗ i,(1−p) be the 100 · pth and 100 · (1 − p)th empirical percentiles of the ith component of θ obtained from ˆθ∗1 i , ˆθ∗2 i ,..., ˆθ∗R i , a percentile bootstrap interval of length 1 − 2p is given by (see Efron and Tibshirani (1993)) [ ˆθi,%lo, ˆθi,%up] ≈ [ ˆθ∗ i,(p), ˆθ∗ i,(1−p)]. 3.4. Model selection. To determine the unknown number of multiple m dark states, we have chosen to use the Bayesian information criterion (BIC) to deter- mine the most likely model given data Y. Although similar to the Akaike informa- tion criterion (AIC), the BIC offers greater protection against over-fitting when the number of data-points is large, as is the case in this setting. The BIC is defined in our context as q log(NENF ) − 2ℓ(Y; ˆθ), where q denotes the number of unknown parameters estimated in θ and ℓ(Y; ˆθ) denotes the max- imized log-likelihood using the maximum likelihood estimates ˆθ. This criterion can be computed among all suitable models, with the most preferred being chosen as that with the smallest BIC value. 4. Simulations and analysis. Simulation studies have been conducted to as- sess and analyze the performance of the PSHMM method as detailed in Section 3. To make the results applicable, we restrict ourselves to realistic parameter values that typically occur in an experimental setting. 4.1. Performance on images and comparison with exponential fitting. To test the performance of parameter estimation against the exponential fitting method of Lin et al. (2015), synthetic imaging data of photo-switching fluorophores was simulated. We begin our focus on the model M0 {1}, since for many practical appli- cations the life-times of further dark (in particular the triplet (T1)) states is short relative to �. As such, this dark state has been considered as part of the meta-stable On state (Ha and Tinnefeld (2012), Vogelsang et al. (2010)). Since the predomi- nant pathway to absorption is via the triplet state, a simplified model can be used in which the absorption state 2 is only accessible from state 1. Given the popu- larity of this model and its ease of analysis, we have derived the exact solution of the corresponding transmission matrices (see Supplementary Materials Section S2 (Patel et al. (2019))). Details on the image simulation method and how the discretized state sequences were extracted can be found in Supplementary Materials Section S7 (Patel et al. (2019)). Global parameter values are also noted. The extracted state sequences ----!@#$NewPage!@#$---- 1412 L. PATEL ET AL. were analyzed using an implementation of Algorithm 1 (see Supplementary Ma- terial Section S3 (Patel et al. (2019))). The resulting parameter estimates were compared to estimates derived from the exponential fitting method, which was ex- tended in this study to allow the calculation of absorption rates (see Supplementary Materials Section S5 (Patel et al. (2019))). Table 2 (see the Appendix) shows estimated parameter statistics over 16 im- age simulation studies with 100 replicates (datasets) per study. Rate parameters θ, were chosen to cover a range of observed behaviors of organic fluorophores and fluorescent proteins (Dempsey et al. (2011)) with NE = 100 fluorophores per study. The number of frames NF in each study was adjusted to standardize the average number of transitions predicted from θ. Scatterplots of these rate esti- mates are presented in Figure 7. It is evident that the PSHHM yields estimates with much lower bias and root mean squared errors (RMSE) when compared to the exponential fitting method, although they have a tendency to increase as transition and absorption rates are increased. The reported empirical (2.5,97.5) percentile intervals contain the true parameter values across all studies for the PSHMM method and further highlight the bias in estimates obtained from expo- nential fitting. For experimenters, the effect of imaging parameters on the performance of the estimators is of particular interest and importance. Further simulation studies car- ried out under model M0 {1} highlight the consistency in both accuracy and precision of the PSHMM estimator across a range of different experimental conditions. Fig- ure 8 compares the PSHMM with exponential fitting rate estimates when we vary the emission intensity of the fluorophores (measured in the mean number of pho- tons each emits when in the On state for time �). Further investigation of other parameters, including the frame length (�), the number of frames (NF ) and the FIG. 7. Estimates of log10(λ01) and log10(λ10) simulated from model M0 {1} using both exponential fitting (a) and PSHMM fitting (b) are plotted in dark yellow and pink respectively. True rates are plotted as black crosses. Estimates for the absorption rate μ1, along with means, RMSEs and 2.5 and 97.5 empirical percentiles are given in Table 2 (see the Appendix). ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1413 FIG. 8. Top Left: Examples of single simulated frames at the indicated number of photons per frame (Supplementary Materials Section S7 (Patel et al. (2019))). Box-plots showing quantiles from estimates of λ01, λ10 and μ1 from both exponential fitting (black) and PSHMM fitting (gray) are plotted against increasing photons per frame. NF = 9872 for all simulations. True rates given by the blue line. detection threshold (proportional to δ) under this model, are provided in Supple- mentary Materials Section S7 (Patel et al. (2019)). Across the full range of relevant parameters tested, the PSHMM estimator performs significantly better than expo- nential fitting. To assess the accuracy of parameter estimates for the extended models m = 1 and m = 2 over fast, medium and slow switching scenarios, additional simulations were performed by directly sampling the continuous time processes {X(t)} and extracting the observation sequences Y as in (2.2), using fixed values of θ. Results from the analyses of these simulations are shown in Tables 3 and 4 in the Appendix. While it is evident that the estimates for λ0m0m+1 and λ0m+11 incur greater bias as m increases, the empirical (2.5,97.5) percentile intervals predominantly cover true parameter values, albeit over a larger area due to the increase in the RMSEs. As is seen when m = 0, the exponential fitting method performs less well, yielding much higher bias and RMSEs for particular parameter values. 4.2. Model selection. Using these simulated datasets, the BIC was used in model selection from the set of proposals {M0 {1},M1 {1},M2 {1}} (i.e., under the as- sumption that the absorption state was known to only be accessible by the On state). Applying model selection to the M0 {1} dataset used to estimate parameters in Table 2 results in the true state model being chosen in all (100%) cases. 100 ----!@#$NewPage!@#$---- 1414 L. PATEL ET AL. TABLE 1 Confusion table showing the empirical percentage of models predicted from three candidates: M0 {1}, M1 {1} and M2 {1} under simulation studies 16, 19 and 20 (see Tables 2, 3 and 4 in the Appendix), with NE = 300, δ = 1 100 and � = 1 50 s. 100 datasets from each study were generated and the BIC used to select the best fitted model Predicted → M0 {1} M1 {1} M2 {1} True ↓ M0 {1} 100 0 0 M1 {1} 0 98 2 M2 {1} 0 1 99 datasets, each for m = 0,1,2 were generated for studies 2,17 and 20 with � = 1 50s and NE = 300. These results presented in Table 1 demonstrate the accuracy of se- lecting the correct model. 4.3. Identifiability and consistency. We give a detailed discussion on the inter- related issues of identifiability and consistency in Supplementary Materials Sec- tion S4 (Patel et al. (2019)). Here, we summarize these results. The parameter values for “fast,” “medium” and “slow” transition rates can be found in Table S1, Supplementary Materials Section S4 (Patel et al. (2019)). The formal definition of model identifiability is there exists a bijective map- ping from the parameter space to the space of distributions for the data; or equiv- alently for any θ, there exists no other θ∗ in the parameter space such that ℓ(Y;θ) = ℓ(Y;θ∗) almost everywhere. Obtaining such a result for the PSHMM is highly nontrivial, if not intractable. We therefore explore the issue of identi- fiability through empirical studies. To do so, we begin by exploring local iden- tifiability. Parameter vector θ is said to be locally identifiable if there exists a neighborhood around it such that there is no other θ∗ in that neighborhood for which ℓ(Y;θ) = ℓ(Y;θ∗) almost everywhere (Little, Heidenreich and Li (2010)). It can be shown that θ is locally identifiable if and only if the Fisher information matrix I(θ) is nonsingular (Rothenberg (1971)), and this becomes our object of interest. Again, due to the complexity of the model, the Fisher information ma- trix can not be computed, however we can study local identifiability via the ob- served Fisher information matrix (the Hessian matrix of the log-likelihood func- tion) J (ˆθ) = −∇∇⊤ℓ(Y;θ)|θ=ˆθ evaluated at the maximum likelihood estimate ˆθ of θ (Colquhoun, Hatton and Hawkes (2003)). This is averaged over several re- peated simulations of data set Y. In particular, if J (ˆθ) is singular then ℓ(Y; ˆθ) is not a unique (local) maximum, typically due to a flat ridge in one or more direc- tions, and θ is unidentifiable. ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1415 Summarizing the findings presented in Supplementary Materials Section S4 (Patel et al. (2019)), in the sets of parameters studied, the observed Fisher infor- mation matrix is shown to be nonsingular in almost all circumstances, providing strong empirical evidence of structural identifiability over the parameter values most likely encountered in practice. It is obvious that the form of the observed Fisher information matrix will be dependent on the relative values of the known and unknown parameters. Broadly speaking, we find that as δ/� → 1, that is, the time a fluorophore needs to stay in the On state to be detected tends towards the frame rate, there appears to be a breakdown in identifiability. This is indicated by the observed correlation matrix, as derived from the observed covariance matrix J (ˆθ)−1, showing strong correlation between ˆδ and ˆλ01, and hence the existence of a ridge on the likelihood surface, albeit still with curvature. While still tech- nically identifiable, the correlation between these two parameter estimators indi- cates they cannot be independently identified and pose difficulties to numerical optimization methods (Jacquez and Greif (1985)). This effect is more pronounced for faster transition rates due to an increased chance of transitions into the On state not being observed. This trend as δ/� increases is expected as the model will be completely unidentifiable when δ = � as no fluorophore will be observed in the On state. However, for low values of δ/� (<0.2) as is typically encountered in practice, correlation between all elements of the estimator ˆθ is low, providing clear empirical evidence of locally identifiability for all parameter values stud- ied. We also see a breakdown in identifiability if the frame length is too large in comparison to the switching rates. For example, for a frame length of � = 1/30 s (30 frames per second), under slow and medium switching, the models appear to be identifiable, however under fast switching this breaks down. Increasing the frame rate to �−1 = 100 s−1 appears, from empirical evidence, to be sufficient for making the model identifiable. This intuitively seems correct as large frame lengths will fail to capture the nuanced photo-kinetic behavior of fast switching fluorophores. To get a handle on global identifiability we assess whether the likelihood sur- face has a unique global maximum or if there are further modes the optimization method is determining to be the maximum. For the 3 state case (m = 0), an ap- proximation scheme is used to find a suitable starting point for the Nelder–Mead simplex (see Supplementary Materials Section S6 (Patel et al. (2019))) and we en- sure we locate the correct mode. In the 4 and 5 state cases (m = 1 and m = 2, respectively), a stochastic search method is deployed that trials multiple starting points. Unimodal histograms for the parameter estimates would indicate a single global maximum, whereas a multi modal histogram would indicate further domi- nant modes being located instead. Analysis presented in Supplementary Materials Section S4 (Patel et al. (2019)) suggests a single global maximum in all cases with one exception; there appears to be two different modes being located for the λ0102 ----!@#$NewPage!@#$---- 1416 L. PATEL ET AL. parameter in the m = 2 model, although this disappears as the number of frames NF increases. Studies on consistency of the PSHMM maximum likelihood estimator cor- roborate our findings on identifiability. A break down in identifiability will re- sult in the estimator becoming inconsistent. In Supplementary Materials S4 (Patel et al. (2019)), empirical evidence suggests that the mean squared error tends to zero (as the number of emitters NE increases) when δ/� is within the normal range (<0.2). However, as δ/� increases towards 1, consistency of the estimator breaks down with it becoming more biased (although with a reduction in vari- ance). 4.4. Length biased sampling. Length biased sampling is an issue that could appear in practice. In our setting, this would occur if there are fluorophores whose traces we do not include when estimating the unknown parameters due to them never being observed. We note that for this to happen the fluorophore would have to never be in the On state for longer than δ in any frame. As soon as it is, we have observed it and can populate its trace up to that point with zeros. In the 3 state (m = 0) case, even under the extreme situation of δ = 0.9�, the probability of not observing any given fluorophore in all 10,000 frames is 1.3 × 10−2 for fast transi- tion rates and 1.0 × 10−3 for slow transition rates. Under a more realistic setting of δ = 0.1�, this reduces to 1.1×10−3 and 1.0×10−4 for fast and slow transition rates, respectively. Therefore, for a single experiment with 100 fluorophores, the probability all of them are observed is 0.90 when under fast switching, climbing to 0.99 for slow switching. 4.5. Bootstrap interval coverage. Simulations were performed to verify the coverage of the bootstrapped confidence intervals presented in Section 3.3. As an example, for the 3 state (m = 0) model under slow switching (see parame- ter values in row 1 of Table S1 in Supplementary Materials Section S4 (Patel et al. (2019))), the coverage of the 95% bootstrapped confidence intervals are 92.8% (λ01), 94.6% (λ10) and 94.6% (μ1). These results were obtained from 500 simulated bootstrap intervals with each interval being formed from 100 es- timates. 5. Application to Alexa Fluor 647 data. In this section we apply the method presented in this paper to the data analyzed with the exponential fitting method in Lin et al. (2015). The details, including experimental methods, can be found in this reference. In summary, antibodies labeled with Alexa Fluor 647 at a ratio of 0.13–0.3 dye molecules per antibody were sparsely absorbed to a cover slip and imaged by Total Internal Fluorescence microscopy to investigate the effect of eight different laser intensities on the photo-switching behavior of Alexa Fluor 647. The study contains 27 experiments with differing combinations of laser intensity and ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1417 frame rate. These values, together with the number of emitters detected and the number of frames over which they were imaged is summarized in Table 5 of the Appendix. For each photo-switchable molecule detected, the discrete observation trace indicating if the emitter was observed in each frame, was extracted (see Sup- plementary Materials Section S7 (Patel et al. (2019))). In all experiments, the true model and its associated parameters were unknown. Subsequently, we will show comparisons between estimates from both the PSHMM and modified exponential fitting methods.6 Initially, the BIC model selection criterion as outlined in Section 3.4 was used to select the most suitable model for the data from the range of models M0 ∅, M0 {0}, M0 {1}, M1 ∅, M1 {0}, M1 {01}, M1 {1}, M2 ∅, M2 {0}, M2 {01}, M2 {02} and M2 {1}, with the model M2 {1} being selected on all (100%) occasions. This supports Lin et al. (2015), who hypothesize this, with bleaching model and assume the M2 ∅ (without bleaching) model for rate estimates gained from exponential fit- ting. PSHMM maximum likelihood estimates were then computed for the esti- mation of θ∗ = (λ001 λ01 λ0102 λ011 λ021 λ10 μ1 νX δ ω)⊤ for each of the 27 datasets. Associated with these, 95% bootstrapped intervals were computed us- ing the method in Section 3.3 (R = 100 due to computational intensity). The re- sults are shown in Figure 9. Comparisons with exponential fitting bootstrapped re-estimates (where νX, δ and ω are not estimable in this setting) are also shown. The results indicate that the exponential fitting predicts a much slower switch- ing scenario for the Alexa Fluor 647 antibodies, with many estimates shown to be several orders of magnitude below those predicted by the PSHMM. This resembles the conclusions reached from the results of the simulation studies as described in Section 4 and are thought to occur as a result of the exponential fitting method missing events within frames. Incidentally, the higher variance of predictions from both methods are shown to be reported at higher laser intensities, where faster switching of fluorophores is promoted. This is especially pronounced in some par- ticularly large simulated confidence sets for the exponential fitting estimates of λ0102 and λ021 (see Figure 9). 6. Summary and discussion. Accurate measurement of fluorophore photo- switching rates has the potential to enable tailored design of single molecule localization microscopy experiments to specific requirements. For example, one may wish to select a fluorophore and photo-switching environment to achieve the rapid photo-switching at low laser intensities required for live-cell samples. Al- ternatively, one may wish to promote long off times required for densely packed 6We modified the exponential fitting algorithm used by Lin et al. (2015) to allow for the absorption parameter (see Supplementary Materials Section S5 (Patel et al. (2019)) for more details). ----!@#$NewPage!@#$---- 1418 L. PATEL ET AL. FIG. 9. Rate predictions and associated 95% bootstrap confidence sets are shown for λ01, λ10, μ1, λ011, λ0102, λ011 and λ021, for eight different values of laser intensity (see Table 5 in the Ap- pendix for exact experimental parameters). Intervals in black correspond to those from exponential fitting and those in gray correspond to those gained from the PSHMM. Point estimates from each of the 27 datasets are given by the diamond (PSHMM) or square (exponential). While in some cases PSHMM produces much wider intervals, it also yields less biased estimates than the exponential fitting method; see text for discussion. samples. Furthermore, precise estimates of photo-switching rates has the poten- tial to advance data processing methods used in single molecule localization mi- croscopy imaging, enabling more accurate image reconstruction and aiding proper quantitative analysis. For this purpose, we have presented a method for char- acterizing the photo-switching kinetics of fluorophores from a sequence of im- ages. For the most general continuous time photo-switching model, we have carefully defined the observation process and linked it to the hidden continuous time photo- switching behavior that we wish to infer upon. From this, we have formulated a hidden Markov model to link the observations to the continuous time photo- switching model. Importantly, images being formed by exposing the camera over a nonzero time interval violates the traditional assumption placed on HMMs that ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1419 the emission and transition probabilities are decoupled. To tackle this, we have introduced transmission matrices that capture all the dependencies present in the model and provided a detailed scheme for computing them for any continuous time photo-switching model. A modification of the forward-backward algorithm tailored for these coupled HMMs has been presented and numerical maximiza- tion of the computed likelihood was performed to generate accurate estimates of the true photo-switching rates. Through a detailed simulation study, these were compared to estimates from an existing exponential fitting method. We found that our proposed method of parameter estimation is highly robust to a range of simulated experimental parameters including low signal-to-noise ratios and fast frame rates, frequently outperforming estimates from exponential fitting. We fur- ther found that by using the BIC, it is possible to perform accurate model selection from a range of model proposals, thus providing a powerful new tool for chemists wishing to infer the number of quantum states a particular fluorophore can ex- ist in. Empirical analysis provided strong evidence that the PSHMM is identifi- able and the estimators approximately consistent in the normal parameter range encountered in experiments. Although, experimenters should ensure fast switch- ing fluorophores are imaged with greater frame rates to ensure model identifiabil- ity. The model selection and estimation method presented in this paper was then applied to real data collected from the study of Lin et al. (2015). We provide strong evidence of a relationship between laser intensity and photo-switching rates and support the hypothesis that Alexa Fluor 647 has three off-states in addition to a photo-bleached state. While this paper focuses on single molecule localization microscopy, the type of kinetic models discussed in this paper are unlikely to be unique to photo-switching fluorophores and super-resolution applications. Certainly, stochastic processes in which the observed signal depends on both the current and past states of a hidden process are likely to be a general feature of digital, discretized measurements of stochastic signals. This is particularly true in image processing where images are inevitably formed by exposing the camera’s sensor over a nonzero length time win- dow. The coupling between the emission and transition probabilities of the HMM is a direct consequence of this exposure time, and therefore it is likely that the presented methodology for dealing with this will find use in imaging applications that are beyond the scope of this paper. Further theoretical discussions and a comprehensive simulations and methods section, can be found in the Supplementary Materials. APPENDIX: RATE ESTIMATES See Tables 2–5. ----!@#$NewPage!@#$---- 1420 L. PATEL ET AL. TABLE 2 Simulation results showing mean, bias, root mean squared error (RMSE) and the 2.5 and 97.5 empirical percentiles of the estimates of θ = (λ01 λ10 μ1)⊤ under model M0 {1} for both the PSHMM and exponential fitting (Exp) methods across 100 repeat experiments. � = 1 30 s, δ,ω > 0 (unknown), the number of emitters NE = 100. NF indicates the number of frames simulated. For both methods, log−log scatterplots of λ01 and λ10 are shown in Figure 7 PSHMM PSHMM Exp Exp PSHMM PSHMM RMSE (2.5%,97.5%) Exp Exp RMSE (2.5%,97.5%) Study NF θ Mean Bias (×10−2) percentiles Mean Bias (×10−2) percentiles 1 16,800 0.32 0.32 0.00 0.97 (0.30,0.34) 0.29 −0.029 3.29 (0.26,0.32) 0.32 0.32 −0.001 0.76 (0.30,0.33) 0.31 −0.007 0.89 (0.30,0.32) 0.01 0.01 0.001 0.21 (0.01,0.02) 0.01 0.001 0.11 (0.01,0.01) 2 11,151 0.32 0.32 −0.001 0.66 (0.30,0.33) 0.30 −0.012 1.47 (0.29,0.32) 1 1.00 0.003 1.91 (0.96,1.04) 0.95 −0.053 5.95 (0.89,0.99) 0.03 0.03 0.001 0.35 (0.03,0.04) 0.03 0.001 0.33 (0.03,0.04) 3 9364 0.32 0.31 −0.004 0.68 (0.30,0.32) 0.30 −0.017 1.90 (0.28,0.32) 3.16 3.16 0.002 6.56 (3.05,3.28) 2.45 −0.712 77.85 (1.78,3.92) 0.11 0.11 0.001 1.02 (0.09,0.13) 0.09 −0.017 2.12 (0.06,0.12) 4 8799 0.32 0.30 −0.013 1.40 (0.29,0.31) 0.28 −0.032 3.35 (0.27,0.30) 10 9.96 −0.042 23.03 (9.52,10.42) 3.19 −6.809 690.87 (1.52,5.91) 0.33 0.35 0.014 3.79 (0.29,0.42) 0.12 −0.210 21.49 (0.06,0.25) 5 10,962 1 1.00 −0.002 1.86 (0.96,1.04) 0.90 −0.104 12.43 (0.74,1.01) 0.32 0.32 0.000 0.72 (0.30,0.33) 0.30 −0.013 1.47 (0.29,0.32) 0.01 0.01 0.000 0.10 (0.01,0.01) 0.01 0.001 0.11 (0.01,0.01) 6 5312 1 1.00 −0.004 1.81 (0.96,1.03) 0.95 −0.054 6.44 (0.87,1.01) 1 1.00 0.001 1.76 (0.96,1.04) 0.93 −0.066 6.88 (0.89,0.97) 0.03 0.03 0.001 0.29 (0.03,0.04) 0.03 0.001 0.28 (0.03,0.04) ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1421 TABLE 2 (Continued) PSHMM PSHMM Exp Exp PSHMM PSHMM RMSE (2.5%,97.5%) Exp Exp RMSE (2.5%,97.5%) Study NF θ Mean Bias (×10−2) percentiles Mean Bias (×10−2) percentiles 7 3526 1 0.99 −0.015 2.32 (0.95,1.02) 0.95 −0.053 5.78 (0.91,0.99) 3.16 3.17 0.003 7.33 (3.01,3.30) 2.71 −0.451 46.16 (2.50,3.89) 0.11 0.11 0.001 1.06 (0.08,0.13) 0.10 −0.004 0.99 (0.08,0.12) 8 2961 1 0.97 −0.033 3.91 (0.93,1.00) 0.91 −0.095 9.75 (0.85,0.95) 10 9.94 0.003 27.21 (9.46,10.47) 5.02 0.003 504.34 (3.66,6.52) 0.33 0.35 0.017 3.94 (0.28,0.42) 0.20 −0.133 13.77 (0.14,0.27) 9 9116 3.16 3.15 −0.008 6.88 (3.04,3.29) 2.31 −0.855 95.19 (1.64,3.04) 0.32 0.31 −0.002 1.53 (0.28,0.34) 0.28 −0.037 3.74 (0.27,0.29) 0.01 0.01 0.000 0.11 (0.01,0.01) 0.01 0.001 0.13 (0.01,0.01) 10 3466 3.16 3.13 −0.035 7.47 (3.01,3.28) 2.83 −0.335 37.76 (2.49,3.06) 1 1.00 0.004 4.04 (0.90,1.07) 0.87 −0.129 13.00 (0.84,0.90) 0.03 0.03 0.001 0.37 (0.03,0.04) 0.04 0.002 0.36 (0.03,0.04) 11 1680 3.16 3.11 −0.052 9.49 (2.98,3.31) 2.92 −0.245 25.63 (2.75,3.08) 3.16 3.18 0.015 9.12 (2.99,3.37) 2.60 −0.567 56.96 (2.48,3.70) 0.11 0.11 0.002 1.21 (0.09,0.13) 0.11 −0.000 0.98 (0.09,0.13) 12 1115 3.16 3.03 −0.135 14.73 (2.92,3.15) 2.79 −0.377 38.19 (2.66,3.92) 10 9.99 −0.008 24.86 (9.54,10.48) 6.35 −3.648 365.97 (5.72,6.93) 0.33 0.35 0.015 3.92 (0.29,0.44) 0.27 −0.061 6.79 (0.22,0.33) ----!@#$NewPage!@#$---- 1422 L. PATEL ET AL. TABLE 2 (Continued) PSHMM PSHMM Exp Exp PSHMM PSHMM RMSE (2.5%,97.5%) Exp Exp RMSE (2.5%,97.5%) Study NF θ Mean Bias (×10−2) percentiles Mean Bias (×10−2) percentiles 13 8532 10 9.93 −0.069 25.64 (9.47,10.47) 5.33 −4.666 506.70 (1.98,8.60) 0.32 0.32 0.000 4.42 (0.24,0.37) 0.22 −0.099 9.95 (0.21,0.23) 0.01 0.01 0.000 0.09 (0.01,0.01) 0.01 0.001 0.10 (0.01,0.01) 14 2882 10 9.86 −0.142 29.52 (9.44,10.39) 7.75 −2.246 241.85 (5.53,8.71) 1 1.03 0.026 10.53 (0.78,1.16) 0.68 −0.323 32.37 (0.64,0.71) 0.03 0.03 0.001 0.36 (0.03,0.04) 0.04 0.001 0.37 (0.03,0.04) 15 1096 10 9.73 −0.266 38.40 (9.22,10.34) 8.19 −1.814 184.84 (7.38,8.66) 3.16 3.21 0.049 20.73 (2.45,3.47) 2.05 −1.108 110.97 (1.97,3.16) 0.11 0.11 0.004 1.22 (0.09,0.13) 0.11 0.004 1.04 (0.09,0.13) 16 531 10 9.50 −0.501 55.72 (9.10,9.96) 7.93 −2.072 207.90 (7.55,8.22) 10 9.91 −0.095 54.47 (9.02,10.88) 5.63 −4.368 436.96 (5.40,5.89) 0.33 0.34 0.007 4.51 (0.26,0.43) 0.30 −0.029 4.06 (0.26,0.36) ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1423 TABLE 3 Simulation results showing mean, bias, root mean squared error (RMSE) and the 2.5 and 97.5 empirical percentiles of the estimates of θ = (λ001 λ01 λ 011 λ10 μ1)⊤ under model M1 {1} for both the PSHMM and exponential fitting (Exp) methods across 100 repeat experiments. � = 1 30 s, δ = 0.01,ω = 0 (unknown), the number of emitters NE = 100. NF indicates the number of frames simulated PSHMM PSHMM Exp Exp PSHMM PSHMM RMSE (2.5%,97.5%) Exp Exp RMSE (2.5%,97.5%) Study NF θ Mean Bias (×10−2) percentiles Mean Bias (×10−2) percentiles 17 11,151 0.15 0.15 0.002 1.69 (0.12,0.19) 0.15 −0.004 1.66 (0.11,0.19) 0.3 0.30 0.001 0.85 (0.28,0.32) 0.30 −0.002 0.84 (0.28,0.31) 0.1 0.10 0.000 0.43 (0.09,0.11) 0.10 0.002 0.45 (0.10,0.11) 0.80 0.80 −0.001 1.28 (0.78,0.82) 0.76 −0.039 4.12 (0.74,0.79) 0.01 0.01 0.000 0.15 (0.01,0.01) 0.02 0.010 0.97 (0.02,0.02) 18 9364 0.35 0.36 0.005 5.44 (0.24,0.43) 0.33 −0.022 5.32 (0.24,0.43) 1 1.00 0.003 3.68 (0.94,1.07) 0.95 −0.049 5.83 (0.90,1.01) 0.3 0.30 −0.002 2.01 (0.26,0.34) 0.29 −0.008 2.12 (0.25,0.33) 2.30 2.30 −0.003 5.01 (2.21,2.39) 2.04 −0.262 26.48 (1.95,2.11) 0.10 0.10 0.002 0.98 (0.09,0.12) 0.10 −0.005 1.03 (0.08,0.11) 19 7000 2 2.03 0.033 18.14 (1.75,2.45) 2.16 0.156 21.25 (1.89,3.50) 10 9.78 −0.218 54.49 (8.55,10.53) 6.94 −3.061 306.69 (6.59,7.34) 0.7 0.71 0.011 4.88 (0.64,0.83) 0.67 −0.031 4.85 (0.60,0.76) 10 10.00 0.002 63.62 (9.22,11.65) 4.97 −5.030 503.14 (4.75,5.17) 0.33 0.34 0.005 7.29 (0.20,0.56) 0.27 −0.068 7.30 (0.22,0.32) ----!@#$NewPage!@#$---- 1424 L. PATEL ET AL. TABLE 4 Simulation results showing mean, bias, root mean squared error (RMSE) and the 2.5 and 97.5 empirical percentiles of the estimates of θ = (λ001 λ01 λ0102 λ011 λ021 λ10 μ1)⊤ under model M2 {1} for both the PSHMM and exponential fitting (Exp) methods across 100 repeat experiments. � = 1 30 s, δ = 0.01,ω = 0 (unknown), the number of emitters NE = 100. NF indicates the number of frames simulated PSHMM PSHMM Exp Exp PSHMM PSHMM RMSE (2.5%,97.5%) Exp Exp RMSE (2.5%,97.5%) Study NF θ Mean Bias (×10−2) percentiles Mean Bias (×10−2) percentiles 20 7000 2 2.03 0.032 3.14 (1.75,2.37) 2.05 0.054 2.12 (1.79,3.31) 10 9.85 −0.153 13.42 (9.20,10.49) 7.04 −2.958 878.82 (6.68,7.48) 0.2 0.21 0.009 0.10 (0.16,0.27) 0.18 −0.024 0.11 (0.13,0.21) 0.7 0.69 −0.012 0.37 (0.59,0.83) 0.66 −0.037 0.35 (0.57,0.75) 0.01 0.01 −0.001 0.00 (0.01,0.01) 0.01 0.005 0.02 (0.01,0.02) 10 9.63 −0.368 35.57 (8.73,10.53) 4.91 −5.087 2588.85 (4.67,5.16) 0.33 0.32 −0.009 0.32 (0.24,0.45) 0.32 −0.013 0.12 (0.26,0.38) ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1425 TABLE 5 A description of the Alexa Fluor 647 datasets with reference to the laser intensities in kW/cm2 and frames sampled per second (or �−1) measured in s−1 used to characterize each of the 27 experiments. The NF × NE size of each dataset is also included Dataset Laser intensity �−1 NE NF Dataset Laser intensity �−1 NE NF Dataset Laser intensity �−1 NE NF 1 1.0 200 275 49,796 10 16 200 292 39,703 19 62 800 443 29,107 2 1.9 200 259 49,533 11 16 800 305 29,074 20 62 800 425 29,551 3 3.9 200 335 49,815 12 16 800 290 29,145 21 62 800 425 29,426 4 3.9 200 393 39,758 13 31 800 617 29,059 22 62 800 398 28,989 5 7.8 200 340 39,721 14 31 800 534 29,778 23 97 800 454 29,191 6 7.8 800 244 29,418 15 31 800 515 29,179 24 97 800 440 29,198 7 7.8 800 230 29,257 16 31 800 493 29,400 25 97 800 436 29,270 8 7.8 800 230 29,438 17 31 800 456 29,071 26 97 800 422 29,295 9 16 800 437 29,467 18 62 800 554 29,327 27 97 800 414 29,218 ----!@#$NewPage!@#$---- 1426 L. PATEL ET AL. Acknowledgments. The authors would like to thank Prof Joerg Bewersdorf, Department of Cell Biology, Yale University, for his help in making the Alexa Fluor 647 data available, and are most grateful to the Editor, Associate Editor and referees for their insightful comments and discussion. SUPPLEMENTARY MATERIAL Supplementary materials (DOI: 10.1214/19-AOAS1240SUPPA; .pdf). The Supplementary Materials supporting this paper contains detailed proofs and derivations regarding our method, discussions on its implementation and further simulation studies, including exact details on the image analysis. Code and data (DOI: 10.1214/19-AOAS1240SUPPB; .zip). MATLAB code and imaging data sets used for the algorithms presented in this paper, can be found at https://github.com/eakcohen/photoswitching. REFERENCES BAUM, L. E. and EAGON, J. A. (1967). An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull. Amer. Math. Soc. 73 360–363. MR0210217 BAUM, L. E. and PETRIE, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37 1554–1563. MR0202264 BAUM, L. E. and SELL, G. R. (1968). Growth transformations for functions on manifolds. Pacific J. Math. 27 211–227. MR0234494 BAUM, L. E., PETRIE, T., SOULES, G. and WEISS, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41 164– 171. MR0287613 BETZIG, E., PATTERSON, G. H., SOUGRAT, R., LINDWASSER, O. W., OLENYCH, S., BONIFA- CINO, J. S., DAVIDSON, M. W., LIPPINCOTT-SCHWARTZ, J. and HESS, H. F. (2006). Imaging intracellular fluorescent proteins at nanometer resolution. Science 313 1642–1645. CHING, W., FUNG, E. and NG, M. (2003). Higher-order hidden Markov models with applications to DNA sequences. In Intelligent Data Engineering and Automated Learning. IDEAL 2003 (J. Liu, Y. Cheung and H. Yin, eds.). Lecture Notes in Computer Science 2690 535–539. Springer, Berlin, Heidelberg. COLQUHOUN, D., HATTON, C. J. and HAWKES, A. G. (2003). The quality of maximum likelihood estimates of ion channel rate constants. J. Physiol. 547 699–728. COLQUHOUN, D. and HAWKES, A. G. (1981). On the stochastic properties of single ion channels. Proc. R. Soc. Lond., B Biol. Sci. 211 205–235. COLQUHOUN, D., HAWKES, A. G. and SRODZINSKI, K. (1996). Joint distributions of apparent open and shut times of single-ion channels and maximum likelihood fitting of mechanisms. Phi- los. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 354 2555–2590. COX, S., ROSTEN, E., MONYPENNY, J., JOVANOVIC-TALISMAN, T., BURNETTE, D. T., LIPPINCOTT-SCHWARTZ, J., JONES, G. E. and HEINTZMANN, R. (2011). Bayesian localiza- tion microscopy reveals nanoscale podosome dynamics. Nat. Methods 9 195–200. DEMPSEY, G. T., VAUGHAN, J. C., CHEN, K. H., BATES, M. and ZHUANG, X. (2011). Evalua- tion of fluorophores for optimal performance in localization-based super-resolution imaging. Nat. Methods 8 1027–1036. ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1427 DU PREEZ, J. (1998). Efficient training of high-order hidden Markov models using first-order rep- resentations. Comput. Speech Lang. 12 23–39. EFRON, B. and TIBSHIRANI, R. J. (1993). An Introduction to the Bootstrap. Monographs on Statis- tics and Applied Probability 57. CRC Press, New York. MR1270903 EPSTEIN, M., CALDERHEAD, B., GIROLAMI, M. A. and SIVILOTTI, L. (2016). Bayesian statistical inference in ion-channel models with exact missed event correction. Biophys. J. 111 333–348. GREENFELD, M., PAVLICHIN, D. S., MABUCHI, H. and HERSCHLAG, D. (2015). Single molecule analysis research tool (SMART): An integrated approach for analysing single molecule data. PLoS ONE 7 e30024. HA, T. and TINNEFELD, P. (2012). Photophysics of fluorescent probes for single-molecule bio- physics and super-resolution imaging. Annu. Rev. Phys. Chem. 63 595–617. HAWKES, A. G., JALALI, A. and COLQUHOUN, D. (1990). The distributions of the apparent open times and shut times in a single channel record when brief events cannot be detected. Philos. Trans. R. Soc. Lond. Ser. A 332 511–538. MR1084721 HAWKES, A. G., JALALI, A. and COLQUHOUN, D. (1992). Asymptotic distributions of apparent open times and shut times in a single channel record allowing for the omission of brief events. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 337 383–404. HEILEMANN, M., VAN DE LINDE, S., SCHÜTTPELZ, M., KASPER, R., SEEFELDT, B., MUKHER- JEE, A., TINNEFELD, P. and SAUER, M. (2008). Subdiffraction—resolution fluorescence imag- ing with conventional fluorescent probes. Angew. Chem. Int. Ed. 47 6172–6176. HESS, S. T., GIRIRAJAN, T. P. K. and MASON, M. D. (2006). Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91 4258–4272. HUANG, B., BATES, M. and ZHUANG, X. (2009). Super-resolution fluorescence microscopy. Annu. Rev. Biochem. 78 993–1016. JACQUEZ, J. A. and GREIF, P. (1985). Numerical parameter identifiability and estimability: Inte- grating identifiability, estimability, and optimal sampling design. Math. Biosci. 77 201–227. JUNGMANN, R., STEINHAUER, C., SCHEIBLE, M., KUZYK, A., TINNEFELD, P. and SIMMEL, F. C. (2010). Single-molecule kinetics and super-resolution microscopy by fluorescence imaging of transient binding on DNA origami. Nano Lett. 10 4756–4761. LEE, L.-M. and LEE, J.-C. (2006). A study on high-order hidden Markov models and applications to speech recognition. In Advances in Applied Artificial Intelligence. IEA/AIE 2006 (M. Ali and R. Dapoigny, eds.). Lecture Notes in Computer Science 4031 682–690. Springer, Berlin, Heidelberg. LEE, S. H., SHIN, J. Y., LEE, A. and BUSTAMANTE, C. (2012). Counting single photoactivatable fluorescent molecules by photoactivated localization microscopy (PALM). Proc. Natl. Acad. Sci. USA 109 17436–17441. LEHMANN, M., LICHTNER, G., KLENZ, H. and SCHMORANZER, J. (2016). Novel organic dyes for multicolor localization-based super-resolution microscopy. J. Biophotonics 9 161–170. LEVINSON, S. E., RABINER, L. R. and SONDHI, M. M. (1983). An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell Syst. Tech. J. 62 1035–1074. MR0702893 LIN, Y., LONG, J. J., HUANG, F., DUIM, W. C., KIRSCHBAUM, S., ZHANG, Y., SCHROEDER, L. K., REBANE, A. A., VELASCO, M. G. M. et al. (2015). Quantifying and optimizing single- molecule switching nanoscopy at high speeds. PLoS ONE 10 e0128135. LITTLE, M. P., HEIDENREICH, W. F. and LI, G. (2010). Quantifying and optimizing single- molecule switching nanoscopy at high speeds. PLoS ONE 5 e8915. LIU, Y.-Y., LI, S., LI, F., SONG, L. and REHG, J. (2015). Efficient learning of continuous-time hidden Markov models for disease progression. In NIPS Proceedings 3600–3608. MACDONALD, I. L. and ZUCCHINI, W. (1997). Hidden Markov and Other Models for Discrete- Valued Time Series. Monographs on Statistics and Applied Probability 70. CRC Press, London. MR1692202 ----!@#$NewPage!@#$---- 1428 L. PATEL ET AL. MUKAMEL, E., BABCOCK, H. and ZHUANG, X. (2012). Statistical deconvolution for superresolu- tion fluorescence microscopy. Biophys. J. 102 2391–2400. NIEUWENHUIZEN, R. P. J., BATES, M., SZYMBORSKA, A., LIDKE, K. A., RIEGER, B. and STALLINGA, S. (2015). Quantitative localization microscopy: Effects of photophysics and la- beling stoichiometry. PLoS ONE 10 e0127989. OBER, R. J., RAM, S. and WARD, E. S. (2004). Localization accuracy in single-molecule mi- croscopy. Biophys. J. 87 1185–1200. OBER, R., TAHMASBI, A., RAM, S., LIN, Z. and WARD, E. (2015). Quantitative aspects of single- molecule microscopy: Information-theoretic analysis of single-molecule data. IEEE Signal Pro- cess. Mag. 32 58–69. PATEL, L., GUSTAFSSON, N., LIN, Y., OBER, R., HENRIQUES, R. and COHEN, E. (2019). Sup- plement to “A hidden Markov model approach to characterizing the photo-switching behavior of fluorophores.” DOI:10.1214/19-AOAS1240SUPPA, DOI:10.1214/19-AOAS1240SUPPB. QIN, F., AUERBACH, A. and SACHS, F. (1996). Estimating single-channel kinetic parameters from idealized patch-clamp data containing missed events. Biophys. J. 70 264–280. QIN, F. and LI, L. (2004). Model-based fitting of single-channel dwell-time distributions. Biophys. J. 87 1657–1671. RABINER, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77 257–286. RAM, S., WARD, E. S. and OBER, R. J. (2013). A stochastic analysis of distance estimation ap- proaches in single molecule microscopy: Quantifying the resolution limits of photon-limited imaging systems. Multidimens. Syst. Signal Process. 24 503–542. MR3041619 RIEF, M., ROCK, R. S., MEHTA, A. D., MOOSEKER, M. S., CHENEY, R. E. and SPUDICH, J. A. (2000). Myosin-V stepping kinetics: A molecular model for processivity. Proc. Natl. Acad. Sci. USA 97 9482–9486. RIEGER, B. and STALLINGA, S. (2014). The lateral and axial localization uncertainty in super- resolution light microscopy. ChemPhysChem 15 664–670. ROLLINS, G. C., SHIN, J. Y., BUSTAMANTE, C. and PRESSÉ, S. (2014). Stochastic approach to the molecular counting problem in superresolution microscopy. Proc. Natl. Acad. Sci. USA 112 110–118. ROTHENBERG, T. J. (1971). Identification in parametric models. Econometrica 39 577–591. MR0436944 RUST, M. J., BATES, M. and ZHUANG, X. (2006). Sub-diffraction-limit imaging by stochastic op- tical reconstruction microscopy (STORM). Nat. Methods 793–795. SAGE, D., KIRSHNER, H., PENGO, T., STUURMAN, N., MIN, J., MANLEY, S. and USHER, M. (2015). Quantitative evaluation of software packages for single-molecule localization microscopy. Nat. Methods 12 717–724. SHARONOV, A. and HOCHSTRASSER, R. M. (2006). Wide-field subdiffraction imaging by accumu- lated binding of diffusing probes. Proc. Natl. Acad. Sci. USA 103 18911–18916. THOMPSON, R. E., LARSON, D. R. and WEBB, W. W. (2002). Precise nanometer localization analysis for individual fluorescent probes. Biophys. J. 82 2775–2783. VAN DE LINDE, S. and SAUER, M. (2014). How to switch a fluorophore: From undesired blinking to controlled photoswitching. Chem. Soc. Rev. 43 1076–1087. VAN DE LINDE, S., WOLTER, S., HEILEMANN, M. and SAUER, M. (2010). The effect of photo- switching kinetics and labeling densities on super-resolution fluorescence imaging. J. Biotechnol. 149 260–266. VOGELSANG, J., STEINHAUER, C., FORTHMANN, C., STEIN, I. H., PERSON-SKEGRO, B., CORDES, T. and TINNEFELD, P. (2010). Make them blink: Probes for super-resolution mi- croscopy. ChemPhysChem 11 2475–2490. ----!@#$NewPage!@#$---- CHARACTERIZING PHOTO-SWITCHING FLUOROPHORES 1429 L. PATEL E. COHEN DEPARTMENT OF MATHEMATICS IMPERIAL COLLEGE LONDON LONDON, SW7 2AZ UNITED KINGDOM E-MAIL: lp1611@ic.ac.uk e.cohen@imperial.ac.uk N. GUSTAFSSON MRC LABORATORY FOR MOLECULAR CELL BIOLOGY UNIVERSITY COLLEGE LONDON LONDON, WC1E 6BT UNITED KINGDOM E-MAIL: nils.gustafsson.13@alumni.ucl.ac.uk Y. LIN CELL BIOLOGY AND BIOPHYSICS EUROPEAN MOLECULAR BIOLOGY LABORATORY 69117 HEIDELBERG GERMANY E-MAIL: yu.lin@embl.de R. OBER CENTRE FOR CANCER IMMUNOLOGY FACULTY OF MEDICINE UNIVERSITY OF SOUTHAMPTON SOUTHAMPTON UNITED KINGDOM AND DEPARTMENT OF BIOMEDICAL ENGINEERING TEXAS A&M UNIVERSITY COLLEGE STATION, TEXAS 77843 USA E-MAIL: raimund.ober@tamu.edu R. HENRIQUES MRC LABORATORY FOR MOLECULAR CELL BIOLOGY UNIVERSITY COLLEGE LONDON LONDON, WC1E 6BT UNITED KINGDOM AND FRANCIS CRICK INSTITUTE LONDON, NW1 1AT UNITED KINGDOM E-MAIL: r.henriques@ucl.ac.uk