NanoPyx: super-fast bioimage analysis powered by adaptive machine learning Bruno M. Saraiva1,*, Inês M. Cunha1,2,*, António D. Brito1,3,*, Gautier Follain4, Raquel Portela3, Robert Haase5, Pedro M. Pereira3, Guillaume Jacquemet4,6,7,8, and Ricardo Henriques1,9,� *These authors contributed equally to this work. 1Instituto Gulbenkian de Ciência, Oeiras, Portugal 2Instituto Superior Técnico, Lisboa, Portugal 3Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal 4Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland 5DFG Cluster of Excellence “Physics of Life”, TU Dresden, Dresden Germany 6Turku Bioimaging, University of Turku and Åbo Akademi University, FI- 20520 Turku, FI 7Faculty of Science and Engineering, Cell Biology, Åbo Akademi University, Turku, Finland 8InFLAMES Research Flagship Center, Åbo Akademi University, FI- 20520, Turku, FI 9MRC-Laboratory for Molecular Cell Biology, University College London, London, UK To overcome the challenges posed by large and complex mi- croscopy datasets, we have developed NanoPyx, an adaptive bioimage analysis framework designed for high-speed pro- cessing. At the core of NanoPyx is the Liquid Engine, an agent-based machine-learning system that predicts acceleration strategies for image analysis tasks. Unlike traditional single- algorithm methods, the Liquid Engine generates multiple CPU and GPU code variations using a meta-programming system, creating a competitive environment where different algorithms are benchmarked against each other to achieve optimal per- formance under the user’s computational environment. In ini- tial experiments focusing on super-resolution analysis methods, the Liquid Engine demonstrated an over 10-fold computational speed improvement by accurately predicting the ideal scenar- ios to switch between algorithmic implementations. NanoPyx is accessible to users through a Python library, code-free Jupyter notebooks, and a napari plugin, making it suitable for individ- uals regardless of their coding proficiency. Furthermore, the optimisation principles embodied by the Liquid Engine have broader implications, extending their applicability to various high-performance computing fields. microscopy | super-resolution | bioimage analysis | high-performance Correspondence: rjhenriques@igc.gulbenkian.pt Introduction Super-resolution microscopy has revolutionised cell biology by enabling fluorescence imaging at an unprecedented res- olution (4–7). However, the data collected from super- resolution experiments requires specific analytical proce- dures, such as drift correction, channel alignment, resolution enhancement, and quantifying data quality and resolution. Many of these procedures use open-source image analysis software, particularly ImageJ (8) or Fiji (9); and associated plugins such as ThunderSTORM (10), Picasso (11), FairSIM (12), Fourier Ring Correlations (FRC) (13), and Decorrela- tion Analysis (14). The computational performance of these methodologies bears significant implications for processing time and becomes especially salient given the increasing need for high-performance computing in bioimage analysis. Computational performance has emerged as a significant bot- tleneck with the expanding adoption of super-resolution mi- SRRF Registration Radial Fluctuations Quality Control v FRC Error Map Decorrelation Drift Correction SRRF Channel Registration eSRRF Fig. 1. Schematic representation of the NanoPyx framework. NanoPyx is a Python framework for super-resolution microscopy images. It uses the Liquid En- gine for self-tuning high performance. Currently, NanoPyx offers methods for Image Registration (1), Radial Fluctuations (2), and Quality Control (3) categories. croscopy and the consequent upscaling of datasets (number, size, and complexity). This has highlighted the need for a shift towards a more performance-centric approach in manag- ing increasingly extensive datasets and addressing the limita- tions currently experienced in (super-resolution) microscopy methodologies. Here, we introduce NanoPyx, a high-performance and adap- tive bioimage analysis framework. NanoPyx is not only a Python library but also provides code-free Jupyter notebooks (15) and a napari (16) plugin. At the core of NanoPyx is the Liquid Engine, an agent-based machine-learning system that predicts acceleration strategies for image analysis tasks. To enhance its image analysis capabilities, the Liquid Engine uses multiple variations (here referred to as implementations) of the same algorithm to perform a specific task. Although these implementations provide numerically identical output for the same input, their computational performance differs by exploiting different computational strategies. OpenMP Henriques et al. | bioRχiv | August 13, 2023 | 1 CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- is used for parallelising the code at the CPU level, while OpenCL (17) is used at the GPU level. The Liquid Engine then employs a specially created machine-learning system to predict the optimal combination of implementations based on the user’s specific computational environment. This creates a competitive setting wherein various algorithm implemen- tations are benchmarked against each other, to achieve the highest performance. One of the strengths of NanoPyx is its modular design, which enables users to easily include NanoPyx as part of new meth- ods and algorithms. In its initial iteration, NanoPyx en- hances and expands the super-resolution analysis methods previously included in the NanoJ library while introducing a new Python implementation of the Decorrelation Analysis method (14). NanoJ is an extensive suite of ImageJ plugins custom-built for the domain of super-resolution microscopy. Notable methods exploiting NanoJ (1) include NanoJ-SRRF (2) and NanoJ-eSRRF (18) which generate super-resolution reconstructions from diffraction-limited image sequences; and NanoJ-SQUIRREL (3) which provides image quality and resolution analysis. Bringing the adaptability of the Liquid Engine into these methods allows NanoPyx to over- come many limitations of NanoJ and other modern bioim- age analysis packages. By providing a flexible framework we can assure accessibility of both new and old image analysis pipelines regardless of the user hardware, with further perfor- mance enhancement. Furthermore, we can leverage this flexi- bility and use it in conjunction with other Python libraries and tools. This is particularly valuable as many methods increas- ingly rely on Python-based deep learning techniques, making the use of current analysis frameworks (such as NanoJ (1)) prohibitive in specific scenarios. As part of NanoPyx, users can access critical features, in- cluding drift correction (1), channel registration (1), SRRF (2) and eSRRF (18), error map calculation as per NanoJ- SQUIRREL (3), Fourier Ring Correlation (FRC) (13), and Image Decorrelation Analysis (14) (1). To make NanoPyx accessible to users with different levels of coding expertise, we offer it through three separate avenues - as a Python li- brary for developers with the skills to create their workflow scripts, as Jupyter Notebooks (15) that can be executed either on a local machine or on a cloud-based service like Google Colaboratory, and as a plugin for napari (16), a Python based image viewer, for users without programming experience. By distributing NanoPyx in this manner, we can cater to the needs of a wider audience, ensuring users of varying coding expertise have easy access and can effectively utilise NanoPyx for their bioimage analysis needs. Results NanoPyx’s Liquid Engine Pure Python code often runs on a single CPU core, impact- ing the performance and speed of Python frameworks. Al- ternative solutions, such as Cython (20), PyOpenCL (21) and Numba (22), permit parallelisation of the CPU and GPU, while enabling a considerable computational accel- 0.60 ms Benchmark Laptop Professional Workstation 7.8 ms 0.068 s 0.58 s 5.2 s 30 s CPU Analysed Input Data (10, 10, 10) (10, 300, 300) (500, 300, 300) GPU GPU CPU GPU 0.18 ms 0.087 s 2.5 s 99 ms 0.85 s 41 s CPU CPU GPU CPU CPU CPU CPU GPU CPU CPU Record Run time Fig. 2. Comparative run times of multiple implementations of an algorithm, ran on either a consumer-grade laptop or a professional workstation. The fastest (rabbit) and slowest (snail) implementations depend on the shape of the input data and the user device. The underlying task carried out is a 5x frame-wise bicubic (19) upscaling. Different implementations are represented as processing chips with different colours. The implementation of both threaded (blue chip) and unthreaded (white chip with orange core) CPU uses optimised Cython code, while the implementation for GPUs (pink chip) is done through OpenCL. eration (Supplementary Note 1). However, identifying the swiftest implementation depends substantially on the input data and available hardware. For instance, Figure 2 presents a case where for smaller inputs, employing threaded CPU processing completes an interpolation task over 10x faster than a GPU. However, the situation reverses with increas- ing input size, where GPU-based processing reveals itself as the more efficient alternative. In Supplementary Figure S1, a comprehensive analysis is conducted to contrast the exe- cution times of OpenCL with alternative implemented run time methodologies using diverse hardware configurations, including Google Colaboratory. The obtained results not only reaffirm the identified correlation between the faster implementation and input data size, but also elucidate the hardware-dependent breakpoints at which a given implemen- tation surpasses the performance of its alternatives. Supple- mentary Figures S2, S3 and S4 further elucidate these obser- vations by illustrating run times for various implementations across distinct input datasets and parameters on two contrast- ing hardware setups. The benchmark used features a 2D con- volution with varying kernel sizes. While the professional workstation results align with expectations - OpenCL imple- mentation was markedly faster as input image size and ker- nel size increased. However, this was not mirrored on a lap- 2 | bioRχiv Henriques et al. | NanoPyx CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- top device (Supplementary Figure S2). Laptop performance showed that while larger kernel sizes boosted OpenCL’s rela- tive efficiency against CPU threading, expanding image size beyond a certain threshold made the parallelised CPU ap- proach faster again. Notably, this outcome likely ties to the test laptop (MacBook Air M1) lacking a dedicated GPU, demonstrating how closely run times are tied to specific user hardware. This apparent disparity in results underlines how reliance on one implementation can prove restrictive; for instance, choosing OpenCL implementation for lower-sized images could escalate the run time by up to 300 times com- pared to CPU processing. Similarly, threaded CPU process- ing for larger-sized images performed up to 10x slower than GPU processing on professional workstations. Collectively these findings stress the importance of having an adaptable system that selects the optimum implementation based not just on data inputs but also considering unique user hardware configurations. To address this, we have developed the Liquid Engine. This machine learning-based system manages multiple tasks by exploiting various device components and selecting the most efficient implementation based on input data (Supplemen- tary Note 2). The Liquid Engine can significantly enhance computational speed for tasks involving input data of varying sizes. It achieves this by predicting when to switch between algorithmic implementations, as depicted in Supplementary Figures S1 and S2, showing the capacity for a 10x accelera- tion. The black dotted line in Figure S2 indicates when the switch between implementations occurs. The Liquid Engine features three main components: meta-programming tools for multi-hardware implementation (called tag2tag and c2cl, see Supplementary Note 3); an automatic benchmarking sys- tem for different implementations; and a supervisor machine learning-based agent that determines the ideal combination of implementations to maximise performance (Figure 3). The tag2tag tool enables developers to generate multiple implementations of the same algorithm written as C or Python code snippets. Effectively, tag2tag transcribes these snippets into single-threaded and multi-threaded versions of the code, generally then called by Cython (20). In addition, the c2cl tool auto-generates GPU-based implementations based on C code snippets, using OpenCL (17), called via Py- OpenCL (21). The Liquid Engine also supports Numba (22) as an alternative performance-boosting option for Python code snippets. The implementation of the Liquid engine in NanoPyx adapts to the user’s hardware, selecting the fastest implementation available to each user and ensuring optimal computational speeds. In the cases where a user does not have access to one of the implementations, it will ignore that implementation and pick the fastest from the remaining ones, guaranteeing that users will always be able to process their images. Liquid Engine’s adaptive optimisation NanoPyx’s Liquid Engine independently identifies ideal implementation combinations for specific workflows keeping in view device-dependent performance variations (Figure 2 … GPU CPU CPU CPU Output Input … Task 1 … Task 2 … Task N … Data Dimensions Record Agent Run time Fastest path 1 1 1 2 2 … 1 1 2 2 3 … Fig. 3. NanoPyx achieves optimal performance by exploiting the Liquid En- gine self-optimisation capabilities. The image analysis workflows of NanoPyx are built on top of the Liquid Engine, which automatically benchmarks implementa- tions of all tasks in the specific workflow. The Liquid Engine keeps a historical record of the run times of each task and the shape of the used input, allowing a machine- learning-based agent to select the fastest combination of implementations. In the case of an unexpected delay, the agent dynamically adjusts the preferred imple- mentations to ensure optimal performance. and Supplementary Figure S1 and S2). Through automatic benchmarking of each implementation, the Liquid Engine keeps an historic record of runtimes for each implementation. Whenever a workflow is scheduled to be run, the supervisor agent is responsible to select the optimal implementation based on the previous recorded run times. The agent can adapt to unexpected delays in any implementation (Sup- plementary Figure S5). In case a severe delay is detected, reaching a level where it could potentially lead to a different implementation becoming faster, the agent predicts if the optimal implementation has changed. For that, the agent predicts the likelihood of the delay being repeated in the future and then assigns a probability for each implementation that depends on an estimation of the expected run time that each one might take. For instance, if the fastest imple- mentation for a method uses OpenCL (17) and the GPU is under heavy load, resulting in an abnormally prolonged run time that is longer than the second fastest, the agent activates its delay management (Supplementary Figure S6). All available implementations are now assigned a probability that is a function of their expected run time, as given by the average values measured in the past. The expected run time for the delayed implementation is adjusted based upon the probability that the delay is maintained and the magnitude of the measured delay itself. Therefore, in this example, the probability the agent chooses to run using OpenCL is low, especially if the delay is continuously maintained. However, Henriques et al. | NanoPyx bioRχiv | 3 CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- the delayed implementation should always have a bigger than zero probability to be chosen. Due to this probabilistic approach, the agent will still select and use the delayed implementation from time to time. This ensures that it can detect when the delay is over. Once it detects that the delay is over, the agent goes back to selecting implementations based only on the fastest average run time. In the example of Supplementary Figure S6, the Liquid Engine was able to detect an artificial delay that slowed down the OpenCL im- plementation. During the delay, the OpenCL implementation was used less times, but stochasticity allowed the Engine to detect the end of the delay. In this example, over the course of several sequential runs of the same method, we show that delay management improved the average run time by a factor of 1.8 for a 2D convolution and 1.5 for an eSRRF analysis (Supplementary Figure S6). Users can also manually initiate benchmarking, prompting the Liquid Engine to profile the execution of each implementation, using either multiple automatically generated data loads or using their own input, and identify the fastest one. This benchmarking is performed per task, allowing the Liquid Engine to adapt to the user’s hardware configuration and progressively optimise the chosen combination of implementations to reduce the total run time. The system analyses similar benchmarked examples from the user’s past data, using fuzzy logic (23) (see Supplementary Note 4) to identify the benchmarked example with the most similar input properties, utilising it as a baseline for the expected execution time. This system enables NanoPyx to immediately make adaptive decisions based on an initially limited set of benchmarked examples, progressively learning, and improving its performance as more data is processed. The NanoPyx Framework NanoPyx is a comprehensive and extensible bioimage anal- ysis framework providing wide-ranging methods which can cover an entire bioimage analysis microscopy workflows. In Figure 4, we showcase an example workflow where NanoPyx performs channel registration followed by drift correction to correct any chromatic aberration and drift that might occur during image acquisition. Once drift correction is completed, NanoPyx enables the generation of super-resolved reconstructions using the well-established Super-Resolution Radial Fluctuations (SRRF) (2) algorithm or its improved version eSRRF (18). Due to its focus on per- formance, NanoPyx can perform 2.5 times faster the same eSRRF (18) processing as NanoJ, with for the same input parameters. To ensure the fidelity of the reconstructions, NanoPyx incorporates rigorous quality control tools. The Error Map feature of NanoJ-SQUIRREL (3), implemented within NanoPyx, quantitatively assesses errors introduced by the reconstruction algorithm. The resolution-scaled error (RSE) (3) and resolution-scaled Pearson’s correlation coefficient (RSP) (3) are calculated by comparing the diffraction-limited image stack with the diffraction-limited equivalent of the reconstruction. Furthermore, NanoPyx incorporates Fourier Ring Correlation (FRC) (13) and Image 0.016 Image Registration Input Dataset Drift Correction Drift Aligned Time projection Time projection RSP: 0.901 RSE: 85.795 Error Map Decorrelation Analysis Normalized frequency Resolution: 86.4 nm Cross-correlation coefficients 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 FRC 1.0 0.8 0.6 0.4 0.2 0.0 Resolution: 89.6 nm Spatial frequency (1/nm) 0.000 0.005 0.010 0.015 Fourier Ring Correlation Quality Control Super-Resolution eSRRF Reconstruction 700 0 Channel Registration Channel Registration frame 1000 0 Septin Microtubules Nuclei Available in: Python library: Fig. 4. Microscopy image processing workflow using NanoPyx methods. NanoPyx implements several methods of super-resolution image generation and processing. Through NanoPyx, users can correct drift that occurred during image acquisition, generate a super-resolved image using enhanced radiality fluctuations (eSRRF)(2), assess the quality of the generated image using Fourier Ring Corre- lation (FRC)(13) or Image Decorrelation Analysis (14), perform artifact detection using the error map and then perform channel registration in multi-channel images. NanoPyx methods are made available as a Python library, a napari plugin, and Jupyter Notebooks that can be ran locally or through Google Colaboratory. Scale bars: 10 µm. Decorrelation Analysis (14) to determine image resolution. These various assessment methods enable a comprehensive quantitative evaluation of the resolution improvements achieved by the super-resolved reconstruction. NanoPyx also allows users to perform channel registration on the acquired or super-resolved images. Distribution to end users NanoPyx was developed with the primary objective of ensur- ing accessibility and ease of use for end users. To achieve this goal, we have made available three distinct interfaces through which users can interact with and utilise NanoPyx. Firstly, NanoPyx is accessible as a Python library (Figure 4), which can be conveniently accessed and installed via PyPI (Python Package Index) for stable releases or through our GitHub repository for the latest development versions. The Python library primarily targets developers seeking to incorporate NanoPyx’s methodologies into their workflows. Alongside the Python library we provide template files to help develop- ers implement their own methods using the Liquid Engine. Secondly, we have provided Jupyter notebooks (15) through our GitHub repository (Figure 4 and Supplementary Figure S7). Each notebook offers separate implementations of each individual method. Users of these notebooks are not required to interact with any code directly: by sequentially executing cells, a graphical user interface (GUI) is generated (24, 25), enabling users to fine-tune the parameters for each step easily. 4 | bioRχiv Henriques et al. | NanoPyx CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Consequently, these notebooks are specifically designed for users with limited coding expertise. Lastly, for users desiring a more interactive approach, we are concurrently developing a plugin for napari (16), a Python image viewer, granting ac- cess to all currently implemented NanoPyx methods (Figure 4 and Supplementary Figure S7). By offering these three di- verse user interfaces, we ensure that NanoPyx can be readily utilised by users irrespective of their coding proficiency level. NanoPyx also offers a wide range of example datasets for the users to test and explore its capabilities (see Supplementary Note 5). Discussion and Future Perspectives NanoPyx introduces a novel approach to optimise perfor- mance for bioimage analysis through its machine learning- powered Liquid Engine. This enables dynamic switching be- tween implementations to maximise speed based on data and hardware. In initial experiments, NanoPyx achieves over 10x faster processing by selecting the optimal implementation. This has significant implications given the rapidly expanding scale of microscopy image datasets. The Liquid Engine’s optimisation strategy diverges from tra- ditional approaches of relying on single algorithms or im- plementations. Alternative Python tools like Transonic (26) and Dask (27) that accelerate workflows through just-in- time compilation or parallelism do not adapt implementations based on context. In contrast, the Liquid Engine continually benchmarks and collects runtime metrics to train its decision- making model. It is this tight feedback loop that empowers dynamic optimisation in NanoPyx. Furthermore, alternative libraries such as Transonic may, in the future, aid in gener- ating further implementations for the Liquid Engine to opti- mise and explore, therefore cross-pollinating and benefiting both projects. Beyond raw performance, NanoPyx also provides an acces- sible yet extensible framework covering key analysis steps for super-resolution data. Workflows integrate essential func- tions like drift correction, reconstruction, and resolution as- sessment. NanoPyx builds upon proven ImageJ plugins while migrating implementations to Python. The modular architec- ture simplifies integrating components into new or existing pipelines. Future NanoPyx development will focus on several objec- tives. Incorporating advanced simulation tools will generate synthetic benchmarking data to further optimise and validate algorithms. Supporting emerging techniques like AI-assisted imaging and smart microscopes is a priority, as NanoPyx’s speed is critical for real-time processing during acquisition. Expanding the algorithm library will provide users with a more comprehensive toolkit. Careful benchmarking across diverse hardware will maximize performance from laptops to cloud platforms. Enhancing usability through graphical in- terfaces will improve accessibility. Fostering an open-source community will help drive continual innovation. Looking ahead, a priority for NanoPyx is expanding sup- port for emerging techniques like AI-assisted imaging and smart microscopes. As these methods involve processing data in real-time during acquisition, NanoPyx’s accelerated performance becomes critical. We plan to leverage the Liq- uid Engine’s auto-tuning capabilities to optimise pipelines on heterogeneous hardware. This could enable real-time AI to guide data collection, processing, and analytics. Addi- tionally, we aim to incorporate more diverse reconstruction approaches beyond current methods like SRRF and eSRRF. We are actively testing prototypes to integrate deep learning super-resolution into NanoPyx in a modular way, allowing users a choice of classic or AI-based methods. The Liquid Engine’s ability to dynamically select optimised implemen- tations of these complex neural networks will be crucial for acceptable run times. We are also pursuing integrations with other emerging reconstruction techniques from the imaging literature. In summary, NanoPyx delivers adaptive performance optimi- sation to accelerate bioimage analysis while retaining mod- ular design and easy adoption. The optimisation principles embodied in its Liquid Engine could extend to other scien- tific workloads requiring high computational efficiency. As data scales expand, NanoPyx offers researchers an actively improving platform to execute demanding microscopy work- flows. Methods Mammalian cell culture. A549 cells (The European Col- lection of Authenticated Cell Cultures (ECACC)) were cul- tured in phenol red-free high-glucose, L-Glutamine contain- ing Dulbecco’s modified Eagle’s medium (DMEM; Thermo Fisher Scientific) supplemented with 10% (v/v) fetal bovine serum (FBS; Sigma), 1% (v/v) penicillin/streptomycin (Thermo Fisher Scientific) at 37 °C in a 5% CO2 incubator. Sample preparation for microscopy. A549 cells were seeded on a glass bottom µ-slide 8 well (ibidi) at a 0.05 – 0.1 x 106 cells/cm2 density. After 24 h incubation at 37 °C in a 5% CO2 incubator, cells were washed once using phosphate- buffer saline (PBS) and fixed for 20 min at 23 °C using 4 % paraformaldehyde (PFA, in PBS). After fixation, cells were washed three times using PBS (5 min each time), quenched for 10 min using a solution of 300 mM Glycine (in PBS), and permeabilised using a solution of 0.2% Triton-X (in PBS) for 20 min at 23 °C. After three washes (5 min each) in washing buffer (0.05% Tween 20 in PBS), cells were blocked for 30 min in blocking buffer (5% BSA, 0.05% Tween-20 in PBS). Samples were then incubated with a mix of anti-α-tubulin (1 µg/mL of clone DM1A, Sigma; 2 µg/mL of clone 10D8, Bi- olegend; 2 µg/mL of clone AA10, Biolegend) and anti-septin 7 (1 µg/mL of #18991, IBL) antibodies for 16 h at 4 °C in blocking buffer. After three washes (5 min each) using the washing buffer, cells were incubated with an Alexa Fluor™ 647 conjugated goat anti-mouse IgG and an Alexa Fluor™ 555 conjugated goat anti-rabbit IgG (6 µg/mL in blocking buffer) for 1 h at 23 °C. Cells were then washed thrice (5 min each) in washing buffer and once in 1X PBS for 10 min. Fi- nally, cells were mounted with a GLOX-MEA buffer (50 mM Tris, 10 mM NaCl, pH 8.0, supplemented with 50 mM MEA, Henriques et al. | NanoPyx bioRχiv | 5 CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- 10% [w/v] glucose, 0.5 mg/ml glucose oxidase, and 40 µg/ml catalase). Image acquisition. Data acquisition was performed with the Nanoimager microscope (Oxford Nanoimaging; ONI) equipped with a 100 x oil-immersion objective (Olympus 100x NA 1.45) Imaging was performed using 405-nm, 488- nm, and 640-nm lasers for Hoechst-33342, AlexaFluor555 and AlexaFluor647 excitation, respectively. Fluorescence was detected using a sCMOS camera (ORCA Flash, 16 bit). For channel 0, a dichroic filter with the bands of 498-551 nm and 576-620 nm was used; for channel 1, a 665-705 nm dichroic filter was used. The sequential multicolor acquisi- tion was performed for AlexaFluor647, AlexaFluor555 and Hoechst-33342. Using an EPI-fluorescence illumination, a pulse of high laser power (90%) of the 640-nm laser was used, and 10 000 frames were immediately acquired. Then, the sample was excited with the 488-nm laser (13.7% laser power), whereas 500 frames were acquired, followed by the 405-nm laser excitation (40% laser power), with an acquisi- tion of another 500 frames. For all acquisitions, an exposure time of 10 ms was used. Liquid Engine’s agent. Run times of methods implemented in NanoPyx through the Liquid Engine are locally stored on users’ computers and are associated with the used hardware. For OpenCL implementations, the agent also stores an iden- tification of the device and is capable of detecting hardware changes. Whenever a method is run through the Liquid En- gine, the overseeing agent splits the 50 most recent recorded runtimes into 2 halves: one with the 25 fastest run times (fast average) and one with the 25 lowest (slow average). Then it calculates the average of the 25 fastest run times for each im- plementation and selects the implementation with the lowest average runtime. Once the method finishes running, the agent checks whether there was a delay, which is defined by the last runtime being higher than the previously recorded average runtime of the fastest runs plus four times the standard devi- ation of the fastest runs (Equation 1). If a delay is detected (Supplementary Figure S5), the agent will also calculate the delay factor (DF, Equation 2) and will activate a probabilis- tic approach that stochastically selects which method to run. This is performed by using a Logistic Regression model to calculate the probability of the delay being present on the next run and adjusting the expected runtime of the delayed implementation according to Equation 3, while still using the fast average for all non-delayed implementations. Then, the agent picks which implementation to use based on probabili- ties assigned to each implementation using 1 over the squared normalized expected runtime (Equation 4). This stochastic approach ensures that the agent will still run the delayed im- plementation from time to time to check whether that delay is still present. The agent decides that the delay is over once the last runtime becomes smaller than the slow average mi- nus the standard deviation of the slowest runs or higher than the fast average plus the standard deviation of the fastest runs (Equation 5). Once the delay is over, the agent will go back to selecting which implementation to use based only on the fast average of each implementation (Supplementary Figure S6). Delay=Measured>(Expected+2×Std) (1) DelayF actor= Measured Expected (2) Adjusted=F astAvg×(1−Pdelay)+F astAvg×DF ×Pdelay (3) Pselecting= 1 (ExpectedRuntime)2 (4) Delayend=Measured<(SlowAvg−Std∨>F astAvg+Std) (5) Run times Benchmark. For the laptop benchmarks a Mac- Book Air M1 Pro with 16Gb of RAM and a 512Gb SSD was used. For the professional workstation, a custom-made desktop computer was used containing an Intel i9-13900K, a NVIDIA RTX 4090 with 24Gb of dedicated video memory, a 1TB SSD and 128Gb of DDR5 RAM. The first benchmark performed (Figure 2 and Supplementary Figure S1) was a 5 times up sampling of the input data, using a catmull-rom (19) interpolator. Benchmarks were performed on 3 differ- ent input images with shapes 10x10x10, 10, 10x300x300 and 500x300x300 (time-points, height, width). The second benchmarks (Supplementary Figure 2-4) were 2D convolu- tions using a kernel filled with 1s with varying sizes (1, 5, 9, 13, 17, 21) on images with varying size (100, 500, 1000, 2500, 5000, 7500, 10000, 15000 or 20000 pixels for both di- mensions). NanoPyx comparison with NanoJ. Run times of eSRRF image processing were measured using a MacBook Air M1 with 16Gb of RAM and a 512Gb SSD. The parameters used for the analysis where the same for both NanoPyx and NanoJ: magnification – 5; radius – 1.5; sensitivity – 2; number of frames for SRRF – 1. The input image was a stack with 283 by 283 pixels and 10 000 frames. For the final image output an average reconstruction was performed. Availability. The NanoPyx Python library and the Jupyter Notebooks can be found in our Github repository: https: //github.com/HenriquesLab/NanoPyx. The na- pari plugin implementing all NanoPyx methods can be found in a separate Github repository: https://github.com/ HenriquesLab/napari-NanoPyx. ACKNOWLEDGEMENTS We express our gratitude to the previous developers of the NanoJ framework, whose work inspired this study. Additionally, we extend thanks to Loic Royer and Juan Nunez-Iglesias for their invaluable feedback and guidance in preparing our work. R.H., P.M.P and R.P. acknowledge support from LS4FUTURE Associated Laboratory (LA/P/0087/2020). R.H., B.M.S. and I.M.C. acknowledge the support of the Gulbenkian Foundation (Fundação Calouste Gulbenkian), the European Re- search Council (ERC) under the European Union’s Horizon 2020 research and in- novation programme (grant agreement No. 101001332), the European Commis- sion through the Horizon Europe program (AI4LIFE project with grant agreement 101057970-AI4LIFE, and RT-SuperES project with grant agreement 101099654-RT- SuperES), the European Molecular Biology Organization (EMBO) Installation Grant (EMBO-2020-IG-4734) and the Chan Zuckerberg Initiative Visual Proteomics Grant (vpi-0000000044 with DOI:10.37921/743590vtudfp). In addition, A.D.B acknowl- edges the FCT 2021.06849.BD fellowship. R.H. and B.M.S. also acknowledge that this project has been made possible in part by a grant from the Chan Zuckerberg Ini- tiative DAF, an advised fund of Silicon Valley Community Foundations (Chan Zucker- berg Initiative Napari Plugin Foundations Grants Cycle 2, NP2-0000000085). P.M.P and R.P. acknowledge support from Fundação para a Ciência e Tecnologia (Portu- gal) project grant (PTDC/BIA-MIC/2422/2020) and the MOSTMICRO-ITQB RD Unit (UIDB/04612/2020, UIDP/04612/2020), P.M.P acknowledges support from La Caixa 6 | bioRχiv Henriques et al. | NanoPyx CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Junior Leader Fellowship (LCF/BQ/PI20/11760012) financed by ”la Caixa” Founda- tion (ID 100010434) and by European Union’s Horizon 2020 research and inno- vation programme under the Marie Skłodowska-Curie grant agreement No 847648, and a from a Maratona da Saúde award. This study was supported by the Academy of Finland (338537 to G.J.), the Sigrid Juselius Foundation (to G.J.), the Cancer So- ciety of Finland (Syöpäjärjestöt; to G.J.), and the Solutions for Health strategic fund- ing to Åbo Akademi University (to G.J.). This research was supported by InFLAMES Flagship Programme of the Academy of Finland (decision number: 337531). EXTENDED AUTHOR INFORMATION • Bruno M. Saraiva: 0000-0002-9151-5477; �Bruno_MSaraiva • Inês Martins Cunha: 0000-0002-1327-9018; �inesmcunha • António D. Brito: 0009-0001-1769-2627; �Antonio_DBrito • Gautier Follain: 0000-0003-0495-9529; �Follain_Ga • Raquel Portela: 0000-0002-5559-9554; �RaquelP02997757 • Robert Haase: 0000-0001-5949-2327; �haesleinhuepf • Pedro Matos Pereira: 0000-0002-1426-9540; �MicrobeMatos • Guillaume Jacquemet: 0000-0002-9286-920X; �guijacquemet • Ricardo Henriques: 0000-0002-2043-5234; �HenriquesLab AUTHOR CONTRIBUTIONS B.M.S, P.M.P, G.J., R.He. conceived the study in its initial form; B.M.S., I.M.C., A.D.B., R.He. developed the NanoPyx framework with code contributions from R.Ha., G.J; B.M.S., I.M.C., A.D.B., R.He. designed the Liquid Engine optimization approach; B.M.S., I.M.C., A.D.B. implemented the Liquid Engine tools; G.F., R.P., P.M.P, G.J. provided samples, data, critical feedback, testing and guidance; B.M.S., I.M.C., A.D.B., G.F., G.J. performed experiments and analysis; B.M.S, P.M.P, G.J., R.He. acquired funding; B.M.S., P.M.P., R.Ha., G.J., R.He. supervised the work; B.M.S., I.M.C., A.D.B., G.J., R.He. wrote the manuscript with input from all authors. COMPETING FINANCIAL INTERESTS The authors declare no conflict of interests. Bibliography 1. Romain F Laine, Kalina L Tosheva, Nils Gustafsson, Robert D M Gray, Pedro Almada, David Albrecht, Gabriel T Risa, Fredrik Hurtig, Ann-Christin Lindås, Buzz Baum, Jason Mercer, Christophe Leterrier, Pedro M Pereira, Siân Culley, and Ricardo Henriques. Nanoj: a high-performance open-source super-resolution microscopy toolbox. Journal of Physics D: Applied Physics, 52:163001, 4 2019. ISSN 0022-3727. doi: 10.1088/1361-6463/ab0261. 2. Nils Gustafsson, Siân Culley, George Ashdown, Dylan M. Owen, Pedro Matos Pereira, and Ricardo Henriques. Fast live-cell conventional fluorophore nanoscopy with imagej through super-resolution radial fluctuations. Nature Communications, 7:12471, 11 2016. ISSN 2041- 1723. doi: 10.1038/ncomms12471. 3. Siân Culley, David Albrecht, Caron Jacobs, Pedro Matos Pereira, Christophe Leterrier, Jason Mercer, and Ricardo Henriques. Quantitative mapping and minimization of super- resolution optical imaging artifacts. Nature Methods, 15:263–266, 4 2018. ISSN 1548-7091. doi: 10.1038/nmeth.4605. 4. Michael J Rust, Mark Bates, and Xiaowei Zhuang. Sub-diffraction-limit imaging by stochas- tic optical reconstruction microscopy (storm). Nature Methods, 3:793–796, 10 2006. ISSN 1548-7091. doi: 10.1038/nmeth929. 5. Mark Bates, Bo Huang, and Xiaowei Zhuang. Super-resolution microscopy by nanoscale localization of photo-switchable fluorescent probes. Current Opinion in Chemical Biology, 12:505–514, 10 2008. ISSN 13675931. doi: 10.1016/j.cbpa.2008.08.008. 6. Stefan W. Hell and Jan Wichmann. Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Optics Letters, 19:780, 6 1994. ISSN 0146-9592. doi: 10.1364/OL.19.000780. 7. John M. Guerra. Super-resolution through illumination by diffraction-born evanescent waves. Applied Physics Letters, 66:3555–3557, 6 1995. ISSN 0003-6951. doi: 10.1063/1. 113814. 8. Johannes Schindelin, Curtis T. Rueden, Mark C. Hiner, and Kevin W. Eliceiri. The imagej ecosystem: An open platform for biomedical image analysis. Molecular Reproduction and Development, 82:518–529, 7 2015. ISSN 1040452X. doi: 10.1002/mrd.22489. 9. Johannes Schindelin, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, Curtis Rueden, Stephan Saalfeld, Benjamin Schmid, Jean-Yves Tinevez, Daniel James White, Volker Hartenstein, Kevin Eliceiri, Pavel Toman- cak, and Albert Cardona. Fiji: an open-source platform for biological-image analysis. Nature Methods, 9:676–682, 7 2012. ISSN 1548-7091. doi: 10.1038/nmeth.2019. 10. Martin Ovesný, Pavel Kˇrížek, Josef Borkovec, Zdenˇek Švindrych, and Guy M. Hagen. Thunderstorm: a comprehensive imagej plug-in for palm and storm data analysis and super-resolution imaging. Bioinformatics, 30:2389–2390, 8 2014. ISSN 1367-4803. doi: 10.1093/bioinformatics/btu202. 11. Joerg Schnitzbauer, Maximilian T Strauss, Thomas Schlichthaerle, Florian Schueder, and Ralf Jungmann. Super-resolution microscopy with dna-paint. Nature Protocols, 12:1198– 1228, 6 2017. ISSN 1754-2189. doi: 10.1038/nprot.2017.024. 12. Marcel Müller, Viola Mönkemöller, Simon Hennig, Wolfgang Hübner, and Thomas Huser. Open-source image reconstruction of super-resolution structured illumination microscopy data in imagej. Nature Communications, 7:10980, 3 2016. ISSN 2041-1723. doi: 10.1038/ ncomms10980. 13. Robert P J Nieuwenhuizen, Keith A Lidke, Mark Bates, Daniela Leyton Puig, David Grün- wald, Sjoerd Stallinga, and Bernd Rieger. Measuring image resolution in optical nanoscopy. Nature Methods, 10:557–562, 6 2013. ISSN 1548-7091. doi: 10.1038/nmeth.2448. 14. A. Descloux, K. S. Grußmayer, and A. Radenovic. Parameter-free image resolution es- timation based on decorrelation analysis. Nature Methods, 16:918–924, 9 2019. ISSN 1548-7091. doi: 10.1038/s41592-019-0515-7. 15. T. et al Kluyver. Jupyter notebooks – a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas 87–90 (IOS Press, 2016). 16. N. et al. Sofroniew. napari: a multi-dimensional image viewer for python. 2022. doi: 10. 5281/zenodo.7276432. 17. John E. Stone, David Gohara, and Guochun Shi. Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science Engineering, 12:66–73, 5 2010. ISSN 1521-9615. doi: 10.1109/MCSE.2010.69. 18. Romain F. Laine, Hannah S. Heil, Simao Coelho, Jonathon Nixon-Abell, Angélique Jimenez, Tommaso Galgani, Aki Stubb, Gautier Follain, Siân Culley, Guillaume Jacquemet, Bassam Hajj, Christophe Leterrier, and Ricardo Henriques. High-fidelity 3d live-cell nanoscopy through data-driven enhanced super-resolution radial fluctuation. bioRxiv, 2022. doi: 10.1101/2022.04.07.487490. 19. Edwin Catmull and Raphael Rom. A CLASS OF LOCAL INTERPOLATING SPLINES, pages 317–326. Elsevier, 1974. doi: 10.1016/B978-0-12-079050-0.50020-5. 20. Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. Cython: The best of both worlds. Computing in Science Engineering, 13(2): 31–39, 2011. doi: 10.1109/MCSE.2010.118. 21. A. et al. Kloeckner. Pyopencl. 2022. doi: 10.5281/zenodo.7063192. 22. Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba. pages 1–6. ACM, 11 2015. ISBN 9781450340052. doi: 10.1145/2833157.2833162. 23. Vilém Novák, Irina Perfilieva, and Jiˇrí Moˇckoˇr. Mathematical Principles of Fuzzy Logic. Springer US, 1999. ISBN 978-1-4613-7377-3. doi: 10.1007/978-1-4615-5217-8. 24. R Haase, J Bragantini, and O Amsalem. haesleinhuepf/stackview: 0.6.2. 2023. doi: 10. 5281/zenodo.7847336. 25. jupyter widgets/ipywidgets. Interactive widgets for the jupyter notebook. https:// github.com/jupyter-widgets/ipywidgets. 26. Transonic: Make your python code fly at transonic speeds! 2023. https://github. com/fluiddyn/transonic. 27. Robert Haase, Loic A. Royer, Peter Steinbach, Deborah Schmidt, Alexandr Dibrov, Uwe Schmidt, Martin Weigert, Nicola Maghelli, Pavel Tomancak, Florian Jug, and Eugene W. Myers. Clij: Gpu-accelerated image processing for everyone. Nature Methods, 17:5–6, 1 2020. ISSN 1548-7091. doi: 10.1038/s41592-019-0650-1. Henriques et al. | NanoPyx bioRχiv | 7 CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Supplementary Note 1: How computational acceleration for an algorithm implementation writ- ten with Cython, PyOpenCL and Numba is achieved Python is an interpreted, high-level programming language that allows rapid development and easy code readability. However, Python’s flexibility and dynamic nature come at the cost of performance speed. Operations in pure Python are generally slower compared to compiled languages like C/C++. There are several methods to accelerate Python code by bypassing the Global Interpreter Lock (GIL) and compilation to machine code. Three popular approaches are Cython, PyOpenCL and Numba: • Cython (1) is a static compiler that converts Python code into optimised C/C++ code that can be compiled into a Python extension module. It provides Python-like syntax while supporting calling C functions and declaring C types. Cython code runs significantly faster than Python because it bypasses the GIL to allow multi-threading and performs low-level optimisations like loop unrolling. One limitation is that Cython requires explicit type declarations, which removes some of Python’s dynamism. Overall, Cython can accelerate Python code 5-1400x faster (Supplementary Figure S1). • PyOpenCL (2) allows Python code to execute parallel computations on Graphical Processing Units (GPUs) through the OpenCL framework. Computational tasks are offloaded to the GPU, which has thousands of tiny processing cores suited for data-parallel operations. PyOpenCL translates Python functions into OpenCL kernels that run efficiently on GPUs. This offers massive parallelism and speedup compared to Python limited by single-CPU execution. Despite PyOpenCL not requiring a physical GPU to run, the best and easiest performance improvement requires one, which can be a limiting factor for some users. Overall, PyOpenCL can accelerate some workloads up to 150x by harnessing GPU parallelism28 (Supplementary Figure S1). • Numba (3) is a just-in-time (JIT) compiler that converts Python functions into optimised machine code using the LLVM compiler framework. It is designed to accelerate numerical and scientific workloads using NumPy arrays and math operations. Numba-compiled code avoids interpreter overhead and leverages vectorisation, loop-unrolling and parallel execution on multicore CPUs. But Numba has compilation overhead on first run. Overall, Numba can speed up math- heavy Python code by 100-400x by generating optimised machine code specialised for CPUs (Supplementary Figure S1). In summary, all these three tools can significantly accelerate Python code by bypassing interpreter overhead and using efficient compilation, parallelism, and hardware optimisation. Cython translates Python to C/C++ code that can multi-thread and lever- age CPU efficiency. PyOpenCL taps into massively parallel GPU hardware. Numba optimises machine code for numerical workloads on CPUs. Typical speedups depend on the methods used, nature of the code, size of the input data and hardware used. Supplementary Note 2: The machine-learning basis of the Liquid Engine The NanoPyx Liquid Engine presents a straightforward machine-learning technique for performance self-tuning. To do so, it logs execution times of operations across multiple implementations like Python, Numba, Cython and OpenCL. These bench- marking times are used to train basic regression models to predict the occurrence of delays. If delays are detected the trained models are used at run time to estimate the magnitude and occurrence of a delay in a specific implementation. Over time, the benchmarking data is aggregated to refine the models continuously. This allows the Liquid Engine to "learn" the optimal imple- mentations for a given platform, device, and data shape to maximise performance. It’s important to note that these principles do not use neural networks, rather focusing on looped data-driven optimisations. In summary, fundamental machine learning prin- ciples are applied in the engine itself for auto-tuning - the methods exposed to users are traditional image processing functions using optimised implementations under the hood. Supplementary Note 3: Meta-programming in the Liquid Engine Meta-programming (4) is a programming technique where a program can manipulate or generate code during run time. In NanoPyx, meta-programming is used to generate optimised implementations of the same task automatically. The Liquid Engine uses two meta-programming tools: tag2tag and c2cl. The tag2tag tool enables developers to generate multiple implementations of the same algorithm written as C or Python code snippets. Effectively, tag2tag transcribes these snippets into single-threaded and multi-threaded versions of the code, generally then called by Cython. Specifically, developers can write a single version of the code and delimit the "tag" to be propagated to the other implementations (e.g., a function). The tag2tag tool is able to read the content of the file, identify the tags, and store them in a dictionary, where the tag name is the key, and the associated code snippet is the value. After choosing the "tag", in the same script, the developer can create a tag-copy, where they specify what part of the original tag should be replaced, and what to replace it for. tag2tag will identify the tag placeholders and apply the specified replace commands to the associated tag code. It then replaces the tag placeholder in the file with the modified tag code. This tool is intended for use with CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Python (.py), Cython (.pyx), and OpenCL (.cl) files. In practice, for most tasks implemented in the Liquid Engine, we used a single-threaded version of the code as the original tag, and created multi-threaded versions of the code by replacing the range in the for loops with prange. Additionally, we added implementations with different schedulers for the parallelisation. This allows the developer to easily create as many code variations as they find necessary. This approach streamlines the process of maintaining consistency across various code implementations, as altering the code in one version ensures that the modification is seamlessly and consistently applied to all other relevant versions. As a result, developers can effectively manage code updates and improvements, as these are propagated into the other implementations effortlessly, reducing redundancy and enhancing code maintainability. Another meta-programming tool used in the Liquid Engine is the c2cl tool. c2cl is analogous to the tag2tag tool, but specifically designed to extract C functions and propagate them into .cl files, so the C functions can be used in OpenCL kernels. Manual conversion of these adaptations would be time-consuming and prone to errors, which is where c2cl comes in. The tool automates the process of porting C code to OpenCL by extracting reusable code blocks from the C functions, converting the C code into valid OpenCL kernels, and inserting the modified kernels back into the OpenCL file. The Liquid Engine also supports Numba as an alternative performance-boosting option for Python code snippets. With all these implementations, NanoPyx can be run and used by users with diverse hardware configurations. Overall, the Liquid Engine extensively uses meta-programming techniques to avoid manual coding. Code generation and transformations are used to automatically create specialised implementations, resulting in a simple and flexible architecture. Supplementary Note 4: Fuzzy logic in the Liquid Engine The Liquid Engine employs fuzzy logic (5) to match a specific function call to its most similar past benchmark. The Liquid Engine adaptive nature is highly dependent on the existence of appropriate benchmarks for each implementation. As previously demonstrated, the time it takes for a particular image analysis task to execute is greatly influenced by the size of the input image and by the parameters chosen to perform the given task. To address this variability, benchmarks are stored separately for each unique set of parameters and data size. This approach allows the Liquid Engine to dynamically adjust its performance based on the specific conditions of each task, resulting in more accurate and optimised outcomes. When the agent receives a request to execute a method using the Liquid Engine, it checks its records for historical run time data. However, when dealing with a new method executed with a unique combination of parameters, no existing benchmarks are available. To address this, the agent employs a strategy: it searches for the most similar set of parameters that has been previously used. This is because each set of parameters is associated with a run time score, which was saved along with the historical run time data in benchmark files. This score is determined by considering various factors, such as the dimensions of the input image and other relevant parameters. By finding the most similar parameters with known benchmark data, the Agent leverages this score to estimate and adapt the expected run time performance for the new method, even in the absence of specific benchmarks. This adaptive approach allows the Liquid Engine to intelligently adjust its behaviour and make informed decisions when executing methods with varying input conditions. Finally, if no appropriate benchmarks exist for a specific run type, the score of the current parameters is compared to the score of all other benchmarks, and the benchmarks with the closest score are used. Supplementary Note 5: Example datasets in NanoPyx NanoPyx provides users a wide range of example datasets. These datasets not only draw from our previous publications but also tap into publicly available datasets (6). These were integrated into the NanoPyx framework to facilitate testing and development, offering users an opportunity to gain hands-on experience and explore the capabilities of the library. These include single-molecule localization microscopy data of Cos7 cells expressing Utrophin-GFP (from (7)); U2OS with microtubules labelled with AF647; Jurkat T cells expressing LifeAct-GFP (from (8)); Strucured Illumination Microscopy data of VACV A4 virions (from (9)); among others. The management and loading of datasets within NanoPyx are orchestrated through a class specifically designed for data man- agement, facilitating efficient access and use. The class provides functions to list datasets and retrieve their information, enabling users to effortlessly identify and choose relevant datasets. Most of the datasets are stored and accessed via Google Drive, and can be automatically downloaded as zip files. Once downloaded, these zip files can be effortlessly converted into numpy arrays, a process that seamlessly manages the complexities of image retrieval and manipulation. In Jupyter or Colab notebooks, users can easily access the datasets through a user-friendly graphical interface (GUI) that has been developed to streamline the process. This interface empowers users to select datasets from a diverse array of options, all of which are thoughtfully named to provide clear context. This intuitive approach ensures efficient dataset selection and integration into analysis workflows. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Supplementary figures Google Colab Professional Workstation A Laptop Comparing run type faster GPU faster B Comparing run type faster GPU faster C Comparing run type faster GPU faster Fig. S1. Ratio between the run times of OpenCL and other implemented run types. Run times of a 5x Catmull-rom22 interpolation were measured across multiple input data sizes using either a MacBook Air M1 (A), a Professional Workstation (B) or Google Collabo- ratory (C). Area within dashed lines correspond to kernel and image sizes where OpenCL is faster than other implementations. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- A B Laptop Professional workstation Threaded CPU faster GPU faster Threaded CPU faster GPU faster Fig. S2. Ratio between the run times of a 2D convolution. Run times were measured across multiple input data sizes and kernel sizes using either a MacBook Air M1 (A) or a Professional Workstation (B). Areas within dashed lines correspond to kernel and image sizes where OpenCL is faster than threaded CPU. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Laptop A B Professional workstation Fig. S3. Run time of each implementation is highly dependent on the shape of input data. A 2D convolution was performed on images with increasing size using either a MacBook Air M1 (A) or a professional workstation (B). A 21 by 21 kernel was used in all operations. When using the MacBook laptop, interestingly the PyOpenCL implementation is the fastest until 125MB after which the Cython threaded implementations become significantly faster. In the professional workstation, while unthreaded is virtually always the slowest implementation, the threaded implementations are only the fastest until the size increases to 20MB, after which PyOpenCL becomes the fastest. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Laptop A B Professional workstation Fig. S4. Kernel size impacts which implementation is the fastest. A 2D convolution was performed on images with varying kernel sizes, ranging from 1 to 21 (every 4) using either a MacBook Air M1 (A) or a professional workstation (B). A 21 by 21 kernel was used in all operations While unthreaded is virtually always the slowest implementation, the threaded implementations are only the fastest until the size increases to 20MB, after which PyOpenCL becomes the fastest. Bottom panels correspond to zoomed in windows of top panels, indicated by dotted boxes. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Delay state? Is delayed still the fastest? Choose Fastest Adjust run times ON YES NO NO NO Eq. 3 Eq. 1 Eq. 5 Eq. 2 OFF Check delay YES YES Delay state? ON OFF Delay over? Calculate delay factor and delay probability Set delay state to OFF Run Fastest Run Fastest RUN END Store run time Set delay state to OFF END END END Assign probabilities Eq. 4 Choose Run Type Fig. S5. Schematic of the agent decision making for delay management. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- eSRRF 2D Convolution Device injury Device injury A B 1.8x faster 1.5x faster Fig. S6. Example of delay management by the Liquid engine. Multiple two-dimensional convolutions (A) and eSRRF analysis (B) were run sequentially in a professional workstation. Starting from two initial benchmarks, the agent is responsible to inform the Liquid Engine is what is the best probable implementation. An artificial delay was induced by overloading the GPU with superfluous calculations in a separate Python interpreter. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Fig. S7. NanoPyx is available to users independently or their coding expertise. Besides the using NanoPyx as a Python library, users also have access to Jupyter notebooks (10) (A) that can either be run locally or through Google Collaboratory and a napari (11) plugin (B). CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint ----!@#$NewPage!@#$---- Supplementary Bibliography 1. Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. Cython: The best of both worlds. Computing in Science Engineering, 13(2):31–39, 2011. doi: 10.1109/MCSE.2010.118. 2. A. et al. Kloeckner. Pyopencl. 2022. doi: 10.5281/zenodo.7063192. 3. Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba. pages 1–6. ACM, 11 2015. ISBN 9781450340052. doi: 10.1145/2833157.2833162. 4. Krzysztof Czarnecki, Kasper Østerbye, and Markus Völter. Generative programming. pages 15–29, 01 2002. 5. Vilém Novák, Irina Perfilieva, and Jiˇrí Moˇckoˇr. Mathematical Principles of Fuzzy Logic. Springer US, 1999. ISBN 978-1-4613-7377-3. doi: 10.1007/978-1-4615-5217-8. 6. Nicolas Oliver and Debora Keller. Storm vectashield datasets (tubulin). 2023. doi: https://doi.org/10.5281/zenodo.7620025. 7. Siân Culley, Kalina L. Tosheva, Pedro Matos Pereira, and Ricardo Henriques. Srrf: Universal live-cell super-resolution microscopy. The International Journal of Biochemistry Cell Biology, 101: 74–79, 8 2018. ISSN 13572725. doi: 10.1016/j.biocel.2018.05.014. 8. Nils Gustafsson, Siân Culley, George Ashdown, Dylan M. Owen, Pedro Matos Pereira, and Ricardo Henriques. Fast live-cell conventional fluorophore nanoscopy with imagej through super-resolution radial fluctuations. Nature Communications, 7:12471, 11 2016. ISSN 2041-1723. doi: 10.1038/ncomms12471. 9. Robert D. M. Gray, Corina Beerli, Pedro Matos Pereira, Kathrin Maria Scherer, Jerzy Samolej, Christopher Karl Ernst Bleck, Jason Mercer, and Ricardo Henriques. Virusmapper: open-source nanoscale mapping of viral architecture through super-resolution microscopy. Scientific Reports, 6:29132, 7 2016. ISSN 2045-2322. doi: 10.1038/srep29132. 10. T. et al Kluyver. Jupyter notebooks – a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas 87–90 (IOS Press, 2016). 11. N. et al. Sofroniew. napari: a multi-dimensional image viewer for python. 2022. doi: 10.5281/zenodo.7276432. CC-BY 4.0 International license. available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made bioRxiv preprint doi: https://doi.org/10.1101/2023.08.13.553080; this version posted August 14, 2023. The copyright holder for this preprint