CoLabo at 2: looking for a postdoc to work on genetic-epi models using data from a real-life outbreak simulator

8 min readJul 11, 2022

By Andrés Colubri —

Hello everyone! Here at CoLabo, as we approach our two-year anniversary, we would like to announce that we are looking for our first postdoctoral researcher to join our lab at UMass Chan Medical School.

Launching the lab in August of 2020, during the midst of the pandemic, was challenging, but since then we were able to build an amazing team of talented people with diverse backgrounds and skills. This diversity reflects the multi-disciplinary nature of the lab, where we aim to bridge computational modeling of infectious diseases, visualization of complex biomedical datasets, and science education and outreach.

A major focus of our research is the development of an app-based outbreak simulator called Operation Outbreak (OO). As we scale this project up, we would like to bring in expertise to create and validate new predictive models using the data we can generate from OO simulations. See the full posting for this position here and keep reading below for more context about this research and the direction we anticipate in the near future.

Developing a realistic “real-life” outbreak simulator

Operation Outbreak started as an experiential learning activity and classroom module on public health and epidemiology for middle and high schools we created back in 2016 with collaborators from the Broad Institute and the Inspire Project. The OO learning activity consists of an outbreak simulation where students try to stop the outbreak by playing roles as general population, health responders, epidemiologists, and government. This simulation is enabled by a smartphone app that uses Bluetooth to spread the virtual pathogen among the participants in the simulation. We developed the first version the OO app in early 2018, tested it with great success with middle school students at the time, and piloted at many different settings since then (read this recent article on Harvard Health Policy Review for more details about the history of the project.)

As we took the initial OO app prototype into a full-fledged platform for infectious disease education in 2019 and early 2020, the COVID-19 pandemic made us reconsider the applications and impact of the project. On the one hand, the pandemic highlighted the engaging and effective education on infectious disease, and this reinforced our commitment to fulfilling the educational potential of OO. On the other hand, we also came to the realization that OO could be the basis for a novel methodology to generate realistic synthetic “grown-truth” outbreak datasets resolved at the individual-level, comprising not only close contacts between individuals in real-life but also simulated viral genetic sequences and detailed epidemiological information. This data, largely non-existing in the present, could then be applied to develop and validate individual-level models of disease transmission.

This video from a presentation at the Artificial Intelligence for Pandemics (AI4PAN) goes over the history of OO and our plans to use it as an epidemiology modeling and validation platform:

Video of OO presentation at AI4PAN

Our long-term goal is to continue developing OO into a highly realistic live outbreak simulator to conduct participatory experiments where virtual microbes are spread among individuals using proximity sensing technology available on mobile devices (currently Bluetooth Low Energy or BLE, but eventually more accurate technologies.) Computational analysis from recent large-scale OO simulations support the hypothesis that proximity sensing is able to capture real-world behavioral patterns and can be used to construct epidemiological models. Building on these results, we aim to implement a comprehensive modeling framework to drive OO simulations consistent with the latest genetic-epidemiological knowledge about COVID-19 and other infectious diseases. We also look into improving the underlying technology in OO app, for example by using Ultra-wideband (UWB) to increase the spatial resolution of our simulations. We aim to eventually validate outbreak reconstruction methods with the synthetic ground-truth data from OO.

Validating genetic-epi models with synthetic ground-truth data

Understanding transmission dynamics is essential for control of emerging infectious diseases, yet it is difficult to observe the complete transmission process due to individual heterogeneity and inadequate sampling methods. During the pandemic, attempts to infer transmission dynamics using epidemiological and genetic data have relied on incomplete records of information. At the same time, computational advances could make it possible to apply increasingly rigorous statistical frameworks to reconstruct the transmission tree of an outbreak. Contact tracing data with high accuracy and precision is valuable to characterize the performance of these new inference methods, as well as for estimating routes of transmission, analyzing social determinants of disease transmission, and designing new mitigation measures. Attempts to collect high-resolution contact data have been limited by small sample size, low-resolution distance estimation technology, or both.

Using one of the basic tenets of an infectious respiratory disease — that it spreads by close contact— we can create a real network of synthetic disease transmission by spreading a virtual pathogen between users of an outbreak simulation app, such as OO, that records contact events. The OO project is not the first to do this, other app-based simulators have been proposed before, such as FluPhone and SafeBlues, but we believe that OO could represent a novel contribution to the field in three ways: (1) Ability to simulate of a complete epidemic dataset using real social network data. This is an improvement on simulated contact networks, which make assumptions about network structure that are difficult to validate. (2) Incorporating high-resolution contact events using new technologies such as Ultra-wide band (UWB). Currently available proximity detection apps rely on Bluetooth Low Energy (BLE), which produces error-prone distance estimates. UWB is a more recent technology that produces accurate and precise distance measurements. OO use of the proximity-sensing library Herald (the only open-source alternative to Google/Apple Exposure Notification system) will make it easier to test out new technologies when they become available. (3) Predicting individualized risk in real-time using epidemiological and genetic data. Ultimately, our aim is to reconstruct the transmission network in real-time using Bayesian inference and use forward simulations to determine individual risk of infection. Previous attempts at outbreak reconstruction have been limited to retrospective analysis due to limited data and, to our knowledge, individual-level predictions have been largely missing. Because we will use the data from OO simulations, which contain the full transmission tree, we should have the ability of comparing predictions with the “ground truth,” and therefore validating the models.

The following diagram links all the elements described in the aims, going from the live OO simulations to risk predictions that could be validated live during the simulations.

Diagram describing the stages in our proposed modeling platform: live simulations, generation of ground-truth data, construction of predictive models, validation of models during the live simulations.

With regards to predicting individual risk of infection by integrating outbreak reconstruction with forward epidemic simulations, we are currently considering Bayesian inference approaches to reconstruct the transmission tree using partially observed genetic-epidemiologic data, like those proposed earlier by Lau et al. and Campbell et al. Unobserved events and data (exposure times and transmitted sequences) are treated as parameters that can be estimated. Once a plausible transmission tree up to the present time is determined from past data, we can use it as the starting point for individual-level stochastic simulations that calculate the time-dependent probabilities of adopting an epidemiological state (susceptible, exposed, infected, recovered) for each individual in the population of interest within a reasonable time window into the near future. This framework can be applied to a wide range of epidemic datasets, including real data collected in past outbreaks and pandemics, high-resolution data from real-life app-based simulations, or entirely synthetic data generated with agent-based models.

From the microbiology of host-pathogen interaction to population-level epidemiological dynamics

We also believe that our outbreak simulator also give us a framework incorporate molecular-level features that characterize the interaction between pathogen and host cells as inputs into the models that predict individual phenotypes of the disease, such as viral load kinetics. The predictions of these models could be used to drive the dynamics of the live outbreak, for example, by determining infection and recovery rates.

To create a model of transmissibility that enables us to generate realistic ground-truth data of infection dynamics in OO simulations, we are considering to use viral load as a proxy for infectiousness. In the case of COVID-19, within-host models of viral evolution to time series RT-PCR cycle threshold (CT) have been proposed. Different virtual viral load profiles could be generated for each participant in the OO simulation, which in turn will determine their individual infectivity, and the overall outbreak dynamics.

Furthermore, we will also simulate genetic sequences over a transmission network that is generated in real-time during our app-based outbreak experiments. We will seed these simulations with an ancestral genome at the index case, and every time a transmission event between infecting and exposed individuals occurs, we will calculate the coalescence time between the transmitted lineage and the lineages that exist within the transmitting individual. We will use a coalescent model with variable viral population size determined by viral load of infected individuals. By sampling viral lineages from each infected individual over the course of simulation, we will produce a complete phylogenetic tree for the outbreak.

The following figure illustrates how these different models will work within the OO framework to result in fully sampled transmission trees:

Each “player” in the OO simulation (1a) will be assigned an individual-level model of viral kinetics (1b), which will determine their infectivity during the outbreak. The transmission network (2), generated in real-time during the simulation will result in a time tree (3) that will be used to calculate coalescence times in the intra-host models of sequence evolution (4–5).

In summary, we are trying to create an app-based outbreak simulation platform that is able to simulate infectious disease at all the scales: pathogen biology, intra-host immune response, between-host transmission mediated by social interactions and individual factors. The datasets produced by this platform would be highly detailed and capable of informing both model training and validation, which of course requires that we can enroll enough participants in our simulations. This in particular means the OO must be an engaging and trustworthy app that people want to use, that’s why our team also involves User Experience designers, app developers and software engineers :-) Ultimately, we aim to create a useful resource for the infectious disease community, including not only the simulation tools themselves, but also data and models.

The idea of “multi-scale” infectious disease modeling and simulation is arguably very compelling (but probably also the hardest to realize):

Increasing awareness of the importance to integrate within- and between-host scales has led to the development of models that explicitly link the two scales. These models, often referred to as ‘multi-scale’ models, have increased in popularity in recent years. While there have been exciting advances made in this area, most studies linking within- and between-host scales are conceptual or theoretical with mainly qualitative and little quantitative support from data. Progress towards a predictive multi-scale framework will require a more precise, quantitative understanding of how infection dynamics, pathogen load, target cell depletion, immunology, symptomatology and other clinical features combine to shape pathogen transmission fitness at the population level.

As pointed out by Handel & Rohani in their opinion piece from 2015:

Crossing the scale from within-host infection dynamics to between-host transmission fitness: a…

The progression of an infection within a host determines the ability of a pathogen to transmit to new hosts and to…

royalsocietypublishing.org

Join us at CoLabo!

I hope you enjoyed reading our ideas and plans regarding OO and using this novel tool to generate better epidemiological datasets and models. When not developing apps or running simulations, we like to take occasional day trips to enjoy nature or city attractions: