## Metadata - Author: #people/samir-bhatt et al. - Full Title: Unifying incidence and prevalence under a time-varying general branching process - Document Tags: #bayesian-statistics #branching-processes #infectious-disease-modelling #renewal-equations - DOI: [10.48550/arXiv.2107.05579](https://doi.org/10.48550/arXiv.2107.05579) ## Highlights - We show that the equations for incidence and prevalence are consistent with the so-called back-calculation relationship. (Page 1) - They constructed classes, called compartments, and modelled the propagation of infectious disease via interactions among these compartments. The result is the popular susceptible–infected–recovered (SIR) model, variants of which are widely used in epidemiology. SIR models provide an intuitive mechanism for understanding disease transmission, and in the original derivation of [31], they were noted to be similar to the Volterra equation [40]. The Volterra equation (of the second kind) or more commonly, the renewal equation, is another popular governing equation [10, 14, 21, 38]. (Page 2) - majority of renewal frameworks model only incidence, and the explicit link between prevalence and incidence often requires the use of a latent process for incidence [8]. (Page 2) - Between individual-based and governing equation models are stochastic branching processes. Branching processes are applied in the modelling of epidemics by first constructing a stochastic process where infected individuals transmit disease according to simple rules, and then deriving a governing equation for the average behaviour. For example Galton–Watson processes, where individuals infect other individuals at generations specified by a fixed time, provide a tractable and intuitive way of modelling the spread of an infectious disease [2, 25]. In 1948, Bellman and Harris [4] elegantly captured a more complex underlying infection mechanism by formulating an age-dependent branching process, where the age-dependence alludes to individuals who infect other individuals after a random interval of time. Interestingly, the expectation of the Bellman–Harris process [4] follows a renewal equation (Page 2) - Crump, Mode [15, 16] and (independently) Jagers [30] further extended the Bellman-Harris process into a general branching process where individuals not only can infect at random times, but can do so randomly over the duration of their infection (as opposed to the Bellman-Harris process where all subsequent infections generated by each infected individual happen at a single random time). (Page 3) - The form of this renewal equation when only considering R0 is exactly what is commonly used in epidemic modelling where the incidence of infections I(t) follows a renewal equation given by I(t) = R0 I(t − u)g(u)du, (cid:90) ∞ 0 where g(·) is the probability density function (PDF) of the generation interval (Page 3) - In our novel time-varying Crump–Mode–Jagers process, we specifically allow the statistical properties of infections, i.e., “offspring”, generated by each individual to vary over time. Building on this model, we can lay down a general, stochastic process foundation for incidence, cumulative incidence and prevalence, and characterise the renewal-like integral equations they follow. We show that the equations for prevalence and incidence are consistent with the well-known back-calculation relationship [8, 17] used in infectious disease epidemiology (Page 4) - We also outline an argument to show that the common renewal equation used ubiquitously for modelling incidence [14, 21] is in fact, under specific conditions, equivalent to the integral equation for incidence in our framework. Additionally, we formulate a novel reproduction process where infections occur randomly over the duration of each individual’s infection according to an inhomogeneous Poisson process. The model thus eschews the common assumption that infections happen instantaneously at a random time, as in the Bellman–Harris process, but still admits analytically tractable integral equations for prevalence and incidence. (Page 4) - In our time-varying CMJ outbreak model, the initial infection occurs at non-random time τ ≥ 0. All subsequent infections are “progeny” of this index case, and we shall denote the set of these infected individuals by I ∗. (Page 4) - infected, the individual, and Lτ is a positive random variable representing the amount of time the individual remains • χτ (·) is a stochastic process on [0, ∞) which we shall call the random characteristic of • N τ (·) is a counting process on [0, ∞) keeping track of the new infections, i.e., “offspring”, generated by the individual. (Page 5) - The Bellman–Harris branching model can informally be characterised, in the context of epidemics, by the principle that each individual generates a random number of new infections which occur simultaneously at a random time. Once these new infections have occurred, the individual immediately ceases to be infectious. (Page 5) - Epidemiologically, the Bellman–Harris process can be applied to modelling diseases where the majority of secondary infections happen at a specific time rather than over the duration of the infection (Page 5) - realistic epidemiological model where each infected individual generates new infections randomly and one by one according to an inhomogeneous Poisson process until they cease to be infectious. This process, with a constant rate of transmission has been previously studied in the context of the generation time (Page 5) - ρ(·) is a non-negative function that models population-level variation in transmissibility while k(·) is another non-negative function describing how individual-level infectiousness varies over time [45]. For example, specifying k(t) to be low or zero for small t can be used to incorporate an incubation period in the model. Let Φ(·) be a unit-rate, homogeneous Poisson process on [0, ∞), independent of {Lτ }τ ≥0. Then we can define this model explicitly by N τ (u) :=   Φ(cid:82) u Φ(cid:82) Lτ 0 ρ(v + τ )k(v)dv(cid:1), u < Lτ 0 ρ(v + τ )k(v)dv(cid:1), u ≥ Lτ (If ρ(t) ≡ ρ and k(t) ≡ k, both constant, then new infections follow a homogeneous Poisson process with rate ρk until the individual is no longer infected (Page 6) - L´evy process (i.e., a process with independent and identically distributed increments) (Page 6) - deterministic function ρ(·) with a stochastic process, as long as it is independent of Φ(·), would be straightforward. In the Poisson case, this would turn N τ (·) into to a doubly-stochastic Cox process. (Page 6) - Bellman–Harris process of Example 1, Lτ is directly interpreted as the generation (Page 6) - In contrast, the inhomogenous Poisson process model of Example 2 (and also the L´evy and Cox process models of Example 3), Lτ corresponds to how long an individual remains infected (the duration of infection). During this period, an individual can infect others with rate that depends on ρ(·), which describes calendar-time variation of overall infectiousness in the population, and on k(·), which describes how the infectiousness of each infected individual varies over the course of their infection. The individual’s infectiousness profile k(·) can be set as constant, i.e., variation in the individual’s infectiousness is only due to calendar-time variation in overall infectiousness. (Page 7) - Z(t, τ ) tallies the number of infections occurred by time t and for (7) the number of infected individuals at time t (Page 7) - CI(t, τ ) = 1 + CI(t, u + τ )R(u + τ )gτ (u)du, τ Pr(t, τ ) = G (t − τ ) + Pr(t, u + τ )R(u + τ )gτ (u)du. (cid:90) t−τ 0 (cid:90) t−τ 0 (Page 9) - Note: Bellman Harris equations - CI(t, τ ) = 1 + τ CI(t, u + τ )ρ(u + τ )k(u)G (u)du, τ Pr(t, τ ) = G (t − τ ) + τ Pr(t, u + τ )ρ(u + τ )k(u)G (u)du (cid:90) t−τ 0 (cid:90) t−τ 0 (Page 10) - Note: Poisson process equations - The quantity R(t) in the context of the Bellman–Harris process (Examples 1 and 12) is more precisely the instantaneous reproduction number, i.e., the expected number of secondary cases arising from a primary case when those infections occur at time t. (Page 11) - quantity ρ(t) in the Poisson process model (Examples 2 and 13), in contrast, is a time varying transmission rate, i.e., scaled by time, and therefore exists on a different scale (Page 11) - alternative way of analysing R(·) is to use the case reproduction number R(t) [27, 49], which represents the average number of secondary cases arising from a primary case infected at time t, i.e., transmissibility after time t. (Page 11) - RPois(t) = t ρ(u)k(u − t)G (u − t)du, RBH(t) = R(u)gt(u − t) (cid:90) ∞ t (cid:90) ∞ t (Page 11) - Incidence is defined as the time-derivative of cumulative incidence (Page 11) - λτ (·), which can be assumed non-negative here, given that N τ (·) is a counting process. This assumption rules out lattice-type occurrence of infections, and is τ satisfied with λτ (u) = ρ(u + τ )k(u)G (u) (Page 11) - incidence is governed by the equation I(t, τ ) = δ(t − τ ) + I(t, u + τ )λτ (u)du, t ≥ τ ≥ 0. (cid:90) (0,t−τ ] (Page 12) - Back-calculation is a standard method to recover prevalence from incidence by convolving the survival function of the generation interval with incidence [8, 17]. We (Page 13) - Our argument above shows that there is no need to model incidence as a latent function, rather one can fit ρ(·) or R(·) directly to prevalence data using the prevalence integral equation for Pr(t, τ ), after which I(t, τ ) can be computed directly without need for a latent incidence function (Page 14) - The inclusion of τ means that we need to work with I(t, τ ), not simply I(t), and also gives rise to terms outside of the integral depending on whether one is interested in incidence, cumulative incidence or prevalence. (Page 15) - In the case of prevalence or cumulative incidence, the equivalence between the common renewal equation and our newly derived integral equations is broken. (Page 16) - Simpler renewal equations that do not involve varying dependence on the parameter τ for prevalence or cumulative incidence are not (Page 17) - possible. (Page 17) - (cid:90) t−τ 0 The integral equations for cumulative incidence, prevalence and incidence under the assumption (17) are all special cases of a generic equation (Page 17) - For all data sets, an arbitrary seeding period of 10 days was used to correct for poor surveillance in the early epidemic. The seeding period was not included in the likelihood and we found our fits to be robust to different choices of seeding duration (Page 20) - R(t), under the Bellman–Harris process and the Poisson process model. The results on R(t) for Influenza and SARS in both models are very similar, except that R(t) for the Poisson process model is slightly lower. This difference reflects the underlying assumption that secondary infections occur over the duration of the primary infection, rather than at once as in the Bellman–Harris process. In contrast, for Measles and Smallpox, the differences in R(t) are slightly more pronounced in terms of the timing of the main infection peak. These differences are due to the much longer generation interval compared to Influenza and SARS. (Page 22) - starting from a stochastic process, we are able to define prevalence, incidence and cumulative incidence as summary statistics (via moments) of an individual-based infection process. (Page 23) - The common renewal equation is computationally simpler as it does not involve the time τ of the first infection and simplifies the problem from two-dimensional to one-dimensional. (Page 23) - Given the ability of modern computers to perform these operations using parallel computation, we do not believe the computational overhead of our more general integral equations is meaningfully greater than the simple renewal equations. H (Page 23) - Note: Only true if can solve matrices in parallel and likely need this to be GPU accelerated. Renewal equations can also be accelerated so this does not result the likely order of magnitude difference in compute time. However most of the compute time in these models appears to come from the complex correlated posteriors and not model complexity. - Finally, our framework, and the vast majority previous frameworks, only consider the mean integral equation and ignore the dynamics of higher-order moments. Using our framework, we can recover these moments from our stochastic process and formulate more accurate likelihoods for model fitting (Page 24) - Note: Rather than mean field can start looking at more complex variance