Top Banner
Chain Binomial Model in TranStat Yang Yang Department of Biostatistics & Emerging Pathogens Institute University of Florida
43

New Chain Binomial Model in TranStat · 2020. 7. 14. · Chain-binomial likelihood models, e.g., Rampey et. al.(1992), Yang et. al. (2006, 2008). Transmission-network-based survival

Oct 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Chain Binomial Model in TranStat

    Yang Yang

    Department of Biostatistics & Emerging Pathogens InstituteUniversity of Florida

  • Data Hierarchy

    • Number of infections and susceptibles at the beginningand the end of epidemic (Final-size binomial models).

    • Regression models• Generalized Estimating Equations• Mixed effects logistic regression

    • Iterative binomial models• Constant infectious period: Longini and Koopman (1982)• Variable infectious period: Ball (1989), Cheryl et. al. (1991).

    • Disease onset times plus disease natural history.• Chain-binomial likelihood models, e.g., Rampey et.

    al.(1992), Yang et. al. (2006, 2008).• Transmission-network-based survival models, e.g., Kenah

    (2010)

    • Contact structure and/or exposure history

  • Data Hierarchy (continue)

    • Laboratory confirmation of a single pathogen.• Chain-binomial likelihood model with EM-MCEM or

    Bayesian model, e.g., Cauchemez et. al. (2004), Yang et.al. (2008, 2012).

    • Laboratory confirmation of multiple co-circulatingpathogens.

    • Bayesian transmission models, e.g., Auranen et. al. (2000),Yang et. al. (2010, 2019).

  • Contact-based modeling

    • A contact is specific to• A time unit (e.g., a day)• A setting of mixing (e.g., household)

    • Types of contacts (source of transmission)• Person-to-person (P2P):

    Close contact with specific infectious individuals in mixinggroups, e.g., households, neighborhoods, schools,hospitals.

    • Common-source-to-person (C2P) contact:Contact with unobserved nonspecific individuals, zoonoticsources, or environmental reservoir.

  • InfectiveSusceptible

    1pθφbθ

    b

    1pθ

    1pCommon

    Source

    1pφ

    Household 1

    Transmission Patterns and Parameters of Interest

    Susceptible

    2pθφ

    2pθ

    2p

    2pφ

    Household 2

  • Time of Infection

    Incubation period

    Infectious period

    0.2

    0.6

    0.2

    0.30.7

    0.1

    1.0

    Onset time of symptoms and infectiousness

    1 2 3 4 5 6

    1 2 3 4 5 6

    Days

    Days

    Probability of symptom onset on day �̃�𝑡 given infection on day t .

    Probability of host being infective on day t given symptom onset on day t .

    t

    1.0 1.0

    Pr �̃�𝑡 𝑡𝑡 = 𝑓𝑓(�̃�𝑡 − 𝑡𝑡)

    Pr(𝑡𝑡|�̃�𝑡) = 𝑔𝑔(𝑡𝑡 − �̃�𝑡)

    𝑓𝑓(�̃�𝑡 − 𝑡𝑡):𝑔𝑔(𝑡𝑡 − �̃�𝑡):

    𝑙𝑙𝑚𝑚𝑖𝑖𝑖𝑖 ≤ �̃�𝑡 − 𝑡𝑡 ≤ 𝑙𝑙𝑚𝑚𝑚𝑚𝑚𝑚

    �̃�𝑡

  • Chain binomial model (continue)

    Probability that the common source or person j infects i in day t:

    logit(bit) = logit(b) + x′itβb

    logit(pijt) =

    logit(p1) + x′ijtβp1 , Hi = Hjlogit(p2) + x′ijtβp2 , Hi 6= Hj,Ni = Nj0, otherwise.

    An example of covariate adjustment:• let ri be vaccine status of person i and be the only

    covariate.• logit(pijt) = logit(pk) + riθ + rjφ+ rirjψ• VES = 1− eθ, VEI = 1− eφ and VET = 1− eθ+φ+ψ.

  • Chain binomial model (continue)

    Probability of escaping infection on day t, t = 1, . . . ,T:

    eit =(1− bit

    ) N∏j=1

    (1− pijt g(t − t̃j)

    )Probability of escape up to day t: Qit =

    ∏tτ=1 eiτ

    Probability of escape before and infection on day t:Uit = Qi,t−1(1− eit)let ti = t̃i − lmax, ti = t̃i − lmin. Likelihood contributed by i is:

    Li =

    QiT , escapedUit, if known to be infected on t∑ti

    t=ti Uit × f (̃ti − t), if symptom onset on t̃i

    Secondary attack rate: SARk = 1−∏D−1

    d=0

    [1− pk g(d)

    ], k = 1, 2.

    Effective reproductive number: R0 =∑

    k nk × SARk.

  • Missing Data Patterns

  • Mechanisms of Generating Missing Data

    Suppose data (Xi,Yi,Mi) is generated (iid across all i) viaf (Xi,Yi,Mi|θ, φ) = f (Xi,Yi|θ)f (Mi|Xi,Yi, φ), i = 1, . . . ,N. SupposeXi is always observed, and Yi is sometimes observed asindicated by Mi (1=missing, 0=observed). Can we estimateE(Y|X = x) by 1∑n

    i=1 1(Mi=0,Xi=x)∑n

    i=1 1(Mi = 0,Xi = x)Yi?

    • Missing completely at random (MCAR):f (Mi|Xi,Yi, φ) = f (Mi|φ). Clearly, E(Yi|Xi,Mi = 0) = E(Yi|Xi)because f (Yi|Xi,Mi) = f (Yi|Xi).

    • Missing at random (MAR): f (Mi|Xi,Yi, φ) = f (Mi|Xi, φ). Westill have E(Yi|Xi,Mi = 0) = E(Yi|Xi) because

    f (Yi|Xi,Mi) =f (Yi,Xi,Mi)

    f (Xi,Mi)=

    f (Xi,Yi)f (Mi|Xi,Yi)f (Xi)f (Mi|Xi)

    = f (Yi|Xi).

    • Missing not at random (MNAR): f (Mi|Xi,Yi, φ) cannot befurther reduced.

  • Methods for handling Missing Data

    • Using completely observed data only. Valid under MCAR,but not necessarily under MAR, e.g., using

    1∑ni=1 1(Mi=0)

    ∑ni=1 1(Mi = 0)Yi to estimate E(Y).

    • Weighting by inverse probability of not missing. Valid underboth MCAR and MAR.

    • One-time imputation (hot deck imputation, meanimputation, carry-forward) and multiple imputation.

    • Model-based approaches to integrate out missing values,e.g., Expectation-Maximization algorithm, Markov chainMonte Carlo.

  • Likelihood Model with Data Augmentation

    Augmenting data with unobserved quantities (Yang,Longini and Halloran, 2007)• Pairwise transmission outcome Yji(t) (1:transmission,

    0:escape).• Yji(t) is defined only if Yji(τ) = 0 for all τ < t.• Yji(t) is not observed when j is infectious and ti ≤ t ≤ ti.• Yji(t) is independent of Yki(t) for the same day t.• More convenient to work with

    Zji(t) = Yji(t)∏

    k∈Di,τ

  • Symptom onsetPotential transmissions not observable

    1

    2

    4

    3

    1 1 1

    2 2 2

    3 3 3

    4 4 4

    c c c

    Exposure Outcome(1: transmission, 0:escape)

    Expected

    Frequencyi j←

    1 2←

    1 c←

    0

    1

    0

    1

    𝑃𝑃𝑃𝑃 )�̅�𝑍21 �̃�𝑡1 − 1 | 𝑆𝑆1(�̃�𝑡1

    𝑃𝑃𝑃𝑃 )𝑍𝑍21 �̃�𝑡1 − 1 | 𝑆𝑆1(�̃�𝑡1

    𝑃𝑃𝑃𝑃 )�̅�𝑍c1 �̃�𝑡1 − 1 | 𝑆𝑆1(�̃�𝑡1

    𝑃𝑃𝑃𝑃 )𝑍𝑍c1 �̃�𝑡1 − 1 | 𝑆𝑆1(�̃�𝑡1

  • Likelihood Model with Data Augmentation (continue)

    Let Zi(t) = maxj∈Di Zji(t) indicates if Zji(t) = 1 for any j on day t.The likelihood of the augmented data is

    Li(b, p, θ, φ|̃tj,Zji(t), Z̄ji(t), j ∈ Di, t ≤ T)

    =T∏

    t=1

    {g(̃ti|t)Zi(t)

    ∏j∈Di

    (pji(t)

    )Zji(t)(1− pji(t))Z̄ji(t)},The log-likelihood is

    log(Li(b, p, θ, φ|̃tj,Zji(t), Z̄ji(t), j ∈ Di, t ≤ T)

    )∝

    T∑t=1

    ∑j∈Di

    {Zji(t) log

    (pji(t)

    )+ Z̄ji(t) log

    (1− pji(t)

    )},

  • Likelihood Model with Data Augmentation (continue)

    The E-M algorithm: Define the following events• Si(t): i has symptom onset on day t.• Ii(t): i is infected on day t.• Iji(t): j infects i on day t.

    whose probabilities are given by

    Pr[Iji(t)] = Q̂i(t − 1)p̂ji(t)Pr[Ii(t)] = Q̂i(t − 1)

    {1− êi(t)

    },

    Pr[Si(̃ti)] =t̄i∑

    τ=ti

    f (̃ti − τ)× Pr[Ii(τ)],

  • Likelihood Model with Data Augmentation (continue)

    The conditional distributions of Zji(t) and Z̄ji(t) are

    Pr(Zji(t) = 1|b, p, θ, φ, t̃i) =

    {Pr[Iji(t)]×f (̃ti−t)

    Pr[Si (̃ti)], ti ≤ t ≤ t̄i,

    0, otherwise,

    and

    Pr(Z̄ji(t) = 1|b, p, θ, φ, t̃i)

    =

    f (̃ti−t)×

    {Pr[Ii(t)]−Pr[Iji(t)]

    }Pr[Si (̃ti)]

    +∑t̄i

    τ=t+1f (̃ti−τ)×Pr[Ii(τ)]

    Pr[Si (̃ti)], ti ≤ t ≤ t̄i,

    1, t < ti.0, otherwise.

  • EM-MCEM: Account for Missing Outcomes

    • Independent clusters (households) each of size nh,h = 1, · · · ,H.

    • Let t̃i be infectiousness onset time if i is infected. This isalso the symptom onset time if i is a symptomatic case.

    • partition the population into four final states according to zi(preseason immune status), yi (infection status), and si(symptom status):

    1 prior immunity (zi = 1, yi = 0, si = 0)2 susceptible, but escaped infection (zi = 0, yi = 0, si = 0)3 symptomatic infection (zi = 0, yi = 1, si = 1), t̃i observed.4 asymptomatic infection (zi = 0, yi = 1, si = 0), t̃i not

    observed.

    • ui = (zi, yi, si, t̃i) is the complete individual data.

  • • zi ∼ Bernolli(α), α: proportion of pre-immunity.• si ∼ Bernolli(φ), φ: probability of symptoms if infected.• Covariates are adjusted for via logistic regressions:

    logit(αi) = logit(α) + xτi βα, logit(φit) = logit(φ) + xτitβφ,

    logit(bit) = logit(b) + xτitβb, logit(pijt) = logit(p) + xτijtβp.

    • Probabilities of escaping infection on and up to day t areeit = (1− bit)

    ∏j:cj=ci

    (1− θ1−sjpijt g(t − t̃j + 1)

    )and

    Qit =∏t

    l=1 eil• Let ψ = {b, p, α, φ,βα,βφ,βb,βp}. The likelihood

    contributed by individual i:

    L(i)(ψ|ui) = αzii (1− αi)1−zi ×{QiT 1−yi

    (∑t

    f (̃ti − t)Qi(t−1)(1− eit

    )(φit)si (1− φit)1−si)yi}1−zi .

  • For each household h define collections of observed andmissing data• Oh = {ui : ci = h and ui is completely observed},• Uh = {ui : ci = h and ui is not completely observed},

    For the population, define O = {Oh : h = 1, . . . ,H} andU = {Uh : h = 1, . . . ,H}.Assuming independence between households, thehousehold-level and population-level likelihoods based on thecomplete data are

    Lh(ψ|Oh,Uh) =∏

    i:ci=h

    L(i)(ψ|ui)

    and

    L(ψ|O,U) =H∏

    h=1

    Lh(ψ|Oh,Uh).

  • • Imputation of Uh is performed at the household level.• Let {U?hk : k = 1, . . . , δh} be the collection of all possible

    realizations of Uh for household h, h = 1, . . . ,H.• δh = 1 if Uh is empty. For non-empty Uh, we expect δh > 1.

    Example: If t̃i of individual i of household h is the onlyunobserved quantity in that household, then the range ofUh is determined by 1 + l ≤ t̃i ≤ T and δh = T − l, where lhere is the minimum duration of the latent period.

    • δh could be too large for the EM algorithm to enumerate allpossibilities.

    • We propose to use EM whenever affordable, and MonteCarlo EM otherwise. The MCEM is based on importancesampling (Levine and Casella, 2001).

  • Algorithm

    1 Choose J to partition the households into three groups:∆OBS = {h : δh = 1}, ∆EM = {h : 1 < δh < J}, and∆MCEM = {h : δh ≥ J}.

    2 Choose a large integer K to be the number of importancesamples for the MCEM algorithm.

    3 Choose an initial value ψ(0) for ψ. For householdh ∈ ∆MCEM, draw K samples of Uh from Pr(Uh|Oh,ψ(0))using a MCMC algorithm, and let these samples be Ûhk,k = 1, . . . ,K.

    4 Set ψ̂(0)

    = ψ(0)

  • 5 At iteration r ≥ 0,-i- Update the conditional probabilities for all h ∈ ∆EM:

    λ(r)hk =

    Lh(ψ̂(r)|Oh,U?hk)∑δh

    l=1 Lh(ψ̂(r)|Oh,U?hl)

    , k = 1, . . . , δh,

    and the importance weights for all h ∈ ∆MCEM:

    ω(r)hk =

    Lh(ψ̂(r)|Oh, Ûhk)

    Lh(ψ(0)|Oh, Ûhk)

    , k = 1, . . . ,K.

    -ii- Maximize

    Ω(ψ, ψ̂(r)

    ) =∑

    h∈∆OBS

    ln Lh(ψ|Oh) +∑

    h∈∆EM

    δh∑k=1

    λ(r)hk ln Lh(ψ|Oh,U

    ?hk)

    +∑

    h∈∆MCEM

    1

    ω(r)h·

    K∑k=1

    ω(r)hk ln Lh(ψ|Oh, Ûhk)

    with regard to ψ to find ψ̂(r+1)

    , where ω(r)h· =∑K

    k=1 ω(r)hk .

    Repeat this step until convergence in the estimates of ψ,and denote the final estimate by ψ̂.

  • Variance estimation

    • For point estimation, one maximizes at each EM iteration

    EU|O,ψ̂(r)

    ln L(ψ|O,U) =H∑

    h=1

    EUh|Oh,ψ̂

    (r) ln Lh(ψ|Oh,Uh).

    • Variance estimation requires evaluation of

    EU|O,ψ̂

    (d ln L(ψ|O,U)dψ

    )(d ln L(ψ|O,U)dψ

    )τ6=(

    EU|O,ψ̂d ln L(ψ|O,U)

    )×(

    EU|O,ψ̂d ln L(ψ|O,U)

    )τ.

    • we generate K new importance samples based on anyparameter value ψ̃ ≈ ψ̂ for all households in ∆EM ∪∆MCEM.

    • If K is sufficiently large, the algorithm should converge tothe same final estimates ψ̂.

  • • Denote the new importance samples by Ũhk, k = 1, . . . ,K,h ∈ ∆EM ∪∆MCEM.

    • Let Ũ·k = {Ũhk : h = 1, . . . ,H}. andω̃k = L(ψ̂|O, Ũ·k)/L(ψ̃|O, Ũ·k), k = 1, . . . ,K.

    • Ignoring the MC error, the covariance matrix of ψ̂ isestimated by (Louis, 1982):

    V̂−1

    (ψ̂, ψ̃, Ũ)

    =( 1∑K

    k=1 ω̃k

    K∑k=1

    ω̃kd ln L(ψ|O, Ũ·k)

    )( 1∑Kk=1 ω̃k

    K∑k=1

    ω̃kd ln L(ψ|O, Ũ·k)

    )τ−

    1∑Kk=1 ω̃k

    K∑k=1

    ω̃k

    {d2 ln L(ψ|O, Ũ·k)

    dψ2+(d ln L(ψ|O, Ũ·k)

    )(d ln L(ψ|O, Ũ·k)dψ

    )τ}

    |ψ=ψ̂

    .

  • TranStat: an efficient tool for outbreak analysis

    • Any number of b’s and p’s.• Adjust for any number of time-dependent and time

    independent covariates.• Flexible common-source-to-person (c2p) and

    person-to-person (p2p) contact structures.• Account for preseason immunity and asymptomatic

    infections• Assess goodness of fit.• Written in C, and optimized for computational efficiency.

  • Input file I: "pop.dat", Population profile

    • One line per person.• Both individual ID and community ID are numbered

    0, 1, 2, · · · .• Weight can be used for epidemic curve data.

    Ind. Comm. Pre- Inf. Sym. Sym. Cens. Cens. Idx Imm. Pat. Wt. IgnoreID ID Season Stat. Stat. Onset Stat. Day Case Grp. Grp.

    Imm. Day0 0 0 1 1 13 0 -1 1 0 0 1 01 0 0 0 0 -1 1 40 0 0 0 1 02 0 0 0 0 -1 0 -1 0 0 0 1 03 0 0 0 0 -1 0 -1 0 0 0 1 04 0 0 0 0 -1 0 -1 0 0 0 1 05 1 0 1 1 15 0 -1 0 0 0 1 06 1 0 1 1 13 0 -1 1 0 0 1 07 1 0 0 0 -1 1 45 0 0 0 1 0...

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

    † Abbreviations: comm.=community, imm=immunity, inf=infection, stat=status,sym=symptom, cens=censoring, grp=group, wt=weight.

  • Input file II: "community.dat", Community profile

    • For prospective design, starting and stopping daysgenerally cover the duration of epidemic.

    • For case-ascertained design, the starting day should be afew days before the symptom onset day of the index case,at least covering the maximum duration of the incubationperiod.

    Community Start StopID Day Day0 1 1001 1 1002 1 1203 1 120...

    ......

  • Input file III: "time_ind_covariate.dat"time-independent covariates

    • One line per individual.• put as many covariates as you want, but remember the

    order.

    Individual Pre-seasonID Age Gender HI Titer · · ·0 10 1 20 · · ·1 11 0 20 · · ·2 35 0 40 · · ·3 37 1 80 · · ·...

    ......

    .... . .

  • Input file IV: "time_dep_covariate.dat"time-dependent covariates

    • One line per individual per time unit.• Variables are numbered after time-independent ones.

    Individual Start Stop Antiviral ViralID Day Day Treatment Shedding · · ·0 1 12 0 1 · · ·0 13 45 1 1 · · ·1 1 25 0 0 · · ·1 26 45 1 0 · · ·...

    ......

    ......

    . . .

  • Input file V: "c2p_contact.dat"common source to person contact history

    • Community-specific or individual-specific format,depending on whether all individuals in the samecommunity share the same c2p contact profile.

    • Contact modes are numbered 0, 1, 2, · · · .• Offsets reflect variation in the infectivity level of the

    common source, e.g., log(daily number of infectiouspeople).

    Community Start Stop Contact Offset Ignore/Individual ID Day Day Mode0 1 22 0 0.8 00 23 30 1 1.0 01 1 30 0 1.0 02 1 10 0 1.5 02 11 30 1 0.8 0...

    ......

    ......

    ...

  • Input file VI: "p2p_contact.dat"person-to-person contact history

    • Only need to list contacts during the infectious periods ofcases

    Community-specific

    Community Individual Start Stop Contact Offset IgnoreID ID Day Day Mode0 0 5 11 0 0 00 1 20 26 1 0 01 8 2 10 0 0 01 10 6 12 0 0 0...

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

    Or, individual-specific

    Start Stop Person 1 Person 2 Contact Offset IgnoreDay Day ID ID Mode5 11 1 0 0 0 05 11 2 0 0 0 05 11 3 0 0 0 02 8 4 5 0 0 02 8 6 5 0 0 0...

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

  • Input file VII: "impute.dat", imputation profile

    • Four possibilities: preseason immunity, non-infection,symptomatic infection, asymptomatic infection.

    • For symptomatic infection, give first and last possible daysof illness onset.

    • For asymptomatic infection, give first and last possibledays of peak infectivity.

    Person Preseason Non- Sym. First Last Asym. First LastID Immunity Infected Inf. Possible Possible Inf. Possible Possible

    Day Day Day Day274 0 0 0 -1 -1 1 12 41374 0 0 0 -1 -1 1 12 40375 0 0 0 -1 -1 1 12 40436 0 0 0 -1 -1 1 12 38531 0 0 0 -1 -1 1 12 38...

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

    .

    .

    ....

  • Input file VIII: "config.file"Model configuration file

    • Natural history of disease, i.e., incubaiton and infectiousperiods.

    • Parameters to be estimated, e.g.,• Numbers of c2p, p2p contact modes.• Numbers of time-independent and time-dependent

    covariates.• Covariates affecting susceptibility (c2p/p2p), infectiousness

    (p2p), or interaction (p2p).• Define equivalence classes of parameters. Parameters in

    the same class are equal.• Specify which parameters have fix values.

  • • Choose EM-MCEM algorithm if there is uncertainty ininfection/disease outcome.

    • Number of burn-in runs• Number of importance samples• Threshold for MCEM activation

    • Statistical adjustment: selection bias or right censoring.• Simulate epidemics to check goodness of fit or not.• Optimization options.• Many more.

  • Outputs of TranStat

    • Estimates, SD, 95% CIs and p-values of parameters.• Transmission probabilities: b and p.• Secondary attack rate: SARk = 1−

    ∏t

    (1− pkg(t|̃t)

    ).

    • Local reproductive number: R =∑

    k NkSARk.• Odd ratios, i.e., exp(αS), exp(βS), exp(βI) and exp(βSI).• Two output format: detailed vs. simplified.

    • p-value for testing existence of person-to-persontransmission.

    • Goodness of fit: observed and fitted daily numbers ofinfections, together with simulated bounds.

  • Case study 1: Contact-tracing data of COVID-19 in Guangzhou,China

    • Timeline: Jan 7 - Feb 18, 2020.• 195 close contact groups, 215 primary cases, 134

    secondary cases, 1964 uninfected contacts.• Among the 349 cases, 19 (5.4%) were asymptomatic.• 153 (73%) primary cases and 66 (46%) secondary cases

    were imported.• Main goals:

    • Estimate household SAR and non-household SAR;• Evaluate age and gender effects on susceptibility and

    infectivity;• Assess infectivity during the incubation (preclinical) period.

  • Household secondary case

    Non-household secondary case

  • 16 

    Table 1. Demographic compositions of the study population stratified by case type (primary, secondary and non‐case) and contact type (household [HH] and non‐household). Contact type is determined by relationship with the primary cases of each close contact group. Percentages are enclosed in parentheses. The data‐based secondary attack rate is calculated as the number of secondary cases divided by the sum of secondary cases and non‐cases. 

    Definition of household  Factor  Category 

    Primarycases 

    Secondary cases Non‐cases Overall 

    Data‐based SAR (%)HH Non‐HH HH  Non‐HH Household Non‐household 

    Close relatives 

    Age group 

    6  55 (26) 34 (33) 8 (26) 379 (56)  409 (32) 885 (38) 8∙2 (5∙8, 11∙3) 1∙9 (0∙83, 3∙7) 

    Origin  Imported  158 (73) 59 (57) 3 (10)Local  57 (27) 44 (43) 28 (90)

    Total  215 (100) 103 (100) 31 (100) 681 (100) 1283 (100) 2313 (100) 13∙2 (10∙9, 15∙7) 2∙4 (1∙6, 3∙3) Residential address 

    Age group 

    6  27 (13) 14 (15) 9 (22) 140 (31) 324 (21) 514 (22) 9∙1 (5∙1, 14∙8) 2∙7 (1∙2, 5∙1) 

    Origin  Imported  158 (73) 56 (60) 6 (15)Local  57 (27) 37 (40) 35 (85)

    Total    215 (100) 93 (100) 41 (100) 449 (100) 1515 (100) 2313 (100) 17∙2 (14∙1, 20∙6) 2∙6 (1∙9, 3∙6) ‡ Secondary cases and non‐cases in each CCG were allocated to January or February of 2020 according to the proportion of the primary cases’ infectious periods falling in January vs. that in February. 

  • 17 

    Table 2. Model‐based estimates (and 95% confidence intervals) of secondary attack rates among household and non‐household contacts, and model‐based estimates of the local reproductive number (local 𝑅) with and without quarantine. Estimates are reported using two different definitions of household contact (close relatives or individuals sharing the same residential address) and for selected settings of the natural history of disease. This model is not adjusted for age group, epidemic phase or household size. 

    Definition of household 

    Parameter Setting 

    Mean incubation period = 5 days  Mean incubation period = 7 days 

    Max infectious period  = 13 days 

    Max infectious period  = 22 days 

    Max infectious period    = 13 days 

    Max infectious period   = 22 days 

    Close relatives  SAR (%)  Household  12∙4 (9∙8, 15∙4)  15∙5 (11∙7, 20∙2)  11∙4 (9∙0, 14∙2)  13∙1 (9∙9, 17∙1) 

    Non‐household 7∙9 (5∙3, 11∙8)  10∙4 (6∙7, 15∙8)  7∙5 (5∙0, 11∙2)  8∙9 (5∙7, 13∙6) 

    Local 𝑅  With quarantine 0∙50 (0∙41, 0∙62)  0∙51 (0∙39, 0∙66)  0∙51 (0∙41, 0∙63)  0∙51 (0∙39, 0∙67) 

    No quarantine  0∙60 (0∙49, 0∙74)  0∙76 (0∙59, 1∙00)  0∙56 (0∙45, 0∙69)  0∙65 (0∙49, 0∙85) 

    Residential address 

    SAR (%)  Household  17∙1 (13∙3, 21∙8)  21∙2 (15∙8, 27∙8)  16∙1 (12∙5, 20∙4)  18∙3 (13∙6, 24∙1) 

    Non‐household 7∙3 (5∙4, 9∙9)  9∙3 (6∙5, 13∙1)  6∙8 (5∙0, 9∙2)  7∙8 (5∙5, 11∙0) 

    Local 𝑅  With quarantine 0∙50 (0∙40, 0∙61)  0∙50 (0∙38, 0∙65)  0∙50 (0∙41, 0∙62)  0∙51 (0∙39, 0∙66) 

    No quarantine  0∙59 (0∙48, 0∙72)  0∙74 (0∙57, 0∙96)  0∙55 (0∙45, 0∙67)  0∙63 (0∙48, 0∙82) 

  • 19

    Table 4. Model-based odds ratios (and 95% confidence intervals) for the effects of age group and epidemic phase (Feb. vs. Jan.) on susceptibility and relative infectivity during the illness period compared to the incubation period. Estimates are reported using two different definitions of household contact (close relatives or individuals sharing the same residential address) and for selected settings of the natural history of disease. This model is adjusted for age group, epidemic phase, and household size.

    Definition of household

    contact Parameter Odd ratio

    Mean incubation period = 5 days Mean incubation period = 7 days

    Max infectious period = 13 days

    Max infectious period = 22 days

    Max infectious period = 13 days

    Max infectious period = 22 days

    Close relatives Susceptibility Age group

  • Case study 2: Estimate Rt of COVID-19 in Wuhan, China

    • Timeline: Dec 7, 2019 - Jan 26, 2020.• 8866 probable cases in 30 provinces of China, 4021

    (45%)lab-confirmed.• 3731 probable cases (1664 lab-confirmed) in Wuhan.• Mean age was 48 years (SD=16 years).• Main goals: To estimate time-varying effective reproduction

    number Rt.

  • IntroductionSimple chain binomial modelsMissing Data Patterns and MechanismsData AugmentationEM-MCEM algorithmCase studies