Top Banner
Space-Time Scan Statistics for Early Warning Systems Martin Kulldorff Department of Ambulatory Care and Prevention Harvard University Medical School and Harvard Pilgrim Health Care
70
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presentation

Space-Time Scan Statistics for Early Warning Systems

Martin Kulldorff

Department of Ambulatory Care and Prevention

Harvard University Medical School

and Harvard Pilgrim Health Care

Page 2: Presentation

Content

• Background on Disease Surveillance

• Purely Spatial Scan Statistics: Brain Cancer in the United States

• Early Warning System using a Space-Time Permutation Scan Statistic: Syndromic Surveillance in New York City

• Various Extensions

Page 3: Presentation

Collaborators

Harvard Medical School: Ken Kleinman, Richard Platt, Katherine Yih

New York City Dep Health: Jessica Hartman, Rick Heffernan, Farzad Mostashari

University of Connecticut: David Gregorio, Zixing Fang

Universidad Federal Minais Gerais: Renato Assunção, Luiz Duczmal

Page 4: Presentation

Importance of Early Disease Outbreak Detection

• Eliminate health hazards

• Warn about risk factors

• Earlier diagnosis of new cases

• Quarantine cases

• Scientific research concerning treatments, vaccines, etc.

• Early detection is especially critical for infectious diseases

Page 5: Presentation

Disease Surveillance

Data Sources• Disease Registries• Reportable Diseases• Electronic Health

Records• Health Insurance

Claims Data• Vital Statistics

(Mortality)

Types of Data• Diagnosed Diseases

• Symptoms (Syndromic Surveillance)

• Lab Test Results• Pharmaceutical Drug

Sales

Page 6: Presentation

Disease Surveillance

Frequency of Analyses

• Daily

• Weekly

• Monthly

• Yearly

Page 7: Presentation

Purely Temporal MethodsFarrington CP, Andrews NJ, Beale AD, Catchpole MA

(1996) A statistical algorithm for the early detection of outbreaks of infectious disease. J R Stat Soc A Stat Soc 159: 547–563.

Hutwagner LC, Maloney EK, Bean NH, Slutsker L, Martin SM (1997) Using laboratory-based surveillance data for prevention: An algorithm for detecting salmonella outbreaks. Emerg Infect Dis 3: 395–400.

Nobre FF, Stroup DF (1994) A monitoring system to detect changes in public health surveillance data. Int J Epidemiol 23: 408–418.

Reis B, Mandl K (2003) Time series modeling for syndromic surveillance. BMC Med Inform Decis Mak 3: 2.

Page 8: Presentation

Three Important Issues

• An outbreak may start locally.

• Purely temporal methods can be used simultaneously for multiple geographical areas, but that leads to multiple testing.

• Disease outbreaks may not conform to the pre-specified geographical areas.

Page 9: Presentation

Why Use a Scan Statistic?With disease outbreaks:

• We do not know where they will occur.

• We do not know their geographical size.

• We do not know when they will occur.

• We do not know how rapidly they will emerge.

Page 10: Presentation

One-Dimensional Scan Statistic

Page 11: Presentation

The Spatial Scan Statistic

Create a regular or irregular grid of centroids covering the whole study region.

Create an infinite number of circles around each centroid, with the radius anywhere from zero up to a maximum so that at most 50 percent of the population is included.

Page 12: Presentation
Page 13: Presentation

For each circle:

– Obtain actual and expected number of cases inside and outside the circle.

– Calculate likelihood function.

Compare Circles:

– Pick circle with highest likelihood function as Most Likely Cluster.

Inference:

– Generate random replicas of the data set under the null-hypothesis of no clusters (Monte Carlo sampling).

– Compare most likely clusters in real and random data sets (Likelihood ratio test).

Page 14: Presentation

Poisson Likelihood Function

[c / μ ]c x [(C-c)/(C- μ)] C-c

c=cases in circle

μ = expected cases in circle

C = total cases

Page 15: Presentation

Spatial Scan Statistic: Properties

– Adjusts for inhomogeneous population density.– Simultaneously tests for clusters of any size and

any location, by using circular windows with continuously variable radius.

– Accounts for multiple testing.– Possibility to include confounding variables, such

as age, sex or socio-economic variables.– Aggregated or non-aggregated data (states,

counties, census tracts, block groups, households, individuals).

Page 16: Presentation

U.S. Brain Cancer Mortality1986-1995

deaths rate* (95% CI)Children (age <20): 5,062 0.75 (0.66-0.83)Adults (age 20+): 106,710 6.0 (5.8-6.2)Adult Women: 48,650 4.9 (4.7-5.0)Adult Men: 58,060 7.2 (7.0-7.5)

* annual deaths / 100,000

Page 17: Presentation

Brain CancerKnown risk factors:• High dose ionizing radiation• Selected congenital and genetic disorders

Explains only a small percent of cases.

Potential risk factors:N-nitroso compounds?, phenols?, pesticides?, polycyclic aromatic hydrocarbons?, organic solvents?

Page 18: Presentation

Adjustments

• Age

• Gender

• Ethnicity (African-American, White, Other)

All subsequent analyses where adjusted for:

Page 19: Presentation

0 2 0 0 4 0 0 6 0 0

M i l e s

S M R2 . 0 7 - 4 2 . 8 2 ( h i g h e s t 1 0 % )1 . 2 0 - 2 . 0 60 . 8 3 - 1 . 1 90 . 5 0 - 0 . 8 2Z e r o c a s e s ( 1 8 6 7 c o u n t i e s )

Brain Cancer Mortality, Children 1986-1995

Page 20: Presentation

15

37

4

2

6

0 2 0 0 4 0 0 6 0 0

M i l e s

R i s k F a c t o r C o l o r K e yH i g h R i s k , N o t S i g n i f i c a n t

Spatial Scan Statistic, Children

Page 21: Presentation

Children: Seven Most Likely Clusters

Cluster Obs Exp RR p= 1. Carolinas 86 51 1.7 0.242. California 16 4.9 3.3 0.743. Michigan 318 250 1.3 0.744. S Carolina 24 10 2.5 0.795. Kentucky-Tenn 127 88 1.4 0.796. Wisconsin 10 2.4 4.1 0.987. Nebraska 12 3.6 3.3 0.99

Page 22: Presentation

Conclusions: Children

No statistically significant clusters detected.

Any part of the pattern seen on the original map may be due to chance.

Page 23: Presentation

What About Adults?

Page 24: Presentation

0 2 0 0 4 0 0 6 0 0

M i l e s

S M R9 . 4 6 - 2 4 . 4 4 ( h i g h e s t 1 0 % )8 . 0 5 - 9 . 4 57 . 2 7 - 8 . 0 46 . 7 2 - 7 . 2 66 . 1 7 - 6 . 7 15 . 6 8 - 6 . 1 65 . 1 9 - 5 . 6 74 . 5 1 - 5 . 1 83 . 4 0 - 4 . 5 0

Z e r o C a s e s ( 3 1 2 c o u n t i e s )

Brain Cancer Mortality, Adults 1986-1995

Page 25: Presentation

1

7

1 0

1 1

3

4

2

6

5 1 2

9

1 3

8

Spatial Scan Statistic: Adults

Page 26: Presentation

1 3

1 2

1 1

1 0

8

6

2

7 5

9

13

4

0 2 0 0 4 0 0 6 0 0

M i l e s

R i s k F a c t o r C o l o r K e yL o w R i s k , p < 0 . 0 5H i g h R i s k , p < 0 . 0 5L o w R i s k , N o t S i g n i f i c a n tH i g h R i s k , N o t S i g n i f i c a n t

Spatial Scan Statistic, Women

Page 27: Presentation

Women: Most Likely Clusters Cluster Obs Exp RR p= 1. Arkansas et al. 2830 2328 1.22 0.00012. Carolinas 1783 1518 1.17 0.00013. Oklahoma et al. 1709 1496 1.14 0.0034. Minnesota et al. 2616 2369 1.10 0.01

10. N.J. / N.Y. 1809 2300 0.79 0.000111. S Texas 127 214 0.59 0.000112. New Mexico et al. 849 1049 0.81 0.0001

Page 28: Presentation

4

2

8

9

1 1

1 2

1 4

6

1 3 3

1 0

5

7

1 51

0 2 0 0 4 0 0 6 0 0

M i l e s

R i s k F a c t o r C o l o r K e yL o w R i s k , p < 0 . 0 5

H i g h R i s k , N o t S i g n i f i c a n tH i g h R i s k , p < 0 . 0 5

Spatial Scan Statistic: Men

Page 29: Presentation

Men: Most Likely Clusters Cluster Obs Exp RR p= 1. Kentucky et al. 3295 2860 1.15 0.00012. Carolinas 1925 1658 1.16 0.00013. Arkansas et al. 1143 964 1.19 0.0014. Washington et al. 1664 1455 1.14 0.0035. Michigan 1251 1074 1.17 0.005

11. N.J. / N.Y. 2084 2615 0.80 0.000112. S Texas 157 262 0.60 0.000113. New Mexico et al.1418 1680 0.84 0.000114. Upstate N.Y. et al.1642 1895 0.87 0.0001

Page 30: Presentation

Conclusions: Adults

It is possible to pinpoint specific areas with higher and lower rates that are statistically significant, and unlikely to be due to chance.

The exact borders of detected clusters are uncertain.

Similar patterns for men and women.

Page 31: Presentation

Conclusion: General

The spatial scan statistic can be useful as an addition to disease maps, in order to determine if the observed patterns are likely due to chance or not.

A complement rather than a replacement for regular disease maps.

Page 32: Presentation

Space-Time Scan Statistic

Use a cylindrical window, with the circular base representing space and the height representing time.

We will only consider cylinders that reach the present time.

Page 33: Presentation

For each cylinder:

– Obtain actual and expected number of cases inside and outside the cylinder.

– Calculate likelihood function.

Compare Cylinders:

– Pick cylinder with highest likelihood function as Most Likely Cluster.

Inference:

– Generate random replicas of the data set under the null-hypothesis of no clusters (Monte Carlo sampling).

– Compare most likely clusters in real and random data sets (Likelihood ratio test).

Page 34: Presentation

For each cylinder:

– Obtain actual and expected number of cases inside and outside the cylinder.

– Calculate likelihood function.

Compare Cylinders:

– Pick cylinder with highest likelihood function as Most Likely Cluster.

Inference:

– Generate random replicas of the data set under the null-hypothesis of no clusters (Monte Carlo sampling).

– Compare most likely clusters in real and random data sets (Likelihood ratio test).

Page 35: Presentation

Space-Time Permutation Scan Statistic

1. For each cylinder, calculate the expected

number of cases conditioning on the marginals

μst = Σscst x Σtcst / C

where cst = # cases at time t in location s

and C = total number of cases

Page 36: Presentation

Space-Time Permutation Scan Statistic

2. For each cylinder, calculate

Tst = [cst / μst ]cst x [(C-cst)/(C- μst)]

C-cst if cst > μst

= 1, otherwise

3. Test statistic T = maxst Tst

Page 37: Presentation

Space-Time Permutation Scan Statistic

4. Generate random replicas of the data set conditioned on the marginals, by permuting the pairs of spatial locations and times.

5. Compare test statistic in real and random data sets using Monte Carlo hypothesis testing (Dwass, 1957):

p = rank(Treal) / (1+#replicas)

Page 38: Presentation

Space-Time Permutation Scan Statistic: Properties

– Adjusts for purely geographical clusters.– Adjusts for purely temporal clusters.– Simultaneously tests for outbreaks of any

size at any location, by using a cylindrical windows with variable radius and height.

– Accounts for multiple testing.– Aggregated or non-aggregated data

(counties, zip-code areas, census tracts, individuals, etc).

Page 39: Presentation
Page 40: Presentation

Let’s Try It!

• Historic data, Nov 15, 2001 – Nov 14, 2002

• Diarrhea, all age groups

• Use last 30 days of data.

• Temporal window size: 1-7 days

• Spatial window size: 0-5 kilometers

• Residential zip code and hospital coordinates

Page 41: Presentation

Results: Hospital Analyses

Date #days #hosp #cases #exp RR p= recurrence intervalA Nov 21 6 1 101 73.6 1.4 0.0008 1 / 3.4 yearsB Jan 11 1 1 10 2.3 4.4 0.0007 1 / 3.9 yearsC Feb 26 4 2 97 66.9 1.4 0.0018 1 / 1.5 yearsD Mar 31 2 1 38 19.2 2.0 0.0017 1 / 1.6 years E Nov 1 6 3 122 86.6 1.4 0.0017 1 / 1.6 years F Nov 2 7 3 135 98.3 1.4 0.0008 1 / 3.4 years

Page 42: Presentation

Results: Residential Analyses

reccurence Date #days #zips #cases #exp RR p= intervalG Feb 9 2 15 63 34.7 1.8 0.0005 1 / 5.5 yearsH Mar 7 2 8 63 37.3 1.7 0.0027 1 / 1.0 years

Page 43: Presentation
Page 44: Presentation

0

20

40

60

80

100

120

140

160

180

200

Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Month

# of

vis

its

A B

G

H

CE,F

2001 2002 ----->

Areas with residential signals

Areas with hospital signals

Citywide

D

Page 45: Presentation

Real-Time Daily Analyses

• Starting November 1, 2003.

• Respiratory, Fever/Flu, Diarrhea, (+Vomiting)

• Hospital (and Residential) Analyses

• Spatial window size: 0-5 kilometers

• Temporal window size: 1-7 days

Page 46: Presentation

Real-Time Results, Nov 24, 2003: Hospital Analysis

Syndrome #days #hosp #cases #exp RR p= recurrence intervalRespiratory 2 3 80 57.4 1.4 0.13 every 8 daysFever/Flu 3 1 24 14.8 1.6 0.68 every dayDiarrhea 2 4 18 8.2 2.2 0.04 every 26 days

Page 47: Presentation

Real-Time Results, Nov 25, 2003: Hospital Analysis

Syndrome #days #hosp #cases #exp RR p= recurrence intervalRespiratory 7 1 45 30.4 1.5 0.46 every 2 daysFever/Flu 1 5 50 31.5 1.6 0.04 every 23 daysDiarrhea 3 4 22 11.5 1.9 0.17 every 6 days

Page 48: Presentation

Real-Time Results, Nov 26, 2003: Hospital Analysis

Syndrome #days #hosp #cases #exp RR p= recurrence intervalRespiratory 5 2 233 199.4 1.1 0.63 every 2 daysFever/Flu 7 7 299 252.1 1.2 0.05 every 22 daysDiarrhea 4 4 23 12.6 1.8 0.22 every 5 days

Page 49: Presentation

Real-Time Results, Nov 27, 2003: Hospital Analysis

Syndrome #days #hosp #cases #exp RR p= recurrence intervalRespiratory 1 4 41 26.9 1.5 0.45 every 2 daysFever/Flu 6 4 181 142.9 1.3 0.03 every 36 daysDiarrhea 5 3 29 14.1 1.7 0.50 every 2 days

Page 50: Presentation

Real-Time Results, Nov 28, 2003: Hospital Analysis

Syndrome #days #hosp #cases #exp RR p= recurrence intervalRespiratory 2 4 98 78.8 1.2 0.82 every dayFever/Flu 7 5 228 178.0 1.3 0.001 every 1000 daysDiarrhea 6 3 29 17.5 1.5 0.26 every 4 days

Page 51: Presentation

Real-Time Results, Nov 29, 2003: Hospital Analysis

Syndrome #days #hosp #cases #exp RR p= recurrence intervalRespiratory 7 2 146 123.6 1.2 0.95 every dayFever/Flu 7 4 253 195.7 1.3 0.001 every 1000 daysDiarrhea 7 4 44 29.4 1.5 0.21 every 5 days

Page 52: Presentation

Real-Time Results, Nov 30, 2003: Hospital Analysis

Syndrome #days #hosp #cases #exp RR p= recurrence intervalRespiratory 1 1 19 10.7 1.8 0.69 every dayFever/Flu 6 9 429 364.1 1.2 0.002 every 500 daysDiarrhea 1 5 12 4.4 2.7 0.06 every 17 days

Page 53: Presentation

SummaryFour strong diarrhea signals:• Two were early signals for city-wide outbreaks likely

due to norovirus. • One was an early signal for a city-wide children

outbreak, likely due to rotavirus.• One small outbreak of unknown etiology.

Three medium strength diarrhea signals:

• All during the rotavirus outbreak, possibly due to a shift in the geographical epicenter

One real-time fever/flu signal, coinciding with the start of the flu season.

Page 54: Presentation

Different Data StreamsFor example:

• Nurses Hotline Calls

• Regular Physician Visits

• Emergency Department Visits

• Ambulance Dispatches

• Pharmaceutical Drug Sales

• Lab Test Results

Page 55: Presentation

Multiple Data Streams

For each cylinder, add the Poisson log

likelihoods: Tst =

log[ T[1]

st ] +log[ T[2]

st ] +log[ T[3]

st ]

Test statistic T = maxst Tst

Page 56: Presentation

Syndromic Surveillance in Boston: Upper and Lower GI

• Harvard Pilgrim Health Care HMO members cared for by Harvard Vanguard Medical Associates

• Historical Data from Jan 1 to Dec 31, 2002• Mimicking Surveillance from Sept 1 to Dec 31,

2002

Page 57: Presentation

Three Data Streams

• Telephone Calls ( ~ 20 / day)

• Urgent Care Visits ( ~ 9 / day)

• Regular Physician Visits ( ~ 22 / day)

Multiple contacts by the same person removed.

Page 58: Presentation

Strongest Signal: October 18

Recurrence Interval

Multiple Data Streams: < 1 / 1000 days

Single Data Streams:

Tele: < 1 / 1000 days

Urgent ~ every day

Regular: ~ every day

Page 59: Presentation

October 18 Signal

• Friday

• Number of Cases: 5

• Expected Cases: 0.04

• Location: Zip Code 01740

• Time Length: One Day

Page 60: Presentation

October 18 Signal

• Friday

• Number of Cases: 5

• Expected Cases: 0.04

• Location: Zip Code 01740

• Time Length: One Day

• Diagnosis: Pinworm Infestation (all 5)

Page 61: Presentation

October 18 Signal

• Friday

• Number of Cases: 5 (all tele)

• Expected Cases: 0.04

• Location: Zip Code 01740

• Time Length: One Day

• Diagnosis: Pinworm Infestation (all 5)

• Same Family: Mother, Father, 3 Kids

Page 62: Presentation

Limitations• Space-time clusters may occur for other reasons

than disease outbreaks

• Automated detection systems does not replace the observant eyes of physicians and other health workers.

• Epidemiological investigations by public health department are needed to confirm or dismiss the signals.

Page 63: Presentation

Scan Statistics for Irregular Shaped Clusters

Duczmal, Assunção. A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistic and Data Analysis, 2004.

Patil, Talllie. Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 2004.

Iyengar. Space-time clusters with flexible shapes. Morbidity and Mortality Weekly Report, 2005.

Tango, Takahashi. A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geographics, 2005.

Assunção, Costa, Tavares, Ferreira. Fast detection of arbitrarily shaped disease clusters. Statistics in Medicine, 2006.

Page 64: Presentation

Probability Models

• Poisson model (e.g. incidence, mortality)

• Bernoulli model (e.g. case-control data)

• Normal model (e.g. weight, blood lead levels)

• Exponential model (e.g. survival data)

• Ordinal model (e.g. early, medium and late stage cancer)

• Space-time permutation model (when only case data is available)

Page 65: Presentation

Application Areas

• Chronic Diseases• Infectious Diseases

• Health Services• Accidents• Brain Imaging

• Toxicology• Veterinary Medicine

• Psychology• Demography

• Criminology• History• Archeology

• Ecology

Page 66: Presentation

Examples of ApplicationsBeato Filho, Assunção, Silva, Marinho, Reis, Almeida. Homicide

clusters and drug traffic in Belo Horizonte, Minas Gerais, Brazil from 1995 to 1999. Cadernos de Saúde Pública, 2001.

Pellegrini. Analise espaço-temporal da leptospirose no municipio do Rio de Janeiro. Fiocruz, 2002.

Andrade, Silva, Martelli, Oliveira, Morais Neto, Siqueira Junior, Melo, Di Fabio. Population-based surveillance of pediatric pneumonia: use of spatial analysis in an urban area of Central Brazil. Cadernos de Saúde Pública, 2004.

Ceccato. Homicide in São Paulo, Brazil: Assessing spatial-temporal and weather variations. J Environmental Psychology, 2005.

Simões, Mendes, Marques, Pereira, Bagagli. Spatial clusters of paracoccidioido-mycosis in southeastern Brazil. Revista do Instituto de Medicina Tropical de São Paulo, 2005.

Page 67: Presentation

SaTScan Software

Free. Download from www.satscan.org

Registered users in 116 countries:1. USA2. Canada3. United Kingdom4. Brazil5. Italy. . . 100s. Albania, Bhutan, Burma, Fiji, Grenada, Guinea,

Iraq, Macao, Madagascar, Malawi, Malta, etc

Page 68: Presentation

Future Topics

• Irregular shaped clusters

• Non-Euclidean neighbor definitions

• Multivariate data

• Multiple locations per observation

• Computational speed

Page 69: Presentation

AcknowledgementResearch funded by:

Alfred P Sloan Foundation

Centers for Disease Control and Prevention

Massachusetts Department of Health

National Cancer Institute

National Institute of Child Health and Development

National Institute of General Medical Sciences:Modeling Infectious Disease Agent Study (MIDAS)

Page 70: Presentation

References

Kulldorff. A spatial scan statistic. Communications in Statistics, Theory and Methods. 26:1481-1496, 1997.

Fang, Kulldorff, Gregorio: Brain cancer in the United States 1986-1995, A Geographical Analysis. Neuro-Oncology, 6:179-187, 2004.

Kulldorff, Heffernan, Hartman, Assunção, Mostashari. A space-time permutation scan statistic for disease outbreak detection. PLoS Medicine, 2(3):e59, 2005.

Kulldorff, Mostashari, Duczmal, Yih, Kleinman, Platt. Multivariate spatial scan statistics for disease surveillance. Statistics in Medicine, 2006, in press.

Kulldorff and IMS Inc. SaTScan v.7.0: Software for the spatial and space-time scan statistics, 2004. Free: http://www.satscan.org/