Top Banner
Scientific Schedule and Program Abstracts WNAR/IMS and Graybill Conference Ft Collins Co, June 17-21, 2012 1
121

Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

May 15, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Scientific Schedule andProgram Abstracts

WNAR/IMS and Graybill ConferenceFt Collins Co, June 17-21, 2012

1

Page 2: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Monday, 18 June 2012

WNAR invited 1 Room: Clark A201

Statistical immunomicsOrganizer: Raphael Gottardo; Fred Hutchinson Cancer Research Center

8:30-8:55 Analytical strategies for high dimensional immunomics data(87)Ann Oberg; Mayo Clinic

8:55-9:20 Model-Based Sieve Analysis(19)Paul Edlefsen; Fred Hutchinson Cancer Research Center

9:20-9:45 MIMOSA: Mixture Models for Single Cell Assays With Applica-tions to Vaccine Studies(61)Greg Finak; Fred Hutchinson Cancer Research Center

9:45-10:10 Discussion

10:10-10:20 Floor Discussion

Graybill Invited Session Room: Clark A202

Spatial and Spatio-Temporal Statistical ModelingOrganizer: Mevin Hooten; Colorado State University

8:30-8:55 Multivariate multilevel latent Gaussian process model to evaluatewetland condition(109)Jennifer Hoeting; Colorado State University

8:55-9:20 Modeling Resource Selection and Space Usage with SpatialCapture-Recapture Models(111)Andy Royle; USGS Patuxent Wildlife Research Center

9:20-9:45 Multivariate Nonlinear Spatio-Temporal Dynamic Models for Eco-logical Processes(110)Chris Wikle; University of Missouri

9:45-10:10 A spatial model for large geophysical datasets(107)Stephan Sain; NCAR

10:10-10:20 Floor Discussion

2

Page 3: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Monday, 18 June 2012

Student Paper 1 Room: Clark A203

Student paper competitionOrganizer: Brandie Wagner (committee chair); University of Colorado Denver

8:30-8:55 Adjusting for time-dependent sensitivity in an illness-deathmodel(68)Elizabeth Teeple; University of Washington

8:55-9:20 Diagnostic Methods on Multiple Diagnostic Tests Without a GoldStandard(93)Jingyang Zhang; Fred Hutchinson Cancer Research Center

9:20-9:45 An Alternative Characterization of Hidden Regular Variation inJoint Tail Modeling(63)Grant Weller; Colorado State University

9:45-10:10 Information in a two stage adaptive optimal design(46)Adam Lane; University of Missouri

10:10-10:20 Floor Discussion

Contributed 1 Room: Clark A204

Study and Trial Design #1Organizer: Program;

8:30-8:55 Bayesian inference for the finite population total from a het-eroscedastic probability proportional to size sample(100)Sahar Zangeneh; University of Michigan

8:55-9:20 Using Audit Information to Adjust Parameter Estimates for DataErrors in Clinical Trials(38)Pamela Shaw; Biostatistics Research Branch, NIAID/NIH

9:20-9:45 Construction of simultaneous confidence intervals for ratios ofmeans of lognormal distributions using two-step method of vari-ance estimates recovery(56)Amany Abdel-Karim; Tanta University

9:45-10:10 10:10-10:20 Floor Discussion

3

Page 4: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Monday, 18 June 2012

WNAR invited 2 Room: Clark A201

Advances in variable selection methods for high-dimensional dataOrganizer: Mahlet Tadesse; Georgetown University

10:30-10:55 Invisible Fence Methods and the Identification of DifferentiallyExpressed Gene Sets(34)J. Sunil Rao; University of Miami

10:55-11:20 Bias-variance trade-off in estimating posterior probabilities forvariable selection(37)Joyee Ghosh; The University of Iowa

11:20-11:45 A stochastic partitioning method to associate high-dimensional re-sponse and covariate data(15)Mahlet Tadesse; Georgetown University

11:45-12:10 Bayesian Kernel Based Modeling and Selection of Genetic Path-ways and Genes for Cancer Studies(35)sounak chakraborty; University of Missouri-Columbia

12:10-12:20 Floor Discussion

IMS invited 1 Room: Clark A202

Advances in Applied Spatial StatisticsOrganizer: Ethan Anderes; University of California Davis

10:30-10:55 A test for stationarity of spatio-temporal random fields on planarand spherical domains(11)Mikyoung Jun; Texas A&M University

10:55-11:20 Spatial analysis of areal unit data(48)Debashis Mondal; University of Chicago

11:20-11:45 Optimally-Smoothed Maps of Pollution Source Potential via Par-ticle Back Trajectories and Filtered Kriging(55)William Christensen; Brigham Young University

11:45-12:10 Nonstationary random fields and the gravitational lensing of thecosmic microwave background(10)Ethan Anderes; University of California at Davis

12:10-12:20 Floor Discussion

4

Page 5: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Monday, 18 June 2012

Student Paper 2 Room: Clark A203

Student paper competitionOrganizer: Brandie Wagner (committee chair); University of Colorado Denver

10:30-10:55 Building DAGs via Bootstrapping(13)Ru Wang; [email protected]

10:55-11:20 Semiparametric modeling of non-autonomous nonlinear dynamicalsystems with applications(24)Siyuan Zhou; University of California, Davis

11:20-11:45 Fitting and interpreting continuous-time latent Markov models forpanel data(79)Jane Lange; University of Washington, dept of Biostatistics

11:45-12:10 Bayesian Elastic-Net and Fused Lasso for Semiparametric Struc-tural Equation Models(95)Zhenyu Wang; University of Missouri, Columbia

12:10-12:20 Floor Discussion

Contributed 2 Room: Clark A204

Multivariate and Longitudinal MethodsOrganizer: Program;

10:30-10:55 Bivariate Nonlinear Mixed Models for Longitudinal DichotomousResponse Data: Estimating Temporal Relationships Among DailyMarijuana, Cigarette, and Alcohol Use(65)Susan Mikulich-Gilbertson; University of Colorado Anschutz Medical Center

10:55-11:20 A joint Markov chain model for the association of two longitudinalbinary processes(77)Catherine M. Crespi; University of California Los Angeles

11:20-11:45 Stochastic Models of US Population(102)Asma Tahir; Colorado State University

11:45-12:10 12:10-12:20 Floor Discussion

5

Page 6: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Monday, 18 June 2012

WNAR invited 3 Room: Clark A201

Statistical challenges in HIV prevention researchOrganizer: Deborah Donnell; Fred Hutchinson Cancer Research Center

1:45-2:10 Estimating the effects of chemoprophylaxis of HIV in compliers(6)Dave Glidden; University of California, San Francisco

2:10-2:35 Design considerations for cluster randomized trials in HIV pre-vention(43)Rui Wang; Harvard School of Public Health

2:35-3:00 Consonant Multiple Testing Procedures for Subpopulation Treat-ment Effects in Randomized Trials(4)Michael Rosenblum; Johns Hopkins Bloomberg School of Public Health

3:00-3:25 Discussion

3:25-3:35 Floor Discussion

IMS invited 2 Room: Clark A202

High-Dimensional Statistical Inference for Matrix ModelsOrganizer: Florentina Bunea; Cornell University

1:45-2:10 Riemannian metric estimation(114)Marina Meila; University of Washington

2:10-2:35 Joint variable and rank selection for parsimonious estimation ofhigh dimensional matrices(14)Marten Wegkamp; Cornell

2:35-3:00 Nuclear norm penalization and optimal rates for noisy low rankmatrix completion(16)Karim Lounici; Georgia Institute of Technology

3:00-3:25 Discussion

3:25-3:35 Floor Discussion

6

Page 7: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Monday, 18 June 2012

Student Paper 3 Room: Clark A203

Student paper competitionOrganizer: Brandie Wagner (committee chair); University of Colorado Denver

1:45-2:10 Fast Computation for Genome Wide Association Studies usingBoosted One-Step Statistics(70)Arend Voorman; University of Washington

2:10-2:35 Two penalized likelihood parameter estimation approaches for amultivariate stochastic differential equations system with partiallyobserved discrete sparse data(96)Libo Sun; Colorado State University

2:35-3:00 Comparing and exploring a conservative approach to handle over-run based on maximum likelihood ordering(86)Timothy Skalland; Oregon State University

3:00-3:25 3:25-3:35 Floor Discussion

Contributed 3 Room: Clark A204

Genomics/GWAS Methods #1Organizer: Program;

1:45-2:10 ’Next Generation’ Imputation: Widening the Genome-Wide As-sociation Study(25)Sarah Nelson; University of Washington, Genetics Coordinating Center

2:10-2:35 Genome-wide evaluation of gene-environment interactions: Usinglongitudinal data to increase power(26)Colleen Sitlani; University of Washington

2:35-3:00 The Phenotype-Driven (PhD) Approach to Discovery and Valida-tion of Genomic Networks(27)Cheng Cheng; St. Jude Children’s Research Hospital

3:00-3:25 Segmentation and Intensity Estimation for Microarray Imageswith Saturated Pixels(50)Yan Yang; Arizona State University

3:25-3:35 Floor Discussion

7

Page 8: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Tuesday, 19 June 2012

WNAR invited 4 Room: Clark A201

Recent Advance in Methods for Clinical TrialsOrganizer: Ying Chen; Fred Hutchinson Cancer Research Center

8:30-8:55 Pooling Treatment Effects across RCTs: Comparison of the OddsRatio and the Risk Difference(116)Joan Hilton; University of California San Francisco

8:55-9:20 Estimating Regression Parameters in an Extended ProportionalOdds Model(51)Ying Qing Chen; Fred Hutchinson Cancer Research Center

9:20-9:45 Likelihood-Based Prediction of Endpoint Occurrences in ClinicalTrials(47)Megan Smith; University of Washington

9:45-10:10 Discussion

10:10-10:20 Floor Discussion

IMS invited 3 Room: Clark A202

Statistical Network Analysis: Application, Theory and MethodsOrganizer: Peter Hoff; University of Washington

8:30-8:55 Inferring Gene Regulatory Networks by Integrating PerturbationScreens and Steady-State Expression Profiles(52)Ali Shojaie; University of Washington

8:55-9:20 Co-clustering for directed graphs: an algorithm, a model, andsome asymptotics(20)Karl Rohe; UW Madison

9:20-9:45 Relating Social Networks and Nodal Behaviors(41)Bailey Fosdick; University of Washington

9:45-10:10 Discussion

10:10-10:20 Floor Discussion

8

Page 9: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Tuesday, 19 June 2012

Student Paper 4 Room: Clark A203

Student paper competitionOrganizer: Brandie Wagner (committee chair); University of Colorado Denver

8:30-8:55 Boosting for detection of gene-environment interactions(103)Hristina Pashova; University of Washington

8:55-9:20 Heritability estimation of an ordinal trait: lessons from simulationsand from analyzing osteoarthritis in pig-tailed macaques(85)Peter Chi; University of Washington

9:20-9:45 High throughput epitope mapping via Bayesian hierarchical mod-eling of peptide tiling arrays(94)Gregory Imholte; University of Washington - Seattle

9:45-10:10 10:10-10:20 Floor Discussion

Contributed 4 Room: Clark A204

High Dimensional and Image Data MethodsOrganizer: Program;

8:30-8:55 Functional Median Polish with Cimiate Applications(39)Ying Sun; Statistical and Applied Mathematical Sciences Institute (SAMSI)

8:55-9:20 Improved mean estimation and its application to diagonal discrim-inant analysis(42)Tiejun Tong; Hong Kong Baptist University

9:20-9:45 Ascertainment correction for a population tree via a pruning al-gorithm for computation of composite likelihood from dependentgenetic loci(54)Arindam RoyChoudhury; Columbia University

9:45-10:10 Alternatives to Penalization for Sparse Models(104)Sarah Emerson; Oregon State University

10:10-10:20 Floor Discussion

9

Page 10: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Tuesday, 19 June 2012

WNAR invited 5 Room: Clark A201

Emerging Topics in Causal InferenceOrganizer: James Dai; Fred Hutchinson Cancer Research Center

10:30-10:55 To be announcedTchetgen, Eric

10:55-11:20 Estimating the impact of community level interventions: TheSEARCH Trial and HIV Prevention in Sub-Saharan Africa(57)Maya Petersen; University of California, Berkeley School of Public Health

11:20-11:45 An Evaluation of Different Approaches to Signature Identificationin the Adaptive Signature Design(97)Mary Redman; Fred Hutchinson Cancer Research Center

11:45-12:10 Estimating the efficacy of pre-exposure prophylaxis for HIV pre-vention among participants with a threshold level of drug concen-tration(59)James Dai; Fred Hutchinson Cancer Research Center

12:10-12:20 Floor Discussion

IMS invited 4 Room: Clark A202

Objects, Shape and GeometryOrganizer: Wolfgang Polonik; University of California Davis

10:30-10:55 Statistical methods for diffusion MRI: tensor smoothing and fiberorientation estimation(5)Jie Peng; University of California, Davis

10:55-11:20 Asymptotics in graph partitioning and clustering(9)Bruno Pelletier; Universite Rennes 2, France

11:20-11:45 Estimating Filaments and Manifolds: Methods and Surrogates(8)Christopher Genovese; Carnegie Mellon University

11:45-12:10 Discussion

12:10-12:20 Floor Discussion

10

Page 11: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Tuesday, 19 June 2012

WNAR invited 6 Room: Clark A203

Spatial methods for climate applicationsOrganizer: Armin Schwartzman; Harvard University

10:30-10:50 Spatial Inference for Climate Change(82)Joshua French; University of Colorado Denver

10:50-11:10 Variability in annual temperature profiles: A multivariate spatialanalysis of regional climate model output(92)Tamara Greasby; National Center for Atmospheric Research

11:10-11:20 False Discovery Control in Large-Scale Spatial Multiple Test-ing(108)Brian Reich; North Carolina State University

11:20-11:40 Calibration of Computer Models for Large Multivariate SpatialData with Climate Applications(90)K. Sham Bhat; Los Alamos National Laboratory

12:10-12:20 Floor Discussion

Contributed 5 Room: Clark A204

Study and Trial Design #2Organizer: Program;

10:30-10:55 Designing intervention studies with multiple intermediatebiomarker outcomes(66)Loki Natarajan; University of California San Diego

10:55-11:20 Single Arm Trial Design with Comparison to a Heterogeneous Setof Historical Control Trials(71)Arzu Onar-Thomas; St Jude Children’s Research Hospital

11:20-11:45 Statistical Criteria for Establishing the Safety and Efficacy of Al-lergenic Products-A Discussion of the Variety of Variables thatMay Influence Success or Failure(81)Tammy Massie; Food and Drug Administration

11:45-12:10 12:10-12:20 Floor Discussion

11

Page 12: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Tuesday, 19 June 2012

WNAR invited 7 Room: Clark A201

Young Investigators Special Invited SessionOrganizer: Elizabeth Juarez-Colunga; University of Colorado Denver

1:45-2:05 Failure-time imputation of interval-censored time-to-event datathrough joint modeling with a longitudinal outcome(64)Darby Thompson; EMMES Canada

2:05-2:25 Hypothesis Testing for an Extended Cox Model with Time-Varying Coefficients(33)Takumi Saegusa; University of Washington

2:25-2:45 Bayesian Multistate Models for Multivariate Event Histories(32)Adam King; University of California, Los Angeles

2:45-3:05 Subgroup Methods Designed to Evaluate Heterogeneity of Treat-ment Effects: Enhancing the Interpretation of Study Results forMaking Individual Patient Decisions(106)Ann Lazar; University of California, San Francisco

3:05-3:25 Effectively selecting a target population for a future comparativestudy(67)Lihui Zhao; Northwestern University

13:25-13:35 Floor Discussion

WNAR invited 8 Room: Clark A202

Perspectives in functional data analysis and nonparametricsOrganizer: Donatello Telesco; UCLA

1:45-2:10 To be announcedShenturk, Damla

2:10-2:35 Modeling Criminal Careers as Departures from a Unimodal Pop-ulation Age-Crime Curve: The Case of Marijuana Use(49)Donatello Telesca; UCLA

2:35-3:00 A Bayesian hierarchical model for estimating and partitioningBernstein polynomial density functions(83)Charlotte Gard; New Mexico State University

3:00-3:25 Discussion

3:25-3:35 Floor Discussion

12

Page 13: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Tuesday, 19 June 2012

IMS invited 5 Room: Clark A203

Phylogenetic Analysis and Molecular EvolutionOrganizer: Ruriko Yoshida; University of Kentucky

1:45-2:10 Spectral tree reconstruction via a theory for partially-suppliedgraphs(113)Eric Stone; North Carolina State University

2:10-2:35 TIPP: Taxonomic Identification using Phylogenetic Placement(7)Nam-phuong Nguyen; University of Texas-Austin

2:35-3:00 Bayesian Nonparametric Phylodynamics(17)Vladimir Minin; University of Washington, Seattle

3:00-3:25 Discussion

3:25-3:35 Floor Discussion

Contributed 6 Room: Clark A203

Bayesian ModelingOrganizer: Program;

1:45-2:10 Extended Bayesian stable isotope trophic level inference(40)Erik Erhardt; University of New Mexico

2:10-2:35 K-Bayes Reconstruction and Estimation of Functional MRI(58)John Kornak; University of California, San Francisco

2:35-3:00 Bayesian Semi-parametric Analysis of Multi-rater Ordinal Data,With Application to Prioritizing Research Goals For Suicide Pre-vention(76)Terrance Savitsky; RAND Corporation

3:00-3:25 Multiplicity in Bayesian Graphical Models(80)Riten Mitra; ICES, University of Texas, Austin

3:25-3:35 Floor Discussion

13

Page 14: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Wednesday, 20 June 2012

WNAR invited 9 Room: Clark A201

Frontiers in Statistical Methods for Next Generation SequencingOrganizer: Katerina Kechris and Laura Saba; University of Colorado Denver

8:30-8:55 Differential Expression Identification and False Discovery Rate Es-timation in RNA-Seq Data(78)Jun Li; Stanford University

8:55-9:20 Timing Chromosomal Abnormalities using Mutation Data(105)Elizabeth Purdom; UC, Berkeley

9:20-9:45 Bayesian analysis of high-throughput quantitative measurement ofprotein-DNA interactions(44)Katerina Kechris; University of Colorado Denver

9:45-10:10 Improved moderation for gene-wise variance estimation in RNA-Seq data.(112)Yee Hwa (Jean) Yang; University of Sydney

10:10-10:20 Floor Discussion

WNAR invited 10 Room: Clark A202

Power for longitudinal and multilevel trialsOrganizer: Deb Glueck; University of Colorado Denver

8:30-8:55 Power and Sample Size for the Most Common Hypotheses inMixed Models(31)Anna Baron; University of Colorado

8:55-9:20 Mixed Model Power Analysis by Examples: Using Free Web-BasedPower Software(28)Sarah Kreidler; University of Colorado Denver

9:20-9:45 Sample Size Determination for High Dimensional Global Hypoth-esis Testing(29)Yueh-Yun Chi; University of Florida

9:45-10:10 Adaptive Sample Size Designs for Comparative Effectiveness Clin-ical Trials(30)Mitchell Thomann; University of Iowa

10:10-10:20 Floor Discussion

14

Page 15: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Wednesday, 20 June 2012

IMS invited 6 Room: Clark A203

Applied Bayesian Nonparametric MethodsOrganizer: Athanasios Kottas; University of California Santa Cruz

8:30-8:55 Local Clustering – Biostatistics and Bioinformatics Applica-tions(88)Juhee Lee; UT MD Anderson Cancer Center

8:55-9:20 Regression Analysis Using Dependent Polya Trees(22)Adam Branscum; Oregon State University

9:20-9:45 A nonparametric Bayesian approach to the analysis of bioassayexperiments with ordinal responses(36)Kassie Fronczyk; UT MD Anderson Cancer Center/Rice University

9:45-10:10 Discussion

10:10-10:20 Floor Discussion

Contributed 7 Room: Clark A204

Genomics/GWAS Methods #2Organizer: Program;

8:30-8:55 Integrative Bayesian Analysis of High-Dimensional Multi-platformGenomics Data(62)Kim-Anh Do; UT M.D¿ Anderson Cancer Center

8:55-9:20 Higher Order Asymptotics for Negative Binomial Regression In-ferences from RNA-Sequencing Data(75)Yanming Di; Oregon State University

9:20-9:45 Mixture models vs. supervised learning for integrative genomicanalysis(89)Daniel Dvorkin; University of Colorado Denver — Anschutz Medical Campus

9:45-10:10 Incorporating Partial Phase Information into Inference ofCoancestry(91)Chris Glazner; University of Washington

10:10-10:20 Floor Discussion

15

Page 16: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

AbstractsSequential by abstract number (in blue)

16

Page 17: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (4)Paper: Consonant Multiple Testing Procedures for Subpopulation Treatment Effects in RandomizedTrialsAuthor: Michael Rosenblum

Johns Hopkins Bloomberg School of Public [email protected]

Co-Authors:Abstract: In planning a randomized trial of a new treatment, such as in an HIV prevention trial, itmay be suspected that certain subpopulations will benefit more than others. These subpopulationscould be defined by a risk factor measured at baseline. We consider situations where the overallpopulation can be partitioned into two such subpopulations (i.e. high risk and low risk). If thenull hypothesis of no treatment effect in the overall study population is rejected, a natural questionis what can be said about these subpopulations. Whenever there is a treatment effect in theoverall population, it follows logically that there must be a treatment effect in at least one ofthese subpopulations. Therefore, it would be desirable to reject at least one subpopulation nullhypothesis whenever the null hypothesis for the overall population is rejected. Furthermore, itwould be desirable to do so without sacrificing any power for detecting a treatment effect in theoverall population. We give the first multiple testing procedure that has both these properties andthat strongly controls the familywise Type I error rate at level 0.05. Our procedure is simple toimplement and can be used with binary, continuous, or time-to-event outcomes. In addition, thisprocedure is the first to satisfy a certain maximin optimality property in this setting. The proofsof these properties rely on a general method we present for transforming analytically difficultexpressions arising in some multiple testing problems into more tractable nonlinear optimizationproblems, which are then solved using intensive computation. We compare our multiple testingprocedure to other procedures in simulations and in an analysis of data from a randomized trialof a breast cancer therapy. We also prove a general result for the case of k¿2 subpopulations thatpartition the overall population.

17

Page 18: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (5)Paper: Statistical methods for diffusion MRI: tensor smoothing and fiber orientation estimationAuthor: Jie Peng

University of California, [email protected]

Co-Authors: Debashis Paul; Jun Chen; Owen CarmichaelAbstract: We will talk about two problems in analyzing diffusion MRI data. One is tensor smooth-ing where we extend kernel smoothing to tensor space. We conduct theoretical and numericalanalysis to compare three kernel-based DTI smoothers based on three different geometries on thetensor space, namely the Euclidean, log-Euclidean and Affine-invariant geometries. In the secondproblem, we consider estimating fiber orientation distributions (FOD) based on High Angular Res-olution Diffusion Imaging (HARDI). We adopt a spherical wavelet decomposition and utilize sparsepenalties to select fiber orientations. This method is compared with approaches based on sphericalharmonic representation through simulation examples. This talk is based on the joint work withDebashis Paul, Jun Chen and Owen Carmichael at UC Davis.

18

Page 19: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (6)Paper: Estimating the effects of chemoprophylaxis of HIV in compliersAuthor: Dave Glidden

University of California, San [email protected]

Co-Authors: Dean Follmann, Pamela Shaw, Wenjuan Gu, Biostatistics Research Branch, NIAIDAbstract: Randomized placebo controls clinical trials have shown that tenofovir with or withoutemtricitabine can prevent HIV infection in HIV-negative individuals. In the iPrEx trial, an intent-to-treat (ITT) estimate of the effect of assignment to drug revealed a 42% reduction in the HIVinfection rate. However, most participants in the drug group did not take the drug as directed.Our interest is to estimate the effect of drug assignment in a trial where all would comply. Usingmeasured drug-in-the blood (DITB) of drug group HIV+ participants and matched controls, webuild a prediction model for DITB in the drug group and predict whether placebo participantswould have had DITB, had they been randomzied to drug. We then estimate the ITT effect ofdrug assignment in the subgroup of would-be drug compliers. Different methods are used includingan even-handed approach where predicted DITB is used for all drug and placebo volunteers andan asymmmetric method which uses observed DITB in the drug group and predicted DITB in theplacebo group. We compare our methods to a more standard case-control analysis of measuredDITB in the drug group and evaluate the different methods using simulation.

19

Page 20: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (7)Paper: TIPP: Taxonomic Identification using Phylogenetic PlacementAuthor: Nam-phuong Nguyen

University of [email protected]

Co-Authors: Warnow, Tandy; Mirarab, SiavashAbstract: An estimated 99% of all microbes cannot be cultured in a laboratory. Metagenomicanalyses circumvent this problem by sequencing organisms directly from the environment. Thetypical output of a metagenomic analysis is millions of short fragmentary reads from various speciesin the sample. Identifying the species of each read is a fundamental challenge for metagenomicanalysis. We present a new method, TIPP, for the taxonomic identification of short reads. TIPPtakes as input a set of backbone alignment and trees, a set of short reads, a taxonomy, and statisticalsupport thresholds for alignment and classification, and the output is the taxonomic identificationof each short read. The taxonomic identification is performed through phylogenetic placement,where each read is aligned to the backbone alignment and then placed into the taxonomic tree. Toprevent overclassification, TIPP takes the into account the uncertainty in the backbone alignmentand tree, the uncertainty in the alignment of the short reads, and the uncertainty in the placementinto the taxonomy. We present results that show TIPP greatly increases the number of correctclassifications over existing methods with a small increase in incorrect classifications.

20

Page 21: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (8)Paper: Estimating Filaments and Manifolds: Methods and SurrogatesAuthor: Christopher Genovese

Carnegie Mellon [email protected]

Co-Authors:Abstract: Spatial data and high-dimensional data, such as collections of images, often contain high-density regions that concentrate around some lower dimensional structure. In many cases, thesestructures are well-modeled by smooth manifolds, or collections of such manifolds. For example,the distribution of matter in the universe at large scales forms a web of intersecting clusters (0-dimensional manifolds), filaments (1-dimensional manifolds), and walls (2-dimensional manifolds),and the shape and distribution of these structures have cosmological implications. I will discusstheory and methods for the problem of estimating manifolds (and collections of manifolds) fromnoisy data in the embedding space. The noise distribution has a dramatic effect on the perfor-mance (e.g., minimax rates) of estimators that is related to but distinct from what happens inmeasurement-error problems. Some variants of the problem are “hard” in the sense that no esti-mator can achieve a practically useful level of performance. I will show that in the “hard” case,it is possible to achieve accurate estimators for a suitable surrogate of the unknown manifold thatcaptures many of the key features of the object and will describe a method for doing this efficiently.

21

Page 22: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (9)Paper: Asymptotics in graph partitioning and clusteringAuthor: Bruno Pelletier

Universite Rennes 2, [email protected]

Co-Authors:Abstract: Graph partitioning and related methods in clustering arise in many applications wherethe goal is to find a partition in such a way that a certain cost function is minimized. A centralquantity appearing in this context is the Cheeger isoperimetric constant, which can be defined fora Euclidean domain as well as for a graph. In either case, it quantifies how well the domain orgraph can be bisected or cut into two pieces that are as little connected as possible. Motivatedby recent developments in spectral graph clustering and computational geometry, we present inthis talk asymptotic results regarding the partition of a random neighborhood graph with nodesrandom points drawn in a given domain. We relate the Cheeger constant of the graph with theCheeger constant of the domain itself, and we establish the convergence of optimal graph partitionstowards the Cheeger sets of the domain. These results imply the consistency of the bipartite graphcut algorithm and provide a characterization of the limit partition in terms of subsets achievingthe isoperimetric Cheeger constant.

22

Page 23: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (10)Paper: Nonstationary random fields and the gravitational lensing of the cosmic microwave back-groundAuthor: Ethan Anderes

University of California at [email protected]

Co-Authors:Abstract: This talk presents a new regression characterization for the quadratic estimator of weaklensing, developed by Hu and Okamoto (2001,2002) for cosmic microwave background observations.This characterization motivates a modification of the quadratic estimator by an adaptive Wienerfilter which uses the robust Bayesian techniques developed by Berger (1980) and Strawderman(1971). This technique requires the user to propose a fiducial model for the spectral density of theunknown lensing potential but the resulting estimator is developed to be robust to misspecificationof this model. The role of the fiducial spectral density is to give the estimator superior statisticalperformance in a “neighborhood of the fiducial model’ while controlling the statistical errors whenthe fiducial spectral density is drastically wrong. Our estimate also highlights some advantagesprovided by a Bayesian analysis of the quadratic estimator. This work is based on collaborationwith Debashis Paul (Statistics, UC Davis).

23

Page 24: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (11)Paper: A test for stationarity of spatio-temporal random fields on planar and spherical domainsAuthor: Mikyoung Jun

Texas A&M [email protected]

Co-Authors: Marc GentonAbstract: A formal test for weak stationarity of spatial and spatio-temporal random fields is pro-posed. We consider the cases where the spatial domain is planar or spherical and we do not requiredistributional assumptions for the random fields. The method can be applied to univariate or tomultivariate random fields. Our test is based on the asymptotic normality of certain statistics thatare functions of estimators of covariances at certain spatial and temporal lags under weak station-arity. Simulation results for spatial as well as spatio-temporal cases on the two types of spatialdomains are reported. We describe the results of two applications of the proposed method. Oneis to test stationarity of Pacific wind data. The other is to test axial symmetry of climate modelerrors for surface temperature using the NOAA GFDL model outputs and the observations fromthe Climate Research Unit in East Anglia and the Hadley Centre.

24

Page 25: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (13)Paper: Building DAGs via BootstrappingAuthor: Ru Wang

[email protected]@ucdavis.edu

Co-Authors: Jie PengAbstract: This work is motivated by revealing the structure of gene regulatory networks based onhigh through genomic data. Particularly, we consider structural learning of directed acyclic graphs(DAG) under the high dimension low sample size setting. We first propose a hybrid algorithm. Itutilizes the ’space’ method for learning gaussian graphical models to estimate the moral graph ofthe DAG. It then uses the hill climbing (hc) algorithm with BIC score while constraining edgesin the set inferred by ’space’. Through simulations, this method performs favorably compared toother hybrid methods such as mmhc and rsmax2. Secondly, we adopt the idea of bagging, bybuilding an ensemble of DAGs through bootstrap resampling. An aggregated DAG is then derivedthrough minimizing a graph distance measure. This aggregated based method is also comparedwith commonly used (non-aggregated) methods such as hc, pcalg, etc, and is shown to improvethese methods. Besides these newly proposed methods for DAG learning, compared with someexisting software packages, our implementation is also more efficient.

25

Page 26: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (14)Paper: Joint variable and rank selection for parsimonious estimation of high dimensional matricesAuthor: Marten Wegkamp

[email protected]

Co-Authors: F Bunea, Y. SheAbstract: This talk is devoted to optimal dimension reduction methods for sparse, high dimensionalmultivariate response regression models. Both the number of responses and that of the predictorsmay exceed the sample size. Sometimes viewed as complementary, predictor selection and rankreduction are the most popular strategies for obtaining lower dimensional approximations of theparameter matrix in such models. We show that important gains in prediction accuracy can beobtained by considering them jointly. For this, we first motivate a new class of sparse multivariateregression models, in which the coefficient matrix has both low rank and zero rows or can be wellapproximated by such a matrix. Then, we introduce estimators that are based on penalized leastsquares, with novel penalties that impose simultaneous row and rank restrictions on the coefficientmatrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and havefast rates of convergence. Our theoretical results are supported by a simulation study.

26

Page 27: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (15)Paper: A stochastic partitioning method to associate high-dimensional response and covariate dataAuthor: Mahlet Tadesse

Georgetown [email protected]

Co-Authors: Stefano MonniAbstract: In recent years, there has been a growing interest in relating data sets in which boththe number of regressors and response variables are substantially larger than the sample size. Forexample, in an attempt to gain new insights into molecular processes, many efforts are being carriedout to integrate data sets from various high-throughput genomic sources. We propose a Bayesianstochastic partitioning method that provides a unified approach to identify cluster structures andrelationships between data sets by identifying correlated response variables and their associatedsubsets of covariates. We illustrate the method with an application to genomic studies.

27

Page 28: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (16)Paper: Nuclear norm penalization and optimal rates for noisy low rank matrix completionAuthor: Karim Lounici

Georgia Institute of [email protected]

Co-Authors: Vladimir Koltchinskii; Alexander TsybakovAbstract: This paper deals with the trace regression model where n entries or linear combinationsof entries of an unknown m1 ×m2 matrix A0 corrupted by noise are observed. We propose a newnuclear norm penalized estimator of A0 and establish a general sharp oracle inequality for thisestimator for arbitrary values of n,m1,m2 under the condition of isometry in expectation. Thenthis method is applied to the matrix completion problem. In this case, the estimator admits a simpleexplicit form and we prove that it satisfies oracle inequalities with faster rates of convergence thanin the previous works. They are valid, in particular, in the high-dimensional setting m1m2 � n.We show that the obtained rates are optimal up to logarithmic factors in a minimax sense andalso derive, for any fixed matrix A0, a non-minimax lower bound on the rate of convergence of ourestimator, which coincides with the upper bound up to a constant factor. Finally, we show thatour procedure provides an exact recovery of the rank of A0 with probability close to 1. We alsodiscuss the statistical learning setting where there is no underlying model determined by A0 andthe aim is to find the best trace regression model approximating the data.

28

Page 29: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (17)Paper: Bayesian Nonparametric PhylodynamicsAuthor: Vladimir Minin

University of Washington, [email protected]

Co-Authors: Julia Palacios; Marc Suchard; Mandev GillAbstract: Estimating evolutionary trees, called phylogenies or genealogies, is a fundamental taskin modern biology. Once phylogenetic reconstruction is accomplished, scientists are faced with achallenging problem of interpreting phylogenetic trees. In certain situations, a coalescent process,a stochastic model that randomly generates evolutionary trees, comes to rescue by probabilisticallyconnecting phylogenetic reconstruction with the demographic history of the population under study.From a Bayesian hierarchal modeling perspective, the coalescent process can be viewed as a priorfor evolutionary trees, parameterized in terms of unknown demographic parameters. In particular,we are interested in the application of the coalescent to phylodynamics, an area that aims atreconstructing past population dynamics from genomic data through phylogenetic reconstruction.We propose a new Gaussian process-based Bayesian nonparametric framework for such demographicreconstruction. Viewing the coalescent process as a point process, we develop a novel Markov chainMonte Carlo algorithm that allows us to approximate the posterior distribution of populationsize trajectories efficiently and without artificial discretization. Using simulations, we show thatour new method is more accurate and precise than a competing Gaussian Markov random fieldsmoothing approach. Our analyses of population dynamics of hepatitis C and human influenzaviruses demonstrate that our new method produces biologically plausible reconstructions.

29

Page 30: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (19)Paper: Model-Based Sieve AnalysisAuthor: Paul Edlefsen

Fred Hutchinson Cancer Research [email protected]

Co-Authors:Abstract: In the AIDS Vaccine 2011 meeting in Bangkok, two years after the release of the initialresults of the RV144 Thailand HIV Vaccine Trial, I presented results of a sieve analysis of the V2region of the Envelope protein of the viral sequences from infected RV144 subjects. Our analy-sis showed significant evidence of vaccine-induced filtering of HIV viruses with particular genomicsequence features. This result supports an important hypothesis generated by the groundbreak-ing immune correlates (case-control) analysis: that the vaccine induced anti-V2 antibodies, andthat those antibodies conferred protection against HIV. This sieve analysis result, along with therecently-published findings of a strong t-cell mediated sieve effect in the failed Merck STEP (HVTN502) HIV vaccine trial, has generated increased interest in sieve effects and in sieve analysis meth-ods. In this talk I will review these findings and present my ’Model-Based Sieve Analysis’ methods,which were among those used for the RV144 sieve analysis conducted by our Sieve Analysis Groupat SCHARP at the Fred Hutchinson Cancer Research Center in Seattle. The framework of model-based sieve analysis provides a useful formalism for testing hypotheses and estimating parametersregarding sieve effects in both frequentist and Bayesian contexts, and provides a useful foundationfor reasoning about the sieving phenomenon and its nuances.

30

Page 31: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (20)Paper: Co-clustering for directed graphs: an algorithm, a model, and some asymptoticsAuthor: Karl Rohe

UW [email protected]

Co-Authors: Bin YuAbstract: Although the network clustering literature has focused on undirected networks, manynetworks are directed. For example, communication networks contain asymmetric relationships,representing the flow of information from one person to another. This talk will (1) demonstratethat co-clustering, instead of clustering, is more natural for many directed graphs, (2) propose aspectral algorithm and a statistical model for co-clustering, (3) show some asymptotic results, and(4) present a preliminary analysis of a citation network from Arxiv. Of key interest is the discoveryof bottleneck nodes that transmit information between clusters of papers.

31

Page 32: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (21)Paper: Estimation for general birth-death processesAuthor: Forrest Crawford

[email protected]

Co-Authors: Marc A. Suchard and Vladimir N. MininAbstract: Birth-death processes (BDPs) are continuous-time Markov chains that track the numberof ’particles’ in a system over time. While widely used in population biology, genetics and ecology,statistical inference of the instantaneous particle birth and death rates remains largely limited torestrictive linear BDPs in which per-particle birth and death rates are constant. Researchers oftenobserve the number of particles at discrete times, necessitating data augmentation procedures suchas expectation-maximization (EM) to find maximum likelihood estimates. The E-step in the EMalgorithm is available in closed-form for some linear BDPs, but otherwise previous work has resortedto approximation or simulation. Remarkably, the E-step conditional expectations can also beexpressed as convolutions of computable transition probabilities for any general BDP with arbitraryrates. This important observation, along with a convenient continued fraction representation of theLaplace transforms of the transition probabilities, allows novel and efficient computation of theconditional expectations for all BDPs, eliminating the need for approximation or costly simulation.We use this insight to derive EM algorithms that yield maximum likelihood estimation for generalBDPs characterized by various rate models, including generalized linear models. We show that ourLaplace convolution technique outperforms competing methods when available and demonstratea technique to accelerate EM algorithm convergence. Finally, we validate our approach usingsynthetic data and then apply our methods to estimation of mutation parameters in microsatelliteevolution.

32

Page 33: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (22)Paper: Regression Analysis Using Dependent Polya TreesAuthor: Adam Branscum

Oregon State [email protected]

Co-Authors: Angela SchoergendorferAbstract: A flexible Bayesian model is developed for regression analysis using dependent Polya trees.The modeling framework can be applied to both cross-sectional and longitudinal data becauseit involves residual distributions that can evolve with respect to a covariate such as time. Bymodeling residual distributions at consecutive time points or across ordered groups using separate,correlated Polya tree priors, distributional information can be pooled across values of the covariatewhile allowing flexibility to accommodate evolving residual distributions. The approach can beembedded in a wide range of models, such as linear and nonlinear mixed models. Application togrowth curve data will be presented.

33

Page 34: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (24)Paper: Semiparametric modeling of non-autonomous nonlinear dynamical systems with applica-tionsAuthor: Siyuan Zhou

University of California, [email protected]

Co-Authors: Debashis Paul; Jie PengAbstract: Continuous time dynamical systems arise in modeling various processes, such as, growthof plants or organisms; chemical reactions as well as disease or ecological dynamics. In this paper,we propose a semi-parametric approach for modeling population-level data on systems governed bypossibly non-autonomous differential equations. This model includes an unknown common gradientfunction and the subject-specific effects on the rates of change are multiplicative. These subject-specific effects are modeled as polynomials in time. We devise a procedure for fitting this modelbased on a hierarchical likelihood framework where the baseline gradient function g is representedin a spline basis. We also propose two computationally efficient model selection procedures usingcross validation scores. We show by simulation studies that the proposed estimation and modelselection procedures can handle both sparse and dense data with various noises. The proposedmethod is illustrated through applications to data on growth trajectories.

34

Page 35: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (25)Paper: ’Next Generation’ Imputation: Widening the Genome-Wide Association StudyAuthor: Sarah Nelson

University of Washington, Genetics Coordinating [email protected]

Co-Authors: Cathy Laurie; Brian Browning; Sharon BrowningAbstract: Genotype imputation is the process of inferring unobserved genotypes in a study samplebased on the haplotypes observed in a more densely genotyped reference sample of similar geneticbackground [1]. When first applied to genome-wide association studies (GWAS), imputation mostcommonly used reference panels from the International HapMap Consortium (Phases 2 and 3) [2,3]. More recently, reference panel choice has expanded and improved with the advent of the 1000Genomes Project, which is enabling the imputation of many more and rarer variants [1, 4, 5]. Since2010, the University of Washington Genetics Coordinating Center has completed imputation fornumerous Genome-Wide Association Studies in the GENEVA and GARNET consortia, using bothHapMap Phase 3 and 1000 Genomes reference panels. Along the way, we have assessed severalaspects of imputation including reference panel design, computational efficiency, family studies,and multi-ethnic study cohorts. Our experience imputing into studies that vary by ascertainmentdesign, genotyping platforms, reference samples, and ethnic composition, has enabled us to testand refine these various aspects of imputation. 1. Li, Y., Willer, C., Sanna, S. & Abecasis, G.Genotype imputation. Annu Rev Genomics Hum Genet 10, 387-406 (2009). 2. Frazer, K. et al. Asecond generation human haplotype map of over 3.1 million SNPs. Nature 449, 851-61 (2007). 3.Altshuler, D. et al. Integrating common and rare genetic variation in diverse human populations.Nature 467, 52-8 (2010). 4. Marchini, J. & Howie, B. Genotype imputation for genome-wideassociation studies. Nat Rev Genet 11, 499-511 (2010). 5. Ellinghaus, D., Schreiber, S., Franke,A. & Nothnagel, M. Current software for genotype imputation. Hum Genomics 3, 371-80 (2009).

35

Page 36: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (26)Paper: Genome-wide evaluation of gene-environment interactions: Using longitudinal data to in-crease powerAuthor: Colleen Sitlani

University of [email protected]

Co-Authors: Kenneth Rice; Thomas Lumley; Barbara McKnight; Arend VoormanAbstract: Genome-wide association studies of gene-environment interactions seek to discover someof the missing heritability that remains unexplained by the genome-wide search for main effects.However, even in consortia with tens of thousands of individuals, power to detect such interactionsmay be low. One way to increase power is to incorporate repeated longitudinal measures of thephenotype and the environmental exposure of interest. Such longitudinal analyses are particularlyuseful when the rate of exposure is low, individual exposure changes over time, and past outcomesdo not influence future treatment. We discuss the advantages and disadvantages of generalizedestimating equations versus mixed effects models in the setting of genome-wide gene-environmentinteractions. We also discuss practical considerations involved in fitting longitudinal models withinone cohort and in meta-analyzing results across cohorts.

36

Page 37: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (27)Paper: The Phenotype-Driven (PhD) Approach to Discovery and Validation of Genomic NetworksAuthor: Cheng Cheng

St. Jude Children’s Research [email protected]

Co-Authors:Abstract: The recent advance of biotechnology has enabled biomedical investigators to collect, ona large number of subjects and on the genome-wide scale, observations of massive numbers ofmolecular variables such as SNPs, gene expressions, DNA methylations, and microRNA expres-sions. With such a massive and diverse dataset, there is a unique opportunity to perform molecularsystems biology studies to elucidate the biological process underlying a phenotype of interest, bydiscovering genomic networks (GNs) that affect the phenotypic variation. In general, genomic net-work discovery is a NP-hard problem in terms of the number of molecular variables involved. Thispresentation describes a phenotype-driven (PhD) approach to effective dimension reduction thatpreserves important biological information relevant to the study context defined by the phenotypeof interest, along with a validation method for the discovered GNs. An application to studyingtreatment response of childhood leukemia is presented.

37

Page 38: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (28)Paper: Mixed Model Power Analysis by Examples: Using Free Web-Based Power SoftwareAuthor: Sarah Kreidler

University of Colorado [email protected]

Co-Authors:Abstract: Join us for an interactive demonstration of power and sample size calculations for lon-gitudinal and multilevel designs. Popular study designs in community based, behavioral researchwill provide the driving examples. We will walk through sample size calculations for four designsincluding 1) a cluster-randomized trial of an intervention to increase physical activity in middleschool girls, 2) a longitudinal study of a sensory focus intervention on memories of dental pain, 3)a multilevel trial to promote oral cancer screening, and 4) a multilevel and longitudinal study ofrisk factors for drinking and driving among youth. For each example, we will demonstrate how toobtain inputs for the power analysis, how to summarize the study design, how to calculate poweror sample size, and how to interpret the results. All calculations will use a free, web-based softwaretool with a point-and-click graphical user interface (http://www.samplesizeshop.com). We encour-age attendees to bring lap-tops to the session to practice techniques. Session attendees will receivedetailed lecture notes describing the sample size analyses for future reference.

38

Page 39: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (29)Paper: Sample Size Determination for High Dimensional Global Hypothesis TestingAuthor: Yueh-Yun Chi

University of [email protected]

Co-Authors: Matther Gribbin, Deborah Glueck, Keith MullerAbstract: In pharmacogenomics, scientists often study functionally related genes, or a completegenetic pathway. The effect of individual genes may be too small to achieve statistical significance.The standard analytic approach involves performing a single test for each gene and correcting formultiple comparisons. A global hypothesis test allows evaluation of a set of genes and may allowgreater sensitivity for a collection of small effects. We propose a new global hypothesis test basedon the univariate approach to repeated measures. We demonstrate that the proposed test has goodcontrol of Type I error, even when the number of genes studied exceeds the number of researchparticipants. We derive exact and approximate power and sample size methods. We show thatthe global hypothesis test has greater power than performing the standard analysis. Extensivenumerical simulations confirm the analytic results for power and Type I error rate. We illustratethe new methods for a sample size calculation for a proposed study of the metabolic consequencesof vitamin B6 deficiency.

39

Page 40: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (30)Paper: Adaptive Sample Size Designs for Comparative Effectiveness Clinical TrialsAuthor: Mitchell Thomann

University of [email protected]

Co-Authors:Abstract: In contrast to traditional clinical trials, comparative effectiveness trials evaluate medicaltreatments already in practice. Primary outcomes typically include relative effectiveness, safety,and cost. Adaptive designs appeal because fixed designs have two shortcomings: 1. the variabilityin actual medical practice make fixed plans unlikely to be optimal; 2. the traditional dependenceon a fixed value of a ’minimal clinically meaningful effect’ does not seem valid for population-basedresearch. Traditional adaptive decision rules allow early stopping due to efficacy, futility, or safety.Adding sample size re-estimation abilities can help balance sensitivity and efficiency. Existingadaptive early stopping and sample-size methods for linear models in comparative effectiveness fallinto one of three tiers: t tests, univariate models with multiple predictors (including ANOVA),and repeated measures (mixed models). For each tier, early stopping and sample size reestimationdesigns (with or without early stopping) include: 1. internal pilot designs assuming fixed effectsof interest and re-estimated variances; 2. standard group sequential designs using estimated effectsizes for early stopping; 3. combinations of 1 and 2; and 4. designs changing the effect size of interestfor interim sample size calculations. Types 1, 2 and 3 generate little controversy. In contrast, inthe context of pharmaceutical science, type 4 methods have been criticized on both ethical andtheoretical grounds. We propose that the ethical and cost profiles of comparative effectivenesstrials encourage considering all four classes of designs. We review existing methods, and highlightopen questions needing new results.

40

Page 41: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (31)Paper: Power and Sample Size for the Most Common Hypotheses in Mixed ModelsAuthor: Anna Baron

University of [email protected]

Co-Authors:Abstract: Mixed models have become the standard approach for handling correlated observationsand accommodating missing data. The complexity of the resulting covariance matrices may seemto require correspondingly complex power and sample size methods. We demonstrate that inmany cases a matrix-based transformation process can reduce the mixed model and hypothesisto an equivalent general linear multivariate model and hypothesis. The resulting power analysisis simpler. Using existing power techniques for the equivalent multivariate model yields exact, orextremely accurate approximate power results for the original problem. The method applies whenthe hypothesis concerns only fixed effects, and under the assumption that analysis uses the Waldtest with a Kenward-Roger degrees of freedom approximation. We show that an additional inflationfactor for sample size can often account for missing data, when the data are missing completely atrandom.

41

Page 42: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (32)Paper: Bayesian Multistate Models for Multivariate Event HistoriesAuthor: Adam King

University of California, Los [email protected]

Co-Authors: Robert WeissAbstract: We present Bayesian multistate models for event history data with discrete time. Thedata for each subject consists of records of time segments (called episodes) during which each of apredetermined collection of characteristics, behaviors, or circumstances was present. The lengthsof these episodes, as well as the lengths of time spent in between episodes, are discrete time survivaloutcomes, where the event of interest is a transition between the two states of the characteristicbeing present and absent. We propose models that include the dichotomous processes for multiplecharacteristics simultaneously. These models use nonparametric baseline hazards for multiple timevariables (such as time spent in current state and age) and can accommodate informative censoring.We apply our methods to retrospectively collected lifetime event histories of cocaine usage andrelated characteristics of 500 drug users in Los Angeles.

42

Page 43: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (33)Paper: Hypothesis Testing for an Extended Cox Model with Time-Varying CoefficientsAuthor: Takumi Saegusa

University of [email protected]

Co-Authors: Chongzhi Di, Fred Hutchinson Cancer Research Center; Ying Qing Chen, FredHutchinson Cancer Research CenterAbstract: We propose spline-based score tests for time-varying treatment effects in an extendedCox Model. The log-rank test has been widely used to test treatment effects under the Cox model,but may lose power substantially when the proportional hazards assumption does not hold. Thereare approaches to test plausibility of the proportional hazards assumption, such as a smoothingspline-based score test of Lin, Zhang and Davidian (2006). In this study, we consider an extendedCox model that uses smoothing splines to model the time-varying treatment effect and proposetest statistics for the overall treatment effect. The proposed tests combine statistical evidencefrom both the magnitude and shape of the hazard ratio function, and thus are omnibus proceduresthat are powerful against various types of alternatives. Simulation studies confirmed that theproposed tests performed well in finite samples and were often more powerful than both log-rankand proportionality tests under many settings. We applied our methods to the HIVNET 012 Study,a randomized clinical trial conducted by the HIV Prevention Trial Network.

43

Page 44: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (34)Paper: Invisible Fence Methods and the Identification of Differentially Expressed Gene SetsAuthor: J. Sunil Rao

University of [email protected]

Co-Authors: Jiming Jiang (UC-Davis); Thuan Nguyen (Oregon Health and Science University)Abstract: The fence method (Jiang et.al. 2008; Ann. Statist. 36, 1669-1692) is a recently developedstrategy for model selection. The idea involves a procedure to isolate a subgroup of correct models(of which the optimal model is a member). This is accomplished by constructing a statistical fence,or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal modelis selected from amongst those within the fence according to a criterion which can be made flexible.We extend the fence method to situations where a true model may not exist or be among thecandidate models. Furthermore, another look at fence methods leads to a new procedure known asthe invisible fence (IF). We develop IF ideas for the problem of gene set analysis (GSA) for geneexpression experiments where the number of genes is much larger than the number of samples. Wedetail the theory and compare empirical performance against other GSA type procedures currentlyin use. We demonstrate theoretically and empirically, the consistency property of IF while pointingout an inconsistency of GSA under certain situations. An application tracking pathway involvementin late versus early stage colon cancers is considered.

44

Page 45: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (35)Paper: Bayesian Kernel Based Modeling and Selection of Genetic Pathways and Genes for CancerStudiesAuthor: sounak chakraborty

University of [email protected]

Co-Authors: Zhenyu Wang, Jianguo SunAbstract: Much attention has been given recently to the development of methods that utilizethe large quantity of genetic information available in online databases. Most of the proposedmethods look at the entire set of genes and their impact on a disease. Recently a new philosophyemerged which considers the genetic pathways, which contain sets of genes, combined effect ona disease. Under the new philosophy the goal is to identify the significant genetic pathways andthe corresponding influential genes in regards to different diseases. In this research, a Bayesiankernel machine model which incorporates existing information on pathways and gene networks inthe analysis of DNA microarray data is developed. Each pathway is modeled nonparametricallyusing a reproducing kernel Hilbert space. Mixture priors on the pathway indicator variable andthe gene indicator variable are assigned. This approach can be used to model both linear andnon-linear pathway effects and can pinpoint the important pathways along with the active geneswithin each pathway. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed tofit our model. A simulation study and a real data analysis, using, van’t Veer et al. (2002) breastcancer microarray data, are used to illustrate the proposed method.

45

Page 46: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (36)Paper: A nonparametric Bayesian approach to the analysis of bioassay experiments with ordinalresponsesAuthor: Kassie Fronczyk

UT MD Anderson Cancer Center/Rice [email protected]

Co-Authors: Athanasios KottasAbstract: Motivated by the area of cytogenetic dosimetry, where interest lies in the relationshipbetween the dosage of exposure to radiation and some measure of genetic aberration, we aim tomodel the general case of bioassay experiments with ordinal outcomes. We adopt a structurednonparametric mixture model built upon modeling dose-dependent response distributions. Theframework capitalizes on a continuation-ratio type construction for the mixture kernel to promoteflexibility in the dose-response curves and to maintain relative ease of implementation. A portionof an experimental data set is used to illustrate the method with respect to the prime inferentialobjectives.

46

Page 47: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (37)Paper: Bias-variance trade-off in estimating posterior probabilities for variable selectionAuthor: Joyee Ghosh

The University of [email protected]

Co-Authors: Merlise A. ClydeAbstract: Markov chain Monte Carlo algorithms are commonly used to identify a set of promisingmodels for Bayesian model selection or Bayesian model averaging. Because Monte Carlo estimatesof model probabilities are often unreliable in high dimensional problems that preclude enumeration,posterior probabilities calculated from the observed marginal likelihoods, renormalized over the setof sampled models are often employed. Such estimates are the only recourse in several newerstochastic search algorithms. In this talk, we show that renormalization of posterior probabilitiesover the set of sampled models generally leads to bias which may dominate mean squared error.Viewing the model space in Bayesian variable selection as a finite population, we propose a newestimator based on a ratio of Horvitz-Thompson estimators. This estimator incorporates observedmarginal likelihoods like the renormalized estimators, but also enjoys an approximate unbiasednessproperty. Based on simulation studies, we demonstrate that this new estimator may lead to areduction in mean squared error compared to the Monte Carlo or renormalized estimators, with asmall increase in computational cost.

47

Page 48: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (38)Paper: Using Audit Information to Adjust Parameter Estimates for Data Errors in Clinical TrialsAuthor: Pamela Shaw

Biostatistics Research Branch, NIAID/[email protected]

Co-Authors: Bryan Shepherd; Pamela Shaw*; Lori DoddAbstract: Audits are often performed to assess the quality of clinical trial data, but beyond de-tecting fraud or sloppiness, the audit data is generally ignored. Database errors are a type ofmeasurement error that can lead to bias in study estimates. We examine the usefulness of audit-based error-correction methods in clinical trial settings where a continuous outcome is of primaryinterest. We examine the bias of the nave multiple linear regression estimates, which ignore thedata errors, in general settings where both the outcome and covariates may have errors. We studythis bias under different assumptions including, correlated errors between covariates and outcome;independence between treatment assignment, covariates, and data errors (conceivable in a double-blinded randomized trial); and independence between treatment assignment and covariates but notdata errors in the outcome (possible in an unblinded randomized trial). We discuss moment-basedestimators proposed by Shepherd and Yu (2011) and propose new multiple imputation estimatorsthat incorporate audit data to adjust for the error. Treatment and covariate effect estimates canbe corrected by incorporating audit data using either the multiple imputation or moment-basedapproaches. The performance of the error-correction and nave estimators is studied in simulations.The extent of bias and relative performance of the methods depend on the extent and nature ofthe error as well as the size of the audit.

48

Page 49: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (39)Paper: Functional Median Polish with Cimiate ApplicationsAuthor: Ying Sun

Statistical and Applied Mathematical Sciences Institute (SAMSI)[email protected]

Co-Authors: Marc G. GentonAbstract: This article proposes functional median polish, an extension of univariate median polish,for one-way and two-way functional analysis of variance (ANOVA). The functional median polishestimates the functional grand effect and functional main factor effects based on functional mediansin an additive functional ANOVA model assuming no interaction among factors. A functional ranktest is used to assess whether the functional main factor effects are significant. The robustness ofthe functional median polish is demonstrated by comparing its performance with the traditionalfunctional ANOVA fitted by means under different outlier models in simulation studies. Thefunctional median polish is illustrated on various applications in climate science, including one-way and two-way ANOVA when functional data are either curves or images. Specifically, Canadiantemperature data, U.S. precipitation observations and outputs of global and regional climate modelsare considered, which can facilitate the research on the close link between local climate and theoccurrence or severity of some diseases and other threats to human health.

49

Page 50: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (40)Paper: Extended Bayesian stable isotope trophic level inferenceAuthor: Erik Erhardt

University of New [email protected]

Co-Authors: Rachel M. Wilson; James Nelson; Jeffrey P. ChantonAbstract: We developed an extended Bayesian mixing model to jointly infer organic matter utiliza-tion and isotopic enrichment of organic matter sources in order to infer the trophic levels of severalnumerically abundant fish species (consumers) present in Apalachicola Bay, FL, USA. Bayesianmethods apply for arbitrary numbers of isotopes and diet sources but existing models are some-what limited as they assume that trophic fractionation is estimated without error or that isotoperatios are uncorrelated. The model uses stable isotope ratios of carbon, nitrogen, and sulfur, iso-topic fractionations, elemental concentrations, elemental assimilation efficiencies, as well as priorinformation (expert opinion) to inform the diet and trophic level parameters. The model appro-priately accounts for uncertainly and prior information at all levels of the analysis. Comparingresults between models including concentration and assimilation efficiency with a simpler modelthat only uses isotopic fractionations revealed substantial differences, illustrating the importance ofincluding this “additional” information, in particular for organic matter sources that varied nine-fold in sulfur concentrations. Comparing results to a simpler two-stage model that did not accountfor uncertainty (Wilson, et al., Estuaries and Coasts (2009) 32:999–1010) showed correctly moreestimate variablility than the previous two-stage model, though estimates themselves did not differsubstantially. Furthermore, well-specified prior information can have a substantial impact on theresults, in particular by reducing the uncertainty for key parameters, such as diet and trophic level,though great care in eliciting priors is suggested. Finally, we illustrate how prior information isparameterized, how the model is tailored to the specific problem, and how this general strategy canbe applied to other situations.

50

Page 51: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (41)Paper: Relating Social Networks and Nodal BehaviorsAuthor: Bailey Fosdick

University of [email protected]

Co-Authors: Peter D. HoffAbstract: Describing the association between people’s network positions and their individual be-haviors has historically been done by considering either the network or nodal characteristics as fixedand the other as random, or the dependent variable. However, in many networks it is likely thatthe network is influenced by the nodal attributes and the nodal attributes affect the network. Wepropose a joint model of the network relations and nodal attributes. This model extends the modelin Hoff (2009) by incorporating a joint dependence model for an individual’s network relations andtheir individual-level behaviors. Invariances present in the model are addressed via a transparentparameterization of the network effects, and tests of association between the network and nodalvariables are proposed. We apply this model to high school friendship networks from the NationalLongitudinal Study of Adolescent Health and investigate the relationship between the network andbehavioral characteristics such as drinking, smoking, and grade point average.

51

Page 52: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (42)Paper: Improved mean estimation and its application to diagonal discriminant analysisAuthor: Tiejun Tong

Hong Kong Baptist [email protected]

Co-Authors: Liang Chen; Hongyu ZhaoAbstract: High-dimensional data such as microarrays have created new challenges to traditionalstatistical methods. One such example is on class prediction with high-dimension, low-sample-size data. Due to the small sample size, the sample mean estimates are usually unreliable. Asa consequence, the performance of the class prediction methods using the sample mean may alsobe unsatisfactory. To obtain more accurate parameters estimation some statistical methods, suchas regularizations through shrinkage, are often desired. In this paper we investigate the family ofshrinkage estimators for the mean value under the quadratic loss function. The optimal shrinkageparameter is proposed under the scenario when the sample size is fixed and the dimension is large.We then construct a shrinkage-based diagonal discriminant rule by replacing the sample meanby the proposed shrinkage mean. Finally, We demonstrate via simulation studies and real dataanalysis that the proposed shrinkage-based rule outperforms its original competitor in a wide rangeof settings.

52

Page 53: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (43)Paper: Design considerations for cluster randomized trials in HIV preventionAuthor: Rui Wang

Harvard School of Public [email protected]

Co-Authors: Ravi Goyal; Victor De GruttolaAbstract: Cluster randomized trials have been utilized to evaluate the effectiveness of HIV pre-vention strategies when the study endpoint is population-level incidence. Outcomes of individualswithin the same community are expected to be more similar than those from individuals in differentcommunities inducing a correlation among study participants. Available sample size formulas forcluster randomized trials can make use of a ’design effect’– a function of intraclass correlation. Suchapproaches have been derived from random effects models in which cluster-level random effects areassumed to be independent as are individual outcomes within a cluster (e.g., an exchangeable cor-relation structure). These assumptions are likely to be violated when outcome is HIV infectionbecause correlation between two individuals in a partnership would be expected to be higher thanthat between two individuals who are far apart in a sexual network. We investigate analytically andthrough simulations how deviations from an exchangeable correlation structure affect study power.In the design of new CDC-Botswana Harvard Partnership HIV prevention study, we use simulationstudies to predict effect size and investigate power under different study designs and conditionsregarding individual behaviors and community characteristics. We develop a new method to gen-erate a robust collection of sexual networks utilizing both the estimated degree mixing matrix andits sampling variability. In order to model realistic community level correlation structure, we firstgenerate a collection of sexual networks using data from a BHP pilot study in Mochudi, Botswanaand a network study in Likoma island, Malawi, and then propagate disease on these networks.Simulations allow us to take into consideration some sexual network characteristics, such as mixingwithin and between communities, as well as coverage levels for different prevention modalities inthe combination prevention package under study.

53

Page 54: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (44)Paper: Bayesian analysis of high-throughput quantitative measurement of protein-DNA interac-tionsAuthor: Katerina Kechris

University of Colorado [email protected]

Co-Authors: David Pollock; Jason de Koning; Hyun-min Kim; Todd Castoe; Mair ChurchillAbstract: Transcriptional regulation depends upon the binding of transcription factor (TF) proteinsto DNA in a sequence-dependent manner. Although many experimental methods address theinteraction between DNA and proteins, they generally do not comprehensively and accuratelyassess the full binding repertoire (the complete set of sequences that might be bound with at leastmoderate strength). Here, we develop and evaluate through simulation an experimental approachthat allows simultaneous high-throughput quantitative analysis of TF binding affinity to thousandsof potential DNA ligands. Tens of thousands of putative binding targets can be mixed with a TF,and both the pre-bound and bound target pools sequenced. A hierarchical Bayesian Markov chainMonte Carlo approach determines posterior estimates for the dissociation constants, sequence-specific binding energies, and free TF concentrations. A unique feature of our approach is thatdissociation constants are jointly estimated from their inferred degree of binding and from a modelof binding energetics, depending on how many sequence reads are available and the explanatorypower of the energy model. Therefore, this method is theoretically capable of rapid and accurateelucidation of an entire TF-binding repertoire.

54

Page 55: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (45)Paper: Adaptive LASSO for Varying-Coefficient Partially Linear Measurement Error ModelsAuthor: HaiYing Wang

University of Missouri - [email protected]

Co-Authors: Zou, Guohua and Wan, Alan T. K.Abstract: This paper extends the adaptive LASSO (ALASSO) for simultaneous parameter esti-mation and variable selection to a varying-coefficient partially linear model where some of thecovariates are subject to measurement errors of an additive form. We draw comparisons with theSCAD, and prove that both the ALASSO and SCAD attain the oracle property under this setup.We further develop an algorithm in the spirit of LARS for finding the solution path of the ALASSOin practical applications. Finite sample properties of the proposed methods are examined in a sim-ulation study, and a real data example based on the U.S. Department of Agriculture’s ContinuingSurvey of Food Intakes by Individuals (CSFII) is considered.

55

Page 56: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (46)Paper: Information in a two stage adaptive optimal designAuthor: Adam Lane

University of [email protected]

Co-Authors: Yao, Ping; Flournoy, NancyAbstract: In adaptive optimal designs, each stage uses an optimal design evaluated at maximumlikelihood estimates that are derived using cumulative data from all prior stages. This dependencyaffects the properties of maximum likelihood estimates. To illuminate the effects, we assume forsimplicity that there are only two stages and that the first stage design is fixed. The informationmeasure most commonly used in the optimal design literature is compared with Fisher’s informa-tion. To make the analysis explicit, responses are assumed to be normal with a one parameterexponential mean function. With this model, several estimates of information are compared and aprocedure for selecting the proportion of subjects allocated to stage 1 is recommended.

56

Page 57: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (47)Paper: Likelihood-Based Prediction of Endpoint Occurrences in Clinical TrialsAuthor: Megan Smith

University of [email protected]

Co-Authors: Ying Qing ChenAbstract: The HIV Prevention Trials Network (HPTN) 052 Study is a Phase III, controlled, ran-domized clinical trial to assess the effectiveness of immediate versus delayed antiretroviral therapy(ART) strategies on sexual transmission of Human Immunodeficiency Virus Type-1 (HIV-1). Aninterim analysis of this study was reviewed on April 28, 2011. The analysis results were hailed bythe journal Science as the Breakthrough of the Year 2011. This talk demonstrates a likelihood-basedstatistical method to determine the timing of the interim analysis for this landmark HIV/AIDSprevention study.

57

Page 58: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (48)Paper: Spatial analysis of areal unit dataAuthor: Debashis Mondal

University of [email protected],edu

Co-Authors:Abstract: In spatial statistics, there is great interest in analysis of areal unit data with poten-tial applications in agricultural variety trials, geographic epidemiology, pycnophylactic and otherarea-to-point interpolations, histogram smoothing, climate downscaling, small area estimation andnumerous other areas of science. In this talk, I propose stochastic spatial modeling of areal unit datausing generalized linear models and (generalized) Gaussian Markov random fields. My approachleads to interesting continuum random field theories that are geostatistical in nature, and I provideapproximations of the proposed stochastic models based on lattice Gaussian Markov random fields.The sparse structures of lattice Gaussian Markov random fields in turn give rise to fast and efficientstatistical computations. And I show some applications based on this method for Gaussian arealdata as well as for Poisson areal count data

58

Page 59: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (49)Paper: Modeling Criminal Careers as Departures from a Unimodal Population Age-Crime Curve:The Case of Marijuana UseAuthor: Donatello Telesca

[email protected]

Co-Authors:Abstract: A major aim of longitudinal analyses of life course data is to describe the within- andbetween-individual variability in a behavioral outcome, such as crime. Statistical analyses of suchdata typically draw on mixture and mixed-effects growth models. In this work, we present afunctional analytic point of view and develop an alternative method that models individual crimetrajectories as departures from a population age-crime curve. Drawing on empirical and theoreti-cal claims in criminology, we assume a unimodal population age-crime curve and allow individualexpected crime trajectories to differ by their levels of offending and patterns of temporal misalign-ment. We extend Bayesian hierarchical curve registration methods to accommodate count dataand to incorporate influence of baseline covariates on individual behavioral trajectories. Analyzingself-reported counts of yearly marijuana use from the Denver Youth Survey, we examine the influ-ence of race and gender categories on differences in levels and timing of marijuana smoking. Wefind that our approach offers a flexible and realistic model for longitudinal crime trajectories thatfits individual observations well and allows for a rich array of inferences of interest to criminologistsand drug abuse researchers.

59

Page 60: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (50)Paper: Segmentation and Intensity Estimation for Microarray Images with Saturated PixelsAuthor: Yan Yang

Arizona State [email protected]

Co-Authors: Phillip Stafford; YoonJoo KimAbstract: Microarray image analysis processes scanned digital images of hybridized arrays to pro-duce the input spot-level data for downstream analysis, so it can have a potentially large impacton those and subsequent analysis. Signal saturation is an optical effect that occurs when somepixel values for highly expressed genes exceed the upper detection threshold of the scanner soft-ware. In practice, spots with a sizable number of saturated pixels are often flagged and discarded.Alternatively, the saturated values are used without adjustments for estimating spot intensities.The resulting expression data tend to be biased downwards and can distort high-level analysis thatrelies on these data. We developed a flexible mixture model-based segmentation and spot intensityestimation procedure that accounts for saturated pixels by incorporating a censored component inthe mixture model. As demonstrated with biological data and simulation, our method extends thedynamic range of expression data beyond the saturation threshold and is effective in correctingsaturation-induced bias when the lost information is not tremendous. We further illustrate theimpact of image processing on downstream classification, showing that the proposed method canincrease diagnostic accuracy using data from a lymphoma cancer diagnosis study. The presentedmethod adjusts for signal saturation at the segmentation stage that identifies a pixel as part ofthe foreground, background or other. The cluster membership of a pixel can be altered versustreating saturated values as truly observed. Thus, the resulting spot intensity estimates may bemore accurate than those obtained from existing methods that correct for saturation based onalready segmented data. As a model-based segmentation method, our procedure is able to identifyinner holes, fuzzy edges and blank spots that are common in microarray images. The approach isindependent of microarray platform and applicable to both single- and dual-channel microarrays.

60

Page 61: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (51)Paper: Estimating Regression Parameters in an Extended Proportional Odds ModelAuthor: Ying Qing Chen

Fred Hutchinson Cancer Research [email protected]

Co-Authors: Nan Hu; Su-Chun Cheng; Philippa Musoke; Lue Ping ZhaoAbstract: The proportional odds model may serve as a useful alternative to the Cox proportionalhazards model to study association between covariates and their survival functions in medicalstudies. In this article, we study an extended proportional odds model that incorporates the so-called ’external’ time-varying covariates. In the extended model, regression parameters have adirect interpretation of comparing survival functions, without specifying the baseline survival oddsfunction. Semiparametric and maximum likelihood estimation procedures are proposed to estimatethe extended model. Our methods are demonstrated by Monte-Carlo simulations, and appliedto a landmark randomized clinical trial of a short course Nevirapine (NVP) for mother-to-childtransmission (MTCT) of human immunodeficiency virus type-1 (HIV-1). Additional applicationincludes analysis of the well-known Veterans Administration (VA) Lung Cancer Trial.

61

Page 62: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (52)Paper: Inferring Gene Regulatory Networks by Integrating Perturbation Screens and Steady-StateExpression ProfilesAuthor: Ali Shojaie

University of [email protected]

Co-Authors: Alexandra Jauhiainen; Michael Kallitsis; George Michailidis (Univ of Michigan)Abstract: Reconstructing a transcriptional regulatory network is an important task in functionalgenomics. Data obtained from experiments that perturb genes by knock-outs or RNA interferencecontain useful information for addressing the reconstruction problem. However, such data canbe limited in size and/or expensive to acquire. On the other hand, observational data of theorganism in steady state are more readily available, but their informational content inadequatefor the task at hand. We develop a computational approach to appropriately utilize both datasources for estimating a regulatory network. The proposed method offers significant advantagesover existing techniques. We develop a three-step algorithm to estimate the underlying directedacyclic regulatory network that uses as input both perturbation screens and steady state geneexpression data. In the first step, the algorithm determines causal orderings of the genes that areconsistent with the perturbation data. In the second step, for each ordering, a regulatory networkis estimated using a penalized likelihood based method, while in the third step a consensus networkis constructed from the highest scored ones. The algorithm offers two options for determining allpossible causal orderings: an exhaustive search that becomes prohibitive for larger scale problemsand a fast heuristic that couples a Monte Carlo technique with a fast search algorithm. Further, it isestablished that the algorithm produces a consistent estimate of the regulatory network. Numericalresults show that the algorithm performs well in uncovering the underlying network and clearlyoutperforms competing approaches that rely only on a single data source.

62

Page 63: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (54)Paper: Ascertainment correction for a population tree via a pruning algorithm for computation ofcomposite likelihood from dependent genetic lociAuthor: Arindam RoyChoudhury

Columbia [email protected]

Co-Authors:Abstract: We present a method for correcting ascertainment-bias in a coalescent-based compos-ite likelihood estimation method for estimating population trees from dependent loci. In a set ofindividuals, the allele counts in nearby genetic loci are usually associated with each other due tolinkage disequilibrium and joint coalescent events. In some analyses the parameters of interestcould be estimated from the marginal distributions of these loci, where each locus is a separatedata-point. As they are dependent, it is often difficult to derive the joint likelihood from multipleloci. To simplify the derivation, we treat the loci as if they are independent, computing a compositelikelihood. To correct for the ascertainment-bias in the composite likelihood we compute the prob-ability of allele-counts conditioned on the locus being included. This conditional probability is theproduct over uncorrected marginal likelihood divided by the inclusion-probability across the loci.We modify the computation so that the ascertainment-bias-corrected composite likelihood couldbe computed with a single run of the algorithm. Our computation is exact and avoids Monte-Carlobased methods. We will also present a modification of our methods for missing data.

63

Page 64: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (55)Paper: Optimally-Smoothed Maps of Pollution Source Potential via Particle Back Trajectories andFiltered KrigingAuthor: William Christensen

Brigham Young [email protected]

Co-Authors:Abstract: For over 20 years, the Potential Source Contribution Function (PSCF) has been usedby the aerosol research community to identify regions around an air-quality receptor location thatare associated with high levels of a pollutant. PSCF uses particle back trajectories and measuredlevels of a pollutant in order to link high-measurement days with specific back trajectories. For agiven rectangular area (s) on a map, the probability (s) that the area contains an important sourceof the pollutant of interest is estimated with p(s) = X(s)/n(s), where n(s) is the number of backtrajectories that runs through that area and X(s) is the number of those back trajectories thatwere associated with a ’high’ day for the pollutant. Results are generally illustrated with a PSCFplot in which p(s) is plotted at each area (or pixel) on the map. In this talk, we use the basic notionof the PSCF plot and propose a modified potential source map that exploits both prior knowledgeabout π(s) and a recently-developed filtered kriging approach. Results are illustrated using datafrom air quality receptors in Pittsburgh and Southern California.

64

Page 65: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (56)Paper: Construction of simultaneous confidence intervals for ratios of means of lognormal distribu-tions using two-step method of variance estimates recoveryAuthor: Amany Abdel-Karim

Tanta [email protected]

Co-Authors:Abstract: This research considers construction of simultaneous confidence intervals for ratios ofmeans of independent lognormal distributions. Two approaches using a two-step method of varianceestimates recovery are proposed. One approach deals with using fiducial generalized confidenceintervals in the first step and the method of variance estimates recovery in the second step (FGCIs-MOVER) . The other approach concerns using the MOVER in the first and second steps (MOVER-MOVER). The performance of the proposed approaches is compared to another existing approach,simultaneous fiducial generalized confidence intervals (SFGCIs). Monte Carlo simulation is used toevaluate the performance of these approaches in terms of coverage probability, mean interval width,and time consumption. A clinical example is used to illustrate the proposed approaches. Simulationstudies show that the three approaches have a satisfactory performance. The MOVER-MOVERapproach outperforms the other approaches in terms of interval width and time consumption. Theimprovements are more significant with increasing sample sizes and decreasing variances values.The results of the real data example are consistent with the results of the simulation studies.

65

Page 66: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (57)Paper: Estimating the impact of community level interventions: The SEARCH Trial and HIVPrevention in Sub-Saharan AfricaAuthor: Maya Petersen

University of California, Berkeley School of Public [email protected]

Co-Authors: Joshua Schwab; Laura Balzer; Mark van Der Laan; The SEARCH TeamAbstract: Evaluation of community level interventions to prevent HIV infection presents significantmethodological challenges. Even when it is feasible to randomly assign a treatment versus controllevel of the intervention to each community in a sample, measurement of incident HIV infectionremains difficult. In this talk we describe an experimental design developed for the SEARCHTrial, a large community randomized trial that will evaluate the impact of expanded treatment onincident HIV and other outcomes. Regular community-wide testing campaigns are conducted anda random sample of community members who fail to attend a campaign are tracked. The datagenerated by this experiment are subject to non-monotone missingness; however, the missing atrandom assumption is known to hold by design, and the missingness mechanism is known. Wepresent two approaches to estimating the effect of the randomized intervention on incident HIVinfection using these data. The first approach is based on targeted maximum likelihood estimationof a community level outcome based on the community specific sample, followed by a communitylevel ANCOVA to estimate the treatment effect. The second approach applies a single targetedmaximum likelihood estimator to pooled individual level data. The methods described can alsobe applied, under additional non-testable assumptions, to estimate the effects of non-randomizedcommunity level interventions in the setting of incomplete tracking success.

66

Page 67: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (58)Paper: K-Bayes Reconstruction and Estimation of Functional MRIAuthor: John Kornak

University of California, San [email protected]

Co-Authors:Abstract: A statistical procedure (K-Bayes) is presented for the improved reconstruction and analy-sis of functional Magnetic Resonance Imaging (MRI) of the brain. K-Bayes improves image quality,spatial resolution and activation pattern estimation while overcoming inherent limitations and ar-tifacts of conventional discrete Fourier transform (DFT) based reconstruction. K-Bayes achievesthese improvements by 1) using high-resolution anatomical prior information from structural MRIto provide constraints for the functional MRI process; and 2) simultaneously relating the functionalactivation pattern to be estimated to the sampled data points in k-space (frequency space).

67

Page 68: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (59)Paper: Estimating the efficacy of pre-exposure prophylaxis for HIV prevention among participantswith a threshold level of drug concentrationAuthor: James Dai

Fred Hutchinson Cancer Research [email protected]

Co-Authors: Peter Gilbert, Jim Hughes, Elizabeth BrownAbstract: Assays to detect antiretroviral drug levels in study participants are increasingly popularin pre-exposure prophylaxis (PrEP) trials as they provide an objective measure of adherence. Cur-rent correlation analyses of drug concentration data are prone to bias. In this article, the authorsformulate the causal estimand of prevention efficacy among drug compliers, those who would havehad a threshold level of drug concentration had they been assigned to the drug arm. The identi-fiability of the causal estimand is facilitated by exploiting the exclusion restriction, that is, drugnoncompliers do not acquire any prevention benefit. In addition, the authors develop an approachto sensitivity analysis that relaxes the exclusion restriction. Applications to published data fromtwo PrEP trials, namely the Pre-exposure Prophylaxis Initiative (iPrEx) trial and the Centre forthe AIDS Programme of Research in South Africa (CAPRISA) 004 trial, suggest high efficacy es-timates among drug compliers (odds ratio 0.097 and 95% confidence interval [0.027,0.352] in theiPrEx trial, odds ratio 0.104 and 95% confidence interval [0.024,0.447] in the CAPRISA 004 trial).In summary, the proposed inferential method provides an unbiased assessment of PrEP efficacyamong drug compliers, thus adding to the primary intent-to-treat analysis and correlation analysesof drug concentration data.

68

Page 69: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (61)Paper: MIMOSA: Mixture Models for Single Cell Assays With Applications to Vaccine StudiesAuthor: Greg Finak

Fred Hutchinson Cancer Research [email protected]

Co-Authors: SC De Rosa; M Roederer; R GottardoAbstract: Immunological endpoints in vaccine trials are measured through a variety of assays thatprovide single-cell measurements of multiple genes and proteins in specific immunological cell pop-ulations. Using single-cell data, we consider the problem of identifying subjects where these cellpopulations exhibit differential responses under different experimental conditions. For example,in the intracellular cytokine staining assay from flow cytometry, individual cells are classified aseither positive or negative for a marker based on a predetermined threshold. The assay is used toassess an individual’s immune response to a vaccine by measuring the number of antigen-specificcells producing different cytokines in different T-cell subpopulations in response to different antigenstimulations. Individuals whose T-cells exhibit increased production of a cytokine in response tostimulation are termed ’positive’ for that cytokine, and multiple such ’positivity calls’ are used toidentify vaccine responders. Here we present a framework based on mixtures of Beta-binomial orDirichlet-Multinomial distributions for analyzing count data derived from such single-cell assays.Our method models cellular responses in a marker-specific manner, treating the responding andnon-responding observations as separate components in the model. Cell counts from the differentexperimental conditions are modelled independently, while sharing information across respondingand non-responding observations through empirical Bayes priors in order to increase the sensitivityand specificity of positivity calls. We compare our method against Fisher’s exact test and showhow it can be extended to model multivariate (multiple markers) cellular responses. In simulationsand in HIV vaccine trial data we find that our method has higher sensitivity and specificity thanFisher’s exact test for positivity calls.

69

Page 70: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (62)Paper: Integrative Bayesian Analysis of High-Dimensional Multi-platform Genomics DataAuthor: Kim-Anh Do

UT M.D¿ Anderson Cancer [email protected]

Co-Authors: Wenting Wang; Veerabhadran Baladandayuthapani; Jeffrey Morris; Bradley Broom;Ganiraju ManyamAbstract: Analyzing data from multi-platform genomics experiments combined with patients’ clin-ical outcomes helps us understand the complex biological processes that characterize a disease, aswell as how these processes relate to the development of the disease. Current integration approachesthat treat the data are limited in that they do not consider the fundamental biological relationshipsthat exist among the data from platforms. We propose an integrative Bayesian analysis of genomicsdata (iBAG) framework for identifying important genes/biomarkers that are associated with clinicaloutcome. This framework uses a hierarchical modeling technique to combine the data obtained frommultiple platforms into one model. We assess the performance of our methods using several syn-thetic and real examples. Simulations show our integrative methods to have higher power to detectdisease-related genes than non-integrative methods. Using The Cancer Genome Atlasglioblastomadataset, we apply the iBAG model to integrate expression and methylation data to study theirassociations with patient survival. Our proposed method discovers multiple methylation-regulatedgenes that are related to patient survival, most of which have important biological functions inother diseases but have not been previously studied in glioblastoma.

70

Page 71: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (63)Paper: An Alternative Characterization of Hidden Regular Variation in Joint Tail ModelingAuthor: Grant Weller

Colorado State [email protected]

Co-Authors: Dan CooleyAbstract: In modeling the joint upper tail of a multivariate distribution, a fundamental deficiencyof classical extreme value theory is the inability to distinguish between asymptotic independenceand exact independence. In this work, we examine multivariate threshold modeling based on theframework of regular variation on cones. Tail dependence is described by an angular measure, whichin some cases is degenerate on joint tail regions, despite strong sub-asymptotic dependence in suchregions. The canonical example is a bivariate Gaussian distribution with any correlation less thanone. Hidden regular variation (Resnick, 2002), a second-order tail decay on these regions, offers arefinement of the classical theory. Previous characterizations of random vectors with hidden regularvariation are not well-suited for joint tail estimation in finite samples, and estimation approachesthus far have been unable to model both the heavier-tailed regular variation and the hidden regularvariation simultaneously. We propose to represent a random vector with hidden regular variationas the sum of independent first- and second-order regular varying pieces. We show our modelis asymptotically valid via the concept of multivariate tail equivalence, and illustrate simulationmethods with the bivariate Gaussian example. Finally, we outline a framework for estimation fromour model via the EM algorithm.

71

Page 72: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (64)Paper: Failure-time imputation of interval-censored time-to-event data through joint modeling witha longitudinal outcomeAuthor: Darby Thompson

EMMES [email protected]

Co-Authors:Abstract: Analysis of interval-censored time-to-event data commonly relies on imputed event-timesto permit standard proportional hazard analyses. Imputation techniques such as mid-point impu-tation, multiple-imputation, or non-parametric (eg: Finkelstein or kernel-smoothing) imputationcan provide adequate solutions in many situations. Often, however, a longitudinal covariate isobserved which may be related to the failure-time process and these techniques fail incorporatethis information. This talk will introduce a method which leverages the relationship between alongitudinal covariate and the event process to provide more accurate imputed failure times andimprove the accuracy and efficiency of treatment effect estimation. The method involves itera-tive application and solution of a conditional-independence joint model. Joint models, particularlyconditional-independence models of longitudinal and right-censored time-to-event outcomes, willbe reviewed and include a motivating example. Imputation in a designed experiment and resultsof simulation studies will also be provided and will demonstrate the performance of the techniquein many situations.

72

Page 73: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (65)Paper: Bivariate Nonlinear Mixed Models for Longitudinal Dichotomous Response Data: Estimat-ing Temporal Relationships Among Daily Marijuana, Cigarette, and Alcohol UseAuthor: Susan Mikulich-Gilbertson

University of Colorado Anschutz Medical [email protected]

Co-Authors: Gary ZerbeAbstract: We model and estimate temporal relationships of marijuana use with cigarette and alco-hol use in adolescent patients with bivariate nonlinear mixed models appropriate for longitudinaldichotomous response data. Assuming conditional independence given the random effects elimi-nates the need for assuming multivariate normality among residuals and facilitates the modelingof bivariate responses across time with missing at random incomplete data. Useful properties arethat responses: 1) do not need to be completely independent but only conditionally independent,2) do not need to be measured at the same times, allowing for missing data, and 3) might stillbe correlated through the normally distributed random subject effects. Data are from a pharma-cotherapy trial of Osmotic-Release Methylphenidate for Attention-Deficit Hyperactivity Disorderwith cognitive behavioral therapy (CBT) for co-occurring substance use disorders (Riggs et al,2011). Data utilized in these analyses consist of daily reports of use (yes/no) of each drug withup to 112 observations per drug on the subset of 64 adolescents who completed the trial (receivingadequate CBT therapy) and who used marijuana at least 15 days and smoked cigarettes at least15 days of the 28 days prior to baseline. Models were implemented in SAS Proc NLMIXED.

73

Page 74: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (66)Paper: Designing intervention studies with multiple intermediate biomarker outcomesAuthor: Loki Natarajan

University of California San [email protected]

Co-Authors: Hongying LiAbstract: Prevention trials aim to reduce disease risk. Assessing the efficacy of such trials, however,requires long-term follow-up, which is often not feasible. Instead, intervention effects on interme-diate outcomes, such as biomarkers, may be more easily obtained. Of note, lifestyle interventionsfrequently impact multiple pathways and biomarkers, e.g. weight loss interventions could influ-ence both inflammation and insulin pathways. Rarely are there existing models that considers allrelevant biomarkers together. Hence how to design a study that translates intervention relatedchanges in multiple, possibly correlated biomarkers into survival benefits raises analytic challenges.To address this, we will develop a method that yields appropriate relative weights for differentbiomarkers, and use these weights to create a composite risk score. The efficacy of the interventiontrial is assessed by the change of the risk score. The appropriate relative weights are derived bycomparing a disease model conditioning on the joint distribution of the markers to the correspond-ing marginal models, and incorporating the correlations among the markers. We will describe theproposed analytic method in a logistic regression setting and will present simulation studies toevaluate our proposed method.

74

Page 75: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (67)Paper: Effectively selecting a target population for a future comparative studyAuthor: Lihui Zhao

Northwestern [email protected]

Co-Authors: Lu Tian; Tianxi Cai; Brian Claggett; Lee-Jen WeiAbstract: When comparing a new treatment with a control in a randomized clinical study, the treat-ment effect is generally assessed by evaluating a summary measure over a specific study population.The success of the trial heavily depends on the choice of such a population. In this research, weshow a systematic, effective way to identify a promising population, for which the new treatmentis expected to have a desired benefit, using the data from a current study involving similar com-parator treatments. Specifically, with the existing data we first create a parametric scoring systemusing multiple covariates to estimate subject-specific treatment differences. Using this system, wespecify a desired level of treatment difference and create a subgroup of patients, defined as thosewhose estimated scores exceed this threshold. An empirically calibrated group-specific treatmentdifference curve across a range of threshold values is constructed. The population of patients withany desired level of treatment benefit can then be identified accordingly. To avoid any “self-serving”bias, we utilize a cross-training-evaluation method for implementing the above two-step procedure.Lastly, we show how to select the best scoring system among all competing models. The proposalsare illustrated with the data from a randomized comparative study.

75

Page 76: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (68)Paper: Adjusting for time-dependent sensitivity in an illness-death modelAuthor: Elizabeth Teeple

University of [email protected]

Co-Authors: Elizabeth BrownAbstract: We consider an illness-death model with disease status assessed at scheduled visits us-ing an imperfect diagnostic test. Under this scenario, a participant’s true state may be unknownat the start and end times of the study, and the detection of transitions into illness may be de-layed or missed altogether. When the test has imperfect sensitivity, but perfect specificity, theadditional uncertainty can be captured as a random variable measuring delay in detection. Thecumulative distribution then defines a sensitivity function that increases over time. We present amaximum likelihood based illness-death model that accounts for imperfect sensitivity by includingthe delay as an exponential distribution. We apply this method to estimate the rate of post-natal mother-to-child transmission of HIV (MTCT) and examine covariate effects on transmissionthrough Cox-proportional hazards regression. Additionally, we allow for non-homogeneity by speci-fying transition rates as penalized B-splines and examine the model under Markov and semi-Markovassumptions.

76

Page 77: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (70)Paper: Fast Computation for Genome Wide Association Studies using Boosted One-Step StatisticsAuthor: Arend Voorman

University of [email protected]

Co-Authors: Kenneth Rice; Thomas LumleyAbstract: Statistical analyses of Genome Wide Association Studies require fitting large numbersof very similar regression models, each with low statistical power. Taking advantage of repeatedobservations or correlated phenotypes can increase this statistical power, but fitting the morecomplicated models required can make computation impractical. In this paper we present simplemethods that capitalize on the structure inherent in GWAS studies to dramatically speed up com-putation for a wide variety of problems, with a special focus on methods for correlated phenotypes.Availability: The R package ’boss’ is available on the Comprehensive R Archive Network (CRAN)at http://cran.r-project.org/web/packages/boss/

77

Page 78: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (71)Paper: Single Arm Trial Design with Comparison to a Heterogeneous Set of Historical ControlTrialsAuthor: Arzu Onar-Thomas

St Jude Children’s Research [email protected]

Co-Authors:Abstract: This presentation will focus on a design strategy for a single arm registration study. Theapproach utilizes historical data from 7 contemporary single-arm trials conducted by the PediatricBrain Tumor Consortium and the Children’s Oncology Group for newly diagnosed diffuse intrinsicpontine glioma patients. The overall design methodology borrows from the meta-analysis literaturewhere the outcomes from previous trials are treated as random effects from a common population.A prediction interval approach is then used to determine whether the outcome from the new trialcan be considered as an improvement over historical data. The statistical properties of designsbased on median overall survival and log-rank tests treating one of the trials as baseline will becompared with respect to power and sensitivity to model assumptions.

78

Page 79: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (72)Paper: A dropout-varying generalized linear mixed-effects modelAuthor: Jeri Forster

University of Colorado Denver School of [email protected]

Co-Authors: Samantha MaWhinney; Xinshuo WangAbstract: Longitudinal cohorts are a valuable research resource and many include binary outcomesof interest such as undetectable viral load (yes/no) in HIV studies. Additionally, dropout due toloss to follow-up and death can be common. When the probability of dropout depends on theunobserved outcomes, even after conditioning on observable data, the missing data are missingnot at random and therefore nonignorable. Despite the likelihood of nonignorable dropout, manylongitudinal cohort studies have used traditional methods, potentially yielding biased results. Wehave developed a dropout-varying generalized linear mixed-effects model approach (NSVbin). TheNSVbin is a mixture model that semiparametrically models dropout time using natural cubic B-splines. Mixture models account for dropout by factoring the joint outcome-dropout distributioninto the dropout time distribution, f(u), and the distribution of the outcome given dropout, f(y|u).We assume a binary response. Marginal estimates are obtained from the conditional model andestimates of interest such as odds ratios and probabilities follow as in standard methods. Simulationstudies that both meet and violate the smoothness and continuity assumptions of the mixturemodels will be presented.

79

Page 80: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (73)Paper: Estimating the impact of community level interventions: The SEARCH ¿¿¿ Trial and HIVPrevention in Sub-Saharan AfricaAuthor: Laura Balzer

University of California, Berkeley School of Public [email protected]

Co-Authors: Maya Petersen; Mark van der Laan; Joshua Schwab; The SEARCH TeamAbstract: Evaluation of community level interventions to prevent HIV infection presents significantmethodological challenges. Even when it is feasible to randomly assign a treatment versus controllevel of the intervention to each community in a sample, measurement of incident HIV infectionremains difficult. In this talk we describe an experimental design developed for the SEARCHTrial, a large community randomized trial that will evaluate the impact of expanded treatment onincident HIV and other outcomes. Regular community-wide testing campaigns are conducted anda random sample of community members who fail to attend a campaign are tracked. The datagenerated by this experiment are subject to non-monotone missingness; however, the missing atrandom assumption is known to hold by design, and the missingness mechanism is known. Wepresent two approaches to estimating the effect of the randomized intervention on incident HIVinfection using these data. The first approach is based on targeted maximum likelihood estimationof a community level outcome based on the community specific sample, followed by a communitylevel regression to estimate the treatment effect. The second approach applies a single targetedmaximum likelihood estimator to pooled individual level data. The methods described can alsobe applied, under additional non-testable assumptions, to estimate the effects of non-randomizedcommunity level interventions in the setting of incomplete tracking success.

80

Page 81: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (75)Paper: Higher Order Asymptotics for Negative Binomial Regression Inferences from RNA-Sequencing DataAuthor: Yanming Di

Oregon State [email protected]

Co-Authors: Sarah C Emerson; Daniel W Schafer; Jeff A Kimbrel; Jeff H ChangAbstract: RNA sequencing (RNA-Seq) has become the technology of choice for mapping and quan-tifying transcriptome and for studying gene expression. The negative binomial (NB) distributionhas been shown to be a useful model for frequencies of mapped RNA-Seq reads. The NB model usesa dispersion parameter to capture the extra-Poisson variation commonly observed in RNA-Seq readfrequencies. An extension to NB regression is needed to permit the modeling of gene expressionas a function of explanatory variables and to compare groups after accounting for other factors.A considerable obstacle in the development of NB regression is the lack of accurate small-sampleinference for the NB regression coefficients. The exact test available for two-group comparisonsdoes not extend. Asymptotic inferencesthrough Wald test and the likelihood ratio testare math-ematically justified only for large sample sizes. Because of the labor associated with RNA-Seqexperiments, sample sizes are almost always small. There is an obvious concern that the large-sample tests may be inappropriate for such small sample sizes. In this paper we address that issueby showing that likelihood ratio tests for regression coefficients in NB regression models, possiblywith a higher-order asymptotic (HOA) adjustment, are nearly exact, even for very small samplesizes. In particular, we demonstrate that 1) the HOA-adjusted likelihood ratio test p-values are,for practical purposes, indistinguishable from exact test p-values in situations where the exact testis available and 2) via simulation, that the behavior of the test matches the nominal specifica-tions more generally. With HOA inference, the negative binomial regression will provide statisticalmodeling tools for RNA-Seq analysis that are on par with those currently availablethrough moretraditional regression analysisfor microarray analysis. Additionally, this important application toanalysis of biological data will draw attention to HOA, a somewhat neglected yet extremely usefuldevelopment of modern statistical theory.

81

Page 82: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (76)Paper: Bayesian Semi-parametric Analysis of Multi-rater Ordinal Data, With Application to Pri-oritizing Research Goals For Suicide PreventionAuthor: Terrance Savitsky

RAND [email protected]

Co-Authors: Siddhartha DalalAbstract: This article devises a Bayesian formulation for analysis of ordinal multi-rater data ac-quired from an on-line generalization of the Delphi elicitation process whose scalability in thenumber of participants, or raters, results in employment of typically many hundreds of raters, J ,scoring relatively few items, N , such that J � N . The elicitation employs two rounds of scoringsby raters on items, allowing for the possibility of rater learning expressed in the evolution of theirscores. Raters are typically drawn from multiple domains of expertise and express heterogeneityand grouping in their beliefs about the items rated that generate multi-modality and edge effectsfor their observed ordinal scores. Our approach performs a robust semi-parametric extension oftypically-used parametric, multi-rater ordinal models by explicitly parameterizing rater beliefs andpermitting the data to discover rater sub-populations that expands the scope for inference to bet-ter support prioritization decisions for policy-makers. The model framework introduces a set oflatent continuous variates under employment of a semi-parametric (countably infinite) mixture ofGaussian distributions prior formulation induced with prior imposition of a Dirichlet process onmean and precision parameters indexed by rater, goal and round. The mixing is performed overraters, rather than items, in a fashion that borrows strength to capture dependence among ratersand simultaneously accomplishes dimension reduction. We leverage the by-round indexing to makeinference on shifts among raters across rounds towards consensus in underlying true beliefs andthe precisions with which their beliefs are expressed. This framework supports a broad class ofon-line crowdsourcing applications. We deploy our model for analysis of a National Action Al-liance for Suicide Prevention and RAND Corporation suicide prevention dataset acquired using theExpertLens on-line group elicitation process where raters from diverse areas of expertise performratings in multiple panels on the importance of research goals for the improvement of preventionmethods and treatments towards reducing the U.S. national suicide rate.

82

Page 83: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (77)Paper: A joint Markov chain model for the association of two longitudinal binary processesAuthor: Catherine M. Crespi

University of California Los [email protected]

Co-Authors: Sherry LinAbstract: In many longitudinal studies, two related health state processes are measured on thesame individual, and interest lies in the association between the processes. We consider the settingin which longitudinal data on two binary health state processes are collected from each individualat a series of discrete time points. We propose a joint model in which each process is modeled as afirst-order Markov chain and develop measures of association between the two processes. We applyour model to longitudinal data on viral shedding collected from individuals infected with type 2herpes simplex virus.

83

Page 84: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (78)Paper: Differential Expression Identification and False Discovery Rate Estimation in RNA-SeqDataAuthor: Jun Li

Stanford [email protected]

Co-Authors: Robert TibshiraniAbstract: RNA-Sequencing (RNA-Seq) is taking place of microarrays and becoming the primarytool for measuring genome-wide transcript expression. We discuss the identification of features(genes, isoforms, exons, etc.) that are associated with an outcome in RNA-Seq and othersequencing-based comparative genomic experiments. That is, we aim to find features that aredifferentially expressed in samples in different biological conditions or under different diseasestatuses. RNA-Seq data take the form of counts, so models based on the normal distributionare generally unsuitable. The problem is especially challenging because different sequencingexperiments may generate quite different total numbers of reads, or ’sequencing depths’. Existingmethods for this problem are based on Poisson or negative-binomial models: they are useful butcan be heavily influenced by ’outliers’ in the data. We introduce a simple, non-parametric methodwith resampling to account for the different sequencing depths. The new method is more robustthan parametric methods. It can be applied to data with quantitative, survival, two-class, ormultiple-class outcomes. We compare our proposed method to Poisson and negative-binomialbased methods in simulated and real data sets, and find that our method discovers more consistentpatterns than competing methods.

84

Page 85: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (79)Paper: Fitting and interpreting continuous-time latent Markov models for panel dataAuthor: Jane Lange

University of Washington, dept of [email protected]

Co-Authors: University of Washington, dept of StatisticsAbstract: Multistate models are used to characterize disease processes within an individual. Clin-ical studies often observe the disease status of individuals at discrete time points, making exacttimes of transition unknown. Such panel data pose considerable modeling challenges. Assuminga standard continuous time Markov chain (CTMC) yields tractable likelihoods, but the assump-tion of exponential sojourn time distributions is typically unrealistic. More flexible semi-Markovmodels permit generic sojourn distributions, yet yield intractable likelihoods for panel data in thepresence of reversible transitions. One attractive alternative is to assume the disease process ischaracterized by a underlying latent CTMC, with multiple latent states mapping to each diseasestate. These models retain analytic tractability due to the CTMC framework, but allow for flexible,duration-dependent disease state sojourn distributions. We have developed a robust and efficientexpectation-maximization (EM) algorithm in this context. Our complete data state space consistsof the observed data and the underlying latent trajectory, yielding computationally efficient expec-tation and maximization steps. Our algorithm outperforms alternative methods in terms of time toconvergence and robustness. We also examine the frequentist performance of latent CTMC pointand interval estimates of disease process functionals based on simulated data. The performanceof estimates depends on time, functional, and data-generating scenario. Finally, we illustrate theinterpretive power of latent CTMC models for describing disease processes on a data-set of lungtransplant patients. We hope our work will encourage wider use of these models in the biomedicalsetting.

85

Page 86: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (80)Paper: Multiplicity in Bayesian Graphical ModelsAuthor: Riten Mitra

ICES, University of Texas, [email protected]

Co-Authors: Peter Mueller, Yuan JiAbstract: Graphical models characterize associations among a set of random variables. We discussthe joint estimation of multiple graphical models as a multiple comparison problem. Here weextend the results of mean effects and variable selection models by focusing on the conditionalindependence structure. We also deal with multiple latent graphical structures simultaneously. Weshow that the choice of a suitable Bayesian prior matters. Compared to independent analysis ofmultiple graphs, this can lead to significant improvements in posterior inference when the totalnumber of possible edges is large. We formalized our intuition in results on KL divergence betweentwo posteriors. We then applied the joint graphical model to estimate dependencies among 18protein functional markers present in bone marrow cells under two different stimulation conditions.The data is obtained from a novel mass cytometry platform called Cytof .

86

Page 87: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (81)Paper: Statistical Criteria for Establishing the Safety and Efficacy of Allergenic Products-A Dis-cussion of the Variety of Variables that May Influence Success or FailureAuthor: Tammy Massie

Food and Drug [email protected]

Co-Authors:Abstract: Allergenic products including drugs and biologics are used for the treatment and preven-tion of a variety allergic responses. These allergic responses can be deadly reactions to bee sting,mild to severe reactions to ingestion of foods, symptoms such as congestion, wheezing, scratchy eyesand even a feeling of malaise due to seasonal allergies and even allergies to chronic exposure to itemssuch as dogs, cats, household products, and dust. The primary endpoints are frequently patientreported outcomes (PRO’s) in which the individual notes how they are feeling and how severe theirsymptoms are over time. Determining the safety and efficacy of allergenic products meant to treator prevent allergic symptoms in clinical studies is a challenging task not only because of subjectresponse variability but also because these studies can be weeks, months or even years in length.Many clinical trials for allergenic products involve environmental exposure or chamber studies eachof which have their own challenges. In this presentation the speaker will discuss specific challengesfaced in environmental exposure to seasonal allergies since some of these issues may be statisti-cal in nature. This presentation will begin with a brief introduction to commonly implementedclinical studies of allergenic products including endpoints/symptom scales and timing. Specificchallenges faced in natural exposure studies including missing values, variability in exposure dueto pollen/allergen season, variability in scoring mechanisms and influence of other concomitantallergies and will be discussed at length. Graphics highlighting both timing and symptom scoresgraphics will be presented to provide insight related to these studies. The audience will then bechallenged with how they might address these real world situations.

87

Page 88: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (82)Paper: Spatial Inference for Climate ChangeAuthor: Joshua French

University of Colorado [email protected]

Co-Authors: Stephan R. SainAbstract: The study of climate change often focuses on determining where climate change is likelyto occur and its possible magnitude. We propose methodology for drawing conclusions aboutwhere climate change is possible and/or likely to occur for various levels of temperature change.The methodology uses kriging and conditional simulation to create confidence regions for the areaswhere temperature change exceeds certain thresholds of interest. Discussion will include assessmentof future climate change for several regions of North America based on climate models from theNorth American Regional Climate Change Assessment Program (NARCCAP).

88

Page 89: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (83)Paper: A Bayesian hierarchical model for estimating and partitioning Bernstein polynomial densityfunctionsAuthor: Charlotte Gard

New Mexico State [email protected]

Co-Authors: Elizabeth R. BrownAbstract: We present a Bayesian hierarchical model for simultaneously estimating and partitioningprobability density functions. Individual density functions are modeled using Bernstein densities,which are mixtures of beta densities whose parameters depend only on the number of mixturecomponents. We place a prior on the number of mixture components and express the mixtureweights as increments of a distribution function G. We place a Dirichlet process prior on G andtreat the parameters of the Dirichlet process, the baseline distribution and the precision parameter,as random. We use a mixture of a product of beta densities to partition subjects into groups, withsubjects in the same group sharing information via a common baseline distribution. Inferenceis carried out using Markov chain Monte Carlo. We offer a computing algorithm based on theconstructive definition of the Dirichlet process. We consider the case where the number of groups isfixed and the case where the number of groups is unknown, using a birth-death algorithm to makeinference regarding the number of groups. We demonstrate the model using radiologist-specificdistributions of percent mammographic density.

89

Page 90: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (84)Paper: A multiple dimensional semi-parametric vary-coefficient approach for nonignorable dropoutin longitudinal cohort studiesAuthor: Samantha MaWhinney

Colorado School of Public [email protected]

Co-Authors: Jeri Forster; Nichole Carlson; Xinshuo WangAbstract: Longitudinal cohort studies are often impacted by nonignorable dropout and exhibitcomplex patterns over time. A motivating example is CD4 T cell counts in untreated HIV-infectedsubjects that are identified during acute or early infection. Drop out reasons include loss to follow-up, death or antiretroviral treatment initiation. Here, subjects often demonstrate an initial increasein CD4 T cell count, with the onset of immune response, generally followed by a decline overtime. For this study, methods accounting for dropout assuming a linear decline in CD4 count areinadequate. We consider a multi-dimensional, semi-parametric, varying- coefficient mixture modelmethod. The approach extends existing methods to allow coefficients to vary smoothly across thepairwise interactions between Time, baseline CD4 count and/or dropout time. Additional covariatescan be included and the dropout mechanism may differ by subject group. Group-specific marginalestimates of the CD4 trajectory are obtained by averaging over the dropout-specific curves.

90

Page 91: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (85)Paper: Heritability estimation of an ordinal trait: lessons from simulations and from analyzingosteoarthritis in pig-tailed macaquesAuthor: Peter Chi

University of [email protected]

Co-Authors: Patricia Kramer; Andrea Duncan; Vladimir MininAbstract: We examine heritability estimation of an ordinal trait for osteoarthritis, using a popula-tion of pig-tailed macaques from the Washington National Primate Research Center (WaNPRC).Modeling considerations were non-trivial, as the data were ordinal measurements from 16 interver-tebral spaces through each macaque’s spinal cord, with many missing values. Although the datacould be treated as continuous, we show that standard assumptions result in severely biased esti-mates under simulation. We thus utilize the threshold model, as first proposed by Sewell Wright(1934), and recently implemented for heritability estimation as a Bayesian routine in the MCM-Cglmm R package (Hadfield 2010). We find that our WaNPRC study sample does not provideenough information for heritability estimation, as evidenced by the resulting posterior distributionsof heritability. We thus explore sample size requirements, under our data pedigree structure andother hypothetical structures.

91

Page 92: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (86)Paper: Comparing and exploring a conservative approach to handle overrun based on maximumlikelihood orderingAuthor: Timothy Skalland

Oregon State [email protected]

Co-Authors:Abstract: A common occurrence in sequential clinical trials is the accrual of data after a stoppingregion has been reached. The data acquired after a stopping boundary has been passed is calledoverrun and several methods have been examined for dealing with overrun (Whitehead 1992,1997;Hall & Ding 2001; Hall & Liu 2002; Sooriyarachchi et al 2003). The deletion method introduced byWhitehead essentially removes the kth interim analysis where the stopping boundary was reachedand uses the overrun as the next analysis with the same error allocation as the kth interim analysis.The method of combining p-values with random weights introduced by Hall & Ding weights thep-values from the kth stopping analysis and the overrun data by using the values of Fisher’sinformation at the trials termination and Fisher’s information in the overrun. Both methods wereexplored by Sooriyarachchi et al using an analysis-time ordering of the outcome space, which ordersobservations based on the observed statistic (ie: sample mean) and the time of the analysis. Anotherapproach is to order the outcome space by the sample mean (maximum likelihood ordering), whichorders observations solely on the observed statistic. In this paper, I use simulations to comparethe deletion and combining p-values methods using these two orderings of the outcome space.Furthermore, I will explore a conservative approach to dealing with overrun using the sample meanordering and explore the power of this approach.

92

Page 93: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (87)Paper: Analytical strategies for high dimensional immunomics dataAuthor: Ann Oberg

Mayo [email protected]

Co-Authors:Abstract: Vaccines are an effective tool of modern medicine for disease prevention which havesuccessfully reduced disease burden worldwide. Despite the effectiveness, some people fail to achievean immune response following vaccination for reasons we do not completely understand. Our groupis funded to develop systems level immune profiles which explain the key drivers of immune responseto influenza vaccination. High dimensional data (mRNA Seq, methylation and protein ’profiles’)are being collected at multiple time points. We proposed a two-pronged analysis approach inorder to maximize power and minimize false discoveries. I will discuss our experiences to date withmRNA Seq data, some of the analytical challenges in this research, and ideas for solutions includingstrategies for normalization and variable reduction.

93

Page 94: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (88)Paper: Local Clustering – Biostatistics and Bioinformatics ApplicationsAuthor: Juhee Lee

UT MD Anderson Cancer [email protected]

Co-Authors: Peter Mueller; Yuan JiAbstract: We discuss several applications of an approach to model-based local clustering (NoB-LoC). The model formalizes the notion that biological samples cluster differently with respect todifferent processes, and that each process might be related to only a subset of sample. We brieflyreview the model and then discuss two important applications and possible extensions. The firstapplication is inference for RPPA data that record protein expression levels of samples from breastcancer patients. Inference of interest is to identify meaningful and comprehensive profiling ofsamples for possibly more accurate disease prognosis. The second application is protein expressiondata of cells in human bone marrow from a novel experimental platform of mass cytomery. Theinvestigators expect that subsets of markers define different sets of cell types. The grouping of cellsby protein markers requires the more general clustering framework that we develop. The examplesalso highlight a specific limitation of the NoB-LoC model. The model assumes that each protein(marker) is in at most one subset of proteins. However, consider for example protein sets thatcorrespond to different pathways. Several proteins are part of multiple pathways. Similar issuesarise in the two discussed applications. In the model this requires that the definition of proteinsubsets should allow for overlapping subsets rather than a strict partition. We briefly discuss analternative probability model that allows such inference.

94

Page 95: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (89)Paper: Mixture models vs. supervised learning for integrative genomic analysisAuthor: Daniel Dvorkin

University of Colorado Denver — Anschutz Medical [email protected]

Co-Authors: Katerina KechrisAbstract: Gene classification using multiple data sources such as expression and transcription factorbinding is an important problem in bioinformatics. Given high-quality training data, supervisedmachine learning methods such as logistic regression, naive Bayes classifiers, and random forestsare generally more powerful than unsupervised methods. However, we show that when trainingdata sets are small or contain labeling errors, unsupervised or semi-supervised mixture models aremore robust. We use both simple mixture models for single data sources and graphical mixturemodels to integrate multiple data sources. Our methods are sufficiently flexible to model the jointdistributions of heterogeneous data from a variety of biological sources and marginal probabilitydistributions. We compare mixture model methods to fully supervised machine learning methods,and discuss the circumstances under which the former may be expected to outperform the latter.Identification of critical genes in Drosophila development is used as an example of a case in whichtraining data may not be of sufficient size, quality, or completeness to justify the use of fullysupervised methods.

95

Page 96: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (90)Paper: Calibration of Computer Models for Large Multivariate Spatial Data with Climate Appli-cationsAuthor: K. Sham Bhat

Los Alamos National [email protected]

Co-Authors:Abstract: Characterizing uncertainty in climate predictions and models is crucial for climatedecision-making. One uncertainty source is due to inability to resolve complex physical processes,which is approximated using a parameterization. We characterize parameter uncertainty by cal-ibrating the computer model, where we combine model output with physical observations of theclimate process to infer these parameters. Both model output and observations are often largemultivariate spatial (or space-time) fields. We use Bayesian methods to infer these parameterswhile incorporating space-time dependence and other uncertainty sources. The climate model isapproximated using a fast surrogate, a flexible Gaussian process emulator which may include non-linear relationships among spatial fields and globally nonstationary and nonseparable covariancefunctions. Model discrepancy and observation error are incorporated in our approach, resultingin improved inference and characterization of uncertainty. Dimension reduction techniques us-ing kernel mixing and low-rank matrix identities are utilized for computational tractability. Ourapproaches are applied to complex ocean models.

96

Page 97: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (91)Paper: Incorporating Partial Phase Information into Inference of CoancestryAuthor: Chris Glazner

University of [email protected]

Co-Authors: Elizabeth ThompsonAbstract: Data used in genetic analyses typically come as genotypes with unknown phase, butknowledge of phase can greatly improve the performance of inference methods for detection ofgenome segments of shared coancestry (identity by descent or IBD). New sequencing technologiesand improved statistical phasing mean that more datasets now come with some phase information.Single molecule sequencing and statistical methods based on linkage disequilibrium allow precisephasing over short distances but cannot phase distant variants. Methods based on families orfounder populations can use inferred inheritance to assign widely spaced variants to the samehaplotype, but not all loci can be phased this way. If data are modelled only as unphased orcompletely phased, partial phase information must be thrown away or treated as more completethan it really is. Allowing intermediate possibilities makes use of all available information whilerespecting the limitations of the phasing method. In our hidden Markov model for the inferenceof genome segments shared IBD, the latent state is the locus-specific IBD pattern among thechromosomes of the individuals. If complete haplotypic data are available, the emission probabilitiesfor the data are straightforward. Where there is no phase information, the probabilities of genotypicemissions are given by assuming a uniform prior over marker phasings. Other prior distributions canexpress more complex phase information. Two forms of incomplete phasing are easily incorporatedinto the HMM, corresponding to the short- and long-range phasing referred to above. This researchwas supported in part by NIH grants R37 GM046255 and T32 GM081062.

97

Page 98: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (92)Paper: Variability in annual temperature profiles: A multivariate spatial analysis of regional climatemodel outputAuthor: Tamara Greasby

National Center for Atmospheric [email protected]

Co-Authors: Steve SainAbstract: Annual profiles of average daily temperature are one way to describe climate and climatechange. Understanding how these profiles vary across the members of a climate ensemble can pro-vide insight to both impacts researchers and climate scientists. This talk will discuss a statisticalapproach based on a Bayesian hierarchical technique to simultaneously represent how these tem-perature profiles change over time, how they vary spatially, and how they vary between differentclimate models. This approach will be demonstrated using the ensemble from North AmericanRegional Climate Change Assessment Program (NARCCAP) and will focus on changes in season-ality and the length of the hot season, in addition to other measures of interest to health impactsresearchers, such as growing degree days.

98

Page 99: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (93)Paper: Diagnostic Methods on Multiple Diagnostic Tests Without a Gold StandardAuthor: Jingyang Zhang

Fred Hutchinson Cancer Research [email protected]

Co-Authors: Ying Zhang; Kathryn Chaloner; Jack T. StapletonAbstract: When there is no gold standard available, it is common to reconcile information frommultiple imperfect diagnostic tests in order to obtain more accuracy. In this paper, we generalizethe linear discriminant method and the optimal risk score method to accommodate the situationthat is lack of a gold standard. We also study an alternative sequential diagnostic method whichdoes not require all tests to be applied to each subject. All the methods are developed under someparametric distributional assumptions. A mixture of two multivariate normal distributions is usedto fit the unclassified data and the optimal diagnostic rule for each method is derived based onthe fitted model. We provide the numerical implementation of all methods. Asymptotic resultsof statistical inferences about the methods are also given. Simulation studies are carried out tocompared the methods and the illustration with a real-life data set is included.

99

Page 100: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (94)Paper: High throughput epitope mapping via Bayesian hierarchical modeling of peptide tilingarraysAuthor: Gregory Imholte

University of Washington - [email protected]

Co-Authors: Raphael GottardoAbstract: Antibodies play a key role in the immune system by preventing and controlling infec-tion. Antibody type and binding location provide key information for both understanding naturalinfection and deriving effective vaccines. Recently, peptide microarrays tiling immunogenic regionsof pathogens (e.g. envelope proteins of a virus) have become an important high throughput toolfor querying and mapping antibody binding. The peptide microarray simultaneously screens thou-sands of peptides against sample serum, determining the presence of antibodies binding to specificpeptides on the array. Because of the many steps involved in the experimental process, from hy-bridization to image analysis, peptide microarray data often contain outliers and censored values.For example, an outlying data value could arise due to scratches or dust on the surface, imper-fections in the glass, or imperfections in array production. Censored values occur when peptidefluorescence is above the scanner detection limit. We develop a robust Bayesian hierarchical modelto test for antibody binding. Errors are modeled explicitly using a t-distribution, which accountsfor outliers. Bayesian data augmentation accounts for censored values. The model includes an ex-changeable prior for the variances, which allows different variances for the peptides but still shrinksextreme empirical variances. Due to between-subject variability of immune systems, subjects mayproduce different antibody profiles in response to an identical vaccine stimulus or infection. Ourmodel directly accounts for this by allowing each peptide to be “bound/unbound” for each indi-vidual, while sharing information across individuals when a peptide is believed to be bound by anantibody. A Metropolis-within-Gibbs sampling scheme estimates the posterior distribution. Weapply our model to several vaccine trial datasets to demonstrate model performance compared tostandard analysis tools for such data.

100

Page 101: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (95)Paper: Bayesian Elastic-Net and Fused Lasso for Semiparametric Structural Equation ModelsAuthor: Zhenyu Wang

University of Missouri, [email protected]

Co-Authors: Adam Lane; Na Hu; Sounak ChakrabortyAbstract: Structural equation models are well-developed statistical tools for multivariate data thathave latent variables. Recently, much attention has been given in developing structural equationmodels that account for nonlinear relationships between the endogenous latent variables, the covari-ates, and the exogenous latent variables. Guo et al. [2012], developed a semiparametric structuralequation model where the nonlinear functional relationships are approximated using basis expan-sions and are analyzed using a Bayesian Lasso. In this paper we consider semiparametric structuralequation models using cubic splines as the basis expansion. Cubic splines are known to induce cor-relations. Bayesian fused Lasso and Bayesian elastic-net priors are used to account for correlationsin both the covariate and basis expansions. We illustrate the usefulness of our proposed methodsthrough two simulation studies. In both simulation studies the semiparametric structural equationmodels based on Bayesian fused Lasso and Bayesian elastic-net priors outperform the BayesianLasso based model (Guo et al., 2012).

101

Page 102: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (96)Paper: Two penalized likelihood parameter estimation approaches for a multivariate stochasticdifferential equations system with partially observed discrete sparse dataAuthor: Libo Sun

Colorado State [email protected]

Co-Authors: Chihoon Lee; Jennifer HoetingAbstract: In this paper, we consider the problem of estimating parameters of a multivariate stochas-tic differential equations system. It is common, particularly for ecological problems, to have par-tially observed, discrete sparse data. We propose two penalized likelihood parameter estimationapproaches for this situation. The penalty term controls the randomness introduced by simulationof unobserved variates. In the first method, a penalty term is introduced to the simulated likelihoodand a Euler-Maruyama approximation is used to approximate the stochastic differential equations.In our other approach, we also apply a penalty term to the likelihood but use the Monte CarloExpectation Maximization for estimation. We apply the methods to a real epidemic of chronicwasting disease in mule deer, and a simulation study illustrates the effectiveness and accuracy ofour methods.

102

Page 103: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (97)Paper: An Evaluation of Different Approaches to Signature Identification in the Adaptive SignatureDesignAuthor: Mary Redman

Fred Hutchinson Cancer Research [email protected]

Co-Authors:Abstract: As our understanding of cancers as being more heterogeneous increases and agents target-ing these subgroups are developed, target populations are going to become smaller. Use of methodsdeveloped in the causal inference literature have great applicability in this setting. In the settingof randomized studies, use of approaches such as the augmented estimating equation approachesproposed by various authors (e.g. Robins, Tsiatis, and van der Laan) provide an approach to in-corporating baseline or prognostic information while still modeling marginal associations betweentreatments (either overall or within subgroups) that could achieve efficiency gains over standardapproaches. In this talk we evaluate the use of these approaches in the context of the biomarkeradaptive signature design proposed by Freidlin and Simon (CCR 2005), with the aim to bothdiscover and validate a biomarker signature within a single phase 3 study

103

Page 104: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (99)Paper: Examining Brain’s Default Mode Network differences in Male Youths with Conduct andSubstance Problems and Controls Using Group Independent Component Analysis of functionalMRI data.Author: Manish Dalwani

University of Colorado Anschutz Medical [email protected]

Co-Authors: Jason R. Tregellas; Jessica R. Andrews-Hanna; Joseph T. Sakai; Marie T. Banich;Thomas J. CrowleyAbstract: Default mode network (DMN) activity is an important neuromarker and one of the sev-eral resting state networks in brain. DMN can be obtained from resting state or active functionalmagnetic resonance imaging (fMRI) paradigms. Examining DMN using the most popularly usedstandard hemodynamic-based general linear model (GLM) has limitations. The GLM has a uni-variate approach in which a temporal model is specified and a statistical parameter is computedat each brain voxel individually. Model-based approaches are effective when the time course of thehemodynamic response can be inferred apriori. The specificity of GLM for brain regions of interesttherefore depends on the accuracy of the formulated hypothesis. The univariate approach of GLMmay lead to voxels that are of slightly different temporal behavior being co-activated thereby caus-ing a loss of sensitivity. Deconvolution of the DMN activity from other brain network activationsespecially in a busy fMRI paradigm using GLM is challenging. Independent component analysis(ICA) is multivariate data driven approach independent of any reference paradigm and hypothesis.Using spatial or temporal ICA, the brain activation pattern can be separated into meaningful inde-pendent components (including DMN) that may provide useful information about co-activation inspatially different areas of the brain. We used spatial group ICA technique to examine the DMNactivity in patients (Boys 14-18yr with conduct and substance problems (CSP)) and controls (Boys14-18yr with no CSP) when they engage themselves in a risk-taking decision paradigm ’ColoradoBalloon Game’. The novelty of this experiment is the application of ICA on a rapid-event fMRIparadigm. Using group ICA, we were able to obtain DMN activity for every subject from theactive fMRI task. Comparisons of the DMN activity between groups showed that controls havemore activity in brain regions associated with self-referential evaluation and assessment during anactive fMRI paradigm.

104

Page 105: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (100)Paper: Bayesian inference for the finite population total from a heteroscedastic probability propor-tional to size sampleAuthor: Sahar Zangeneh

University of [email protected]

Co-Authors: Roderick LittleAbstract: We study Bayesian inference for the population total in probability-proportional-to-size(PPS) sampling. The sizes of non-sampled units are not required for the usual Horvitz-Thompsonor Hajek estimates, and this information is rarely included in public use data files. Zheng andLittle (2003) showed that including the non-sampled sizes as predictors in a spline model can resultin improved point estimates of the finite population total. In Little and Zheng (2007), the splinemodel is combined with a Bayesian bootstrap (BB) model for the sizes, for point estimation whenthe sizes are only known for the sampled units. We further develop their methods by (a) includingan unknown parameter to model heteroscedastic error variance in the spline model, an importantmodeling feature in the PPS setting; and (b) developing an improved Bayesian method for includingsummary information about the aggregate size of non-sampled units. Simulation studies suggestthat the resulting Bayesian method, which includes information on the number and total size ofthe non-sampled units, recovers most of the information in the individual sizes of the non-sampledunits, and provides significant gains over the traditional Horvitz-Thompson estimator. The methodis applied to two data sets from the US Census Bureau.

105

Page 106: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (101)Paper: A Sensitivity Analysis for Clinical Trials with Informatively Censored Survival EndpointsAuthor: Eric Meier

University of [email protected]

Co-Authors: Scott EmersonAbstract: Analyses of clinical trials with time-to-event endpoints typically employ the assumption ofnoninformative censoring. While this assumption is usually appropriate for end-of-study censoring,its applicability to lost-to-follow-up censoring is often suspect and may result in biased estimatesof the treatment effect. To assess the robustness of estimates to departures from noninformativecensoring, authors have proposed sensitivity analyses that assume a semiparametric model forthe censoring mechanism, with the parameters representing associations between censoring andincreased or decreased rates of survival. The parameters are varied over a plausible range resultingin a corresponding range of estimates for the treatment effect. We consider such an approach fortwo-arm trials in which the sensitivity parameters represent hazard ratios comparing subjects whohave been lost to follow-up to all other subjects. Using hypothesized hazard ratios for each armseparately, we multiply impute the unobserved data as it might have been observed in the absenceof informed censoring. The treatment effect estimates computed using the imputed data are thensummarized in a graphical display. Of particular interest in this research is the robustness of ourapproach to violations of the proportional hazards assumptions used when imputing the missingdata. On the basis of extensive simulation studies, we find that the accuracy of the sensitivityanalyses are relatively unaffected by departures from the semiparametric assumptions.

106

Page 107: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (102)Paper: Stochastic Models of US PopulationAuthor: Asma Tahir

Colorado State [email protected]

Co-Authors: M. M. SiddiquiAbstract: For demographic and planning purposes, estimation of future US population is needed.According to the structure of the population, different trends such as polynomial and exponentialgrowth models are discussed for a macro-level analysis which gives long term results. However, to bemore realistic, a micro-level stochastic bisexual birth and death process growth model is developedusing the most recent information. The birth and death process model gives better results than thetrend models studied, comparing the variances of the projected values, and therefore is preferredfor a population projection up to the year 2020.

107

Page 108: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (103)Paper: Boosting for detection of gene-environment interactionsAuthor: Hristina Pashova

University of [email protected]

Co-Authors: Charles Kooperberg; Michael LeBlancAbstract: In genetic association studies, it is typically thought that genetic variants and environ-mental variables jointly will explain more of the inheritance of a phenotype than either of these twocomponents separately. Traditional methods to identify gene-environment interactions typicallyconsider only one measured environmental variable at a time. However, in practice, multiple envi-ronmental factors may each be imprecise surrogates for the underlying physiological process thatactually interacts with the genetic factors. In this paper we develop a variant of L2 boosting thatis specifically designed to identify combinations of environmental variables that jointly modify theeffect of a gene on a phenotype. Because the effect modifiers might have a small signal comparedto the main effects, working in a space that is orthogonal to the main predictors allows us to focuson the interaction space. In a simulation study that investigates some plausible underlying modelassumptions our method outperforms the lasso, and AIC and BIC model selection procedures ashaving the lowest test error. In an example for the WHI-PAGE study, the dedicated boostingmethod was able to pick out two single nucleotide polymorphisms for which effect modificationappears present. The performance was evaluated on an independent test set and the results arepromising.

108

Page 109: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (104)Paper: Alternatives to Penalization for Sparse ModelsAuthor: Sarah Emerson

Oregon State [email protected]

Co-Authors:Abstract: Penalized methods, such as the lasso, adaptive lasso, and L2 shrinkage, are employedin a wide variety of high-dimensional problems including regression modeling, covariance matrixestimation and decomposition, and variable selection and clustering. These methods are frequentlyapplied in analysis of genomic data, or more generally in any setting where a large number ofpredictors are available, with the goal of identifying or discriminating between phenotypes or sub-populations. In some of these settings, particularly where sparsity is desired, it is not clear that thepenalization approach or the chosen penalties are an efficient or optimal solution to the problem.While minimizing the sum of absolute values of a collection of parameters, as is done by the lasso(L1) penalty, does produce a sparse solution for most problems, it does not necessarily producethe best sparse solution, and involves the inconvenient choice of tuning parameter value requiredto obtain a desired level of sparsity. We explore computationally simpler, faster, and more directapproaches to obtaining sparse matrix decompositions and variable selection for clustering, anddemonstrate that the resulting solutions are generally superior to the lasso (L1) penalty approachin the sense that for a given degree of sparsity, our solutions retain/recover a higher proportion ofthe signal present. Furthermore, the proposed approach makes it easier to obtain a solution witha desired sparsity.

109

Page 110: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (105)Paper: Timing Chromosomal Abnormalities using Mutation DataAuthor: Elizabeth Purdom

UC, [email protected]

Co-Authors: Christine HoAbstract: Tumors accumulate large numbers of mutations and other chromosomal abnormalitiesdue to the breakdown in genomic repair mechanisms that are hallmarks of tumors. However, notall of these abnormalities are believed to be crucial for tumor growth and progression, and it is aquestion of great importance to try to identify the critical abnormalities, particularly as possibletargets for drug treatment. One important indicator of the importance of the abnormality is therelative order in which it occurred, relative to other abnormalities. Outside of animal models, wegenerally will not have tumors from multiple time points in the progression of the tumor, but ratheronly the time point at which the tumor was removed. Therefore we cannot directly observe thetemporal ordering of genomic abnormalities. However, the distribution of allele frequencies withinregions with copy number aberrations provides information about when the chromosomal abnor-mality occurred, relative to other abnormalities in the tumor. Using sequencing data, we developa probabilistic model for the observed allele frequency of a mutation (defined as the proportion ofthe number of reads covering the nucleotide position that contain the mutation) that allows us toorder abnormalities within a tumor. Our method gives a novel insight into the biology of tumorprogression through a quantitative evaluation of temporal ordering of chromosomal abnormalities.Moreover it gives a quantitative measure to compare across samples for highlighting driver muta-tions and events. This is joint work with Christine Ho, Haiyan Huang, Steffen Durinck, Paul T.Spellman, and Raymond J. Cho.

110

Page 111: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (106)Paper: Subgroup Methods Designed to Evaluate Heterogeneity of Treatment Effects: Enhancingthe Interpretation of Study Results for Making Individual Patient DecisionsAuthor: Ann Lazar

University of California, San [email protected]

Co-Authors:Abstract: Comparative studies, in particular clinical trials, are often designed to evaluate the overalltreatment effect or average treatment effect for the typical study patient. The results from a studymay indicate an overall positive effect (or negative effect) of therapy on average. Such results mayimply that all future patients will be treated with the new therapy (or standard therapy for a neg-ative study). Though the truth may indicate that the new therapy is more effective (less effectiveor harmful) for certain subgroups of patients, where the subgroups are defined according to the pa-tient characteristics. Understanding such heterogeneity would allow researchers to tailor therapiesaccording to the characteristics of an individual. In this talk, we will discuss some of the recentdevelopments in subgroup analysis, which explores whether there is evidence that the treatmenteffect depends on certain patient characteristics, designed to evaluate heterogeneity of treatmenteffects, including Subpopulation Treatment Effect Pattern Plot (STEPP), Multivariable FractionalPolynomial Interactions (MFPI) and Johnson-Neyman. Examples will be provided to illustrate howsubgroup analysis may enhance the interpretation of study results for making individual patientdecisions.

111

Page 112: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (107)Paper: A spatial model for large geophysical datasetsAuthor: Stephan Sain

[email protected]

Co-Authors:Abstract: Many research problems in the geosciences involve the analysis of large spatial andspatial-temporal fields. For example, the output from the regional climate models associated withthe North American Regional Climate Change Assessment Program (NARCCAP) involves morethan ten thousand grid boxes. More recent higher-resolution climate model experiments involveupwards of a hundred thousand grid boxes. Our approach involves representing a spatial fieldthrough an expansion of basis functions with a spatial prior distribution on the coefficients. Com-putational savings comes from the sparsity implied by finite-support basis functions and the choiceof prior. The methodology will be illustrated through an analysis of the sources of variation in theNARCCAP ensemble and other regional climate experiments.

112

Page 113: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (108)Paper: False Discovery Control in Large-Scale Spatial Multiple TestingAuthor: Brian Reich

North Carolina State Universitybrian [email protected]

Co-Authors: Wen Sun; Michele Guindani; Armin Schwartzman; Tony CaiAbstract: In this talk, we develop a unified theoretical and computational framework for falsediscovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate,false discovery exceedance and false cluster rate, respectively. The implementation of the oracleprocedures is very challenging in practice, especially on a continuous spatial domain. Hence wedevelop a class of data-driven procedures, based on a finite approximation strategy, to mimic theoracle procedures. Our data-driven procedures are asymptotically valid and can be effectivelyimplemented using Bayesian computational algorithms for analysis of large spatial data sets. Inparticular, we discuss how to summarize the fitted spatial models using posterior samples to addressrelated large-scale testing problems. Numerical results show that our data-driven procedures leadto more accurate error control and enhanced power than conventional methods. The proposedmethods are demonstrated for analyzing the time trends in tropospheric ozone in eastern US.

113

Page 114: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (109)Paper: Multivariate multilevel latent Gaussian process model to evaluate wetland conditionAuthor: Jennifer Hoeting

Colorado State [email protected]

Co-Authors: Erin M. SchliepAbstract: We propose a Bayesian model for mixed ordinal and continuous multivariate data toevaluate a latent spatial Gaussian process. Our proposed model can be used in many contextswhere mixed continuous and discrete multivariate responses are observed in an effort to quantifyan unobservable continuous measurement we call the latent, or unobserved, wetland condition.While the predicted values of the latent wetland con- dition, or health, variable at each locationproduced by the model do not hold any intrinsic value, the ranks of the wetland condition valuesare of interest. In addition, by including point-referenced covariates in the model, we are ableto make predictions at new locations for both the latent random variable and the multivariateresponse. Lastly, the model produces ranks of the multivariate responses in relation to the un-observed latent random field. This is an important result as it allows us to determine whichvariables of the multivariate response are relevant in understanding the latent variable. This offersan alternative to traditional indexes based on best professional judgement that are frequently usedin ecology. We apply our model to create a profile of wetland condition in the North Platte andRio Grande River Basins in Colorado. The model facilitates three types of inference that areimportant for evaluating wetland condition. It provides ranks of wetland condition at multiplelocations, predictions of wetland condition at locations not visited in the field sample, and ranksthe importance of in-field measurements.

114

Page 115: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (110)Paper: Multivariate Nonlinear Spatio-Temporal Dynamic Models for Ecological ProcessesAuthor: Chris Wikle

University of [email protected]

Co-Authors:Abstract: Spatio-temporal statistical models are increasingly being used across ecology to describeand predict spatially explicit processes that evolve over time. Most such processes are dynamic.The challenge with the specification of such dynamical models has been related to the curse ofdimensionality and the specification of realistic dependence structures. These problems are com-pounded in the case of multivariate processes that exhibit nonlinear dynamics, yet these are theprocesses that govern ecological and physical science. We discuss methods for accommodating real-istic multivariate nonlinear structure in hierarchical spatio-temporal that represents a combinationof scientific (mechanistic) knowledge, stochastic representations of uncertainty and dependence,and observations in a conditional framework.

115

Page 116: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (111)Paper: Modeling Resource Selection and Space Usage with Spatial Capture-Recapture ModelsAuthor: Andy Royle

USGS Patuxent Wildlife Research [email protected]

Co-Authors:Abstract: Capture-recapture methods are widely used to obtain basic demographic information formany species from encounter history data obtained using spatial arrays of traps or other devices.Despite the widespread use of such methods, classical capture-recapture models are not spatiallyexplicit - they do not accommodate the spatial indexing of traps, nor the spatial organization ofindividuals in the population. As a result, a number of practical problems arise when attemptingto use such models for inference about population density and related objectives. New classes ofspatially-explicit capture-recapture methods accommodate the spatial organization of traps andindividuals to model individual encounter probability mechanistically, based on spatial proximityof individuals to traps, and allow for direct inferences about density by associating individuals withexplicit spatial locations. I will review conceptual and methodological aspects of spatial capture-recapture models, and discuss recent developments related to modeling space usage and integratingtelemetry data with SCR models.

116

Page 117: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (112)Paper: Improved moderation for gene-wise variance estimation in RNA-Seq data.Author: Yee Hwa (Jean) Yang

University of [email protected]

Co-Authors: Ellis Patrick; Michael BuckleyAbstract: The cost of RNA-Seq has been decreasing over the last few years. However, experimentswith small number (¡5) biological replicates are still quite common. Estimating the variances ofgene expression estimates becomes both a challenging and interesting problem in these situationsof low replication. In general many assumptions need to be made to make any progress in theestimation these variances. In the talk, we will outline some of these assumptions and explore thevariance profiles of genes across various publicly available datasets. We will also assess how utilizinginformation from external experiments can affect both the power and stability of a differentialexpression analysis. We will discuss a new information sharing procedure to improve the estimationgene wise variances. Initial results are very promising indicating it performs comparably or betterthan existing methods such as DESeq and edgeR.

117

Page 118: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (113)Paper: Spectral tree reconstruction via a theory for partially-supplied graphsAuthor: Eric Stone

North Carolina State Universityeric [email protected]

Co-Authors: Alexander Griffing ; Benjamin LynchAbstract: This talk introduces a divisive method for phylogenetic reconstruction from pairwisedistance data. I show how the distance matrix, after double centering, admits an eigendecom-position that is informative of phylogenetic structure. I then show how the eigenvectors can beused in a recursive algorithm to reconstruct a tree from the inside out. The approach is basedupon mathematical results that my group has established for something we call ’partially-suppliedgraphs’.

118

Page 119: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (114)Paper: Riemannian metric estimationAuthor: Marina Meila

University of [email protected]

Co-Authors: Dominique Perrault-JoncasAbstract: The Riemannian metric allows one to define and compute geometric quantities (such asangle, length, or volume) on a manifold. For a given point and coordinate system, the Riemannianmetric is represented by a positive definite matrix. In recent years, manifold learning has becomeincreasingly popular as a tool for performing non-linear dimensionality reduction. This has led tothe development of numerous algorithms of varying degrees of complexity, but also to the realizationthat in most cases a manifold learning algorithm will distort the original data geometry. Weprovide an algorithm for estimating the Riemannian metric from data, when the data are sampledfrom a probability distribution supported on a manifold. This new paradigm offers the guarantee,under reasonable assumptions, that one can recover the original data geometry from any manifoldlearning algorithm. We consider our algorithm’s consistency, and demonstrate the advantages ofour approach in a variety of examples.

119

Page 120: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (115)Paper: Varying Coefficient Models for Sparse Noise-Contaminated Longitudinal DataAuthor: Damla Senturk

[email protected]

Co-Authors:Abstract: In this paper we propose a varying coefficient model for sparse longitudinal data thatallows for error-prone time-dependent variables and time-invariant covariates. We develop a newestimation procedure, based on covariance representation techniques, that enables effective bor-rowing of information across all subjects in sparse and irregular longitudinal data observed withmeasurement error, a challenge for which there is no current adequate solution. Sparsity is ad-dressed via a functional analysis approach that considers the observed longitudinal data as noisecontaminated realizations of a random process that produces smooth trajectories. This approachallows for estimation based on pooled data, borrowing strength from all subjects, in targeting themean functions and auto- and cross-covariances to overcome sparse noisy designs. The resultingestimators are shown to be uniformly consistent. Consistent prediction for the response trajectoriesare also obtained via conditional expectation under Gaussian assumptions. Asymptotic distribu-tions of the predicted response trajectories are derived, allowing for construction of asymptoticpointwise confidence bands. Efficacy of the proposed method is investigated in simulation studiesand compared to the commonly used local polynomial smoothing method. The proposed methodis illustrated with a sparse longitudinal data set, examining the age-varying relationship betweencalcium absorption and dietary calcium. Prediction of individual calcium absorption curves as afunction of age are also examined.

120

Page 121: Scientific Schedule and Program Abstracts WNAR/IMS and Graybill ...

Abstract (116)Paper: Pooling Treatment Effects across RCTs: Comparison of the Odds Ratio and the RiskDifferenceAuthor: Joan Hilton

University of California San [email protected]

Co-Authors:Abstract: When the baseline event rate is very small, as commonly occurs when adverse eventsare studied, there is interest in pooling data across randomized clinical trials to obtain a stableestimate of excess risk. While some authors have reported problems with the Peto OR, othersrecommend this statistic. The Peto OR can be extremely biased when the data distribution ishighly imbalanced or treatment effects are exceptionally large (Greenland& Salvan 1990). In othercircumstances the Mantel-Haenszel OR (i.e., logistic regression) is less biased than the Peto OR,and the Risk Difference produces relatively unbiased estimates of treatment effect but has lowpower when the event rate is low (Bradburn et al 2007). We sought to provide clearer guidanceto clinical researchers on ranges of parameter values associated with good performance of each ofthese summary statistics.

121