Top Banner
displayHTS: An R package for displaying data and results from high-throughput screening experiments Xiaohua Douglas Zhang Head, Early Development Statistics – Asian Pacific BARDS Merck Research Laboratories May 18, 2013 1
28

displayHTS : A n R package for displaying data and results from high-throughput screening experiments

Feb 25, 2016

Download

Documents

aliya

displayHTS : A n R package for displaying data and results from high-throughput screening experiments. Xiaohua Douglas Zhang Head, Early Development Statistics – Asian Pacific BARDS Merck Research Laboratories May 18, 2013. Outline. Background knowledge for the R package - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

1

displayHTS: An R package for displaying data and

results from high-throughput screening experiments

Xiaohua Douglas ZhangHead, Early Development Statistics – Asian Pacific

BARDSMerck Research Laboratories

May 18, 2013

Page 2: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

Outline• Background knowledge for the R package

– Basic drug discovery & development process– High-throughput screening

• Brief description of our R-package “displayHTS”• Main functions in the package

– plateWellSeries.fn– image.design.fn– image.intensity.fn– dualFlashlight.fn

• An Example• Summary

Page 3: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

Drug Discovery & Development Process

Phase III Phase IV(Registration &Pharmacovigilance)

IntroductionTarget Discovery(e.g.,受体 )

Drug Discovery(e.g.,作用体 )

Pre-clinical(safety & drug metabolism)

Phase I / II

FDA Approval

Page 4: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

Drug Discovery Using High-Throughput Biotechnologies

• High-throughput biotechnologies– High-throughput screening (HTS)

• A book having already been published for HTS• A book “Statistical Omics” to be under

contract

Page 5: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments
Page 6: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

Cell of Interest

Transfection

Genes IdentificationOr Therapeutic Target

Library

Treatment

Scanning

Numeric DataStatistical Analysis

HighThroughputScreen

Page 7: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

HTS Project and Data• An HTS project may contain

– one primary screen with millions of compounds with no replicate

– one confirmatory screen with replicates• The measured response is usually the intensity

emitted by labeled particles such as fluorescent dyes. • Need to display data and results• R package “displayHTS” to serve the need

Page 8: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

R Package: displayHTS

• freely available from CRAN: http://cran.r-project.org/mirrors.html

• displayHTS has four main functions:– plateWellSeries.fn– image.design.fn– image.intensity.fn– dualFlashlight.fn

Page 9: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

plateWellSeries.fn()library(displayHTS)data(HTSdataSort)wells = as.character(unique(HTSdataSort[, "WELL_USAGE"]))colors = c("black", "pink", "grey", "blue", "skyblue", "green", "red")orders=c(1, 3, 2, 4, 5, 7, 6)par( mfrow=c(1,1) ) plateWellSeries.fn(data.df = HTSdataSort[1:(384*2),], intensityName="log2Intensity", plateName="BARCODE", wellName="WELL_USAGE", rowName="XPOS", colName="YPOS", show.wellTypes=wells, order.wellTypes=orders, color.wells=colors, pch.wells=rep(1, 7), ppf=6, byRow=TRUE, yRange=NULL, cex.point=0.75,cex.legend=0.75, main="A: Plate-well series plot")

Page 10: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

A: Plate-well series plotlo

g2In

tens

ity

1: P

L000

001

2: P

L000

002

20

21

22

23mock1Samplemock2posCTRL3posCTRL2negCTRLposCTRL1

Page 11: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

Zhang’s Book

Page 12: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

imageDesign.fn()data(HTSresults)condtSample = HTSresults[, "WELL_USAGE"] == "Sample"condtUp = HTSresults[,"ssmd"] >= 1 & HTSresults[,"mean"] >= log2(1.2)condtDown = HTSresults[,"ssmd"] <= -1 & HTSresults[,"mean"] <= -log2(1.2)sum(condtSample & (condtUp | condtDown) )/sum(condtSample)hit.vec = as.character(HTSresults[, "WELL_USAGE"])hit.vec[ condtSample & condtUp ] = "up-hit"hit.vec[ condtSample & condtDown ] = "down-hit"hit.vec[ condtSample & !condtUp & !condtDown] = "non-hit"result.df = cbind(HTSresults, "hitResult"=hit.vec)wells = as.character(unique(result.df[, "hitResult"])); wellscolors = c("black", "green", "white", "grey", "red", "purple1", "purple2", "pink", "purple3") par( mfrow=c(1,1) )imageDesign.fn(result.df[1:384,], wellName="hitResult", rowName="XPOS", colName="YPOS", wells=wells, colors=colors, title="B: Image of hits and controls")

Page 13: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

mock1

down-hit

non-hit

mock2

up-hit

posCTRL3

posCTRL2

negCTRL

posCTRL1

B: Image of hits and controls

Page 14: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

imageIntensity.fn()imageIntensity.fn(HTSdataSort[1:384,], intensityName="log2Intensity", plateName="BARCODE", wellName="WELL_USAGE", rowName="XPOS", colName="YPOS", sampleName="Sample", sourcePlateName="SOBARCODE")

Page 15: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

- -

-

16151413121110987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

21.1921.1121.0420.9620.8920.8120.74

20.6620.5820.5120.4320.3620.28

20.20

20.1320.0519.9819.9019.8319.75

19.6719.60SO000001 - PL000001

Page 16: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

16151413121110

987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Others

Sample

Negative

Inhibition

A1: Plate design

+

++ +

+

+

- -

-

-

-

- --

16151413121110987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1512.141480.711449.291417.871386.441355.021323.601292.181260.751229.331197.911166.481135.061103.641072.211040.791009.37977.94946.52915.10883.67852.25

A2: Raw data in a plate

+

+

+

++

+

+

-

-

- --

16151413121110987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1596.481555.241513.991472.751431.501390.251349.011307.761266.511225.271184.021142.771101.531060.281019.03977.79936.54895.29854.05812.80771.55730.31

A3: Adjusted data in a plate

B1: Raw Data

Plate Number (Plate-well series)

Raw

Inte

nsity

16 17 18 19 20

0

500

1000

1500

2000

B2: Adjusted Data

Plate Number (Plate-well series)

Adj

uste

d In

tens

ity

16 17 18 19 20

0

500

1000

1500

2000

An ApoA1 siRNA Confirmatory Screen

J. Biomol. Screen 2008 13:378-389

Page 17: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

16151413121110

987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Others

Sample

Negative

Inhibition

A1: Plate design

+

++ +

+

+

- -

-

-

-

- --

16151413121110987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1512.141480.711449.291417.871386.441355.021323.601292.181260.751229.331197.911166.481135.061103.641072.211040.791009.37977.94946.52915.10883.67852.25

A2: Raw data in a plate

+

+

+

++

+

+

-

-

- --

16151413121110987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1596.481555.241513.991472.751431.501390.251349.011307.761266.511225.271184.021142.771101.531060.281019.03977.79936.54895.29854.05812.80771.55730.31

A3: Adjusted data in a plate

B1: Raw Data

Plate Number (Plate-well series)

Raw

Inte

nsity

16 17 18 19 20

0

500

1000

1500

2000

B2: Adjusted Data

Plate Number (Plate-well series)

Adj

uste

d In

tens

ity

16 17 18 19 20

0

500

1000

1500

2000

An ApoA1 siRNA Confirmatory Screen

J. Biomol. Screen 2008 13:378-389

Page 18: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

16151413121110

987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Others

Sample

Negative

Inhibition

A1: Plate design

+

++ +

+

+

- -

-

-

-

- --

16151413121110987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1512.141480.711449.291417.871386.441355.021323.601292.181260.751229.331197.911166.481135.061103.641072.211040.791009.37977.94946.52915.10883.67852.25

A2: Raw data in a plate

+

+

+

++

+

+

-

-

- --

16151413121110987654321

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1596.481555.241513.991472.751431.501390.251349.011307.761266.511225.271184.021142.771101.531060.281019.03977.79936.54895.29854.05812.80771.55730.31

A3: Adjusted data in a plate

B1: Raw Data

Plate Number (Plate-well series)

Raw

Inte

nsity

16 17 18 19 20

0

500

1000

1500

2000

B2: Adjusted Data

Plate Number (Plate-well series)

Adj

uste

d In

tens

ity

16 17 18 19 20

0

500

1000

1500

2000

An ApoA1 siRNA Confirmatory Screen

J. Biomol. Screen 2008 13:378-389

Page 19: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

dualFlashlight.fn() for Generating a Dual-Flashlight Plot

par( mfrow=c(1, 1) )dualFlashlight.fn(HTSresults, wellName="WELL_USAGE", x.name="mean", y.name="ssmd", sampleName="Sample", sampleColor="black", controls = c("negCTRL", "posCTRL1", "mock1"), controlColors = c("green", "red", "lightblue"), xlab="Average Fold Change", ylab="SSMD", main="C: Dual-Flashlight Plot", x.legend=0.1, y.legend= -12, cex.point=1, cex.legend=0.8, xat=log2( c(1/4, 1/2, 1/1.2, 1,1.2,2,4) ), xMark=c("1/4", "1/2", "1/1.2","1", "1.2", "2", "4"), xLines=log2( c(1/4, 1/2, 1/1.2, 1, 1.2, 2, 4) ), yLines=c(-5, -3, -2, -1, 0, 1, 2, 3, 5 ) )

Page 20: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

C: Dual-Flashlight Plot

Average Fold Change

SS

MD

1/2 1/1.2 1 1.2

-20

-15

-10

-5

0

5

SamplenegCTRLposCTRL1mock1

Page 21: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

dualFlashlight.fn() for Generating a Volcano Plot

result.df = cbind(HTSresults, "neg.log10.pval" = -log10(HTSresults[,"p.value"]))dualFlashlight.fn(result.df, wellName="WELL_USAGE", x.name="mean", y.name="neg.log10.pval", sampleName="Sample", sampleColor="black", controls = c("negCTRL", "posCTRL1", "mock1"), controlColors = c("green", "red", "lightblue"), xlab="Average Fold Change", ylab="p-value in -log10 scale", main="D: Volcano Plot", x.legend=NA, y.legend=-log10(0.006), cex.point=1, cex.legend=0.8, xat=log2( c(1/4, 1/2,1/1.2,1,1.2,2, 4) ), xMark=c("1/4", "1/2", "1/1.2","1", "1.2", "2", "4"), xLines=log2( c(1/4, 1/2, 1/1.2, 1, 1.2, 2, 4) ), yLines=c(-5, -3, -2, -1, 0, 1, 2, 3, 5 ) )

Page 22: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

D: Volcano Plot

Average Fold Change

p-va

lue

in -l

og10

sca

le

1/2 1/1.2 1 1.2

0

2

4

6

SamplenegCTRLposCTRL1mock1

Page 23: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

An Example in Drug Discovery• New Technology for drug discovery: RNA interference high-throughput screening• RNAi HTS for HIV: Zhou H, Xu M, Huang Q, Gates AT, Zhang XHD, Stec

EM, Ferrer M, Hazuda DJ, Espeseth AS. 2008. Genome-scale RNAi screen for host factors required for HIV replication. Cell Host & Microbe 4(5):495-504

• listed by Nature Medicine in their year end review on Notable advances in 2008

Page 24: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

Summary

• Knowledge about drug R&D is important• HTS is a critical biotechnology for drug R&D• “displayHTS” can display HTS data and results

– plateWellSeries.fn(): display data and results plate-by plate and well-by-well

– image.design.fn(): display the position of control types and result categories

– image.intensity.fn(): display data and results by imaging– dualFlashlight.fn(): display calculated results such as

SSMD and p-value

Page 25: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

References for Data Analysis in HTS(2006 – 2007)

1. Zhang XHD, Yang XC, Chung N, Gates AT, Stec EM, Kunapuli P, Holder DJ, Ferrer M, Espeseth AS. 2006. Robust statistical methods for hit selection in RNA interference high throughput screening experiments. Pharmacogenomics 7 (3) 299-309

2. Espeseth AS, Huang Q, Gates AT, Xu M, Yu Y, Simon AJ, Shi X, Zhang XHD, Hodor PG, Stone D, Burchard J, Cavet GL, Bartz S, Linsley PS, Ray WJ, Hazuda DJ. 2006. A genome wide analysis of ubiquitin ligases in APP processing identifies a novel regulator of BACE1 mRNA levels. Molecular and Cellular Neuroscience 33(3): 227-235.

3. Zhang XHD, Espeseth AS, Chung N, Holder DJ, Ferrer M. 2006. The use of strictly standardized mean difference for quality control in RNA interference high throughput screening experiments. The 2006 American Statistical Association Proceedings, Alexandria, VA: American Statistical Association: 882-886

4. Zhang XHD, Espeseth AS, Chung N, Ferrer M. 2006. Evaluation of a novel metric for quality control in an RNA interference high throughput screening assay. BIOCOMP:385-390.

5. Zhang XHD. 2007. Threshold determination of strictly standardized mean difference in RNA interference high throughput screening assays. IMECS Proceeding: 261-266

6. Zhang XHD, Ferrer M, Espeseth AS, Marine SD, Stec EM, Crackower MA, Holder DJ, Heyse JF, Strulovici B. 2007. The use of strictly standardized mean difference for hit selection in primary RNA interference high throughput screening experiments. Journal of Biomolecular Screening 12 (4): 497-509

7. Zhang XHD. 2007. A new method with flexible and balanced control of false negatives and false positives for hit selection in RNA interference high throughput screening assays. Journal of Biomolecular Screening 12 (5): 645-655

8. Zhang XHD. 2007. A pair of new statistical parameters for quality control in RNA interference high throughput screening assays. Genomics 39: 552-561.

Page 26: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

References (2008 - 2009)9. Zhang XHD, Kuan PF, Ferrer M, Shu X, Liu YC, Gates AT, Kunapuli P, Stec EM, Xu M, Marine SD, Holder

DJ, Stulovici B, Heyse JF, Espeseth AS. 2008. Hit selection with false discovery rate control in genome-scale RNAi screens. Nucleic Acids Research 36 (14):4667-4679.

10. Zhang XHD, Espeseth AS, Johnson E, Chin J, Gates A, Mitnaul L, Marine SD, Tian J, Stec EM, Kunapuli P, Holder DJ, Heyse JF, Stulovici B, Ferrer M. 2008. Integrating experimental and analytic approaches to improve data quality in genome-wide RNAi screens. Journal of Biomolecular Screening 13(5): 378-389.

11. Zhang XHD, 2008. Novel analytic criteria and effective plate designs for quality control in genome-wide RNAi screens. Journal of Biomolecular Screening 13(5): 363-377.

12. Zhang XHD. 2008. Genome-wide screens for effective siRNAs through assessing the size of siRNA effects. BMC Research Notes 1:33.

13. Chung K, Zhang XHD, Kreamer A, Locco L, Kuan PF, Bartz S, Linsley PS, Ferrer M, Strulovici B. 2008. Median absolute deviation to improve hit selection for genome-scale RNAi screens. Journal of Biomolecular Screening 13: 149-158.

14. Zhou H, Xu M, Huang Q, Gates AT, Zhang XHD, Stec EM, Ferrer M, Hazuda DJ, Espeseth AS. 2008. Genome-scale RNAi screen for host factors required for HIV replication. Cell Host & Microbe 4(5):495-504.

15. Zhang XHD, Shane SD, Ferrer M. 2009. Error rates and power in genome-scale RNAi screens Journal of Biomolecular Screening 14: 230-238.

16. Zhang XHD. 2009. A method effectively comparing gene effects in multiple conditions in RNAi and expression profiling research. Pharmacogenomics 10: 345-358

17. Zhang XHD, Heyse JF. 2009. Determination of sample size in genome-scale RNAi screens. Bioinformatics 25:841-844

18. Klinghoffer RA, Frazier J, Annis J, Berndt JD, Roberts BS, Arthur WT, Lacson R, Zhang XHD, Ferrer M, Moon, RT, Cleary MA. 2009. A lentivirus-mediated genetic screen identifies dihydrofolaste reductase (DHFR) as a modulator of -actenin/GSK3 signaling. PLoS ONE 4(9): e6892

Page 27: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

References (2010)19. Zhang XHD. 2010. Assessing the size of gene or RNAi effects in multi-factor high-

throughput experiments. Pharmacogenomics 11(2): 199 - 213 20. Zhang XHD. 2010. Strictly standardized mean difference, standardized mean

difference and classical t-test for the comparison of two groups. Statistics in Biopharmaceutical Research 2(2): 292-299

21. Zhang XHD. 2010. A statistical method assessing collective activity of multiple siRNAs targeting a gene in RNAi screens. The 2010 American Statistical Association Proceedings [CD-ROM], Alexandria, VA: American Statistical Association.

22. Zhang XHD. 2010. An effective method controlling false discoveries and false non-discoveries in genome-scale RNAi screens. Journal of Biomolecular Screening 15: 1116 – 1122 .

23. Zhang XHD, Lacson R, Yang R, Marine SD, McCampbell, Toolan DM, Hare TR, Kajdas J, Holder DJ, Heyse JF, Ferrer M. 2010. The use of SSMD-based false discovery and false non-discovery rates in genome-scale RNAi screens Journal of Biomolecular Screening 15: 1123 – 1131.

24. Zhang XHD, 2010. Contrast variable potentially providing a consistent interpretation to effect sizes. Journal of Biometrics & Biostatitics 1:108

25. Zhao WQ, Santini F, Breese R, Ross D, Zhang XHD, Stone DJ, Ferrer M, Townsend M, Wolfe AL, Seager MA, Kinney GG, Shughrue PJ, Ray WJ. 2010. Inhibition of calcineurin-mediated endocytosis and AMPA receptor prevent amyloid oligomer-induced synaptic disruption. Journal of Biological Chemistry 285(10): 7619-7632

Page 28: displayHTS :  A n  R package for displaying data and results from high-throughput screening experiments

References (2011-2013)26. Zhang XHD. 2011. Illustration of SSMD, z-score, SSMD*, z*-score and t-statistic for hit

selection in high-throughput screens. Journal of Biomolecular Screening 16 (7): 775 - 785 .27. Zhang XHD, Santini F, Lacson R, Marine SD, Wu Q, Benetti L, Yang R, McCampbell A, Berger JP,

Toolan DM, Stec EM, Holder DJ, Soper KA, Heyse JF and Ferrer M. 2011. cSSMD: Assessing collective activity of multiple siRNAs in genome-scale RNAi screens. Bioinformatics 27(20): 2775-2781.

28. Zhang XHD, Heyse JF. 2012. Contrast variable for comparing groups in biopharmaceutical research. Statistics in Biopharmaceutical Research 4 (3): 228 – 239.

29. Huang W, Zhang XHD, Yong Li, William W Wang, Keith Soper. 2012. Standardized median difference for quality control in high-throughput screening. Proceedings of 2012 International Symposium on Information Technologies in Medicine and Education (ITME): 515 – 518.

30. Yang R, Lacson RG, Castriota G, Zhang XHD, Liu Y, Zhao WQ, Einstein M; Camargo, Luiz CM, Qureshi S, Wong KK, Zhang BB, Ferrer M, Berger JP. 2012. A genome-wide siRNA screen to identify modulators of insulin sensitivity and gluconeogenesis. PLoS ONE 7(5): e36384.

31. Zhang XHD, Zhang ZZ. 2013. displayHTS: a R package for displaying data and results from high-throughput screening experiments. Bioinformatics 29 (6): 794–796.

32. BOOK 1: Zhang XHD. Optimal High-Throughput Screening: Practical Experimental Design and Data Analysis for Genome-scale RNAi Research. 2011. Cambridge University Press, Cambridge, UK (ISBN: 9780521734448).

33. BOOK 2: Zhang XHD, Heyse JF (editors). Statistics Omics. Under preparation to come out in 2014. Chapman & Hall/CRC Press, California, USA.