Top Banner
1 Last updated November 8, 2016 AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC Contents Executive Summary ...................................................................................................................................... 3 Summary of Milestones for Data Submission to the Data Coordinating Center ...................................... 5 Contacting the DCC .................................................................................................................................. 5 Introduction ................................................................................................................................................. 5 AMP and the AMP T2D Knowledge Portal Overview ................................................................................... 6 Types of Data Requested for the AMP T2D Knowledge Portal ................................................................ 6 Overview of Data Aggregation and Analysis Process ............................................................................... 7 Policies and Data Use ............................................................................................................................... 7 Submitting Data that Cannot Enter the United States ............................................................................. 8 Data Transfer Agreement ......................................................................................................................... 8 Preparing for Data Submission to the Portal................................................................................................ 8 Required and Requested Files .................................................................................................................. 9 1. AMP DCC Data Intake Form.......................................................................................................... 9 2. Analysis result files ....................................................................................................................... 9 3. Primary Genotype Data File Types. ............................................................................................ 10 4. Intensity files for SNP array data. ............................................................................................... 10 5. Read files for sequencing data. .................................................................................................. 11 6. Phenotype Data.......................................................................................................................... 11 Overview of the Data Intake, Analysis, and Deposition Process ................................................................ 12 Data Transfer .......................................................................................................................................... 12 Description of Project/Cohort ................................................................................................................ 13 Summary Statistics Only ......................................................................................................................... 13 Data QC and Analysis at DCC .................................................................................................................. 14 QC Process at the DCC........................................................................................................................ 15 Association Analysis Process at the DCC ............................................................................................ 15
25

AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

Jun 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

1

LastupdatedNovember8,2016

AMPT2DKnowledgePortalSubmitterandAnalysisGuideforDataattheDCC

ContentsExecutiveSummary......................................................................................................................................3

SummaryofMilestonesforDataSubmissiontotheDataCoordinatingCenter......................................5

ContactingtheDCC..................................................................................................................................5

Introduction.................................................................................................................................................5

AMPandtheAMPT2DKnowledgePortalOverview...................................................................................6

TypesofDataRequestedfortheAMPT2DKnowledgePortal................................................................6

OverviewofDataAggregationandAnalysisProcess...............................................................................7

PoliciesandDataUse...............................................................................................................................7

SubmittingDatathatCannotEntertheUnitedStates.............................................................................8

DataTransferAgreement.........................................................................................................................8

PreparingforDataSubmissiontothePortal................................................................................................8

RequiredandRequestedFiles..................................................................................................................9

1. AMPDCCDataIntakeForm..........................................................................................................9

2. Analysisresultfiles.......................................................................................................................9

3. PrimaryGenotypeDataFileTypes.............................................................................................10

4. IntensityfilesforSNParraydata................................................................................................10

5. Readfilesforsequencingdata...................................................................................................11

6. PhenotypeData..........................................................................................................................11

OverviewoftheDataIntake,Analysis,andDepositionProcess................................................................12

DataTransfer..........................................................................................................................................12

DescriptionofProject/Cohort................................................................................................................13

SummaryStatisticsOnly.........................................................................................................................13

DataQCandAnalysisatDCC..................................................................................................................14

QCProcessattheDCC........................................................................................................................15

AssociationAnalysisProcessattheDCC............................................................................................15

Page 2: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

2

LastupdatedNovember8,2016

DataDepositionandRelease.................................................................................................................16

PublicationPolicy.......................................................................................................................................16

AppendixA:AMPDCCDataIntaketoDataDepositinAMPT2DKnowledgePortal.................................18

AppendixB:AMPDCCDataIntakeForm...................................................................................................19

AppendixC:PhenotypeSubmission...........................................................................................................21

AppendixD:DetailedOverviewofQCProcessattheDCC........................................................................24

QualityControlProcessattheDCC........................................................................................................24

InitialDataReview.................................................................................................................................24

AncestryInference,Clustering,andOutlierdetection...........................................................................24

SampleMetricOutlierDetection...........................................................................................................24

PedigreeReconstruction........................................................................................................................24

QCReport...............................................................................................................................................24

Page 3: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

3

LastupdatedNovember8,2016

ExecutiveSummaryTheAMPT2DKnowledgePortalisawebbasedportalinplaceforthetype2diabetesscientificcommunitythatistransformingthewayresearchcommunitiesshareandvisualizegeneticdataandfacilitatingnewdiseasediscoveries.Inordertoenablescientiststoutilizethesenewtoolsontheirdatasetsandincreasethepowerofthedataontheknowledgeportal,theAMPT2DDataCoordinatingCenter(DCC)isbringinginnewdatasetsfordepositionintotheAMPT2DKnowledgePortal.AlldatasetssubmittedtotheDCChavetheresultsfromtheanalysisperformedattheDCCuploadedtotheknowledgeportalthatcanbeviewedbytheknowledgeportalusers.Individualleveldatawillnotbesharedontheknowledgeportal.Thisdocumentisaguideforstudiesthatareinterestedindepositingtheirarray,wholeexomesequencing,orwholegenomesequencingdataintotheportal.Otherdatatypeswillbeacceptedinthefuture.PleaseseeFigure1belowforabriefoverviewofthesubmissionprocess.Notethattheassociationanalysiswillbedoneonlyondatasetswithindividualdata.

SubmittingyourdatatotheDCCwillbeaninteractiveprocessbetweenyouranalyst/PIandouranalysisteam.ThedataintaketeamattheDCCwillbereviewingtheQCandassociationanalyseswiththesubmitterbeforeanydataisuploadedontotheportalandworkingwiththedatasubmittertoresolveanyissuesfoundwiththedata.TheanalysisprocessisintendedtobeiterativeandthedatasubmitterandDCCwilldecidetogethertheorderandtimelinefortheassociationanalysis.

Onceananalysisisreadyforsubmissiontotheknowledgebase,analysiswillgoliveintheportalonthenextrelease.Oncethedataisliveontheportal,ourpublicationpolicycomesintoeffect.ThedatawillinitiallyenterEarlyAccessPeriod1formonths0-3andEarlyAccessPeriod2formonths3-6months.Duringthefirst6monthsontheportaldatawillbeflaggedasEarlyAccessandundertheguidelinesoftheFortLauderdaleprinciples.Afterthedatahasbeenontheportalfor6months,theopenaccessperiodwillstartforthedata.

Page 4: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

4

LastupdatedNovember8,2016

Figure1.OverviewofAMPT2DKnowledgePortalDataSubmissionatDCC

Page 5: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

5

LastupdatedNovember8,2016

SummaryofMilestonesforDataSubmissiontotheDataCoordinatingCenter1. SignedDTAexecutedbyboththesubmitter’sinstitutionandtheBroadInstitute,servingasDCC.2. SubmitterhaspreparednecessaryfilesfortransfertotheDCC.3. Geneticdatauploadedtosecuretransfersite.(Individualleveldataonly)4. Initialphenotypedatauploadedtosecuretransfersite.(Individualleveldataonly)5. Precomputedanalysesuploadtosecuretransfersite.(Requiredforsummarystatistics

submissionandstronglyrecommendedforindividualleveldatasubmission)6. Projectinfosharedwithsubmitterbeforebeingloadedtotheportal.7. SubmitterandDCCapproveprojectdescription.8. Resultsofcompliancecheckandanalysisthatwillbeshownonportalsharedwithsubmitter.

(Summarystatisticsonly)9. SubmitterandDCCapprovedepositionofsummarylevelanalysesontheportal.(Summary

statisticsonly)10. Analysisresultssharedwithsubmitter.(Individualleveldataonly)11. Projectinformationthatwillbeloadedontheportalsharedwithsubmitter.12. SubmitterandDCCapproveQC’eddata.(Individualleveldataonly)13. SubmitterandDCCapproveassociationanalysis.(Individualleveldataonly)14. SubmitterandDCCapproveprojectinformation.15. Analysisgoesliveonportal.

ContactingtheDCCTogetstartedonthisprocess,pleasereachouttotheDataCoordinatingCenterattheBroadInstitutebyemailingushere:amp-dcc-data-submission@broadinstitute.com.Pleasetellusaboutthedatasetyou’dliketosubmitandanyconcernsyouhaveaboutdepositingyourdata.Amemberofthedataintaketeamwillreplywithadditionalinformationandguideyouthroughthesubmissionprocess.

IntroductionWelcometotheAMPT2DSubmissionGuideline!BringinginnewdatatotheknowledgeportalincreasesthevalueoftheAMPT2DKnowledgePortalforthetype2diabetesresearchcommunityandallowsthesubmittertoseetheirdatainthecontextofhundredsofthousandstype2diabetesandcontrolsamplesgatheredfromaroundtheworld.Ifyouhaven’tyet,pleasecheckouttheportalhere:http://www.type2diabetesgenetics.org/.Allyouneedtogetstartedisagooglelogin.

ThisdocumentoutlinestheprocessofsubmittingdatatotheAMPT2DDataCoordinatingCenter(DCC)attheBroadInstituteinCambridge,MA,USAandwillserveasaguidetosubmittersthroughouttheprocess.TheprocessmappedoutbelowbeginswithgettingyourDataTransferAgreementsignedandendswiththedepositionofyourdataintheportal.Itreviewstheprocess,roles,andresponsibilitiesoftheDCCandthedatasubmitter.Inadditiontoreviewingtheinformationbelow,weencourageyouto

Page 6: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

6

LastupdatedNovember8,2016

reachouttoyourprojectmanagerwithanyissuesorquestionsyouencounterduringyoursubmissionprocess.Eachprocesshasdefinedmilestonesthathighlightsignificantprogressingettingthedatareadyfordeposition.

Ifyouhaven’tstartedtheprocessyetandareinterestedindepositingyourdataintotheAMPT2Dknowledgeportal,pleasecontacttheDCCattheBroadInstitutebyemailingushere:amp-dcc-data-submission@broadinstitute.com.IfyouareunabletosendyourdatatotheUSAforanyreason,youhavetheoptionofsubmittingyourdatathroughafederatednode.PleasecontacttheDCCformoreinformation.

AMPandtheAMPT2DKnowledgePortalOverviewTheAcceleratingMedicinesPartnership(AMP)effortisapublic-privatepartnershipbetweentheNationalInstitutesofHealth(NIH),10pharmaceuticalcompaniesandmultiplenon-profitorganizationsthatjoinedtogethertotransformthewayresearchersidentifyandvalidatetherapeutictargetsforseveraldiseases,includingtype2diabetes.ToreadmoreabouttheAMPinitiativeandtoseewho’sinvolved,pleasevisit:https://www.nih.gov/research-training/accelerating-medicines-partnership-amp/type-2-diabetes

TheAMPtype2diabetes(AMPT2D)consortiumisacollaborationofanumberofAMPfundedinvestigatorsfromaroundtheworld,includingtheBroadInstitute,UniversityofOxford,andUniversityofMichigan.ThegoaloftheAMPT2Dconsortiumistocreateaknowledgeportalusinggeneticandphenotypicdatageneratedfromtype2diabeticsandcontrolsacrossmultiplepopulationsinordertobringforthdiscoveriesinthegeneticarchitectureoftype2diabetesandtofacilitatethedevelopmentofnewtherapeutictargetsfortreatingthisdisease.Usingthegeneticdatacollectedfromresearchersaroundtheworldinaninteractivewebportalenvironment,researchersareabletoaskquestionsfromthedataandseesummarylevelresults.Youcanalsosearchforyourgene,variant,orregionofinterestandseeifanyoftheAMPT2Dknowledgeportaldatasetshaveassociationfortype2diabetesorrelatedtraits.

TheAMPT2Dknowledgeportalwillbecontinuingtoworktowardsimprovingthevalueoftheportalforthetype2diabetescommunity.Tothisend,theAMPT2Dconsortiumwillbeworkingtoaddnewdatasetstotheportalandimprovethewebbasedtoolsusedforanalysiswithintheportal.Wewillbeupdatingthecommunityonourprogressthroughtheuseofourmailinglistandtwitterfeedbysigninguponthehomepageoftheportal:http://www.type2diabetesgenetics.org/home/portalHome.

TypesofDataRequestedfortheAMPT2DKnowledgePortalTheDataCoordinatingCenteriscurrentlyabletoacceptarraydata,wholeexomesequencingdata,andwholegenomesequencingdatathatisabletobetransferredtotheUnitedStates.Wearebuildingthecapacitytoacceptotherdatatypes,suchasgeneexpression,metabolomics,andepigeneticdata.

Page 7: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

7

LastupdatedNovember8,2016

OverviewofDataAggregationandAnalysisProcessAsasubmittertotheknowledgeportal,weknowit’simportantforyoutounderstandhowyourdatawillbehandledonceitisattheDCC.Onlyanalyticalresults,andnotindividualleveldata,willbeaccessiblethroughtheportal.Weanticipatethatmultipleversionsofresults,ofincreasingdetailandharmonizationwithotherdatasets,willbereleasedtotheportalintime.

Eachcohort/projectbeingsubmittedtotheknowledgeportalwillhavetheappropriateanalyticalresultsidentifiedandharmonizedwithexistinganalysesintheportalthroughacollaborativeprocessbetweenananalystattheDCCandananalystatyourinstitution.Theanalyticalresultsthatareprioritizedwillbedependentonthephenotypedataavailable,thevaluetheanalysisaddstotheknowledgeportal,andanyspecialrequestsmadebythedatasubmitter.

Theanalysiswillbereleasedtotheportalin3stages:EarlyAccessPhase1,EarlyAcessPhase2,andOpenAccessPeriod.EarlyAccessPhase1getsthedataanalysisuploadedandavailableontheportalwithlimitedQC.ThesubsequentrevisionsoftheresultswilloccurinEarlyAccessPhase2,whichwillaimto(a)addressanyinconsistenciesidentifiedbytheinitialharmonizationprocess(b)applymoreuniformQCacrossalldatasetsintheportal(c)computeadditionalstatisticsdesiredintheportalbutnotavailableintheinitialversionand(d)enableon-demandinteractiveanalysesofyourdata.Fortheserevisionswewillrequiretheoriginalgenotypeandphenotypedata.Additionally,wewillalsorequiredatainasunprocessedaformataspossible,inordertofacilitateharmonizationandqualitycontrol.OncethedataisQC’edandcomplete,theOpenAccessPeriodwillbeginforyourdata.WeexpectthetimingbetweenthestartofEarlyAccessPhase1tothebeginningoftheOpenAccessPeriodtolast6months.

PoliciesandDataUseWearecommittedtoensuringthatcollaboratorssubmittinggeneticdatatotheAMPT2DknowledgeportalunderstandhowthedatawillbeusedaftertransfertotheDCCatTheBroadInstitute.BysendingyourdatatotheBroadInstituteforuploadintotheknowledgeportalthedatasubmitterandDCCareagreeingtothefollowing:

1. Throughoutthisprocess,theBroadiscommittedtoprotectingyourdata,bothintransitandwhilethedataisinourservers.

2. Wewillonlybeabletoreceivede-identifiedleveldatathatisabletobetransferredandstoredattheDCC.WewillhaveoptionsavailableforthosewhocannotsubmitdatatotheUnitedStates.

3. IndividualdatawillbestoredinoursecureserversandonlyaccessedforQCandanalysispurposesrelatedtotheAMPT2Dknowledgeportal.

4. Individualdatawillneverbeposteddirectlytotheportal.Onlysummarylevelmetricsareavailabletoportalusers.

5. Summarylevelanalysisofthesubmitteddatawillbepostedtotheknowledgeportalandavailabletousers.Thisincludesp-values,oddsratio,minorallelefrequency,effect,directionof

Page 8: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

8

LastupdatedNovember8,2016

effect,allelefrequenciesacrossethnicities,andotheranalysesthataredeemedappropriatebytheAMPT2DKnowledgePortalteamandAMPT2Dconsortium.

6. Usersoftheportalwillbeabletocreatecustomqueriesandviewsummarylevelresultsforthosequeries.Thiswillincludedisplayingresultsforspecificprojects/cohorts.

7. TheBroadwillQCandanalyzeyourdataforT2Dandrelatedtraitsinpartnershipwiththesubmitter.Thisisacollaborativeprocesssothesubmitterwillgettoviewtheanalysisbeforeitisuploadedtotheportal.

8. TheBroadmaybesendinggenotypedatasubmittedtotheportaltotheMichiganImputationServerforimputation.ThisisafreeservicehostedbytheUniversityofMichiganandallowsustousetheHaplotypeReferenceConsortiumpanelforimputation.TheUniversityofMichiganisakeymemberoftheAMPT2DconsortiumthatisfundedbytheNIHtodeveloptheAMPT2Dknowledgeportal.Foradditionalinformationontheimputationserver,pleasevisit:https://imputationserver.sph.umich.edu/index.html.

ThepoliciesrelatedtothedataintheAMPT2Dknowledgeportal,includingdatausefortheknowledgeportalusers,canbefoundhere:http://www.type2diabetesgenetics.org/informational/policies#.

SubmittingDatathatCannotEntertheUnitedStatesOurAMPT2DfundedcollaboratorsattheUniversityofOxfordarecurrentlybuildingacapabilitytoingestdata,QC,andharmonizedataatEBI.Ifdatacan’tleaveEuropeorentertheUnitedStatesyoucanstillsubmityourdatatotheknowledgeportalthroughthismethod.EBIwillperformthesamefunctionsastheDCCandwillworkwithyoutosubmityourdatatotheknowledgeportal.

DataTransferAgreementBeforewebegintransferringdataweneedasignedandexecutedDataTransferApproval(DTA).YouwillreceivetheDTAviaemailfromtheDCCprojectmanageroryoucanfinditontheknowledgeportalhere:http://www.type2diabetesgenetics.org/informational/policies.Thisdocumentshouldbereviewedbyyourinstitution’slegalcounselbeforesigningandanyeditsmadewillneedtobesignedoffbythelegalcounselattheBroad.ThedocumentoutlinesthatasadatacontributortotheAMPT2DPortal,youagreetotransferyourdatatotheDCC(BroadInstitute)andyouhavetheapprovaltodoso.Althoughnotcoveredinthisdocument,asimilarDTAwillbenecessarytotransferdatatoaFederatednodeincaseswherethedatacannotentertheUnitedStates.

Milestone:

1. SignedDTAexecutedbyboththesubmitter’sinstitutionandtheBroadInstitute,servingasDCC.

PreparingforDataSubmissiontothePortalWhileweworktowardsgettingaDTAinplace,thedatasubmittercanbegintheprocessofpreparingtheirfilesfordatasubmissiontotheDCC.Theinformationbelowoutlinestheinformationweneedto

Page 9: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

9

LastupdatedNovember8,2016

getyourdatauploadedtotheportal.Forasummarytableoftheinformationneeded,pleaseseeTable1below.

IfyourdataisunabletoleaveyoursiteorcometotheBroadInstitute,locatedinCambridge,MA,USA,thendepositingyourdatainaFederatedNodewillallowyoutostillcontributeyourdatatotheknowledgeportal.PleasecontacttheDCCformoreinformation.

RequiredandRequestedFilesBelowareguidelinesforthetypesanddesiredformatsofdatasetstransferredtotheDCC.Asageneralrule,weencourageyoutosubmitasmuchdataandasmanyresultsaspossible,andtoannotateyourfileswithasmuchinformationasisfeasible.ThisinformationwillbeextremelyhelpfulasouranalystsstarttheQCandanalysisprocessonyourdata.Pleasenotethatweunderstandthatdifferentsiteswillhavedifferentdatatypesanddifferentabilitiestotransformamongdataformats,andwearethushappytoworkwithyoutofacilitatethisprocessonacase-by-casebasis.

1. AMPDCCDataIntakeForm.ThisdataisrequiredinordertosubmityourdatatotheDCC.Theformwillbesentviaemailandpleasecontactyourprojectmanageroramp-dcc-data-submission@broadinstitute.comifyouhavenotreceivedit.Foradditionaldetailsonthetypeofinformationneeded,pleaserefertoAppendixB.

2. Analysisresultfiles.Thesefilesareoptional,butanyanalyticalresultsthatyoutransferwillhelpusexpediteandverifyouranalysis.Anynumberoffilescanbeprovided.Foreachfile,thefollowingisrequired

• Atab-orcomma-delimitedfile,withaheaderrowfollowedbyonerowforeveryvariantintheresultsfile.Theheaderrowcanhaveasmanycolumnsaspossible.Mandatorycolumnsincludethechromosome,position,effectallele(withrespecttowhichanyphenotypiceffectismeasured),andnon-effectallele.Allallelesshouldbealignedtotheforwardstrandofthegenome,theversionofwhichshouldbespecifiedintheannotationdata(seebelow).Additionaldesiredcolumnsincludeminorallelefrequency,p-valueofassociationwithoneormoretraits,estimatedoddsratiooreffectsize,case/controlcounts,andnumberofanalyzedsamples.Ifmultiplestatisticsareavailableacrossmultipletraits(e.g.T2Dvs.glucose)oracrossmultiplesamplegrouping(e.g.allsamplesvs.onlysamplesofagivenancestry)theycanbeincludedinasinglefileorsplitacrossmultiplefiles.Thesetofvariantsneednotbeidenticalacrossdifferentresultfiles.

• Annotationdatadescribingthemeaningofeachcolumnarerequired.Theseshouldbehumanreadable.Theannotationscanbeembeddedintheresultsfileorprovidedasaseparatedocument.

Page 10: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

10

LastupdatedNovember8,2016

3. PrimaryGenotypeDataFileTypes.Inordertoensurethecontinueduseofyourdataintheportalasdemandforadditionalstatisticsandanalysesgrows,werequestthefollowingfilesencodingthegenotypesofeachsample.Thesegenotypefileswillbeusedtocomputestatisticsthatareunavailablefromtheanalysisfiles,whichwillbeaddedtotheportalinsubsequentdataversions.

• GenotypefilesinVCForPLINKformatarerequired.Wewillaccepteitherformat,providedthatstrandinformationisclearlyannotated.NotethatVCFfileshaveacleardistinctionbetweenreferenceandalternatealleles,whileallelescanbeflippedbysomeplinkanalyses.

a. TheVCFfileformatisavailableatXXX.

b. InformationaboutthePLINKfileformatisavailableatXXX.Werecommendtransferringbed/bim/famfiles,whichcanbecreatedbyPLINK.

• ListsofQC+samplesandvariantsthatwereadvancedtoyourfinalanalysisareoptional.Providingthesewillensurethatwecanrecomputestatisticsconcordantwiththosethatyouproducedinyouranalysis.Ifyoudonotprovidethem,wewillperformourownQCwhichwilllikelybesimilar(butnotidentical)toyours.

• Documentationofyouroriginalanalysisplanisalsooptional.Anyhumanreadabledocumentdescribingthemotivationsofyouranalysis,thestatisticalmethodsemployed,andanyparametersettingswillalsohelpustoreplicateyouranalysis.Amethodssectionofapaper,ifsufficientlydetailed,willalsosuffice.

4. IntensityfilesforSNParraydata.Ultimately,itmaybenecessaryforustohaveaccesstotherawdatausedtocallgenotypes.Thiswillassistwithqualitycontrol(forexample,examiningevidencethatararevarianthasaccurategenotypes),aswellasharmonization(forexample,ensuringthatallvariantsarecalledusingsimilarprocedures).Thus,althoughnotessentialforthefirstversionofyourdatatoappearontheportal,thefollowingfilesarerequiredtocompletethedatatransferprocessinitsentirety.

• Rawintensityfiles(idatortheequivalent).ForSNParraygenotypingdata,anyfileformatthatlistsnormalizedX/Yintensityvaluesforeachsampleisacceptable.WhensubmittingIDAT,pleaseremembertosendbothoftheintensityfilesforeachsample.

a. ExamplefileformatsacceptedbytheSangerforasimilarprojectareatXXX

b. AguideforthefileformatsusedbyzCall(aclusteringalgorithmforexomechip)isavailableatXXX.

• Clusterandmanifestfilestoaccompanytherawintensityfiles.ForIlluminaIDATfiles,thesetwofilesarerequiredforthenecessarydownstreamanalysis.Theyshouldbeavailablefromandfamiliartotheplatformthatproducedyouroriginalgenotypecalls.Themanifestfile

Page 11: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

11

LastupdatedNovember8,2016

describesthesamplesthatweregenotyped;theclusterfilerecordsanyinformationthatwasusedtobetterclustertheintensitiesforeachSNP.

5. Readfilesforsequencingdata.SimilartointensityfilesforSNParraydata,readfilesarerequiredforsequencingdata.Wewillusethesetorun“jointvariantcalling”acrossallsamplesattheDCC,formaximumsensitivityandaccuracyofvariantcalls.Sincevariantcallsetsfromsequencedataincludenovelvariantsandalleles,re-processingrawdataisevenmoreimportantinsequencingexperimentsthanSNParrayexperiments;theExACpaper(availableatXXX)outlinessomeoftherationaleforthis.

• BAMorCRAMfilesforeachsamplearerequiredforsequencingexperiments.Thesefilesarethestandardformatforstoringreaddataandshouldbeproducedbyyoursequencingplatform.Wewouldpreferraw,unalignedBAMfiles.

a. InformationontheBAMfileformatisavailableatXXX

b. BAMfilescanbecreatedfromFASTQfiles,asdescribedatXXX.

6. PhenotypeData.Thisisrequiredalongsidesubmissionofgenotypeand/orsequencingdata.Theofficialdocumentwithfullinstructionswillbeemailedtoyou.Foranideaofthevariablesrequested,pleaseseeAppendixC.Ifyouhaveaspecificvariablenotinthislist,butrelativetotype2diabetesorrelatedconditionsletusknowandwecanincludeitforyoursubmission.

Forasummaryviewofwhatisneededforyourdatasubmission,pleaseseetable1below.

Table1.SummaryoffilesacceptedfordatasubmissionintotheAMPT2DPortal

FileType GenotypingSubmission SequencingSubmissionAMPDCCDataIntakeForm Required RequiredAnalysisResults Optional OptionalAnnotationData Optional OptionalGenotypeFiles(VCForPLINK) Required Required(VCF)ListofQC+samplesandvariants Optional OptionalAnalysisPlanDocumentation Optional OptionalRawIntensityFiles Required N/AClusterFile Required N/AManifestFile Required N/ASequencingReadFiles(BAMorCRAM) N/A RequiredPhenotypeData Required Required

Milestone:

Page 12: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

12

LastupdatedNovember8,2016

1. SubmitterhaspreparednecessaryfilesfortransfertotheDCC.

OverviewoftheDataIntake,Analysis,andDepositionProcessWhenyouarereadytostartsubmittingfilestotheDCCfordepositionintotheAMPT2Dportal,emailamp_dcc_data_submission@broadinstitute.organdwewillsetupasecuretransferportalforyoutouploadyourfiles.WewillbeusinganASPERAsite,whichwillcomewithdetailedinstructionsonhowtouploadthefiles.OncetheASPERAsiteiscreated,wehave30daysbeforethesiteexpirestouploaddata.IfitbecomesnecessarytoextendthattimelinepleaseletusknowsowecanextendthelifeoftheASPERAsite.

Thedatatransferprocessitoutlinedbystepbelow.Forafullpictureofdataintaketodeposition,pleaseseeAppendixA.

DataTransferThedatatransferprocessstartsoncethesubmitterandDCCattheBroadInstitutehaveallnecessarydocumentationinplaceandarereadytobeginphysicallytransferringthedatatotheDCC.Duringthesesteps,ifindividualleveldataisbeingprovided,thedatasubmitterwilltransferthephenotypicandgeneticdatatotheportal.Thisincludestherawdataandanyavailableprecomputedanalyses.ForsiteswherewearereceivingsummarystatisticsonlyweaskfortheprecomputedanalysestobesenttotheDCC.PleaseseeFigure2belowforanoverviewofthedataintakeprocessattheDCC.

Regardlessofwhichtypeofsubmissionbeingsent,weaskthateachsitecompletesadataintakeform,asnotedabove.ThepurposeoftheformistoinformtheDCCofthedatabeingsubmittedandtohelpuscreateaproject/cohortdescriptionforthisdataontheportal.

Figure2.DataTransferProcessatDCC

Page 13: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

13

LastupdatedNovember8,2016

Milestones:

1. Geneticdatauploadedtosecuretransfersite.(Individualleveldataonly)2. Initialphenotypedatauploadedtosecuretransfersite.(Individualleveldataonly)3. Precomputedanalysesuploadtosecuretransfersite.(Requiredforsummarystatistics

submissionandstronglyrecommendedforindividualleveldatasubmission)

DescriptionofProject/CohortEachprojectandcohortwithdataincludedintheAMPT2Dportalwillhaveadescriptionoftheprojectand/orcohortthatissubmittingdata.ThisdescriptionwillbecreatedbytheContentManagerattheDCCusingtheprojectinformationprovidedbythesubmitterontheDataIntakeForm.Duringthisprocess,thesubmitterwillhavetheopportunitytoprovidefeedbackonthedescriptionoftheirstudy.

Figure3.DescriptionofProjectandCohortInformationSubmissionProcessatDCC

Milestones:

1. Projectinfosharedwithsubmitterbeforebeingloadedtotheportal.2. SubmitterandDCCapproveprojectdescription.

SummaryStatisticsOnlyIfyouarenotabletosharerawdatawiththeportalforsomereason,theportalcanacceptsummarylevelstatisticsthatcanbepostedtotheportal.Inthisinstance,theDCCwouldtakethesummarylevelinformationthatyouhavegeneratedthensecurelystorethedataandperformadatacompliancecheck.Oncethecompliancecheckhasbeencompleted,theDCCwillshareresultswiththesubmitterandconfirmthatwecanproceedwithdepositingthedatatotheportal.

Page 14: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

14

LastupdatedNovember8,2016

Figure4.SummaryStatisticsOnlyDataSubmissionProcessatDCC

Milestones:

1. Resultsofcompliancecheckandanalysisthatwillbeshownonportalsharedwithsubmitter.

2. SubmitterandDCCapprovedepositionofsummarylevelanalysesontheportal.

DataQCandAnalysisatDCCDatasetswithindividualleveldatabeingsubmittedtotheDCCwillundergosecuredatastorage,QC,andassociationanalysisattheDCC.DuringthisprocesstheDCCwillworkwiththedatasubmittertocreateananalysisplanthatwillbeusedtodrivethefutureanalysesandcreatedatasetswithintheproject/cohortthatwillbedepositedintotheportal.Adatasetinthiscontextreferstoaspecificsetofsamplespairedwithspecificphenotype(s).WeexpecteachdatasubmissiontocontainanumberofdatasetsandwewillworkwiththedatasubmitterstoprioritizethedatasetsforsubmissiontotheAMPT2Dportal.OncetheanalysisiscompletedtheDCCwillreachouttothesubmitterandreviewtheresultsoftheanalysis.ResultswillnotbeuploadedtotheportaluntilboththedatasubmitterandDCCaffirmsthatthedataisreadytoshare.

Figure5.DataQCandAnalysisProcessforincomingdatatotheDCC

TheDCChascompiledalistofstandardsinglevariantassociationanalysesthatwillbeusedasaguideforcreatingtheanalysisplanwiththesubmitter.Eachanalysisplanwillbeuniquetoeachsite,dependingonthephenotypevariablesthatareavailableandthevalueeachanalysisaddstotheportal.

Milestones:

Page 15: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

15

LastupdatedNovember8,2016

1. Analysisresultssharedwithsubmitter.2. Projectinformationthatwillbeloadedontheportalsharedwithsubmitter.3. SubmitterandDCCapproveQC’eddata.4. SubmitterandDCCapproveassociationanalysis.5. SubmitterandDCCapproveprojectinformation.

QCProcessattheDCCTheQCprocessattheDCCisvitaltoharmonizingthedatabeingaddedtotheAMPT2DKnowledgePortal.ThegoalofourQCistoidentifyartifacts,ensurestatisticscanbecomputedconsistently,andhelpstheDCCunderstandthedatabeingsubmitted.ThisprocessisundertakenonindividualleveldatathatisprovidedtotheDCCbyananalystwhoisworkingwithalldatabeingloadedintotheknowledgeportalandperformstheQCusinganautomatedandconsistentprocess.

TheanalystattheDCCwillbecomputingmetricsadjustedforancestryandotherconfoundersandthenexcludeoutliersamples,whicharepotentiallyartifacts.TheQCcompletedfordatadestinedfortheknowledgeportaltendstobeconservatives,sinceweareaimingtoensurehighqualitydataforusers.OncethisQChascompleted,wewillprovideareporttosharewiththedatasubmitters.AnexampleQCreportcanbefoundintheAMPT2DKnowledgePortal:CAMPQCReport.ForfulldetailsontheQCprocesspleaseseeAppendixD.

AssociationAnalysisProcessattheDCCAssociationanalysisattheDCCisaninteractiveprocessbetweentheanalystattheDCCandtheanalystatthesubmittingsite.TheinitialanalysisperformedwillconsistofasetnumberoftraitsdecideduponbytheDCCandthedatasubmitter.Asaguideforoursubmitters,theDCCrecommendsfocusingonsomeinitialtraitsofrelevancetotype2diabetesfortheinitialanalysisdoneattheDCC.PleaseseeTable2foralistofrecommendedtraits.

Table2.StandardT2DtraitsforpossibleDCCassociationanalysis

Categoriesoftraits ExamplerelatedphenotypevariablesType2Diabetesstatus T2Dstatus,T2DageofdiagnosisCardiometabolic Systolicbloodpressure,Diastolicbloodpressure,HypertensionstatusLipids HDLcholesterol,LDLcholesterol,Triglycerides,TotalcholesterolGlycemic Insulin,glucose(2hr,fasting,and/orrandom)HbA1C,Anthropomorphic BMI,age,weight,waisthipratioKidneyFunction Creatinine,Urinaryalbumin

OnceaninitialanalysishasbeencompletedattheDCC,wewillsendthedatasubmitterananalysisreportfortheirreviewandcomments.AnexamplereportcanbefoundontheAMPT2DKnowledgePortal:CAMPAnalysisReport.

Page 16: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

16

LastupdatedNovember8,2016

DataDepositionandReleaseOncetheanalysesandtheproject/cohortdescriptionhavebeenreviewedandapprovedbyboththedatasubmitterandDCC,thedatawillbedepositedontotheknowledgeportalbasebeforegoingliveontheportal.

Figure6.DataDepositionProcessatDCC

Thedatawillgolivewiththenextquarterlyportalrelease,occurringinFebruary,May,August,andNovember.

Milestone:

1. Analysisgoesliveonportal.

PublicationPolicyOnceyourdataisliveontheknowledgeportal,submittersareprotectedbya6monthearlyaccessperiodthatissubjecttotheguidelinesoftheFt.Lauderdaleprinciples.This6monthperiodisbrokendownintoa3monthEarlyAccessPhase1,wheredataisliveontheportalwithlimitedQCanda3monthEarlyAccessPhase2,wherethedataisfullyintegratedintotheportal.AlldataineitheroftheEarlyAccessPeriodswillbeflaggedtoknowledgeportaluserswhoareviewingthedata.PleaseseeFigure7forthescheduleddatareleases.

Page 17: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

17

LastupdatedNovember8,2016

FormoreinformationontheFt.Lauderdalepolicies,pleasevisit:https://www.genome.gov/pages/research/wellcomereport0303.pdf.

Submitteddatawillbemadeliveontheknowledgeportaloveranumberofdatasetfreezes.Sincewewillberunningassociationanalysisovertimeandaddingtotheportal,eachdatasetwillbedefinedasasetofgeneticdataassociatedwithspecifictraits.Anyanalysisadditionaltraits,samples,ordatawillbeconsideredanewdatasetandwillstartagainintheEarlyAccessPeriod1.Forexample,iffortheinitialanalysisthedatasubmitterandDCCchosetorunanassociationanalysison3,000Exomechipsamplesusingtype2diabetesstatusthatwouldequaltoonedatasetandwouldstarttheEarlyAccessPeriodonthenextscheduledreleaseoftheportal.Ifthesame3,000exomechipsampleswerethenanalyzedlaterforBMI,fastingglucose,andfastinginsulinthatwouldcreateanewdatasetthatwouldstartintheEarlyAccessPeriod,evenifthesamesamplesanalyzedfortype2diabeteshavetheanalysisintheopenaccessperiod.

Figure7.ScheduledAMPT2DPortalReleasesforDataSubmission

Page 18: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

18

LastupdatedNovember8,2016

AppendixA:AMPDCCDataIntaketoDataDepositinAMPT2DKnowledgePortalFigure8isacompleteflowchartoutliningtheDCC’sdataintakeprocess,startingatthepointwherethedataislegallyandphysicallypreparedtobetransferredbythesubmittertotheDCCandendingwiththedatabeingliveintheportal.Thisdocumentcontains5subsectionsofworkthatisgroupedtogethertocreatethelargerprocess.Thesubsectionsarediscussedinmoredetailinthemaindocument.

Figure8.FlowchartofAMPDCCDataIntaketoDataDepositioninAMPT2DKnowledgePortal

Page 19: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

19

LastupdatedNovember8,2016

AppendixB:AMPDCCDataIntakeFormForacopyoftheAMPDCCDataIntakeFormtocompletepleasecontactyourprojectmanageroramp-dcc-data-submission@broadinstitute.com.Inordertogiveyouanideaofthetypeofinformationneeded,wehaveincludedascreenshotoftheinformationbelow.Theinformationwillbegivenfortheprojectonthefirsttab(seeFigure9)andbycohort(s)inthefollowingtabs(seeFigure10).

Inthefirsttab,weaskforthegeneralinformationaboutthefiletypesyouaresubmittingalongwithprojectinformation,includingaprojectdescriptionandinformation,anystudyspecificcovariatesthatwereusedduringyouranalysis,specialanalysisrequests,andotherplacesthedatacanbefound(ie.dbGAP,EGA,oraprojectwebsite).

Thefollowingtabsallowthedatasubmittertogiveadditionaldetailsonthecohortsthatmakeuptheproject.Weexpectthatsomeprojectswillhaveonecohort,whileotherswillhaveseveral.

Figure9.AMPDCCDataIntakeFormProjectInformation

Figure10.AMPDCCDataIntakeFormCohortInformation

Page 20: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

20

LastupdatedNovember8,2016

Page 21: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

21

LastupdatedNovember8,2016

AppendixC:PhenotypeSubmissionTheAMPT2Dconsortiumhasdeterminedanumberoftraitsthatwillbeusefulforunderstandingyourdataandperformingrelevantanalysisthatcanbesharedontheknowledgeportal.Weaskthatyousubmitanyofthesevariablesthatareavailableforyourdataandalsopleaseletusknowifyouhaveauniquevariablethatweshouldbeincluding.Thislistismeantforinformationpurposesonly.PleaseseetheAMPPhenotypeVariableInfosheetemailedtoyouforadditionalinstructionsandinformation.

Table3.AMPT2DKnowledgePortalPhenotypeVariables

Category Variable FormatIDvariables StudyID characterIDvariables SampleIDusedingenotypedataset

(ifdifferent) characterIDvariables dbGaPsampleID(ifexisting) characterIDvariables StudyIDoffather characterIDvariables StudyIDofmother characterDemographics Race characterDemographics Race-opentextdescription characterDemographics Ethnicity characterDemographics

SexPleasecodevaluesas"Male"and

"Female"Demographics Yearofbirth 4-digitinteger

Type2Diabetes(T2D)statusvariables

T2Dstatusbasedonself-report(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)

Type2Diabetes(T2D)statusvariables

T2Dstatusbasedonhistoryofhealthcareproviderdiagnosis

(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesT2Dmedicationstatus(1=Yes;

0=No) integer(1=yes;0=no)Type2Diabetes(T2D)status

variablesT2Dstatusbasedonfastingglucose

level(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesT2DstatusbasedonHbA1c(1=T2D;

0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesGlucosetolerancestatusbasedonoralglucosetolerancetest(OGTT) character

Type2Diabetes(T2D)statusvariables

T2Dstatusdefinedinawayotherthanoneoftheapproachesabove

(1=T2D;0=notT2D)-e.g.acombinationoftheabovethatcan't

beseparatedintoindividualvariables integer(1=T2D;0=notT2D)

Type2Diabetes(T2D)statusvariables

T2Dstatuswithunknowndefinition(1=T2D;0=notT2D)-e.g.whereaT2Dstatusvariableisavailablebutthereisnotdocumentationonhow

itwasdefined integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesT2Dtreatmentwithinsulinor

analogs integer(1=yes;0=no)Type2Diabetes(T2D)status

variablesT2Dtreatmentwithnon-insulin

medication integer(1=yes;0=no)

Page 22: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

22

LastupdatedNovember8,2016

Type2Diabetes(T2D)statusvariables

T2Dageofdiagnosis(forthosethatareaffected)(years) nn.nnn

Type2Diabetes(T2D)statusvariables

Timeinterval,inyears,betweendiagnosisofdiabetesandbeginning

oftreatmentwithinsulin integerType2Diabetes(T2D)status

variablesThisisanopentextvariableto

indicatetypesofdiabetesotherthanType2(orunclearifType2).

Examplesinclude:"Type1diabetes","MODY","LADA",

"Gestationaldiabetes","Diabetesknowntobecausedbyother

processessuchascysticfibrosis,hemochromatosisorpancreaticsurgery","Diabetesstatusonlyavailableduringpregnancy". text

Bloodbiomarkers Fastingplasmaglucose(mmol/l) nn.nnBloodbiomarkers Fastinginsulin(mU/l) nnn.nBloodbiomarkers OGTT2-hourfastingglucose

(mmol/l) nn.nnBloodbiomarkers OGTT2-hourfastingInsulin(mU/l) nnn.nBloodbiomarkers Randomglucose(i.e.notfastingor

unknownfasting)(mmol/l) nn.nnBloodbiomarkers FastingC-peptide(nmol/l) nn.nnBloodbiomarkers HbA1c(fraction,%) nnn.nBloodbiomarkers HbA1c(mmol/mol) nn.nnBloodbiomarkers GlutamicAcidDecarboxylase

Autoantibodies(GADAb) integer(1=positive;0=negative)Bloodbiomarkers IsletCellAutoantibodies integer(1=positive;0=negative)Bloodbiomarkers Anti-insulinAutoantibodies integer(1=positive;0=negative)Bloodbiomarkers ZNT8Autoantibodies integer(1=positive;0=negative)Bloodbiomarkers Serumcreatinine(umol/L) nnn.nBloodbiomarkers Adiponectin(ug/ml) nn.nnBloodbiomarkers Leptin(ng/ml) nnn.nBloodbiomarkers Totalcholesterol(mmol/l) nn.nnBloodbiomarkers LDLcholesterol(mmol/l)(if

measureddirectly,missingifnot) nn.nnBloodbiomarkers CalculatedLDLcholesterol(mmol/l)

(usingFriedewaldequation) nn.nnBloodbiomarkers HDLcholesterol(mmol/l) nn.nnBloodbiomarkers Triglycerides(mmol/l) nn.nnBloodbiomarkers Anylipidloweringmedicationstatus

(1=yes,0=no) integer(1=yes;0=no)Bloodbiomarkers Statinmedicationstatus(1=yes,

0=no) integer(1=yes;0=no)Anthropometry Height(centimeters) nnn.nAnthropometry Weight(kg) nnn.nAnthropometry Hipcircumference(centimeters) nnn.nAnthropometry Waistcircumference(centimeters) nnn.nBloodpressureandhypertension Systolicbloodpressure(mmHg) nnn.nBloodpressureandhypertension Diastolicbloodpressure(mmHg) nnn.n

Page 23: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

23

LastupdatedNovember8,2016

Bloodpressureandhypertension Hypertensionstatus(1=yes,0=no) integer(1=yes;0=no)Bloodpressureandhypertension Hypertensionmedicationstatus

(1=yes,0=no) integer(1=yes;0=no)Urinemeasures Urinarycreatinine(mg/dL) nn.nnUrinemeasures Urinaryalbumin(mg/dL) nn.nnUrinemeasures Urinaryalbumintocreatinineratio

(mg/g) nn.nnSmokingstatus Currentsmokingstatus(1=yes,

0=no) integer(1=yes;0=no)Smokingstatus Eversmokingstatus(1=yes,0=no) integer(1=yes;0=no)

Reproductiveandexogenoushormoneuse Menopausalstatus character

Reproductiveandexogenoushormoneuse

Currentuseofanyfemalehormones(1=yes,0=no) integer(1=yes;0=no)

Reproductiveandexogenoushormoneuse

Currentuseof,specifically,peri-orpost-menopausalhormoneuse(i.e.

notincludingcontraceptives)(1=yes,0=no) integer(1=yes;0=no)

Page 24: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

24

LastupdatedNovember8,2016

AppendixD:DetailedOverviewofQCProcessattheDCC

QualityControlProcessattheDCCAlldatasubmittedtotheDCCwillbeprocessedthroughcomprehensivesampleandvariantqualitycontrolalgorithmstopromoteharmonizationwithexistingdataontheportal.Sincegenotypedataislikelytoexhibituniquepatternsofancestryandclassesofvariants,wehavedevelopedalgorithmsfordetectingmajorlinesofancestryandforidentifyingoutliersamongvarioussamplemetrics.SampleQCwillbeperformedusingbi-allelicvariantsonly.

InitialDataReviewInitially,whentheDCCreceivesyourdata,itwillbecheckedforduplicatesandanycrypticrelatednessthatmayresultfromcontaminationordatacollectionerrors.Duplicatesandcrypticrelatednesswillbeidentifiedusingacombinationofpairwiseidentitybydescentandarobustalgorithmforcalculatingpairwisekinshipinthepresenceofpopulationstratification.Shouldanyconcernsarise,thesubmittermaybecontactedinordertoinvestigatepossiblecausesandissuesthatmightbecorrectedpriortocontinuingwithsampleQC.

AncestryInference,Clustering,andOutlierdetectionAfteranagreementtoproceedwithQC,wewillinfermajorlinesofancestry.Ourapproachconsistsofprojectingyourdataontoprincipalcomponentsderivedfromacollectionofcommonancestryinformativevariantsin1000GenomesProjectdata.ThePCsarethenusedasfeaturesinaGaussianMixtureModelingalgorithmtoclusterthemaccordingtotheirancestry.Anysamplesthatcannotbeincludedinanyofthesubsetsduetotheiruniqueancestryorbadgenotyping,areflaggedasoutliers.

SampleMetricOutlierDetectionDuringclustering,metricsforeachsamplewillbecalculated.Whichmetricsarecalculatedwillvarydependingonthetypeofdatareceived.Someofthemorerecognizablemetricsaretransition/transversionrate,callrate,andthenumberofsingletonscalled.Foreachsamplemetric,wewillcalculatetheresidualsresultingfromregressingthemetriconprincipalcomponentsofancestry.Thenwewillcalculateprincipalcomponentsonthoseadjustedmetrics.GaussianMixtureModelingisemployedagainatthisstage,bothontheprincipalcomponentsoftheadjustedmetrics,andoneachoftheindividualadjustedmetrics.Anysamplesthatdonotclusterusingthesetwoapproacheswillbeflaggedasoutliers.

PedigreeReconstructionIfyourdataisfoundtohavepairsofrelatedsamples,pedigreereconstructionwillbeperformed.

QCReportUponcompletionofsampleQC,areportwillbeprovidedtothesubmittertofacilitatethecreationofasuitableanalysisplan.

Page 25: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

25

LastupdatedNovember8,2016

Figure11.AMPT2DQualityControlProcess