Top Banner
Presenter: Oyekanmi Nash, PhD Node Principal Investigator, H3Africa Bioinformatics Network Node at National Biotechnology Development Agency (NABDA) Abuja, Nigeria Visual Analytical Screening System for Disease Linked Gene Variants Visual Discovery Tools Cardiometabolic Diseases
28

Visual Analytical Screening System for Disease Linked Gene Variants - Oyekanmi Nash

Aug 23, 2014

Download

Science

Background: The major bottleneck in genome sequencing is no longer data generation, but the computational challenges around data analysis, display and integration. New approaches and methods are, therefore, required to meet these challenges. Visual analytics is the representation and presentation of data that exploits human visual perception abilities in order to amplify cognition. Opportunities exist for African researchers to expand the use of visual discovery tools and curated datasets to enable visual discovery (exploration, mining and analysis via interactive visual interfaces) of bioinformatics results from high-quality genomics research.
Methods: We are developing a system of visual analytics resources that are based on molecular and clinical data including molecular consequences of single nucleotide variants; the RNA-seq expression levels of transcripts; and the functional sites in protein sequences.
Results: We have developed an initial set of visual analytics resources with the use case as the major intrinsic protein family of water and glycerol transporters. Members of these protein family have been implicated in diverse cardiometabolic diseases. The computational resources developed can be adapted for gene lists including those obtained from high-throughput assays. The long-term goal of the project is to empower researchers to make discoveries from largescale molecular and clinical datasets to support decision-making on genetic and environmental determinants of cardiometabolic diseases in Africa.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Presenter: Oyekanmi Nash, PhDNode Principal Investigator,

H3Africa Bioinformatics Network Node at National Biotechnology Development Agency (NABDA)

Abuja, Nigeria

Visual Analytical Screening System for Disease Linked Gene Variants

Visual Discovery Tools

Cardiometabolic Diseases

Page 2: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

NIH/WT H3Africa Research Network

Page 3: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

H3Africa/HVP: Leveraging Potentials

• NIH/WT-H3Africa • Collaborative Centers : 8• Research Projects: 9• BioBanks : 4• Bioinformatics Network : 1

Page 4: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Map of Africa showing the distribution of nodes in the H3ABioNet network

Page 5: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

H3Africa: Bioinformatics Network

• H3ABioNet: a sustainable African Bioinformatics Network for H3Africa

The network provide:• computational infrastructure and hardware, • human resources, • tools and computational solutions for genomic and population-based research,

and • communications among African researchers and other interested parties.These aims are be achieved by:• providing user support, • training and capacity development, • research and tools development, and • outreach and communication.

Page 6: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

ORGANIZATION OF THE HVP Nigeria Node

ICCAC Country Representative : Prof. Oyekanmi Nash, Alternate Representative: Hadiza Rasheed-Jada

Reports directly to the DG/CEO, NABDA/FMST

Page 7: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

ORGANIZATION OF THE HVP NIGERIA NODE IIThe staff members of the Node include:• Alternate Representative - Hadiza Rasheed-Jada• Node Manager - Atinuke Hassan• Systems Administrator - Adekunle Farouk• Research Associates - Abimbola Kashim

- Deborah Fasesan - Taoheed Abdulkareem - Ayodele Fakoya - Adijat Ozohu Jimoh • Post-doctoral Researcher - Dr. Segun Fatumo

Institutional and Researchers Affiliation to the Node will drive the activities of the Node.

Page 8: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Background – Cardiometabolic Diseases• Worldwide cardiometabolic diseases are the major causes of:

• Disability; Rising Healthcare Costs and Deaths• Examples:

• Type 2 diabetes, hypertension, dyslipidemia, coronary heart disease and chronic kidney disease

• Over the next 7 years • Africa is projected to experience the largest increase in

death rates from cardiovascular disease, cancer, respiratory disease and diabetes (Aikins et al., 2010)

Noncommunicable Diseases AFR - 2015 AFR - 2030 Fold ChangeDiabetes mellitus 205,378.79 390,614.91 1.90Malignant neoplasms 521,029.65 966,876.53 1.86Other neoplasms 20,155.67 37,375.03 1.85Cardiovascular diseases 1,179,320.20 1,966,212.66 1.67Respiratory diseases 234,649.72 356,651.78 1.52

Source: Global Health Estimates (GHE) 2013: Deaths by age, sex and cause

Page 9: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

A Strategy in Africa to Address Burden of Cardiometabolic Diseases

• Genomic and Environmental Determinants (H3Africa Projects)

• H3Africa Kidney Disease Research Network• Genomic and environmental risk factors for

cardiometabolic disease in Africans• Burden, spectrum and etiology of type 2

diabetes in sub-Saharan Africa• …..

Page 10: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Examples of Projected Massive and Complex Datasets from H3Africa Projects (2013….

Type 2 Diabetes Project• 12,000 Cases and 12,000 Controls • Sequencing of known T2DM regions • Genome-wide genotyping arrays• Whole exome/genome sequencing

Body Composition Project• African genome structure• Phenotyping and sampling for Cohorts • Genetic and environmental contribution to

body composition (~12,000 individuals)

These research investigations rely significantly on bioinformatics analysis and inferences from large and heterogeneous datasets

obtained from populations inside and outside Africa.

Page 11: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

DATA SCIENCE

• Data Flow• Data Curation• Data Analysis

Page 12: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

“The major bottleneck in genome sequencing is no longer data generation—the computational challenges around data analysis, display and integration are now rate limiting. New approaches and methods are required to meet these challenges”.

National Human Genome Research Institute Strategic Plan:Charting a course for genomic medicine from base pairs to bedside http://www.genome.gov/Pages/About/Planning/2011NHGRIStrategicPlan.pdf

Making Discoveries from the Massive and Complex Genomics Datasets and Bioinformatics Results

from H3Africa Projects

Page 13: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Visual Discovery Tools

Visual Discovery Tasks• Exploration • Mining • Analysis

To access and analyze data visually at the speed of thought with

minimal or no IT assistance and then share the results of their

discoveries with colleagues, usually in the form of an interactive

dashboard

Benefits• Data sharing • Collaboration• Easy to Deploy• Research in Limited or No Internet

Access

Page 14: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

What is Visual Analytics?

http://www.slideshare.net/TableauSoftware/visual-analytics-best-practices

“Visual analytics is the representation and presentation of data that exploits our visual perception abilities in order to amplify cognition.”

- Andy Kirk, author of “Data Visualization: a successful design process”

Page 15: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Knowledge-Building Insights from Visual Analytics

http://www.flickr.com/photos/pnnl/6310387725/

Page 16: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Visual Interfaces

Page 17: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Examples of Visual Analytics Softwarehttp://www.vacommunity.org/Education+Resources

Toolkits

Analytic Tools

Jigsaw

Page 18: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Types of Visual Discovery Tools

Page 19: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

H3ABioNet Workshop: Visual Analytics of Human Genomics Variation Datasets

July 2013Opportunities exist for African researchers to expand the use of visual

discovery tools and curated datasets to enable visual discovery (exploration, mining and analysis via interactive visual interfaces) of

bioinformatics results from high-quality genomics research.

Page 20: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash
Page 21: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Long-Term Goal of Project• Visual Analytical System for

• discovery of molecular consequences of variants and linked transcript expression for sets of genes or gene families

http://www.ensembl.org/info/genome/variation/predicted_data.html

Molecular Consequences of Gene Variants

Transcripts

Page 22: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Research Approach Obtain Datasets

Ensembl Genome Browser (www.ensembl.org) BioMart for genes and variants

Database of Alternate Transcript Expression Data Download for transcript expression values

Data Cleaning and Preparation Scripting and Spreadsheets

Construct Views and Dashboards To address scientific questions such as:

Identify molecular consequences of gene variants (Single Nucleotide Variants) in specific disease or trait.

Identify gene variants that result in multiple molecular consequences in gene transcripts.

Identify gene variant specific for transcript Compare RNA-Seq expression values for gene transcripts in

tissues.

Page 23: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Use Case – Gene Families AQUAPORIN – Water and glycerol transporter

13 Mammalian Aquaporins (AQP0-AQP12). Malfunction or absence linked to disease. Adipose AQP7 deficiency is associated with an increase of

intracellular glycerol content. Up-regulation of AQP1 in the glomeruli of most diseased

kidneys.

Reference: Hibuse et al. (2005). Aquaporin 7 deficiency is associated with development of obesity through activation of adipose glycerol kinase. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):10993-8. http://www.ncbi.nlm.nih.gov/pubmed/16009937

Page 24: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Molecular Consequences of Single Nucleotide Variants of Aquaporin Genes - Overview

Consequence to transcript

Ensembl Gene ID / Approved Symbol

ENSG00000135517

MIP

ENSG00000240583

AQP1

ENSG00000167580

AQP2

ENSG00000165272

AQP3

ENSG00000171885

AQP4

ENSG00000161798

AQP5

ENSG00000086159

AQP6

ENSG00000165269

AQP7

ENSG00000103375

AQP8

ENSG00000103569

AQP9

ENSG00000143595

AQP10

ENSG00000178301

AQP11

ENSG00000184945

AQP12A

ENSG00000185176

AQP12B

3_prime_UTR_variant5_prime_UTR_variantcoding_sequence_variantdownstream_gene_variantfeature_elongationfeature_truncationframeshift_variantinframe_deletioninframe_insertioninitiator_codon_variantintron_variantmissense_variantnc_transcript_variantNMD_transcript_variantnon_coding_exon_variantsplice_acceptor_variantsplice_donor_variantsplice_region_variantstop_gainedstop_loststop_retained_variantsynonymous_variantupstream_gene_variant 270

11

2

347

35526

369

31829

28815

375

87584

611

30

295130

2,52610

29

1471048244622

182

39088

178

69

504

245471

12219

55231718

205

58645

1126

10

761

1,36245

861

715919

800

360

1,33499

41

104172256168941

45883

9985732

136

30816

17

52

9239

151

1

1829

298

61

68647

114

275

42778

49512

51464

433

6820

2,108222

214

1111215

4755,875

2

37344455

1,99124

120259

29950

47

60538

1

64434

229

644

60973

321

326

252173

5,925

106295950

40155

33850

10

354

13373

223

54

71199

231

22815

22

31

42933

743

5648

197

1314

70139

46

217

3582

128

510

4445

16

78540

36

73961175

338

12

234

723

2255

Page 25: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Visual Analytical System for Screening Disease Linked Gene Variants Integrates data from ENSEMBL and Database of Alternate Transcript Expression (DBATE)

Data

Sou

rces Blending of Data Dimensions from multiple Data Sources

Identifies Variants linked to TranscriptsInsights: rs199936776 is unique to AQP7-004 and could affect expression of transcript or properties of protein isoform

Page 26: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Identification of variants that could affect transcript expression in adipose tissues

Variation Name PolyPhen predictionAssociated TranscriptName Ensembl Transcript ID

0 50 100adipose_GSE12946

0 20 40adipose_GSE30611

rs2381002 probably damaging AQP7-001 ENST00000379507rs2381003 probably damaging AQP7-001 ENST00000379507rs2381004 possibly damaging AQP7-001 ENST00000379507rs62542743 probably damaging AQP7-001 ENST00000379507rs62542744 probably damaging AQP7-004 ENST00000447660

AQP7-001 ENST00000379507rs62542746 possibly damaging AQP7-204 ENST00000541274rs74668961 probably damaging AQP7-204 ENST00000541274rs114937176 probably damaging AQP7-204 ENST00000541274rs138290992 probably damaging AQP7-004 ENST00000447660

AQP7-001 ENST00000379507rs139024279 possibly damaging AQP7-004 ENST00000447660

AQP7-001 ENST00000379507probably damaging AQP7-004 ENST00000447660

AQP7-001 ENST00000379507rs140405277 probably damaging AQP7-001 ENST00000379507rs140411243 probably damaging AQP7-204 ENST00000541274rs199936776 possibly damaging AQP7-004 ENST00000447660TMP_ESP_9_33385689 probably damaging AQP7-004 ENST00000447660

AQP7-001 ENST00000379507

Page 27: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

SummaryIn Africa, researchers will be able to use visual discovery tools to make

DISCOVERIES from large-scale molecular and clinical datasets to support decision-making on genetic and environmental determinants of cardiometabolic

diseases.

Visual Analytics can facilitate collaboration between Data Experts and Subject Matter Experts

Page 28: Visual Analytical Screening System for  Disease Linked Gene Variants - Oyekanmi Nash

Acknowledgments• H3Africa Bioinformatics Network (H3ABioNet)

– National Human Genome Research Institute – NIH Common Fund– Grant U41HG006941

• National Institutes of Health• Dr. Raphael Isokpehi, Bethune-Cookman University,

Florida, USA• National Biotechnology Development Agency, Federal

Ministry of Science and Technology, Nigeria• Visual Analytics in Biology Curriculum Network