Top Banner
Reproducible Workflows with Jupyter Notebook and Cytoscape Keiichiro Ono Cytoscape Core Developer Team UC, San Diego Trey Ideker Lab / National Resource for Network Biology 5/19/2016 Advanced Cytoscape Workshop
65

Reproducible Workflow with Cytoscape and Jupyter Notebook

Jan 19, 2017

Download

Data & Analytics

Keiichiro Ono
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reproducible Workflow with Cytoscape and Jupyter Notebook

Reproducible Workflows with Jupyter Notebook and CytoscapeKeiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology

5/19/2016 Advanced Cytoscape Workshop

Page 2: Reproducible Workflow with Cytoscape and Jupyter Notebook

Course Materials: Clone/Fork/Download this repository!

https://github.com/idekerlab/tsri-lecture

Setup Guide:

https://github.com/idekerlab/tsri-lecture/blob/master/documents/Setup%20Guide.pdf

Cytoscape 3.4.0:

http://www.cytoscape.org/download.php

Page 3: Reproducible Workflow with Cytoscape and Jupyter Notebook

Keiichiro Ono

Cytoscape Core Developer since 2005 @UCSD Trey Ideker Lab

Area of Interest:Biological Data Integration & Visualization

Page 4: Reproducible Workflow with Cytoscape and Jupyter Notebook

Agenda

• Reproducible Analysis & Visualization

• Introduction to Jupyter Notebook

• Create a reproducible network visualization workflows with Python

Page 5: Reproducible Workflow with Cytoscape and Jupyter Notebook

Review: Cytoscape Core Features

Page 6: Reproducible Workflow with Cytoscape and Jupyter Notebook

Review

- Network analysis / visualization is a powerful method to get biological insights from your screening result

- Cytoscape is the de-facto standard tool to perform this type of analysis

Page 7: Reproducible Workflow with Cytoscape and Jupyter Notebook

Review

-Core features of Cytoscape -Navigation (Pan/Zoom/Select) -Network / Table Data Import -Automatic Layout -Visual Style

Page 8: Reproducible Workflow with Cytoscape and Jupyter Notebook

Drawing Biological Networks

VS

Page 9: Reproducible Workflow with Cytoscape and Jupyter Notebook

Drawing Tools

You need to specify color of each node, width of each edge, shape of nodes, etc.

Page 10: Reproducible Workflow with Cytoscape and Jupyter Notebook

There is one huge difference between Cytoscape and Illustrator…

Page 11: Reproducible Workflow with Cytoscape and Jupyter Notebook

In Cytoscape, Your Data Controls View

Page 12: Reproducible Workflow with Cytoscape and Jupyter Notebook

Creating Visualizations in Cytoscape

Name Type

BRCA1 gene

MAP2K1 gene

C05981 compound

• Mapping from Type to Node Shape • Mapping from Type to Node Color

C05981

BRCA1

MAP2K1

Creating mappings from data points to Visual Properties

Page 13: Reproducible Workflow with Cytoscape and Jupyter Notebook

Reproducibility

Page 14: Reproducible Workflow with Cytoscape and Jupyter Notebook

Recap

Cytoscape Session File — for sharing results

But what about process?

Page 15: Reproducible Workflow with Cytoscape and Jupyter Notebook

http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/

https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938http://www.nature.com/nature/journal/v483/n7391/full/483531a.html

Reproducibility…it’s a known issue

Page 16: Reproducible Workflow with Cytoscape and Jupyter Notebook

Problems- Reproducibility of biological research, especially for in vivo/vitro

experiments, is a hard problem

- But this is true even for in silico analysis! - OS version - Revision of scripts - Data analysis software versions - Version of data files - Command line parameters written on a paper napkin - “Black magic” only a grad student knows

- This is something we need to fix, using latest technologies and best practices

Page 17: Reproducible Workflow with Cytoscape and Jupyter Notebook

Typical Workflow

Page 18: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation Analysis Visualization

Page 19: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation

Page 20: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation

- Cleansing

- Normalization

- Missing values

- Corrupted values

- Reformat

- Conversion

Page 21: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation Analysis Visualization

Page 22: Reproducible Workflow with Cytoscape and Jupyter Notebook

Analysis

Page 23: Reproducible Workflow with Cytoscape and Jupyter Notebook

Analysis

- Filtering

- Standard graph statistics

- Density

- Betweenness - Centrality

- Clustering

- Community Detection

- GO enrichment analysis

Page 24: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation Analysis Visualization

Page 25: Reproducible Workflow with Cytoscape and Jupyter Notebook

Visualization

Page 26: Reproducible Workflow with Cytoscape and Jupyter Notebook

Visualization

- Mapping

- Data points to visual variables

- Layout

- For graphs:

- Force-directed

- Tree

Page 27: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation Analysis Visualization

Page 28: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation

Analysis Visualization

Page 29: Reproducible Workflow with Cytoscape and Jupyter Notebook

Data Preparation

Analysis Visualization

Page 30: Reproducible Workflow with Cytoscape and Jupyter Notebook

Cytoscape for Interactive Visualization

Python for Data Manipulation / Analysis

Page 31: Reproducible Workflow with Cytoscape and Jupyter Notebook

Lab Notebook for in silico Experiments

Page 32: Reproducible Workflow with Cytoscape and Jupyter Notebook

Interactive Command-Line +

Markdown-based Documents

Page 33: Reproducible Workflow with Cytoscape and Jupyter Notebook

IPython Notebook? Jupyter?

Page 34: Reproducible Workflow with Cytoscape and Jupyter Notebook

IPython Notebook

Notebook UI

+ Python Kernel

Jupyter Notebook UI

+

Language Kernel

(R/Julia/etc.)

Page 35: Reproducible Workflow with Cytoscape and Jupyter Notebook

Language-Agnostic

- From next version (4.x), Python Notebook will be an implementation of Jupyter

- You can switch to other language kernels

- In this lecture, we will use Python, but you can use language of your choice to control Cytoscape

Page 36: Reproducible Workflow with Cytoscape and Jupyter Notebook
Page 37: Reproducible Workflow with Cytoscape and Jupyter Notebook

Question

• Cytoscape is a desktop application

• Point & click GUI operation

• Easy to use, but how can we make our workflow reproducible?

Page 38: Reproducible Workflow with Cytoscape and Jupyter Notebook

REST

Page 39: Reproducible Workflow with Cytoscape and Jupyter Notebook

What is cyREST?

- Platform-independent, RESTful API module for Cytoscape - Means you can access basic Cytoscape data objects

programmatically - Now it’s a Cytoscape Core feature!

REST

Page 40: Reproducible Workflow with Cytoscape and Jupyter Notebook

Interactive Data Analysis Environments

In-House Databases External Computing Resources

- Graph Layout- Statistical Analysis- Data Pre-processing

RStudio

- NumPy- SciPy- Pandas- NetworkX

IPython Notebook

File / Code Hosting ServicesPublic Data Repository

PSICQUIC Services

EBI RDF Platform

Other Bioinformatics Web Applications / Services

- igraph- rCurl

Command Line Tools

> sed> awk> grep> curl

Web Browsers

Data Repository & Collaboration Service

Data Bus (Internet)

Your Workstation

Cytoscape App Store

Cytoscape Desktop

Apps

Core

REST

Page 41: Reproducible Workflow with Cytoscape and Jupyter Notebook

REST API?

Page 42: Reproducible Workflow with Cytoscape and Jupyter Notebook

curl http://mygene.info/v2/query?q=kras

{ "hits": [ { "taxid": 9606, "entrezgene": 3845, "symbol": "KRAS", "_id": "3845", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10090, "entrezgene": 16653, "symbol": "Kras", "_id": "16653", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10116, "entrezgene": 24525, "symbol": "Kras", "_id": "24525", "name": "Kirsten rat sarcoma viral oncogene" }, { "taxid": 10090, "entrezgene": 110836, "symbol": "Kras2-rs2", "_id": "110836", "name": "Kirsten rat sarcoma oncogene 2, related sequence 2" }, { "taxid": 10090, "entrezgene": 110832, "symbol": "Kras2-rs1", "_id": "110832", "name": "Kirsten rat sarcoma oncogene 2, related sequence 1" }, { "taxid": 10090, "entrezgene": 111117, "symbol": "Kras1-ps", "_id": "111117", "name": "Kirsten rat sarcoma oncogene 1, pseudogene" } ], "max_score": 391.5175, "took": 4, "total": 6}

Page 43: Reproducible Workflow with Cytoscape and Jupyter Notebook

REST

Cytoscape 3.1+Clients

POST

PUT

DELETE

GET

How cyREST Works

Page 44: Reproducible Workflow with Cytoscape and Jupyter Notebook

Mapping Cytoscape API to HTTP Methods

Create

Read

Update

Delete

Cytoscape Operations

POST

GET

PUT

DELETE

HTTP Methods

Page 45: Reproducible Workflow with Cytoscape and Jupyter Notebook

Get full network with unique ID 52 as JSON

GET http://localhost:1234/v1/networks/52

Page 46: Reproducible Workflow with Cytoscape and Jupyter Notebook

http://localhost:1234/v1/networks/52

Page 47: Reproducible Workflow with Cytoscape and Jupyter Notebook

Language-Specific Shims

For Python For R

Page 48: Reproducible Workflow with Cytoscape and Jupyter Notebook

REST

Page 49: Reproducible Workflow with Cytoscape and Jupyter Notebook

RESTLab notebook to record

your workflow

Make Cytoscape controllable via scripts

Manage multiple versions of your

notebooks and other scripts

Page 50: Reproducible Workflow with Cytoscape and Jupyter Notebook

Hands-On:

Using Cytoscape from Jupyter Notebook

Page 51: Reproducible Workflow with Cytoscape and Jupyter Notebook

Where should we go from here?

Page 52: Reproducible Workflow with Cytoscape and Jupyter Notebook

RESTLab notebook to record

your workflow

Make Cytoscape controllable via scripts

Manage multiple versions of your

notebooks and other scripts

Missing: Environment to execute your workflow

Page 53: Reproducible Workflow with Cytoscape and Jupyter Notebook

Python 3.5.0

Ubuntu 15.04

Pandas, numpy, scipy, jupyter…

Page 54: Reproducible Workflow with Cytoscape and Jupyter Notebook

Docker as Portable Data Analysis Environment

Page 55: Reproducible Workflow with Cytoscape and Jupyter Notebook

Bare Metal MachineOSVirtual Machine

Frameworks

Your App

Page 56: Reproducible Workflow with Cytoscape and Jupyter Notebook

Bare Metal MachineOS (Linux)

Docker

FrameworksApplication

FrameworksApplication

FrameworksApplication

FrameworksApplication

FrameworksApplication

Page 57: Reproducible Workflow with Cytoscape and Jupyter Notebook
Page 58: Reproducible Workflow with Cytoscape and Jupyter Notebook

What is Docker?

- Container to run applications in an isolated environment

- Application = Layer of images

- Sharable Environments

- Environments as code

Page 59: Reproducible Workflow with Cytoscape and Jupyter Notebook

Docker Hub

- Sharing environments as code!

- Dockerfile - Definition of your container

- “GitHub of Images”

Page 60: Reproducible Workflow with Cytoscape and Jupyter Notebook

Jupyter Official Images

Page 61: Reproducible Workflow with Cytoscape and Jupyter Notebook

Resources

- https://www.dataquest.io/blog/docker-data-science/

- https://try.jupyter.org/

Page 62: Reproducible Workflow with Cytoscape and Jupyter Notebook

-

- Two Google Groups

- [email protected]

- [email protected]

- ANY question is OK!

Getting Help

Page 63: Reproducible Workflow with Cytoscape and Jupyter Notebook

Further Readings

Page 64: Reproducible Workflow with Cytoscape and Jupyter Notebook

Further Readings

• My presentation slides

• http://www.slideshare.net/keiono

• cyREST web sites

• http://apps.cytoscape.org/apps/cyrest

• https://github.com/idekerlab/cyREST/wiki

• py2cytoscape — https://github.com/idekerlab/py2cytoscape