Top Banner
Data visualization with Python and SVG Plotting an RNA secondary structure Sukjun Kim The Baek Research Group of Computational Biology Seoul National University April 11 th , 2015 Special Lecture at Biospin Group 1
21

Data visualization with Python and SVG

Jul 30, 2015

Download

Data & Analytics

Sukjun Kim
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data visualization with Python and SVG

1

Data visualization with Python and SVGPlotting an RNA secondary structure

Sukjun KimThe Baek Research Group of Computational Biology

Seoul National University

April 11th, 2015

Special Lecture at Biospin Group

Page 2: Data visualization with Python and SVG

2

Plotting libraries for data visualization

• They have their own language for plotting.

• They should be installed prior to use.

• There are dependencies on upper level libraries.

• They are appropriate for high level graphics.

• We cannot customize a plot at low level.

R matplotlib d3.js

gnuplot Origin PgfPlots

PLplot Pyxplot Grace

Page 3: Data visualization with Python and SVG

3

SVG(Scalable Vector Graphics)

• XML-based vector image format for two-dimensional graphics.

• The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999.

• As XML files, SVG images can be created and edited with any text editor.

• All major modern web browsers – including Mozilla Firefox, Internet Explorer, Google Chrome, Opera, and Safari – have at least some degree of SVG rendering support.

(Wikipedia – Scalable Vector Graphics)

Data visualization by writing SVG document

• SVG markup language is open standard and easy to learn.

• Not only python but also any programming language can be used.

• It requires no dependent libraries.

• We can customize graphic elements at low level.

Page 4: Data visualization with Python and SVG

4

Structure of SVG document

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="100" height="100">

<circle cx="50" cy="50" r="40" stroke="green" stroke-width="4" fill="yellow"/>

</svg>

XML tag

declaration of DOCTYPE

start of SVG tag

end of SVG tag

contents ofSVG document

SVG elements

• SVG has some predefined shape elements.

• rectangle <rect>, circle <circle>, ellipse <ellipse>, line <line>,polyline <polyline>, polygon <polygon>, path <path>, ...

• group <g>, hyperlink <a>, text <text>, ...

40

(50,50)

Page 5: Data visualization with Python and SVG

RNA secondary structural data

## microRNA structural dataseq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

5

RNAplotRNAfoldseq dotbr, pairs coor

How to generate RNA structural data?

(Vienna RNA package, http://www.tbi.univie.ac.at/RNA/)

• seq: RNA sequence.

• dotbr: dot-bracket notation which is used to define RNA secondary structure.

• pairs: base-pairing information.

• coor: x and y coordinates for nucleotides.

This is our final image to plot

Page 6: Data visualization with Python and SVG

Writing a SVG tag in python script

6

out = []out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">\n') ## svg elements here out.append('</svg>\n')open('rna.svg', 'w').write(''.join(out))

<svg xmlns="http://www.w3.org/2000/svg" version="1.1"></svg>

rna.py

rna.svg

SVG documents basically requires open and close SVG tags

Page 7: Data visualization with Python and SVG

SVG Polyline

7

<polyline points="10,10 20,10 10,20 20,20" style="fill:none;stroke:black;stroke-width:3"/>

(10,10) (20,10)

(10,20) (20,20)

fill:none

stroke:black

stroke-width:3

Page 8: Data visualization with Python and SVG

Drawing phosphate backbone

8

points = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])

out.append('<polyline points="%s" style="fill:none; stroke:black; stroke-width:1;"/>\n'%(points))

coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

In DNA and RNA, phosphate backbone is regarded as a skeleton of the molecule. The skeleton will be represented by SVG <polyline> tag.

We have x and y coordinates of each nucleotide as below.

Using the coordination information, we can specifiy points attribute of polyline tag.

Page 9: Data visualization with Python and SVG

SVG Line

9

<line x1="0" y1="0" x2="20" y2="20" style="stroke:red;stroke-width:2"/>

(0,0)

(20,20)

stroke:red

stroke-width:2

Page 10: Data visualization with Python and SVG

Drawing base-pairing

10

for i, j in pairs:    x1, y1 = coor[i]    x2, y2 = coor[j]    out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f" style="stroke:black; stroke-width:1;"/>\n'%(x1, y1, x2, y2))

pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

Watson-Crick base pairs occur between A and U, and between C and G. We will use <line> tag to represent the hydrogen bonds.

In addition to a coordination information, we also have base-pairing information in the form of tuple carrying the indexes of two nucleotides.

From two types of data, base-pairing information can be visualized as a simple line.

Page 11: Data visualization with Python and SVG

SVG Circle

11

<circle cx="50" cy="50" r="20" style="fill:red;stroke:black;stroke-width:3"/>

(50,50)

fill:red

stroke:black

40

stroke-width:3

Page 12: Data visualization with Python and SVG

SVG Text

12

<text x="0" y="15" font-size="15" style="fill:blue">I love SVG!</text>

(0,15)

fill:blue

font-size="15"I love SVG!

Page 13: Data visualization with Python and SVG

Drawing nucleotides

13

A

Each nucleotide will be represented by one character text enclosed with a circle.

seq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

<text><circle>

for i, base in enumerate(seq):    x, y = coor[i]    out.append('<circle cx="%.3f" cy="%.3f" r="%.3f" style="fill:white; stroke:black; stroke-width:1"/>\n'%(x, y, 5))    out.append('<text x="%.3f" y="%.3f" font-size="6" text-anchor="middle" style="fill:black">%s</text>\n'%(x, y+6*0.35, base))

RNA sequence and a coordination information is required.

<text> tag should be written after the <circle> tag.

Page 14: Data visualization with Python and SVG

Content of the python script

14

## microRNA structural dataseq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'pairs = [(0, 68), (1, 67), (2, 66), (4, 64), (5, 63), (6, 62), (7, 61), (9, 59), (10, 58), (11, 57), (12, 56), (13, 55), (14, 54), (15, 53), (16, 52), (17, 51), (19, 49), (20, 48), (21, 47), (22, 46), (23, 45), (24, 44), (25, 43), (26, 42), (27, 41), (28, 40), (29, 39)]coor = [(69.515,526.033),(69.515,511.033),(69.515,496.033),(61.778,483.306),(69.515,469.506),(69.515,454.506),(69.515,439.506),(69.515,424.506),(62.691,412.302),(69.515,400.099),(69.515,385.099),(69.515,370.099),(69.515,355.099),(69.515,340.099),(69.515,325.099),(69.515,310.099),(69.515,295.099),(69.515,280.099),(61.778,266.298),(69.515,253.571),(69.515,238.571),(69.515,223.571),(69.515,208.571),(69.515,193.571),(69.515,178.571),(69.515,163.571),(69.515,148.571),(69.515,133.571),(69.515,118.571),(69.515,103.571),(56.481,95.317),(50.000,81.317),(52.139,66.039),(62.216,54.357),(77.015,50.000),(91.814,54.357),(101.891,66.039),(104.030,81.317),(97.549,95.317),(84.515,103.571),(84.515,118.571),(84.515,133.571),(84.515,148.571),(84.515,163.571),(84.515,178.571),(84.515,193.571),(84.515,208.571),(84.515,223.571),(84.515,238.571),(84.515,253.571),(92.252,266.298),(84.515,280.099),(84.515,295.099),(84.515,310.099),(84.515,325.099),(84.515,340.099),(84.515,355.099),(84.515,370.099),(84.515,385.099),(84.515,400.099),(91.339,412.302),(84.515,424.506),(84.515,439.506),(84.515,454.506),(84.515,469.506),(92.252,483.306),(84.515,496.033),(84.515,511.033),(84.515,526.033)]

out = []out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">\n')

## [1] phosphate backbone - <polyline> tagpoints = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])out.append('<polyline points="%s" style="fill:none; stroke:black; stroke-width:1;"/>\n'%(points))

## [2] base-pairing - <line> tagfor i, j in pairs:    x1, y1 = coor[i]    x2, y2 = coor[j]    out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f" style="stroke:black; stroke-width:1;"/>\n'%(x1, y1, x2, y2))

## [3] nucleotide - <circle> and <text> tagsfor i, base in enumerate(seq):    x, y = coor[i]    out.append('<circle cx="%.3f" cy="%.3f" r="%.3f" style="fill:white; stroke:black; stroke-width:1"/>\n'%(x, y, 5))    out.append('<text x="%.3f" y="%.3f" font-size="6" text-anchor="middle" style="fill:black">%s</text>\n'%(x, y+6*0.35, base))

out.append('</svg>\n')open('rna.svg', 'w').write(''.join(out))

Page 15: Data visualization with Python and SVG

How to use other SVG tags? Go to w3schools.com!

Page 16: Data visualization with Python and SVG

16

Real exampleswith Python and SVG

Page 17: Data visualization with Python and SVG

17

reciPlot

<text><polygon>

Plot for visualizingthe tissue-specific

expression of genes.

Page 18: Data visualization with Python and SVG

18

escPlot

<line><text><path><circle><polyline>

Plot for representing expression, structure, and conservation data of RNA

collectively in a single plot.

Page 19: Data visualization with Python and SVG

wheelPlot

19

<circle><polyline><path> <line><rect> <text>

Plot for visualizingall suboptimal RNA

secondary structures.

Page 20: Data visualization with Python and SVG

Conclusions

20

• There are many graphic tools and libraries for data visualization.

• These software options provide a function limited to high level graphics.

• No dependent libraries or significant time investment are required for learning a specific language to write SVG documents.

• If you want to plot a noncanonical type of graph and customize it at low level, writing a SVG document with Python will be the best solution that meets your purpose.

Page 21: Data visualization with Python and SVG

Thank you!Have a nice weekend.

21