Data visualization with Python and SVG Plotting an RNA secondary structure Sukjun Kim The Baek Research Group of Computational Biology Seoul National University April 11 th , 2015 Special Lecture at Biospin Group 1
Jul 30, 2015
1
Data visualization with Python and SVGPlotting an RNA secondary structure
Sukjun KimThe Baek Research Group of Computational Biology
Seoul National University
April 11th, 2015
Special Lecture at Biospin Group
2
Plotting libraries for data visualization
• They have their own language for plotting.
• They should be installed prior to use.
• There are dependencies on upper level libraries.
• They are appropriate for high level graphics.
• We cannot customize a plot at low level.
R matplotlib d3.js
gnuplot Origin PgfPlots
PLplot Pyxplot Grace
3
SVG(Scalable Vector Graphics)
• XML-based vector image format for two-dimensional graphics.
• The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999.
• As XML files, SVG images can be created and edited with any text editor.
• All major modern web browsers – including Mozilla Firefox, Internet Explorer, Google Chrome, Opera, and Safari – have at least some degree of SVG rendering support.
(Wikipedia – Scalable Vector Graphics)
Data visualization by writing SVG document
• SVG markup language is open standard and easy to learn.
• Not only python but also any programming language can be used.
• It requires no dependent libraries.
• We can customize graphic elements at low level.
4
Structure of SVG document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="100" height="100">
<circle cx="50" cy="50" r="40" stroke="green" stroke-width="4" fill="yellow"/>
</svg>
XML tag
declaration of DOCTYPE
start of SVG tag
end of SVG tag
contents ofSVG document
SVG elements
• SVG has some predefined shape elements.
• rectangle <rect>, circle <circle>, ellipse <ellipse>, line <line>,polyline <polyline>, polygon <polygon>, path <path>, ...
• group <g>, hyperlink <a>, text <text>, ...
40
(50,50)
RNA secondary structural data
## microRNA structural dataseq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
5
RNAplotRNAfoldseq dotbr, pairs coor
How to generate RNA structural data?
(Vienna RNA package, http://www.tbi.univie.ac.at/RNA/)
• seq: RNA sequence.
• dotbr: dot-bracket notation which is used to define RNA secondary structure.
• pairs: base-pairing information.
• coor: x and y coordinates for nucleotides.
This is our final image to plot
Writing a SVG tag in python script
6
out = []out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">\n') ## svg elements here out.append('</svg>\n')open('rna.svg', 'w').write(''.join(out))
<svg xmlns="http://www.w3.org/2000/svg" version="1.1"></svg>
rna.py
rna.svg
SVG documents basically requires open and close SVG tags
SVG Polyline
7
<polyline points="10,10 20,10 10,20 20,20" style="fill:none;stroke:black;stroke-width:3"/>
(10,10) (20,10)
(10,20) (20,20)
fill:none
stroke:black
stroke-width:3
Drawing phosphate backbone
8
points = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])
out.append('<polyline points="%s" style="fill:none; stroke:black; stroke-width:1;"/>\n'%(points))
coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
In DNA and RNA, phosphate backbone is regarded as a skeleton of the molecule. The skeleton will be represented by SVG <polyline> tag.
We have x and y coordinates of each nucleotide as below.
Using the coordination information, we can specifiy points attribute of polyline tag.
SVG Line
9
<line x1="0" y1="0" x2="20" y2="20" style="stroke:red;stroke-width:2"/>
(0,0)
(20,20)
stroke:red
stroke-width:2
Drawing base-pairing
10
for i, j in pairs: x1, y1 = coor[i] x2, y2 = coor[j] out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f" style="stroke:black; stroke-width:1;"/>\n'%(x1, y1, x2, y2))
pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
Watson-Crick base pairs occur between A and U, and between C and G. We will use <line> tag to represent the hydrogen bonds.
In addition to a coordination information, we also have base-pairing information in the form of tuple carrying the indexes of two nucleotides.
From two types of data, base-pairing information can be visualized as a simple line.
SVG Circle
11
<circle cx="50" cy="50" r="20" style="fill:red;stroke:black;stroke-width:3"/>
(50,50)
fill:red
stroke:black
40
stroke-width:3
SVG Text
12
<text x="0" y="15" font-size="15" style="fill:blue">I love SVG!</text>
(0,15)
fill:blue
font-size="15"I love SVG!
Drawing nucleotides
13
A
Each nucleotide will be represented by one character text enclosed with a circle.
seq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
<text><circle>
for i, base in enumerate(seq): x, y = coor[i] out.append('<circle cx="%.3f" cy="%.3f" r="%.3f" style="fill:white; stroke:black; stroke-width:1"/>\n'%(x, y, 5)) out.append('<text x="%.3f" y="%.3f" font-size="6" text-anchor="middle" style="fill:black">%s</text>\n'%(x, y+6*0.35, base))
RNA sequence and a coordination information is required.
<text> tag should be written after the <circle> tag.
Content of the python script
14
## microRNA structural dataseq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'pairs = [(0, 68), (1, 67), (2, 66), (4, 64), (5, 63), (6, 62), (7, 61), (9, 59), (10, 58), (11, 57), (12, 56), (13, 55), (14, 54), (15, 53), (16, 52), (17, 51), (19, 49), (20, 48), (21, 47), (22, 46), (23, 45), (24, 44), (25, 43), (26, 42), (27, 41), (28, 40), (29, 39)]coor = [(69.515,526.033),(69.515,511.033),(69.515,496.033),(61.778,483.306),(69.515,469.506),(69.515,454.506),(69.515,439.506),(69.515,424.506),(62.691,412.302),(69.515,400.099),(69.515,385.099),(69.515,370.099),(69.515,355.099),(69.515,340.099),(69.515,325.099),(69.515,310.099),(69.515,295.099),(69.515,280.099),(61.778,266.298),(69.515,253.571),(69.515,238.571),(69.515,223.571),(69.515,208.571),(69.515,193.571),(69.515,178.571),(69.515,163.571),(69.515,148.571),(69.515,133.571),(69.515,118.571),(69.515,103.571),(56.481,95.317),(50.000,81.317),(52.139,66.039),(62.216,54.357),(77.015,50.000),(91.814,54.357),(101.891,66.039),(104.030,81.317),(97.549,95.317),(84.515,103.571),(84.515,118.571),(84.515,133.571),(84.515,148.571),(84.515,163.571),(84.515,178.571),(84.515,193.571),(84.515,208.571),(84.515,223.571),(84.515,238.571),(84.515,253.571),(92.252,266.298),(84.515,280.099),(84.515,295.099),(84.515,310.099),(84.515,325.099),(84.515,340.099),(84.515,355.099),(84.515,370.099),(84.515,385.099),(84.515,400.099),(91.339,412.302),(84.515,424.506),(84.515,439.506),(84.515,454.506),(84.515,469.506),(92.252,483.306),(84.515,496.033),(84.515,511.033),(84.515,526.033)]
out = []out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">\n')
## [1] phosphate backbone - <polyline> tagpoints = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])out.append('<polyline points="%s" style="fill:none; stroke:black; stroke-width:1;"/>\n'%(points))
## [2] base-pairing - <line> tagfor i, j in pairs: x1, y1 = coor[i] x2, y2 = coor[j] out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f" style="stroke:black; stroke-width:1;"/>\n'%(x1, y1, x2, y2))
## [3] nucleotide - <circle> and <text> tagsfor i, base in enumerate(seq): x, y = coor[i] out.append('<circle cx="%.3f" cy="%.3f" r="%.3f" style="fill:white; stroke:black; stroke-width:1"/>\n'%(x, y, 5)) out.append('<text x="%.3f" y="%.3f" font-size="6" text-anchor="middle" style="fill:black">%s</text>\n'%(x, y+6*0.35, base))
out.append('</svg>\n')open('rna.svg', 'w').write(''.join(out))
How to use other SVG tags? Go to w3schools.com!
16
Real exampleswith Python and SVG
17
reciPlot
<text><polygon>
Plot for visualizingthe tissue-specific
expression of genes.
18
escPlot
<line><text><path><circle><polyline>
Plot for representing expression, structure, and conservation data of RNA
collectively in a single plot.
wheelPlot
19
<circle><polyline><path> <line><rect> <text>
Plot for visualizingall suboptimal RNA
secondary structures.
Conclusions
20
• There are many graphic tools and libraries for data visualization.
• These software options provide a function limited to high level graphics.
• No dependent libraries or significant time investment are required for learning a specific language to write SVG documents.
• If you want to plot a noncanonical type of graph and customize it at low level, writing a SVG document with Python will be the best solution that meets your purpose.
Thank you!Have a nice weekend.
21