Page 1
iTunes U Aggregator A Rapid-Fire Walkthrough
iTunes® and its Logo are registered trademarks of Apple Inc., registered in the U.S. and other countries. tele-TASK™ is a trademark of the Hasso-Plattner-Institut für Systemtechnik GmbH. All trademarks are property of their respective owners.
Page 2
iTunes U D
jan
go
Excel
Pro
filing components
REST templatetags
sch
em
a
visualization
tro
llfac
es
Page 3
motivation h
ttp://co
mm
on
s.wikim
edia.o
rg/wiki/File:In
ternet -m
ail.svg
Page 4
motivation
New Email iTunes U Weekly Report for Hasso-Plattner- Institut für Systemtechnik (HPI)
http
://com
mo
ns.w
ikimed
ia.org/w
iki/File:Intern
et -mail.svg
Page 8
hpi-de-public-dz-2011-10-16.xls
Page 12
hpi-de-public-dz-2011-10-16.xls
week 1 (calendar week #42)
week 2 (calendar week #43)
week 3 (calendar week #44)
week 4 (calendar week #45)
Page 13
hpi-de-public-dz-2011-10-16.xls
week 1 (calendar week #42)
week 2 (calendar week #43)
week 3 (calendar week #44)
week 4 (calendar week #45)
4 weeks in a report but one email each week
Page 15
Browse
unique collection names?????????
Page 19
Tracks
provider collection track provider
Page 20
Tracks
provider collection track collection
Page 21
Tracks
provider collection track track
Page 24
GUIDS globally unique in your face
Page 26
Previews
yay: a match! (with Browse)
Page 28
Edits
double meh.
Page 29
Django to the rescue
Page 30
Djan
go an
d th
e Djan
go Lo
go are registered
tradem
arks of D
jango
Softw
are Fou
nd
ation
.
BSD licensed
MVC driven
SERVER side
DOCUMENTED extremely well
RICH ecosystem
AWE some
Page 31
Djan
go an
d th
e Djan
go Lo
go are registered
tradem
arks of D
jango
Softw
are Fou
nd
ation
.
BSD licensed
MVC driven
SERVER side
DOCUMENTED extremely well
RICH ecosystem
AWE some
Page 32
Excel parsing schmarsing
Page 33
import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)
import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)
import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)
Page 34
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]
Page 35
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]
Page 36
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...] UserActions.objects.create(date=“2011-10-16”,
action=“Browse”, value=1991)
Page 37
pyxlreader?
pyExcelerator?
Page 38
pyxlreader?
pyExcelerator? unmaintained & undocumented
Page 39
Resolver One Excel meets IronPython
Page 40
Resolver One Excel meets IronPython
not suitable for Web applications
Page 41
Tracks
provider collection track
Page 42
Tracks
provider collection track
Series
Internet Security
Anti-Virus Software
Podcast Attack Signatures Social
Hacking ... 2011-10-16
60 tracks
Sample
2011-10-09 36 tracks ...
Page 43
Tracks
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
Page 44
Tracks
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
Page 45
Tracks
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
Page 46
VIRT TIME+ Command
153M 7:52.73 python manage.py runserver
VIRT TIME+ Command
144M 0:19.26 python manage.py runserver
import
aggregate sum
for a 2MB database 20k records
Page 47
VIRT TIME+ Command
153M 7:52.73 python manage.py runserver
VIRT TIME+ Command
144M 0:19.26 python manage.py runserver
import
aggregate sum
Page 48
caching ka-ching!
Page 49
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
transaction #1
transaction #2 ...
Page 50
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
from django.db import transaction with transaction.commit_on_success(): Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
transaction #1
Page 51
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
from django.db import transaction with transaction.commit_on_success(): Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
transaction #1
90% sys time savings
no difference for user time
Page 52
Y U NO FAST?
import logging logging.info(...)
Page 53
Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.
Page 54
Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.
1:48
Page 55
Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.
1:48
where is time spent accurately?
Page 56
Y U NO FAST?
python –mprofile …
Page 57
Y U NO FAST?
python –mprofile … python –mcProfile …
Page 58
Y U NO FAST?
python –mprofile … python –mcProfile …
python –mcProfile –s cumulative manage.py loadreport hpi-de-public-dz-2011-10-16.xls
Page 59
Y U NO FAST?
python –mprofile … python –mcProfile …
python –mcProfile –s cumulative manage.py loadreport hpi-de-public-dz-2011-10-16.xls
60% time spent in django.db.models.Manager.get_or_create
Page 60
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ...
Page 61
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ...
Page 62
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ...
Page 63
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Page 64
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
Page 65
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
Page 66
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
90% time savings through full denormalization
Page 67
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
90% time savings through full denormalization
5% larger database through denormalization
Page 68
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
90% time savings through full denormalization
5% larger database through denormalization
320% larger database through full denormalization
Page 69
Pokémon are copyrighted by Nintendo Co., Ltd. http://www.flickr.com/photos/darktabris/5654794283( © 2011 Sergio Cuellar
yo app’s so fat it consumes my whole memory
the dreaded MEMORY LEAK
Page 70
Pokémon are copyrighted by Nintendo Co., Ltd. http://www.flickr.com/photos/darktabris/5654794283( © 2011 Sergio Cuellar
yo app’s so fat it consumes my whole memory
Page 71
Series
Internet Security
Django object cache
Page 72
Series
Internet Security
Django object cache purge via Model.objects.update()
Page 74
check with objgraph
Page 75
@django.views.decorators.cache.cache_page def teletask_series(request, id): ...
view
Page 76
@django.views.decorators.cache.cache_page def teletask_series(request, id): ...
view
{% load cache %} {% cache 604800 views %} {% endcache %}
template
Page 77
@django.views.decorators.cache.cache_page def teletask_series(request, id): ...
view
{% load cache %} {% cache 604800 views %} {% endcache %}
template
{% load cache %} {% cache 604800 views %} {{views}} {% endcache %}
Page 78
@django.views.decorators.cache.cache_page def teletask_series(request, id): ...
view
{% load cache %} {% cache 604800 views %} {% endcache %}
template
from django.core.cache import cache cache.set('views', 60) cache.get('views')
low-level
{% load cache %} {% cache 604800 views %} {{views}} {% endcache %}
Page 83
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>
two different pages
Page 84
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
Page 85
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>
Page 86
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>
two different pages
Page 87
<h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {{ views }} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %} controller must be adapted
to pass views variable
Page 88
<h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {{ views }} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
Page 89
{% load itunesuagg %} <h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {% viewcount for name %} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
or: REST API
Page 91
Client (interaction)
Server
(performance)
Page 92
Third-party
Client (interaction)
Server
(performance)
Page 93
Google Chart API
Third-party
Client (interaction)
Server
(performance)
Matplotlib, Cairo
YUI, Google Chart Tools, Flot, Highchart
Page 94
from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()
Google Chart API
Page 95
from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()
Google Chart API
http://chart.apis.google.com/chart?cht=p3&chs=250x100&chd=s:pU&chl=Hello|World
Page 96
from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()
Google Chart API
http://chart.apis.google.com/chart?cht=p3&chs=250x100&chd=s:pU&chl=Hello|World
so what about large datasets?
Page 97
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
Google Chart API
Page 98
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
Page 99
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
2KB URI length limitation
Page 100
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
2KB URI length limitation
solution: POST it
Page 101
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
2KB URI length limitation
solution: POST it
16KB limitation not in <img>
Page 102
http://www.highcharts.com/demo/line-basic
Highcharts (SVG)
Page 103
http://www.highcharts.com/demo/line-basic
Highcharts (SVG)
var chart; $(document).ready(function() { chart = new Highcharts.Chart({ chart: { renderTo: 'container' }, series: [{ name: 'Tokyo', data: [ 7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6 ]}] }); )};
Page 104
http://www.highcharts.com/demo/line-basic
Highcharts (SVG)
What about license? Creative Commons Attribution-NonCommercial 3.0 License
Page 106
Flot (Canvas)
1000 points is not a problem, but as soon as you start having more points than the pixel width, you should probably
start thinking about downsampling/aggregation.
Page 107
Third-party
Client (interaction)
Server
(performance)
Page 108
Third-party
Client (interaction)
Server
(performance)
Page 109
primetime ready for
Page 110
No file chosen Choose File Upload!
Page 111
No file chosen Choose File Upload!
Page 112
No file chosen Choose File Upload!
14,248 data sets imported.
Page 114
insights
get_or_create is dangerous
denormalization is key
profiling can be fun
licensing is hard
always drink your milk.
Page 115
insights
get_or_create is dangerous
denormalization is key
profiling can be fun
licensing is hard
thanks for your attention.
always drink your milk.