Hassle-Free Data Science Apps with Bokeh
Presenters
Peter Wang is the CTO and Co-founder of Continuum Analytics and the creator of Bokeh.
He has been developing commercial scientific computing and visualization software for over 15 years.
As a creator of the PyData conference, he devotes time and energy to growing the Python data
community, and advocating and teaching Python at conferences worldwide.
Bryan Van de Ven is the lead developer on the Bokeh project.
He holds an undergraduate degree in Computer Science & Mathematics form UT Austin, and a Masters degree in Physics from UCLA.
Previously Bryan developed data exploration and visualization software for sonar feature detection, financial risk modeling, and fluid mixing simulation.
Overview• What is Bokeh?
• Overview and tour of major features
• Demo 1: Scikit-learn clustering
• Demo 2: Gapminder
• Demo 3: Streaming data
• Really big data: Preview of data shading
• Q&A
Overview of Anaconda
is….the modern open source analytics platform powered by Pythonthe fastest growing open data science language• Easy to Build, Maintain & Deploy Analytics• Talks with Everything, Runs Anywhere• High Performance, Scalable Analytics
AnacondaAccelerating Adoption of Python for Enterprises
COLLABORATIVE NOTEBOOKSwith publication, authentication, & search
Jupyter/ IPython
PYTHON & PACKAGE MANAGEMENTfor Hadoop & Apache stack Spark
PERFORMANCEwith compiled Python for lightning fast execution
Numba
VISUAL APPSfor interactivity, streaming, & BigBokeh
SECURE & ROBUST REPOSITORYof data science libraries, scripts, & notebooks
Conda
ENTERPRISE DATA INTEGRATIONwith optimized connectors & out-of-core
processing
NumPy & Pandas
Anaconda for Data ScienceEmpowering Everyone on the Team
Data Scientist• Advanced analytics with Python & R• Simplified library management• Easily share data science notebooks & packages
Developer• Support for common APIs & data formats• Common language with data scientists• Python extensibility with C, C++, etc.
Business Analyst• Collaborative interactive analytics with
notebooks• Rich browser based visualizations• Powerful MS Excel integration
Data Engineer• Powerful & efficient libraries for data
transformations • Robust processing for noisy dirty data• Support for common APIs & data formats
Ops• Validated source of up-to-date packages including indemnification • Agile Enterprise Package Management• Supported across platforms
Computational Scientist• Rich set of advanced analytics• Trusted & production ready libraries for
numerics• Simplified scale up & scale out on clusters &
GPUs
Modern Analytics Stack
Write Once, Deploy AnywhereM
ANAG
ED
PYTH
ON
Explore & Visualize
Python & R Advanced Analytics
High Performance & Scalability
Data Engineering & Analysis
Collaboration & Integration
Servers Linux,Windows
OSX
GPUs&HighEndWorkstations
Linux&Windows
NVIDIA,AMD,X86/ARM
Clusters Yarn,Mesos,MPI
Power8,LSF,SungridEngine
NoSQL MongoDB
Cassandra/DataStax
Hadoop Cloudera,Hortonworks
ApacheHadoop&Spark
Files MicrosoftExcel
Trifacta,Import.io
DW&SQL AnySQLDB
AnySQLDW,Impala
Bokeh Overview & Tour
Bokeh
11
http://bokeh.pydata.org
• Interactive visualization • Novel graphics • Streaming, dynamic, large data • For the browser, with or without a server • No need to write Javascript
Versatile Plots
12
Novel Graphics
13
14
Linked Plots (Notebook 2)
• Easy to show multiple plots and link them • Easy to link data selections between plots • Can easily customize the kind of linkage straight from
Python, without needing to fiddle around with JS
15
Flexible Tools (Notebook 3)
• Many useful tools with built-in functionality • Easy to extend with Javascript, if so inclined
rBokeh
16http://hafen.github.io/rbokeh
Plays well with R ecosystem: HTMLwidget, RMarkdown…
rBokeh with RStudio & Shiny
17
Architecture
19Server-side Data Processing: Python, Java, etc.
HTML
Javascript
D3 Highcharts Flot nvd3 dcjs
JavaScript Plotting library
CSV, SQL
Data
Traditional Web Visualization
CSSTech: • Python/R/Java • HTML & browser compat • CSS/LESS/Sass • JS plotting library API • Javascript
• jQuery, underscore • svg, canvas2D • webGL, three.js • React • Angular • node.js, browserify,
gulp, grunt, npm, …
Browser
HTML
20
HTML
CSSJavascript
User
Data
Python, Ruby, Java, .NET
Server
Traditional Web Viz - Interaction
Javascript
Javascript
Data’
Simple dashboard: Server language generating HTML, JS, CSS styling, subset of data
Handling user interaction: Custom Javascript, calling Server endpoint, which generates updated JSON or JS that gets pushed back to client via websocket
Server
Bokeh BokehJS
JSON
(HTML, CSS)
Client
Bokeh Conceptual Architecture
UserPython, R,
Scala
Data
Simple dashboard: Single language, no need to write HTML, JS, CSS
Handling user interaction: Single language that you already know; interactive data updates feel seamless to the user
• Skills required: 5-10 skills • Time to market: weeks to months • Server code: 100s to 1000s lines
• Skills required: ~1 skill • Time to market: minutes • Server code: 0
Client
Data
BokehJSPython, RBokeh
Server
Python, Ruby Java, .NET
Data
Client
CSSData
Comparison Chart
Some Bokeh Users
Community & AdoptionGithub • 3500+ watchers • 680 forks
Mailing list • 400+ members • 150+ posts in November
Downloads • 21,500 / month (conda) • 10,000 / month (pip)
Demo: Clustering with Scikit-learn
Demo Overview
In this demo, we will build a basic application which lets us visualize different kinds of clustering approaches with Scikit-learn.
• We will use a drop-down to select the algorithm • We will write a Python handler function which
responds to the user action, and pushes an update to the plot in the browser.
• Notebook for basic viz: ~25 LOC • Example app with 1 dropdown: < 100 LOC • Multiple dropdown and sliders: < 200 LOC
Demo: Gapminder
Demo OverviewThis demo shows how we can embed a little bit of Javascript to make a server-less but very capable interactive visualization.
• We will build up the visualization from the ground up, showing different kinds of Bokeh plotting primitives
• We will do it inside the Jupyter Notebook, so we can see our changes immediately
• Then we will wire up an interactive slider
The resulting interactive visualization will be embedded in the browser, with no reliance on a server to handle user interactions.
Demo: Animation & Streaming example
Demo Overview
In this demo, we will demonstrate how the Bokeh server makes it easy to visualize streaming and dynamic data.
• A minimal example with < 50 LOC • Demonstrates ease of pushing
data from Python code into the browser
32
• Realtime audio sampling via PyAudio, realtime FFT via Numpy
• 30 fps • ~200 lines of code
Bokeh: Progress and Future
Visualizing Big Data: Preview of “Data Shading”
35
Billions and billions…
36
Data Shading Main Points• When trying to visualize millions of points, browser vs. rich client
doesn’t really matter • Raft of common problems that are ignored: Overdraw, over- & under-
saturation, clipping, coarse binning • Statistical transformations of data are a first-class aspect of the
visualization • Rapid iteration of visual styles & configs, interactive selections and
filtering are key concerns in data exploration
When data is large, you don’t know when the viz is lying.
37
Data Shading Pipeline
Data
Project / Synthesize
Scene Aggregates
Sample / Raster Transfer
Image
Visual Abstraction
DataTransforms
VisualMappings
ViewTransforms
Data Tables
Source Data Views
Selection Aggregation Transfer
SignificantSet Aggregates
Anaconda Subscriptions and Resources
Priority 1 support with Dedicated Customer
Support Rep
ANACONDAENTERPRISE
CONTACT USCONTACT US
ANACONDAPRO
Priority 1 support
DOWNLOAD
ANACONDA
Community Support
FREE FOREVER
Open Source Modern Analytics Platform
Powered by Python
Anaconda with Support & Indemnification
Priority 1 support
ANACONDAWORKGROUP
CONTACT US
Anaconda with High Performance and Team
Collaboration
Anaconda with Scalable High Performance and
Team Collaboration
per year
+ $1,000 per year foradditional users
$10,000Starting at
+ $3,000 per year foradditional users
per year
$30,000Starting at
+ $6,000 per year foradditional users
per year
$60,000Starting at
Anaconda Subscriptions
Contact Information and Additional Details
• Contact [email protected] for more information aboutAnaconda subscriptions, consulting, or training
• View documentation and examples at
bokeh.pydata.org
• View demo notebooks on Anaconda Cloud
notebooks.anaconda.org/pwang/
Thank you
Email: [email protected]
Twitter: @ContinuumIO
Peter WangTwitter: @pwang
BokehTwitter: @bokehplots