Top Banner
Dr. Chris Musselle – Consultant [email protected] R Meets Julia Dr Chris Musselle
20

Dr. Chris Musselle – Consultant [email protected] R Meets Julia Dr Chris Musselle.

Dec 27, 2015

Download

Documents

Alfred Scott
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

R Meets Julia

Dr Chris Musselle

Page 2: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Outline

• Julia – What, So What, When? • Julia – Where its currently at• Julia and R• Case Study: Calculating String Similarity

Page 3: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

- julialang.org

• A flexible dynamic language appropriate for scientific and numerical computing.

• Arrived Feb 2012 after 2 years development at MIT.• Julia 0.3 - released Aug 2014. • Free and open source (MIT Licensed)

Page 4: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Language Features

• Performance comparable to compiled languages. • Designed with distributed computing in mind.• Dynamic typing, optional declaration, Multiple

dispatch.• Libs written in Julia, git based package management.• Direct calling of C and Fortran libraries.• Interactive REPL “Read-Eval-Print-Loop”

Page 5: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

The Vision

“We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as MATLAB, as good at gluing programs together as the shell.

… something that provides the distributed power of Hadoop - without the kilobytes of boilerplate Java and XML”

--- Julia’s Authors

Source: http://julialang.org/blog/2012/02/why-we-created-julia/

Page 6: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Too Good to be True?

• Scientific computing, though requiring high performance, have shifted to use dynamic languages.• More productive.• Human time for expensive than CPU time.

• Many advancements in compiler techniques and language design over the years e.g. JIT.

• Can now greatly mitigate the performance trade-off associated with a dynamic language.

• But has required building from the ground up.

Page 7: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

So How Fast is Fast?

Source: http://julialang.org/benchmarks/

Page 8: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Where’s Julia at now?

• Standard Library• Core Syntax, Collections and Data Structures• Linear Algebra, BLAS, Sparse Matrices• Package Manager • Graphics• Unit and Functional Testing• Profiling

• External Packages • Total of 384 external packages written by 138 primary authors.• http://pkg.julialang.org/

Page 9: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Who Uses it?

• JuliaLang – The Core language• JuliaStats – Statistics• JuliaOpt – Numerical Optimization Library• JuliaSparse – Sparse Matrix Solvers• JuliaDiff – Differentiation Tools

• JuliaWeb – Web stack tools• JuliaGPU – GPU computing

• JuliaQuant – Financial Analysis Libraries• JuliaAstro / JuliaQuantum  – Astronomy/Physics/Chemistry

Page 10: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

When to Use it?

• Julia allows fast prototyping of code, that is also fast to execute.

• Best used to code up bespoke algorithms.• Julia ecosystem is in its infancy, majority of

packages focus on numerical computation. • May need to re-implement ‘tools’ from scratch e.g.

parsers / data structures / algorithms etc.

Page 11: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Julia and R?

• Calling R from Julia: https://github.com/lgautier/Rif.jl

• Calling Julia from R:• System calls – New session each time• https://github.com/armgong/RJulia

Page 12: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Case Study: String Similarity (Edit Distance)

• The number of “edit” operations between two strings where an edit is:• An insertion• A deletion• A substitution

• E.g. Edits between sitting and Kitten• Substitute “s” for “k” at position 1• Substitute “i” for “e” at position 5• Insert “g” at position 6

Page 13: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Case Study: String Similarity (Edit Distance)

• This particular formulation is known as the Levenshtein Distance.

• Used the optimised “dynamic programing” approach. • Pseudocode available at http://

en.wikipedia.org/wiki/Levenshtein_distance• Applications

• Spell checking • Computational Biology • Natural Language Processing• Speech Recognition

Page 14: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Case Study: String Similarity (Edit Distance)

• Compared 5 different approaches:• R_lev - Written purely in R.• R_adist - Using the built in adist function in R• Julia – Written purely in Julia• Python_np_lev – Written in Python (using numpy)• Python_c_lev – Python wrapper to a C function

Page 15: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Results

Page 16: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Results (minus R lev)

Page 17: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Key Results

• Pure R implementation was over 10 times slower that adist and Python and 33 time slower than Julia.

• Found Julia 2.5 to 3 times faster than Python and R• Reading line by line <<< Reading in all at once• Python + numpy ~ R’s built in adist

Page 18: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Summary

• Julia – Certainly has great potential• Strengths – numerical computation in a dynamic “REPL”

language with clean syntax• Weakness’s – Playing catch-up with tools and libraries.

• Early days for integration with other languages.• Julia Other language good though.

• Don’t prototype your next algorithm in R if speed matters!

• Found Julia 2.5 to 3 times faster than Python and R

Page 19: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

Thank You For Your Attention

Any Questions?

- julialang.orgCalling R from Julia: https://github.com/lgautier/Rif.jlCalling Julia from R: https://github.com/armgong/RJuliaEdit distance: http://en.wikipedia.org/wiki/Levenshtein_distance

Page 20: Dr. Chris Musselle – Consultant cmusselle@mango-solutions.com R Meets Julia Dr Chris Musselle.

Dr. Chris Musselle – [email protected]

What’s Next?

• Accepted GSoC projects 2014• Libgit2 support • Linear algebra for generic types • Julia + Light Table – IDE development• IJulia Interactive Widgets • 3D Visualization Package for Julia