Computer Vision - now working in over 2 Billion Web Browsers!

Computer Vision - now workingin over 2 Billion Web Browsers!

Rob MansonCEO & co-founder

Sebastian MontaboneComputer Vision Engineer

Mixed Reality. In the web. On any device. https://try.awe.media

So what is Mixed Reality?

Here’s a short demo of Milgram’s Mixed Reality Continuum - all running in a browser.

awe.media

http://www.apple.com

A brief/biased history of Computer Vision 1957 - Russel A. Kirsch scans first photo with a computer

1960 - Larry Roberts publishes thesis at MIT

1964 - First facial recognition system (unamed intelligence agency)

1976 - UK Police create first License Plate recognition system

1978 - David Marr proposes edge detection framework at MIT

1985 - Lockheed Martin/Carnegie Mellon create first self-driving land vehicle

1992 - Tom Caudell at Boeing coins the term Augmented Reality

1999 - Billinghurst & Kato publish/demo ARToolkit at IWAR/SIGGRAPH

2000 - Windows only alpha version of OpenCV launched at CVPR

2007 - OpenCV 1.0 released

2008 - ARToolkit ported to Flash by @saqoosha

2011 - ARToolkit ported to Javascript by Ilmari Heikkinen

2011 - FastCV/Vuforia 1.0 released

2017 - Facebook adds Computer Vision to their camera app

2017 - OpenCV in the browser demonstrated here awe.media

How does Computer Visionwork in the browser?

awe.media

camera -> gUM -> video -> canvas -> pixels -> vision algorithms

HTMLVideoElement

This is a container for decoding and presenting video streams. This brought plugin free video to the web.

awe.media

awe.media

Canvas, WebGL & the ArrayBuffer

The 2D Canvas gave us the ability to convert a video stream into pixel data.

WebGL brought 3D Canvases with access to the GPU. But most importantly WebGL gave us ArrayBuffers

which allowed us to access the pixel data for the first time.

awe.media

JSARToolkit

In 2011 Billinghurst & Kato's ARToolkit was ported to Javascript.

awe.media

Enter WebRTC's getUserMedia()

Some claim this has a latency that makes the web unusable for AR.But here’s the numbers running on a Pixel - the max difference is ~200ms

200-250ms - Camera stream in a native AR 350-400ms - gUM stream in a web app

awe.media

WebRTC's getUserMedia()

FAST feature detection & Tigerstail in 2012

awe.media


Tracking.js released in 2012

awe.media


AR.js released in 2017

awe.media

Transpiling OpenCV

This brings a more general computer vision toolkit to the web!

Demo Time!

awe.media

awe.media

But there's no gUM on iOS?

For Vision based functionality we fallback to Visual Search

For Location based apps we fallback to 360°/VR (like Pokemon Go with the camera off)

And remember “video see thu” is not the only form of AR

Computer Vision - now working in over 2 Billion Web Browsers!

Internet